189 Commits

Author SHA1 Message Date
108b7aa331 Describe version test-v6 2017-10-28 18:46:30 +02:00
a3adf31b89 Merge branch 'fs#86-activate-status-api' into develop 2017-10-28 18:44:42 +02:00
614f288bb9 Activate jogobot status api for onwiki disabling
Related Task: [FS#86](https://fs.golderweb.de/index.php?do=details&task_id=86)
2017-10-28 18:44:05 +02:00
c450a045bf Merge branch 'fs#159-space-before-anchor' into develop 2017-10-28 18:43:13 +02:00
84802cf521 Remove leading or trailing spaces in articles
Some articles contain spaces between title and anchor part which will
be stripped now

Related Task: [FS#159](https://fs.golderweb.de/index.php?do=details&task_id=159)
2017-10-28 18:41:06 +02:00
5f6c443ba8 Merge branch 'test-v5' back into develop 2017-10-28 18:17:01 +02:00
0c135ef1bb Describe version test-v5 2017-09-23 23:50:42 +02:00
8b8221cfcd Merge branch 'fs#152-respect-always-flag' into develop 2017-09-23 23:49:59 +02:00
bdccc8417c Set always in Pywikibot.Basebot
If cmdline param -always is set, set the related option in
Pywikibot.Basebot Object for automatic edits with out further requests

Related Task: [FS#152](https://fs.golderweb.de/index.php?do=details&task_id=152)
2017-09-23 23:49:41 +02:00
a70835c58a Merge back branch 'test-v4' into develop 2017-09-23 23:48:25 +02:00
ec2b84df2a Add requirements
To make setup of environment for this module easier
2017-09-23 21:09:58 +02:00
88848cb084 Prepare Version test-v4 for release
Add a README.md file for this project
2017-09-23 20:32:13 +02:00
5057aed0d3 Merge branch 'fs#157-lowercase-title' into develop 2017-09-09 21:47:03 +02:00
02e53475f1 Prevent lowercase article titles in Parser
Since real lowercase article titles are not allowed, make sure to
convert all first letters of article titles to uppercase. This is
neccessary since pywikibot will return article titles like this.

Related Task: [FS#157](https://fs.golderweb.de/index.php?do=details&task_id=157)
2017-09-09 21:35:36 +02:00
d6f9b460c9 Merge branch 'fs#156-dbapi-charset' into develop 2017-09-02 22:13:20 +02:00
ff03ca8f13 Explicitly set charset for PyMySQL-Connection
Since PyMySQL-Connection otherwise uses charset 'latin-1', explicitly
set connection charset to 'utf8'

http://docs.sqlalchemy.org/en/rel_1_0/dialects/mysql.html#charset-selection
http://docs.sqlalchemy.org/en/rel_1_0/core/engines.html?highlight=url#sqlalchemy.engine.url.URL

Related Task: [FS#156](https://fs.golderweb.de/index.php?do=details&task_id=156)
2017-09-02 22:10:25 +02:00
88692ca678 Merge branch 'fs#155-article-surouding-space' into develop 2017-09-02 22:08:31 +02:00
d9b4fcc0bd Strip spaces before adding articles to redfam
Some article links have surounding spaces in their linktext. Remove them
before adding article to RedFam to have a cannonical title

Related Task: [FS#155](https://fs.golderweb.de/index.php?do=details&task_id=155)
2017-09-02 22:06:30 +02:00
22ff78ea98 Merge branch 'fs#154-categorie-colons-missing' into develop 2017-09-02 16:02:45 +02:00
b3cfcdc259 Improve title detection to get correct behaviour
Make sure that categorie links are starting with colon and non article
pages are returned with namespace.

Related Task: [FS#154](https://fs.golderweb.de/index.php?do=details&task_id=154)
2017-09-02 15:59:34 +02:00
b3e0ace2f4 Merge branch 'fs#153-nested-templates' into develop 2017-09-02 14:25:21 +02:00
f8002c85da Do not search for templates recursivly
Since nested templates did not get an index in global wikicode object
searching for index of an nested template results in ValueError

Related Task: [FS#153](https://fs.golderweb.de/index.php?do=details&task_id=153)
2017-09-02 14:23:25 +02:00
49bc05d29b Merge branch 'fs#151-normalize-article-titles-anchor' into develop 2017-09-02 13:36:17 +02:00
8a26b6d92a Normalize article titles with anchors
In our db article titles with anchors are stored with underscores in
anchor string. Therefore we need to replace spaces in anchor string
given by pywikibot.Page.title().

Related Task: [FS#151](https://fs.golderweb.de/index.php?do=details&task_id=151)
2017-08-25 18:11:41 +02:00
49a8230d76 Merge branch 'fs#141-place-notice-after-comment' into develop 2017-08-25 17:11:28 +02:00
31c10073a2 Prevent index errors searching for comments
Make sure not to exceed existing indexes of wikicode object while trying
to search for comments

Related Task: [FS#141](https://fs.golderweb.de/index.php?do=details&task_id=141)
2017-08-25 17:09:38 +02:00
642a29b022 Improve regex for blank lines
Do not match consecutive linebreaks as one

Related Task: [FS#141](https://fs.golderweb.de/index.php?do=details&task_id=141)
2017-08-24 18:47:18 +02:00
2f90751dc2 Merge branch 'fs#146-famhash-generator' into develop 2017-08-24 12:27:54 +02:00
024be69fe1 Use famhash as generator
If famhash is defined, fetch explicitly that redfam from db and work
only on this

Related Task: [FS#146](https://fs.golderweb.de/index.php?do=details&task_id=146)
2017-08-24 12:27:13 +02:00
b6d7268a7f select by famhash: Add methods to get param in bot
We need a method as callback to get bot specific params passed through
to our bot class.
Introduce -famhash parameter to work on specific famhash

Related Task:[FS#146](https://fs.golderweb.de/index.php?do=details&task_id=146)
2017-08-24 12:27:13 +02:00
526184c1e1 Merge branch 'fs#148-articles-mixed-up' into develop 2017-08-24 12:26:53 +02:00
3aa6c5fb1c Disable PreloadingGenerator temporarily
PreloadingGenerator mixes up yielded Pages. This is very unconvenient
for semi-automatic workflow with manual checks as the articles of the
RedFams were not following each other.

Related Task: [FS#148](https://fs.golderweb.de/index.php?do=details&task_id=148)
2017-08-24 12:23:17 +02:00
ec8f459db5 Merge branch 'fs#138-marked-articles-shown-again' into develop 2017-08-24 12:19:24 +02:00
3b2cb95f36 Do not fetch marked redfams from db
Exclude marked Redfams from DB-Query to prevent marking them again

Related Task: [FS#138](https://fs.golderweb.de/index.php?do=details&task_id=138)
2017-08-24 12:09:43 +02:00
41e5cc1a9d Merge branch 'fs#141-place-notice-after-comment' into develop 2017-08-24 12:06:03 +02:00
9b9d50c4d2 Improve detection of empty lines
Search with RegEx as empty lines could also contain spaces

Related Task: [FS#141](https://fs.golderweb.de/index.php?do=details&task_id=141)
2017-08-24 12:04:45 +02:00
a755288700 Merge branch 'fs#147-templates-in-heading' into develop 2017-08-23 14:55:43 +02:00
14ec71dd09 Rewrite get_disc_link to handle special cases
Use methods of pywikibot site-object and mwparser to get rid of any
special elements like templates or links in headings for construction
of our disc link.
Replace   by hand as it otherwise will occur as normal space and
wont work

Related Task: [FS#147](https://fs.golderweb.de/index.php?do=details&task_id=147)
2017-08-23 14:53:22 +02:00
e283eb78ac Merge branch 'fs#140-also-mark-redirects' into develop 2017-08-22 21:59:22 +02:00
cc02006fd2 Do not exclude redirects from beeing marked
In accordance with Zulu55 redirect discussion pages should also get
a notice, therefore do not exclude redirects.

Related Task: [FS#140](https://fs.golderweb.de/index.php?do=details&task_id=140)
2017-08-22 21:59:07 +02:00
37b0cbef08 Merge branch 'fs#138-marked-articles-shown-again' into develop 2017-08-22 21:58:22 +02:00
4137d72468 Look for existing notice by simple in-check
To detect maybe uncommented notices already present, check for them
using just a simple python x in y check over whole wikicode

Related Task: [FS#138](https://fs.golderweb.de/index.php?do=details&task_id=138)
2017-08-22 21:56:43 +02:00
cd87d1c2bb Fix already marked articles was reshown bug
Since we search for matching states for articles to include or exclude
in a loop, we could not control the outer loop via default break/
continue. Python docs recommend using Exceptions and try/except
structures to realise that most conveniently.

https://docs.python.org/3/faq/design.html#why-is-there-no-goto

Related Task: [FS#138](https://fs.golderweb.de/index.php?do=details&task_id=138)
2017-08-22 21:45:58 +02:00
456b2ba3d4 Merge branch 'fs#141-place-notice-after-comment' into develop 2017-08-21 22:11:51 +02:00
47b85a0b5e Add missing line break if there is no template
To make sure our notice template resides in its own line in every case

Related Task: [FS#141](https://fs.golderweb.de/index.php?do=details&task_id=141)
2017-08-21 22:09:59 +02:00
a6fdc974bd Merge branch 'fs#144-PyMySQL-instead-oursql' into develop 2017-08-21 13:58:34 +02:00
30de2a2e12 Replace oursql with PyMySQL
Since this is prefered on toolsforge and works out of the box after
installing via pip, replace oursql which caused some problems.
Especially oursql was not able to connect to db via ssh tunnel.

Related Task: [FS#144](https://fs.golderweb.de/index.php?do=details&task_id=144)
2017-08-21 13:55:33 +02:00
4a6855cf7b Merge branch 'fs#141-place-notice-after-comment' into develop 2017-08-21 13:51:32 +02:00
8422d08cb6 Keep comments and leading templates together
Prevent spliting up existing comments and templates as often those are
documenting archiv templates behaviour

Related Task: [FS#141](https://fs.golderweb.de/index.php?do=details&task_id=141)
2017-08-21 13:49:34 +02:00
ed78501821 Merge branch 'fs#115-lsec-no-template' into test-v3 2017-08-21 13:16:28 +02:00
34e7e0d3be Prevent index Error if no template in leadsec
Check if there is a template in leadsec before accessing list item to
prevent IndexErrors

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=115 FS#115]
2017-03-11 12:22:10 +01:00
f9f081d072 Merge branch 'fs#114-remove-underscores-in-articles' into test-v3 2017-03-11 11:43:49 +01:00
0f930082b4 Also canonicalise anchor parts of articles
Replace spaces in anchors with underscores as spaces are not correct
there

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=114 FS#114]
2017-03-11 11:40:41 +01:00
80c94ccf4f Replace underscores in article titles
Remove underscores in article titles and replace with spaces to have
canonical state for all articles
Therefore we need to split title and posible anchors in heading parser

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=114 FS#114]
2017-03-11 11:30:19 +01:00
37704c6661 Replace pywikibot.showDiff with patched version
Pywikibot.bot.userPut does not support setting the value of diff context
so it is always zero. Therefore we need to patch either userPut or
showDiff to get some context.

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=113 FS#113]
2017-03-11 10:39:31 +01:00
4e4be1c6d0 Merge branch 'fs#110-markpages-status-problems' into test-v3 2017-03-11 00:06:00 +01:00
3e69a1c77e Remove problem indicating stati when set marked
Remove states which are indicating problems in previous runs if
successfully marked article and also whole RedFam

[https://fs.golderweb.de/index.php?do=details&task_id=112 FS#112]

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=110 FS#110]
2017-03-11 00:03:42 +01:00
56f326b568 Fix error all current redfams marked when quit
Restructure update_status to make sure, marked is only set when all
articles are marked or gone (means deleted or redirect)

[https://fs.golderweb.de/index.php?do=details&task_id=111 FS#111]

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=110 FS#110]
2017-03-10 23:45:48 +01:00
868894a38b Format fixes
Set locale to de_DE.utf-8 for whole Task

Make sure Template is added in own source line
2017-03-10 23:28:24 +01:00
65de6decb2 markpages: Filter redirects
Do not mark redirects discussion pages
2017-03-10 21:51:59 +01:00
889be30a47 Merge branch 'fs#109-redpage-redfam-join-type' into test-v3 2017-03-09 15:33:05 +01:00
147e96d388 Add Wrapperclass for Parser to RedPage
Add a wrapper class to overwrite type of Items returned by
RedPage.redfams relationship

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=109 FS#109]
2017-03-09 15:30:51 +01:00
76666aa294 Merge branch 'fs#25-mark-done' into test-v3 2017-03-09 10:50:14 +01:00
db39bb5ff4 Merge branch 'fs#88-mark-pages-bot' into fs#25-mark-done 2017-03-09 10:49:16 +01:00
ec7880207b Merge branch 'fs#95-sqlalchemy' into fs#88-mark-pages-bot 2017-03-09 10:48:10 +01:00
4aaacf1443 Add redfams to redpage-obj after parsing
To have redfams available for updates immediately after parsing. Double
redfams then will be seen as Update.

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=108 FS#108]
2017-03-09 10:25:52 +01:00
281f1c49a8 mysqlred: Set family via pywikibot
Get family/language part of table names from PyWikiBot Site
2017-03-09 00:12:41 +01:00
3fe47e666f Fix polymorphism problem with relationships
Since we are using subclasses of the ORM mapped classes, disable
typechecks for ORM relations
2017-03-09 00:12:41 +01:00
e16925197c Fix pep8.. compliance
To be concordant with the coding styles fix pep8 compliance
2017-03-09 00:12:31 +01:00
9ba7d2e517 Change redfam generator filters
Change and clear up the filters in redfam generator to keep track of
article status and use positive conditionals
2017-03-09 00:10:51 +01:00
844fee52ae Make markpages using new DB/Class structure
Update markpages and RedFamWorker-Code to use the new sqlalchemy based
DB ORM Interface
2017-03-09 00:10:51 +01:00
43e31c108a Working RedFamWorker query
Modify RedfamWorker class to work with new DB API
2017-03-09 00:10:50 +01:00
89b50e3312 Remove old status API
Now we use the methods of status object directly
2017-03-09 00:10:50 +01:00
bf8e47f916 Improve new status API
Make sure state changes are only detected as such by sqlalchemy if they
are real changes
2017-03-09 00:10:50 +01:00
467f829af2 Some cleanups
Remove old commented out code from manual mysql solution
2017-03-09 00:10:50 +01:00
6e973369cd sqlalchemy working for parser
Needs some testing, presumably contains some bugs
2017-03-09 00:08:48 +01:00
0ebf307bb8 Add markpages as subtask
Markpages is a subtask of our Red-Bot

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=89 FS#89]

# The following line will be added automatically
# Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=88 FS#88]
2016-11-05 19:41:22 +01:00
4e4d5005fd Merge branch 'fs#89-article-status' into fs#88-mark-pages-bot 2016-11-05 19:39:30 +01:00
65fb2ecb28 Generate Fam status based on article status
Some article states should be reflected in the RedFam status

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=89 FS#89]
2016-11-05 19:27:56 +01:00
d55c81c97b Set article status when worked on talkpage
To detect whole redfam status after run over all articles

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=89 FS#89]
2016-08-30 18:05:51 +02:00
870ed4bf25 Update redfam.article_generator use article status
To be able to filter articles by status of that article

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=89 FS#89]
2016-08-30 17:47:02 +02:00
e13320820c Add API to manage status per article
To be able to track changes to articles to update redfam status

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=89 FS#89]
2016-08-30 17:45:18 +02:00
4ae562590e Merge branch 'fs#94-data-structure' into fs#88-mark-pages-bot 2016-08-30 17:37:31 +02:00
6149dcdb8b Apply changes to data structure
See related ticket

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=94 FS#94]
2016-08-30 14:28:28 +02:00
f021f2ea60 Merge branch 'fs#93-update-talkpage-template' into fs#88-mark-pages-bot 2016-08-30 12:08:57 +02:00
8c56125a7b Update talkpage notice template
Exact date is not necessary and end could be ommited if of the same
month

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=93 FS#93]
2016-08-30 12:07:11 +02:00
c19f642d11 Merge branch 'fs#92-mark-done-edit-summary' into fs#88-mark-pages-bot 2016-08-30 11:51:07 +02:00
20b811bc2a Make sure edit summary starts with bot
Due to bot policy all edit summaries of bot edits have to start with
"Bot:"

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=92 FS#92]
2016-08-30 11:48:07 +02:00
59d4d23c83 Set edit summary for each edit
Each edit of bot needs a edit summary

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=92 FS#92]
2016-08-30 11:33:54 +02:00
2b93e4cf16 Check if notice is present before add
To prevent duplications we need to check wether notice is already
present on talkpage

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=88 FS#88]
2016-08-28 21:39:54 +02:00
9beca7f6c9 Implement method to add notice to disk page
Adds the generated notice to the talkpage and starts the saving of
the page

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=88 FS#88]

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=88 FS#88]
2016-08-28 21:33:36 +02:00
c4d8a95672 Implement build_generator-method
Build_generator will add the redfam_talkpages_generator to the
genFactory, build a generator of the genFactory and sets self.gen
which is used as generator for run()

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=88 FS#88]
2016-08-28 18:13:27 +02:00
da4f9b5d6b Add wrapper-generator to redfam.article_generator
We need a wrapper around redfam.article_generator to pass it to
pagegenerators.PageWithTalkPageGenerator and to add a reference to
related redfam to each pywikibot.page-object before yielding it

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=88 FS#88]
2016-08-28 18:09:04 +02:00
ecc78bef96 Import needed modules and add redfams-generator
We will need a couple of modules to build the needed generator
Also we will need a generator with redfams to work with

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=88 FS#88]
2016-08-28 18:01:02 +02:00
efa919ff27 Add new bot with basic structure
We need a bot to work on pages which are subjects of redfams and on the
belonging talk page

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=88 FS#88]
2016-08-28 16:39:32 +02:00
72c6165de8 Merge branch 'fs#87-redfam-article-generator' into fs#25-mark-done 2016-08-28 16:20:53 +02:00
c0b18f88e5 Add filter options to redfam.article_generator
To give the posibility to filter not existing pages or redirect pages or
vice versa.

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=87 FS#87]
2016-08-28 15:06:17 +02:00
e5989305a4 Add a generator to redfam yielding article pages
To work on articles of a redfam a generator which yields belonging
articles is necessary

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=87 FS#87]
2016-08-28 13:20:13 +02:00
8ce6f03641 Merge branch 'fs#29-generate-articledisc-notice' into fs#25-mark-done 2016-08-27 19:50:18 +02:00
6717fa4fba Add method to generate notice for article discpage
We need a method to generate the template to add to article discpages

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=29 FS#29]
2016-08-27 19:49:50 +02:00
8acba7d0f9 Merge branch 'fs#81-get-reddisc-link' into fs#25-mark-done 2016-08-27 19:48:24 +02:00
3723aba578 Add a method to get link to related reddisc
To generate notices or other stuff it is necessary to add links to the
related reddisc.
This method returns a wikilink to text the redfam's reddisc.

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=81 FS#81]

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=81 FS#81]
2016-08-27 19:48:03 +02:00
9d3bc74c80 Merge branch 'fs#26-done-redfam-gen' into fs#25-mark-done 2016-08-27 19:46:41 +02:00
b36dc250d2 Request information about reddisc page for redfams
To generate links to related reddisc it is necessary to get at least the
Title of the related reddisc page. As saving the same data in db is
worse, we retreive it via join from red_pages table

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=26 FS#26]
2016-08-27 19:44:33 +02:00
4055dc52d8 Make it possible to get a RedPage-Object by pageid
When working on redfams it is necessary to have information about redpage

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=26 FS#26]
2016-08-27 19:44:33 +02:00
594130c8a6 Restore changes from 45df35431
Documented to prevent deleting again

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=26 FS#26]
2016-08-27 19:44:33 +02:00
b271a0b0b1 Add generator wrapper to fetch RedFams by status and ending
Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=26 FS#26]
2016-08-27 19:44:33 +02:00
ad088126e7 Define method to update Status after Working with RedFam
Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=26 FS#26]
2016-08-27 19:44:33 +02:00
151c22a735 Add fetched mysql_data to _mysql-Object of parent class for using change-method to update db
Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=26 FS#26]
2016-08-27 19:44:33 +02:00
a97d8c722e Move handling of mysql-Connection from RedFamParser and RedFamWorker to RedFam-Class and make it protected instead of private
Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=26 FS#26]
2016-08-27 19:44:33 +02:00
58dfd8c86a For RedFamilies not fetched individually we need to provide the fam hash as index
Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=26 FS#26]
2016-08-27 19:44:33 +02:00
9481116777 Add new generator-method to fetch RedFams by Status and Ending
Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=26 FS#26]
2016-08-27 19:44:33 +02:00
eaa7596a8f Merge branch 'fs#70-refactoring' into test-v3 2016-08-27 19:40:00 +02:00
449d83d7b5 Merge branch 'fs#82-subtask-wrapper' into fs#70-refactoring 2016-08-27 19:19:40 +02:00
4ac9b305f5 Merge branch 'fs#85-move-start-api-to-jogobot' into fs#82-subtask-wrapper 2016-08-27 19:17:10 +02:00
604b7bd8b7 Now use Bot-Start API from jogobot framework
API was moved to jogobot to share with other tasks

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=85 FS#85]

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=85 FS#85]
2016-08-27 19:04:13 +02:00
d0fa15d0ed Update jogobot module to get standart Start-API
[FS#84]

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=85 FS#85]

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=85 FS#85]

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=85 FS#85]
2016-08-27 18:50:36 +02:00
71e41bfed3 Merge branch 'fs#83-wrapper-compatibility' into fs#82-subtask-wrapper 2016-08-27 18:24:14 +02:00
2be0a8903d Adjust constructor for wrapper-script
The new wrapper-script calls a standardized API
We need to be conform with that

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=83 FS#83]
2016-08-27 17:02:51 +02:00
0ceb2e6e83 Add methods to build gen to DiscussionParser
With the new wrapper script the Bot gets a GenFactory and has to build
a generator out of it by its own

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=83 FS#83]
2016-08-27 16:58:20 +02:00
3540cc2a7d Move functional sections to functions in main()
To make main() function less complicated functional sections are moved
to dedicated functions

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=82 FS#82]
2016-08-27 15:40:09 +02:00
460d2db183 Add Bot run with exception handling
Errors, especially caused by missing run-method, need to be catched to
provide information in Logfile.
And also to get information wether bot run was successfull

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=82 FS#82]

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=82 FS#82]
2016-08-27 15:40:09 +02:00
156f117b18 Add Bot initiation with exception handling
Bot initiation needs to catch errors by Bot to enforce at least a basic
logging.
And also to be sure Init was successfull before starting bot.

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=82 FS#82]
2016-08-27 15:40:09 +02:00
1679e2ad6a Prepare environment for starting subtasks
Before init and run bot we need to provide a environment for it,
like parsed args

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=82 FS#82]
2016-08-27 15:40:09 +02:00
b88efb6bdd Reflect stucture changes in Code
Since bot class is moved to separate dir/file we need to do some changes
to rebuild functionality

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=82 FS#82]
2016-08-27 15:40:09 +02:00
177a8f920f Prepare new structure to use subtasks
To have only one entry point for the bot we want to have a single file
(red.py) which is calling the specfic task class from bots dir with a
standardized call

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=82 FS#82]
2016-08-27 15:40:09 +02:00
0549cbd2c2 Merge branch 'fs#80-remove-deprecated-methods' into fs#70-refactoring 2016-08-25 22:43:57 +02:00
78eda10562 Remove deprecated methods
Deprecated functions which are not used anymore can be removed to make
code more clearer and improve maintainability

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=80 FS#80]
2016-08-25 22:41:13 +02:00
510771509b Merge branch 'fs#79-mysql-table-prefix' into fs#70-refactoring 2016-08-25 13:15:23 +02:00
71b99b5f58 Delay definition of db_table_prefix
db_table_prefix should be defined at init of MysqlRed and not at import
to have cmdline args already parsed
Otherwise it uses default family

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=79 FS#79]
2016-08-25 13:06:32 +02:00
77d1de4473 Add a tablename prefix depending on Site
To be able to run the bot on different wikis the db tables should be
named pywikibot.Site dependend and changed automatically

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=79 FS#79]
2016-08-24 23:53:10 +02:00
cac04f344f Merge branch 'fs#74-helpermodules-lib' into fs#70-refactoring 2016-08-24 23:09:33 +02:00
e28acf88d1 Introduce new directory structure
To clarify which is a bot and which are helper scripts

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=74 FS#74]
2016-08-24 22:41:41 +02:00
af48888535 Merge branch 'fs#78-redfam-section-false-positives' into fs#70-refactoring 2016-08-24 20:05:04 +02:00
ac54aea698 Use callback to detect redfam.section
Detecting redfam-Sections via RegExp caused some false positives due to
wrong formated things in wikisyntax. See Task

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=78 FS#78]
2016-08-24 20:02:48 +02:00
2deb02fe47 Merge branch 'fs#77-errors-on-old-archives' into fs#70-refactoring 2016-08-24 19:59:53 +02:00
1e4c8646bf Reparse redfam-heading with mwparser
See related ticked for detailed failure explanation

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=77 FS#77]
2016-08-24 19:57:25 +02:00
fe2810f07c Merge branch 'fs#76-redfam-without-dates' into fs#70-refactoring 2016-08-24 17:01:00 +02:00
ab430e0085 Use month of reddisc as beginning if missing
Construct a fictive but sensfull beginning if we cant detect one
Needed since beginning is mandatory

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=76 FS#76]
2016-08-24 16:56:54 +02:00
95be313859 Pass reddisc pywikibot.page object to redfam
To access page information like page title (eg. to get dates from it)
of the reddisc page

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=76 FS#76]
2016-08-24 16:53:45 +02:00
0bb0b2d957 Make sure var beginning is always defined
To prevent unbound Errors caused by using undeclared variable beginning
if the redfam-section does not contain any timestamp

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=76 FS#76]
2016-08-24 16:51:23 +02:00
db32c9e8f6 Merge branch 'fs#75-mysql-flush-error-false-reddisc' into fs#70-refactoring 2016-08-24 16:49:37 +02:00
bd2d221c48 Prevent flush from creating cursor without con
MysqlRed.flush() tried to create a cursor in any case. If there was no
connection (because the subclasses haven't been instantiated an oursql
Error occured.
Instead, check before if there is a connection and otherwise raise an Error

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=75 FS#75]

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=75 FS#75]
2016-08-24 16:46:51 +02:00
ee8ebbc8bc Make sure only flush db if there are redfams
To prevent from doing unnecessary stuff and trying to use not existing
db connection

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=75 FS#75]

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=75 FS#75]
2016-08-24 15:47:43 +02:00
dcc4851513 Check reddisc page titles against regex
To prevent parsing Pages which have been categorized in configured cats
wrong or are given via cmd params
Parsing them results in unexpected behaviour

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=75 FS#75]
2016-08-24 15:27:42 +02:00
0ea1b0039d Merge branch 'fs#72-rewrite-reddiscparser' into fs#70-refactoring 2016-08-24 11:23:38 +02:00
2f878ee901 Correct filename in header
Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=72 FS#72]
2016-08-24 11:20:28 +02:00
17bfb32ded Building generators of config cats in sep Function
Since the main()-Function was too complex the logic to build generators
out of categories provided in jogobot.conf was moved in a separate
function

[https://fs.golderweb.de/index.php?do=details&task_id=73 FS#73]

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=72 FS#72]

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=72 FS#72]
2016-08-24 11:17:00 +02:00
6cb92c1da7 Rewrite parse control using pywikibot.bot classes
To use the default pywikibot.classes making life easier at some point
Beeing standardconform with pywikibot in handling args

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=72 FS#72]
2016-08-23 21:53:44 +02:00
a8605bcee6 Mv pages-parser.py to reddiscparser.py
New, more meaningfull naming conventions, from redpage to reddisc (page)

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=72 FS#72]
2016-08-23 21:50:22 +02:00
5d31bdd7eb Jogobot submodule updated 2016-08-23 21:28:13 +02:00
7f8ab1897e Merge branch 'fs#69-deprecated-decorators-param-str' into test-v3 2016-08-23 21:26:47 +02:00
79dbde2413 Provide Replacement to @deprecated() as str
Since use of pywikibot-master (or Python3.5 @see ticket below)
the @deprecator requires a str as param and no callable object like
done before

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=69 FS#69]
2016-08-23 21:23:24 +02:00
36a480a042 Merge branch 'fs#68-mysql-db-port' into test-v3 2016-08-23 21:16:09 +02:00
bd9dbdfa17 Make use of declared db_host_port
The port to connect to MySQL-Server was previously always assumed as
the default one. So the library was incompatible to db's on nonstandard
ports

Related Task: [https://fs.golderweb.de/index.php?do=details&task_id=68 FS#68]
2016-08-23 21:12:07 +02:00
944bea488a Merge branch 'restucture-parsers' into test-v3 2016-03-05 15:02:31 +01:00
7cac294181 Merge branch 'parser-script' into restucture-parsers 2016-03-05 15:01:09 +01:00
a24f208449 Add parse-pages.py Script 2016-03-04 20:39:41 +01:00
0af7eb11d6 Move parsing of redfams from RedPageParser to RedFamParser.parser so RedPageParse won't do anything with redfams
except for returning a generator of text-sections
2016-03-03 20:41:14 +01:00
7422307985 Rewrite RedPage.parse using mwparserfromhell to make it simpler 2016-03-03 17:37:46 +01:00
b81694c6d3 Rewrite heading_parser using mwparserfromhell to make it simpler 2016-03-03 17:30:39 +01:00
a2dfffc74b Let old date-extracting methods use dates_extract and mark them as deprecated 2016-03-03 17:23:44 +01:00
163972c924 New method dates_extract which finds begining and ending at once 2016-03-03 17:20:57 +01:00
baf4ae2a07 Merge branch 'new-structure' into test-v3 2016-03-02 17:22:04 +01:00
10f64199ab Remove relativ imports as we don't are in a package anymore 2016-03-02 17:19:11 +01:00
24f1a7f516 Remove __init__.py as we won't use it as a package 2016-03-02 17:14:30 +01:00
9113a40704 Merge branch 'warning-non-flushed-mysql-cache' into test-v3 2016-03-02 17:13:08 +01:00
f53a5b3745 Output a warning if there are update/insert querys cached when exit programm 2016-03-02 17:10:08 +01:00
673e49c55a Merge branch 'jogobot' into test-v3
Use new jogobot package
2016-02-29 11:37:40 +01:00
24adafeee7 Changes for new jogobot-module 2016-02-29 11:35:48 +01:00
b26f04db8c Use updated version of jogobot with ast.literal_eval parsed config entrys 2016-02-29 11:13:14 +01:00
f29dfd5003 Use new jogobot module 2016-02-28 18:00:46 +01:00
ef9c13324a Improve documentation of MysqlRed.flush() 2015-09-20 18:17:59 +02:00
e186f2f22b Use dictionary with page_id / fam_hash as key for cached_insert_data to prevent double entrys 2015-09-20 17:45:07 +02:00
7d6cd8bb30 Strip leading and trailing whitespace in Links to prevent wrong fam_hashes (when receiving redfam from db) since MySQL drops it 2015-09-19 22:44:43 +02:00
4e21b6696a Remove unnecessary whitespace from error messages 2015-09-19 20:51:52 +02:00
6992f82f02 Start Implementing of RedFamWorker 2015-09-19 20:51:21 +02:00
dbcfe8f106 Add a generator to MysqlRedFam to retrieve redfams from db by status 2015-09-19 19:50:38 +02:00
8059bb9992 Change behavior of MysqlRedFam to be able to get instance without knowen fam_hash 2015-09-19 19:49:20 +02:00
b5ca69077c Remove double appearence of heading parameter in repression of RedFam 2015-09-19 19:47:09 +02:00
523d029fdc Fix bug causing db table cells containing empty strings 2015-09-19 19:45:34 +02:00
4518efc504 Fix bug (Cached querys not executed) caused by class attribute protection level --> changed from private to protected
Reformat MySQL querys to remove whitespace generated by indetation
2015-09-18 18:08:13 +02:00
b1b37f9b9e Implement functions for flushing db query caches 2015-09-17 20:00:13 +02:00
8dc7fe678d Fix bug caused by adding fam_hash to repr of RedFam class since it was not defined yet while output of warning caused by to many articles 2015-09-17 19:57:53 +02:00
53f53ddb8b Implement cached querys in MysqlRedFam 2015-09-17 19:56:39 +02:00
26f5912f88 Collect writing db querys for running once in MysqlRedPage
Add classmethod to MysqlRed for executing collected querys
2015-09-16 21:02:02 +02:00
1dea5d7e84 NOT WORKING Cache SQL querys to reduce amount of querys 2015-09-16 18:31:54 +02:00
b514eb5c42 Move configuration to jogobot module
Use custom Error classes
2015-09-15 21:21:05 +02:00
db5bb7401e Update RedFam class to rebuild the whole structure of RedFamPaser generated object
Move fam_hash() method from RedFamParser to RedFam
Define custom Error classes
2015-09-15 21:19:07 +02:00
16 changed files with 2027 additions and 829 deletions

3
.gitmodules vendored Normal file
View File

@@ -0,0 +1,3 @@
[submodule "jogobot"]
path = jogobot
url = ../jogobot

60
README.md Normal file
View File

@@ -0,0 +1,60 @@
jogobot-red
===========
Dependencies
------------
* pywikibot-core
* mwparserfromhell
The libraries above need to be installed and configured manualy considering [documentation of pywikibot-core](https://www.mediawiki.org/wiki/Manual:Pywikibot).
* SQLAlchemy
* PyMySQL
Those can be installed using pip and the _requirements.txt_ file provided with this packet
pip install -r requirements.txt
Versions
--------
* test-v6
- jogobot status API enabled (Bot can be disabled onwiki)
- Fixed Problem with space between article title and anchor
* test-v5
- Feature _markpages_ working in full-automatic mode with _always_-flag
python red.py -task:markpages -family:wikipedia -always
* test-v4
- Feature _markpages_ working in semi-automatic mode using command
python red.py -task:markpages -family:wikipedia
- Work on specific redfam using param
-famhash:[sha1-famhash]
- Use _PyMySQL_ instead of _OurSQL_
- Correctly parse redfams with articles with leading small character or spaces in wikilink
* test-v3
* test-v2
* test-v1
License
-------
GPLv3
Author Information
------------------
Copyright 2017 Jonathan Golder jonathan@golderweb.de https://golderweb.de/
alias Wikipedia.org-User _Jogo.obb_ (https://de.wikipedia.org/Benutzer:Jogo.obb)

View File

@@ -1,26 +0,0 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# __init__.py
#
# Copyright 2015 GOLDERWEB Jonathan Golder <jonathan@golderweb.de>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
# MA 02110-1301, USA.
#
#
"""
Scripts for our redundances bot
"""

2
bots/__init__.py Normal file
View File

@@ -0,0 +1,2 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

367
bots/markpages.py Normal file
View File

@@ -0,0 +1,367 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# markpages.py
#
# Copyright 2016 GOLDERWEB Jonathan Golder <jonathan@golderweb.de>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
# MA 02110-1301, USA.
#
#
"""
Bot to mark pages which were/are subjects of redundance discussions
with templates
"""
import re
from datetime import datetime
import pywikibot
from pywikibot import pagegenerators
from pywikibot.bot import CurrentPageBot
from pywikibot.diff import PatchManager
import mwparserfromhell as mwparser
import jogobot
from lib.redfam import RedFamWorker
class MarkPagesBot( CurrentPageBot ): # sets 'current_page' on each treat()
"""
Bot class to mark pages which were/are subjects of redundance discussions
with templates
"""
def __init__( self, genFactory, **kwargs ):
"""
Constructor
Parameters:
@param genFactory GenFactory with parsed pagegenerator args to
build generator
@type genFactory pagegenerators.GeneratorFactory
@param **kwargs Additional args
@type iterable
"""
# Init attribute
self.__redfams = None # Will hold a generator with our redfams
if "famhash" in kwargs:
self.famhash = kwargs["famhash"]
# We do not use predefined genFactory as there is no sensefull case to
# give a generator via cmd-line for this right now
self.genFactory = pagegenerators.GeneratorFactory()
# Build generator with genFactory
self.build_generator()
# Run super class init with builded generator
super( MarkPagesBot, self ).__init__(
generator=self.gen,
always=True if "always" in kwargs else False )
def run(self):
"""
Controls the overal parsing process, using super class for page switch
Needed to do things before/after treating pages is done
"""
try:
super( MarkPagesBot, self ).run()
except:
raise
else:
# Do status redfam status updates
for redfam in self.redfams:
redfam.update_status()
RedFamWorker.flush_db_cache()
@property
def redfams(self):
"""
Holds redfams generator to work on in this bot
"""
# Create generator if not present
if not self.__redfams:
end_after = datetime.strptime(
jogobot.config["red.markpages"]["mark_done_after"],
"%Y-%m-%d" )
if hasattr(self, "famhash"):
self.__redfams = list(
RedFamWorker.session.query(RedFamWorker).filter(
RedFamWorker.famhash == self.famhash ) )
else:
self.__redfams = list( RedFamWorker.gen_by_status_and_ending(
"archived", end_after) )
return self.__redfams
def build_generator( self ):
"""
Builds generator to pass to super class
"""
# Add Talkpages to work on to generatorFactory
self.genFactory.gens.append( self.redfam_talkpages_generator() )
# Set generator to pass to super class
# Since PreloadingGenerator mixis up the Pages, do not use it right now
# (FS#148)
# We can do so for automatic runs (FS#150)
# self.gen = pagegenerators.PreloadingGenerator(
# self.genFactory.getCombinedGenerator() )
self.gen = self.genFactory.getCombinedGenerator()
def redfam_talkpages_generator( self ):
"""
Wrappers the redfam.article_generator and
passes it to pagegenerators.PageWithTalkPageGenerator().
Then it iterates over the generator and adds a reference to the
related redfam to each talkpage-object.
"""
for redfam in self.redfams:
# We need the talkpage (and only this) of each existing page
for talkpage in pagegenerators.PageWithTalkPageGenerator(
redfam.article_generator(
filter_existing=True,
exclude_article_status=["marked"] ),
return_talk_only=True ):
# Add reference to redfam to talkpages
talkpage.redfam = redfam
yield talkpage
def treat_page( self ):
"""
Handles work on current page
We get a reference to related redfam in current_page.redfam
"""
# First we need to have the current text of page
# and parse it as wikicode
self.current_wikicode = mwparser.parse( self.current_page.text )
# Add notice
# Returns True if added
# None if already present
add_ret = self.add_disc_notice_template()
# Convert wikicode back to string to save
self.new_text = str( self.current_wikicode )
# Define edit summary
summary = jogobot.config["red.markpages"]["mark_done_summary"].format(
reddisc=self.current_page.redfam.get_disc_link() ).strip()
# Make sure summary starts with "Bot:"
if not summary[:len("Bot:")] == "Bot:":
summary = "Bot: " + summary.strip()
# will return True if saved
# False if not saved because of errors
# None if change was not accepted by user
save_ret = self.put_current( self.new_text, summary=summary )
# Normalize title with anchor (replace spaces in anchor)
article = self.current_page.toggleTalkPage().title(
asLink=True, textlink=True)
article = article.strip("[]")
article_parts = article.split("#", 1)
if len(article_parts) == 2:
article_parts[1] = article_parts[1].replace(" ", "_")
article = "#".join(article_parts)
# Status
if add_ret is None or ( add_ret and save_ret ):
self.current_page.redfam.article_remove_status(
"note_rej",
title=article)
self.current_page.redfam.article_remove_status(
"sav_err",
title=article)
self.current_page.redfam.article_add_status(
"marked",
title=article)
elif save_ret is None:
self.current_page.redfam.article_add_status(
"note_rej",
title=article)
else:
self.current_page.redfam.article_add_status(
"sav_err",
title=article)
def add_disc_notice_template( self ):
"""
Will take self.current_wikicode and adds disc notice template after the
last template in leading section or as first element if there is no
other template in leading section
"""
# The notice to add
self.disc_notice = \
self.current_page.redfam.generate_disc_notice_template()
# Check if it is already present in wikicode
if self.disc_notice_present():
return
# Find the right place to insert notice template
# Therfore we need the first section (if there is one)
leadsec = self.current_wikicode.get_sections(
flat=False, include_lead=True )[0]
# There is none on empty pages, so we need to check
if leadsec:
# Get the last template in leadsec
ltemplates = leadsec.filter_templates(recursive=False)
# If there is one, add notice after this
if ltemplates:
# Make sure not separate template and maybe following comment
insert_after_index = self.current_wikicode.index(
ltemplates[-1] )
# If there is more content
if len(self.current_wikicode.nodes) > (insert_after_index + 1):
# Filter one linebreak
if isinstance( self.current_wikicode.get(
insert_after_index + 1),
mwparser.nodes.text.Text) and \
re.search( r"^\n[^\n\S]+$", self.current_wikicode.get(
insert_after_index + 1 ).value ):
insert_after_index += 1
while len(self.current_wikicode.nodes) > \
(insert_after_index + 1) and \
isinstance(
self.current_wikicode.get(insert_after_index + 1),
mwparser.nodes.comment.Comment ):
insert_after_index += 1
self.current_wikicode.insert_after(
self.current_wikicode.get(insert_after_index),
self.disc_notice )
# To have it in its own line we need to add a linbreak before
self.current_wikicode.insert_before(self.disc_notice, "\n" )
# If there is no template, add before first element on page
else:
self.current_wikicode.insert( 0, self.disc_notice )
# To have it in its own line we need to add a linbreak after it
self.current_wikicode.insert_after(self.disc_notice, "\n" )
# If there is no leadsec (and therefore no template in it, we will add
# before the first element
else:
self.current_wikicode.insert( 0, self.disc_notice )
# To have it in its own line we need to add a linbreak after it
self.current_wikicode.insert_after(self.disc_notice, "\n" )
# Notice was added
return True
def disc_notice_present(self):
"""
Checks if disc notice which shall be added is already present.
"""
if self.disc_notice in self.current_wikicode:
return True
# Iterate over Templates with same name (if any) to search equal
# Link to decide if they are the same
for present_notice in self.current_wikicode.ifilter_templates(
matches=self.disc_notice.name ):
# Get reddisc page.title of notice to add
add_notice_link_tile = self.disc_notice.get(
"Diskussion").partition("#")[0]
# Get reddisc page.title of possible present notice
present_notice_link_tile = present_notice.get(
"Diskussion").partition("#")[0]
# If those are equal, notice is already present
if add_notice_link_tile == present_notice_link_tile:
return True
# If nothing is found, loop will run till its end
else:
return False
# We need to overrite this since orginal from pywikibot.bot.CurrentPageBot
# does not return result of self._save_page
def put_current(self, new_text, ignore_save_related_errors=None,
ignore_server_errors=None, **kwargs):
"""
Call L{Bot.userPut} but use the current page.
It compares the new_text to the current page text.
@param new_text: The new text
@type new_text: basestring
@param ignore_save_related_errors: Ignore save related errors and
automatically print a message. If None uses this instances default.
@type ignore_save_related_errors: bool or None
@param ignore_server_errors: Ignore server errors and automatically
print a message. If None uses this instances default.
@type ignore_server_errors: bool or None
@param kwargs: Additional parameters directly given to L{Bot.userPut}.
@type kwargs: dict
"""
# Monkey patch pywikibot.showDiff
pywikibot.showDiff = showDiff
if ignore_save_related_errors is None:
ignore_save_related_errors = self.ignore_save_related_errors
if ignore_server_errors is None:
ignore_server_errors = self.ignore_server_errors
return self.userPut(
self.current_page, self.current_page.text, new_text,
ignore_save_related_errors=ignore_save_related_errors,
ignore_server_errors=ignore_server_errors,
**kwargs)
# We need to have a patched version to set context param to value greater 0 as
# pywikibot.bot.userPut() currently does not support this value
def showDiff(oldtext, newtext, context=3):
"""
Output a string showing the differences between oldtext and newtext.
The differences are highlighted (only on compatible systems) to show which
changes were made.
"""
PatchManager(oldtext, newtext, context=context).print_hunks()

182
bots/reddiscparser.py Normal file
View File

@@ -0,0 +1,182 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# reddiscparser.py
#
# Copyright 2016 GOLDERWEB Jonathan Golder <jonathan@golderweb.de>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
# MA 02110-1301, USA.
#
#
"""
Bot to parse all reddisc pages in given Generator or configured categories
"""
import re
import pywikibot # noqa
from pywikibot import pagegenerators # noqa
from pywikibot.bot import ExistingPageBot, NoRedirectPageBot
import jogobot
from lib.redpage import RedPageParser
from lib.redfam import RedFamParser
class DiscussionParserBot(
# CurrentPageBot, # via next two sets 'current_page' on each treat()
ExistingPageBot, # CurrentPageBot only treats existing pages
NoRedirectPageBot ): # class which only treats non-redirects
"""
Botclass witch initialises the parsing process of Redundancy Discussions
"""
# RegEx to filter wrong pages
onlyinclude_re = re.compile(
jogobot.config["redundances"]["reddiscs_onlyinclude_re"] )
def __init__( self, genFactory, **kwargs ):
"""
Constructor
Parameters:
@param genFactory GenFactory with parsed pagegenerator args to
build generator
@type genFactory pagegenerators.GeneratorFactory
@param **kwargs Additional args
@type iterable
"""
# Copy needed args
self.genFactory = genFactory
# Build generator with genFactory
self.build_generator()
# Run super class init with builded generator
super( DiscussionParserBot, self ).__init__(generator=self.gen)
def build_generator(self):
"""
Builds generator to work on, based on self.genFactory
"""
# Check wether there are generators waiting for factoring, if not
# use configured categories
if not self.genFactory.gens:
self.apply_conf_cat_generators()
# Create combined Generator (Union of all Generators)
gen = self.genFactory.getCombinedGenerator()
if gen:
# The preloading generator is responsible for downloading multiple
# pages from the wiki simultaneously.
self.gen = pagegenerators.PreloadingGenerator(gen)
else:
pywikibot.showHelp()
def apply_conf_cat_generators( self ):
"""
Builds generators for categories which are read from jogobot.config
Parameters:
@param genFactory: The GeneratorFactory to which the builded
generators should be added.
@type genFactory: pagegenerators.GeneratorFactory
"""
# Create Generators for configured Categories
for category in jogobot.config["redundances"]["redpage_cats"]:
gen = self.genFactory.getCategoryGen(
category, gen_func=pagegenerators.CategorizedPageGenerator)
# If there is one, append to genFactory
if gen:
self.genFactory.gens.append(gen)
# Reset gen for next iteration
gen = None
def run( self ):
"""
Controls the overal parsing process, using super class for page switch
Needed to do things before/after treating pages is done
"""
try:
super( DiscussionParserBot, self ).run()
except:
raise
else:
# If successfully parsed all pages in cat, flush db write cache
RedPageParser.flush_db_cache()
def treat_page( self ):
"""
Handles work on current page
"""
# Short circuit excluded pages
if self.current_page.title() in (
jogobot.config["redundances"]["redpage_exclude"] ):
return
# Exclude pages which does not match pattern
if not type(self).onlyinclude_re.search( self.current_page.title() ):
return
# Initiate RedPage object
redpage = RedPageParser.session.query(RedPageParser).filter(
RedPageParser.pageid == self.current_page.pageid ).one_or_none()
if redpage:
redpage.update( self.current_page )
else:
redpage = RedPageParser( self.current_page )
# Check whether parsing is needed
if redpage.is_parsing_needed():
# Count families for failure analysis
fam_counter = 0
# Iterate over returned generator with redfam sections
for fam in redpage.parse():
# Run RedFamParser on section text
RedFamParser.parser( fam, redpage, redpage.archive )
fam_counter += 1
else:
# If successfully parsed whole page, flush
# db write cache
if( fam_counter ):
RedFamParser.flush_db_cache()
jogobot.output( "Page [[{reddisc}]] parsed".format(
reddisc=redpage.page.title() ) )
else:
jogobot.output(
"\03{red}" + "Page [[{reddisc}]], ".format(
reddisc=redpage.page.title() ) +
"containing no redfam, parsed!",
"WARNING" )

1
jogobot Submodule

Submodule jogobot added at 49ada2993e

337
lib/mysqlred.py Normal file
View File

@@ -0,0 +1,337 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# mysqlred.py
#
# Copyright 2015 GOLDERWEB Jonathan Golder <jonathan@golderweb.de>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
# MA 02110-1301, USA.
#
#
"""
Provides interface classes for communication of redundances bot with mysql-db
"""
import atexit # noqa
import pywikibot # noqa
from pywikibot import config
import jogobot
from sqlalchemy import (
create_engine, Column, Integer, String, Text, DateTime, ForeignKey )
from sqlalchemy import text # noqa
from sqlalchemy.engine.url import URL
from sqlalchemy.ext.declarative import (
declarative_base, declared_attr, has_inherited_table )
from sqlalchemy.ext.mutable import MutableComposite, MutableSet
from sqlalchemy.orm import sessionmaker, relationship, composite
from sqlalchemy.orm.collections import attribute_mapped_collection
import sqlalchemy.types as types
Base = declarative_base()
url = URL( "mysql+pymysql",
username=config.db_username,
password=config.db_password,
host=config.db_hostname,
port=config.db_port,
database=config.db_username + jogobot.config['db_suffix'],
query={'charset': 'utf8'} )
engine = create_engine(url, echo=True)
Session = sessionmaker(bind=engine)
session = Session()
family = pywikibot.Site().family.dbName(pywikibot.Site().code)
class Mysql(object):
session = session
@declared_attr
def _tableprefix(cls):
return family + "_"
@declared_attr
def _tablesuffix(cls):
return "s"
@declared_attr
def __tablename__(cls):
if has_inherited_table(cls):
return None
name = cls.__name__[len("Mysql"):].lower()
return cls._tableprefix + name + cls._tablesuffix
def changedp(self):
return self.session.is_modified(self)
class MutableSet(MutableSet):
"""
Extended version of the mutable set for our states
"""
def has(self, item):
"""
Check if item is in set
@param item Item to check
"""
return item in self
def add(self, item):
"""
Extended add method, which only result in changed object if there is
really an item added.
@param item Item to add
"""
if item not in self:
super().add(item)
def discard(self, item):
"""
Wrapper for extended remove below
@param item Item to discard
"""
self.remove(item)
def remove(self, item, weak=True ):
"""
Extended remove method, which only results in changed object if there
is really an item removed. Additionally, combine remove and discard!
@param item Item to remove/discard
@param weak Set to false to use remove, else discard behavior
"""
if item in self:
if weak:
super().discard(item)
else:
super().remove(item)
class ColumnList( list, MutableComposite ):
"""
Combines multiple Colums into a list like object
"""
def __init__( self, *columns ):
"""
Wrapper to the list constructor deciding whether we have initialization
with individual params per article or with an iterable.
"""
# Individual params per article (from db), first one is a str
if isinstance( columns[0], str ) or \
isinstance( columns[0], MutableSet ) or columns[0] is None:
super().__init__( columns )
# Iterable articles list
else:
super().__init__( columns[0] )
def __setitem__(self, key, value):
"""
The MutableComposite class needs to be noticed about changes in our
component. So we tweak the setitem process.
"""
# set the item
super().__setitem__( key, value)
# alert all parents to the change
self.changed()
def __composite_values__(self):
"""
The Composite method needs to have this method to get the items for db.
"""
return self
class Status( types.TypeDecorator ):
impl = types.String
def process_bind_param(self, value, dialect):
"""
Returns status as commaseparated string (to save in DB)
@returns Raw status string
@rtype str
"""
if isinstance(value, MutableSet):
return ",".join( value )
elif isinstance(value, String ) or value is None:
return value
else:
raise TypeError(
"Value should be an instance of one of {0:s},".format(
str( [type(MutableSet()), type(String()), type(None)] ) ) +
"given value was an instance of {1:s}".format(
str(type(value))) )
def process_result_value(self, value, dialect):
"""
Sets status based on comma separated list
@param raw_status Commaseparated string of stati (from DB)
@type raw_status str
"""
if value:
return MutableSet( value.strip().split(","))
else:
return MutableSet([])
def copy(self, **kw):
return Status(self.impl.length)
class MysqlRedFam( Mysql, Base ):
famhash = Column( String(64), primary_key=True, unique=True )
__article0 = Column('article0', String(255), nullable=False )
__article1 = Column('article1', String(255), nullable=False )
__article2 = Column('article2', String(255), nullable=True )
__article3 = Column('article3', String(255), nullable=True )
__article4 = Column('article4', String(255), nullable=True )
__article5 = Column('article5', String(255), nullable=True )
__article6 = Column('article6', String(255), nullable=True )
__article7 = Column('article7', String(255), nullable=True )
__articlesList = composite(
ColumnList, __article0, __article1, __article2, __article3,
__article4, __article5, __article6, __article7 )
heading = Column( Text, nullable=False )
redpageid = Column(
Integer, ForeignKey( family + "_redpages.pageid" ), nullable=False )
beginning = Column( DateTime, nullable=False )
ending = Column( DateTime, nullable=True )
_status = Column( 'status', MutableSet.as_mutable(Status(255)),
nullable=True )
__article0_status = Column(
'article0_status', MutableSet.as_mutable(Status(64)), nullable=True )
__article1_status = Column(
'article1_status', MutableSet.as_mutable(Status(64)), nullable=True )
__article2_status = Column(
'article2_status', MutableSet.as_mutable(Status(64)), nullable=True )
__article3_status = Column(
'article3_status', MutableSet.as_mutable(Status(64)), nullable=True )
__article4_status = Column(
'article4_status', MutableSet.as_mutable(Status(64)), nullable=True )
__article5_status = Column(
'article5_status', MutableSet.as_mutable(Status(64)), nullable=True )
__article6_status = Column(
'article6_status', MutableSet.as_mutable(Status(64)), nullable=True )
__article7_status = Column(
'article7_status', MutableSet.as_mutable(Status(64)), nullable=True )
__articlesStatus = composite(
ColumnList, __article0_status, __article1_status, __article2_status,
__article3_status, __article4_status, __article5_status,
__article6_status, __article7_status )
redpage = relationship( "MysqlRedPage", enable_typechecks=False,
back_populates="redfams" )
@property
def articlesList(self):
"""
List of articles belonging to the redfam
"""
return self.__articlesList
@articlesList.setter
def articlesList(self, articlesList):
# Make sure to always have full length for complete overwrites
while( len(articlesList) < 8 ):
articlesList.append(None)
self.__articlesList = ColumnList(articlesList)
@property
def status( self ):
"""
Current fam status
"""
return self._status
@status.setter
def status( self, status ):
if status:
self._status = MutableSet( status )
else:
self._status = MutableSet()
@property
def articlesStatus(self):
"""
List of status strings/sets for the articles of the redfam
"""
return self.__articlesStatus
@articlesStatus.setter
def articlesStatus(self, articlesStatus):
self.__articlesStatus = ColumnList(articlesStatus)
class MysqlRedPage( Mysql, Base ):
pageid = Column( Integer, unique=True, primary_key=True )
revid = Column( Integer, unique=True, nullable=False )
pagetitle = Column( String(255), nullable=False )
__status = Column( 'status', MutableSet.as_mutable(Status(255)),
nullable=True )
redfams = relationship(
"MysqlRedFam", enable_typechecks=False,
back_populates="redpage", order_by=MysqlRedFam.famhash,
collection_class=attribute_mapped_collection("famhash") )
@property
def status( self ):
"""
Current fam status
"""
return self.__status
@status.setter
def status( self, status ):
if status:
self.__status = MutableSet( status )
else:
self.__status = MutableSet()
Base.metadata.create_all(engine)
class MysqlRedError(Exception):
"""
Basic Exception class for this module
"""
pass
class MysqlRedConnectionError(MysqlRedError):
"""
Raised if there are Errors with Mysql-Connections
"""
pass

763
lib/redfam.py Normal file
View File

@@ -0,0 +1,763 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# redfam.py
#
# Copyright 2017 GOLDERWEB Jonathan Golder <jonathan@golderweb.de>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
# MA 02110-1301, USA.
#
#
"""
Provides classes for working with RedFams
"""
import hashlib
import locale
import re
from datetime import datetime
import mwparserfromhell as mwparser # noqa
import pywikibot # noqa
from pywikibot.tools import deprecated # noqa
import jogobot
from lib.mysqlred import MysqlRedFam, text
class RedFam( MysqlRedFam ):
"""
Basic class for RedFams, containing the basic data structure
"""
def __init__( self, articlesList, beginning, ending=None, redpageid=None,
status=None, famhash=None, heading=None ):
"""
Generates a new RedFam object
@param articlesList list List of articles
@param beginning datetime Beginning date
@param ending datetime Ending date
@param red_page_id int MW pageid of containing RedPage
@param status str Status of RedFam
@param fam_hash str SHA1 hash of articlesList
@param heading str Original heading of RedFam (Link)
"""
# Having pywikibot.Site() is a good idea most of the time
self.site = pywikibot.Site()
super().__init__(
articlesList=articlesList,
beginning=beginning,
ending=ending,
redpageid=redpageid,
famhash=famhash,
heading=heading,
status=status,
articlesStatus=None
)
def __repr__( self ):
"""
Returns repression str of RedFam object
@returns str repr() string
"""
__repr = "RedFam( " + \
"articlesList=" + repr( self.articlesList ) + \
", heading=" + repr( self.heading ) + \
", beginning=" + repr( self.beginning ) + \
", ending=" + repr( self.ending ) + \
", red_page_id=" + repr( self.redpageid ) + \
", status=" + repr( self.status ) + \
", fam_hash=" + repr( self.famhash ) + \
", articlesStatus=" + repr( self.articlesStatus ) + \
" )"
return __repr
@classmethod
def calc_famhash(cls, articlesList ):
"""
Calculates the SHA-1 hash for the articlesList of redundance family.
Since we don't need security SHA-1 is just fine.
@returns str String with the hexadecimal hash digest
"""
h = hashlib.sha1()
# Since articlesList attr of RedFam will have always 8 Members we
# need to fill up smaller lists (longers will be cropped below).
while len( articlesList) < 8:
articlesList.append(None)
h.update( str( articlesList[:8] ).encode('utf-8') )
return h.hexdigest()
@classmethod
def flush_db_cache( cls ):
"""
Calls flush method of Mysql Interface class
"""
cls.session.commit()
def article_add_status(self, status, index=None, title=None ):
"""
Adds a status specified by status, to article (identified by title
or index in articlesList) status set
@param status Statusstring to add
@type status str
@param index Add to article with index in articlesList
@type index int
@param title Add to article with title in articlesList
@type title str
"""
if title and not index:
index = self.articlesList.index( title )
if isinstance( index, int ) and index < len(self.articlesList):
self.articlesStatus[index].add(status)
else:
raise IndexError( "No index given or wrong format!")
def article_remove_status(self, status, index=None, title=None, weak=True):
"""
Removes a status specified by status, from article (identified by title
or index in articlesList) status set
If weak is set to False it will throw a KeyError when trying to
remove a status not set.
@param status Statusstring to add
@type status str
@param index Remove from article with index in articlesList
@type index int
@param title Remove from article with title in articlesList
@type title str
@param weak Change behavior on missing status
@type bool
"""
if title and not index:
index = self.articlesList.index( title )
if isinstance( index, int ) and index < len(self.articlesList):
if weak:
self.articlesStatus[index].discard(status)
else:
self.articlesStatus[index].remove(status)
else:
raise IndexError( "No index given or wrong format!")
def article_has_status(self, status, index=None, title=None ):
"""
Adds a status specified by status, to articles (identified by title
or index in articlesList) status set
@param status Statusstring to add
@type status str
@param index Check article with index in articlesList
@type index int
@param title Check article with title in articlesList
@type title str
"""
if title and not index:
index = self.articlesList.index( title )
if isinstance( index, int ) and index < len(self.articlesList):
if status in self.articlesStatus[index]:
return True
else:
return False
else:
raise IndexError( "No index given or wrong format!")
class RedFamParser( RedFam ):
"""
Provides an interface to RedFam for adding/updating redundance families
while parsig redundance pages
"""
# Define the timestamp format
__timestamp_format = jogobot.config['redundances']['timestamp_format']
# Define section heading re.pattern
__sectionhead_pat = re.compile( r"^(.*\[\[.+\]\].*\[\[.+\]\].*)" )
# Define timestamp re.pattern
__timestamp_pat = re.compile( jogobot.config['redundances']
['timestamp_regex'] )
# Textpattern for recognisation of done-notices
__done_notice = ":<small>Archivierung dieses Abschnittes \
wurde gewünscht von:"
__done_notice2 = "{{Erledigt|"
def __init__( self, articlesList, heading, redpage, redpagearchive,
beginning, ending=None ):
"""
Creates a RedFam object based on data collected while parsing red_pages
combined with possibly former known data from db
@param redfam_heading str Wikitext heading of section
@param redpage page Pywikibot.page object
@param redpagearchive bool Is red_page an archive
@param beginning datetime Timestamp of beginning
str as strptime parseable string
@param ending datetime Timestamp of ending
str strptime parseable string
"""
# Calculates the sha1 hash over self._articlesList to
# rediscover known redundance families
famhash = type(self).calc_famhash(articlesList)
# Set object attributes:
self.redpage = redpage
# Parse Timestamps
beginning = self.__datetime(beginning)
if ending:
ending = self.__datetime(ending)
super().__init__( articlesList,
beginning,
ending=ending,
redpageid=redpage.page._pageid,
famhash=famhash,
heading=heading )
# Check status changes
self.check_status()
self.session.add(self)
def update( self, articlesList, heading, redpage, redpagearchive,
beginning, ending=None ):
self.articlesList = articlesList
self.heading = heading
self.redpage = redpage
self.redpageid = redpage.pageid
self.add_beginning( beginning )
if ending:
self.add_ending( ending )
self._redpagearchive = redpagearchive
# Check status changes
self.check_status()
@classmethod
def heading_parser( cls, heading ):
"""
Parses given red_fam_heading string and saves articles list
@param heading Heading of RedFam-Section
@type heading wikicode or mwparser-parseable
"""
# Parse string heading with mwparse again everytime
# In some cases the given wikicode is broken due to syntax errors
# (Task FS#77)
heading = mwparser.parse( str( heading ) )
articlesList = []
for link in heading.ifilter_wikilinks():
article = str( link.title ).strip()
# Short circuit empty links
if not article:
continue
# Make sure first letter is uppercase
article = article[0].upper() + article[1:]
# Split in title and anchor part
article = article.split("#", 1)
# Replace underscores in title with spaces
article[0] = article[0].replace("_", " ")
if len(article) > 1:
# Strip both parts to prevent leading/trailing spaces
article[0] = article[0].strip()
article[1] = article[1].strip()
# other way round, replace spaces with underscores in anchors
article[1] = article[1].replace(" ", "_")
# Rejoin title and anchor
article = "#".join(article)
# Add to list
articlesList.append(article)
return articlesList
def add_beginning( self, beginning ):
"""
Adds the beginning date of a redundance diskussion to the object
@param datetime datetime Beginning date
"""
self.beginning = self.__datetime( beginning )
def add_ending( self, ending ):
"""
Adds the ending date of a redundance diskussion to the object.
@param datetime datetime Ending date
"""
self.ending = self.__datetime( ending )
def __datetime( self, timestamp ):
"""
Decides wether given timestamp is a parseable string or a
datetime object and returns a datetime object in both cases
@param datetime timestamp Datetime object
str timestamp Parseable string with timestamp
@returns datetime Datetime object
"""
# Make sure locale is set to 'de_DE.UTF-8' to prevent problems
# with wrong month abreviations in strptime
locale.setlocale(locale.LC_ALL, 'de_DE.UTF-8')
if( isinstance( timestamp, datetime ) ):
return timestamp
else:
result = datetime.strptime( timestamp,
type( self ).__timestamp_format )
return result
def check_status( self ):
"""
Handles detection of correct status
There are three possible stati:
- 0 Discussion running --> no ending, page is not an archive
- 1 Discussion over --> ending present, page is not an archive
- 2 Discussion archived --> ending (normaly) present, page is archive
- 3 and greater status was set by worker script, do not change it
"""
# No ending, discussion is running:
# Sometimes archived discussions also have no detectable ending
if not self.ending and not self.redpage.archive:
self.status.add("open")
else:
self.status.remove("open")
if not self.redpage.archive:
self.status.add("done")
else:
self.status.remove("done")
self.status.remove("open")
self.status.add("archived")
@classmethod
def is_section_redfam_cb( cls, heading ):
"""
Used as callback for wikicode.get_sections in redpage.parse to
select sections which are redfams
"""
# Because of strange behavior in some cases, parse heading again
# (Task FS#77)
heading = mwparser.parse( str( heading ) )
# Make sure we have min. two wikilinks in heading to assume a redfam
if len( heading.filter_wikilinks() ) >= 2:
return True
else:
return False
@classmethod
def parser( cls, text, redpage, isarchive=False ):
"""
Handles parsing of redfam section
@param text Text of RedFam-Section
@type text wikicode or mwparser-parseable
"""
# Parse heading with mwparse if needed
if not isinstance( text, mwparser.wikicode.Wikicode ):
text = mwparser.parse( text )
# Extract heading text
heading = next( text.ifilter_headings() ).title.strip()
# Extract beginnig and maybe ending
(beginning, ending) = RedFamParser.extract_dates( text, isarchive )
# Missing beginning (Task: FS#76)
# Use first day of month of reddisc
if not beginning:
match = re.search(
jogobot.config["redundances"]["reddiscs_onlyinclude_re"],
redpage.page.title() )
if match:
beginning = datetime.strptime(
"01. {month} {year}".format(
month=match.group(1), year=match.group(2)),
"%d. %B %Y" )
articlesList = RedFamParser.heading_parser( heading )
famhash = RedFamParser.calc_famhash( articlesList )
# Check for existing objects in DB first in current redpage
redfam = redpage.redfams.get(famhash)
with RedFamParser.session.no_autoflush:
if not redfam:
# Otherwise in db table
redfam = RedFamParser.session.query(RedFamParser).filter(
RedFamParser.famhash == famhash ).one_or_none()
if redfam:
# Existing redfams need to be updated
redfam.update( articlesList, str(heading), redpage, isarchive,
beginning, ending )
else:
# Create the RedFam object
redfam = RedFamParser( articlesList, str(heading),
redpage, isarchive, beginning, ending )
# Add redfam to redpage object
redpage.redfams.set( redfam )
@classmethod
def extract_dates( cls, text, isarchive=False ):
"""
Returns tuple of the first and maybe last timestamp of a section.
Last timestamp is only returned if there is a done notice or param
*isarchiv* is set to 'True'
@param text Text to search in
@type line Any Type castable to str
@param isarchive If true skip searching done notice (on archivepages)
@type isarchive bool
@returns Timestamps, otherwise None
@returntype tuple of strs
"""
# Match all timestamps
matches = cls.__timestamp_pat.findall( str( text ) )
if matches:
# First one is beginning
# Since some timestamps are broken we need to reconstruct them
# by regex match groups
beginning = ( matches[0][0] + ", " + matches[0][1] + ". " +
matches[0][2] + ". " + matches[0][3] )
# Last one maybe is ending
# Done notice format 1
# Done notice format 2
# Or on archivepages
if ( cls.__done_notice in text or
cls.__done_notice2 in text or
isarchive ):
ending = ( matches[-1][0] + ", " + matches[-1][1] + ". " +
matches[-1][2] + ". " + matches[-1][3] )
else:
ending = None
# Missing dates (Task: FS#76)
else:
beginning = None
ending = None
return (beginning, ending)
class RedFamWorker( RedFam ):
"""
Handles working with redundance families stored in database
where discussion is finished
"""
def __init__( self ):
super().__init__()
# Make sure locale is set to 'de_DE.UTF-8' to prevent problems
# with wrong month abreviations in strptime
locale.setlocale(locale.LC_ALL, 'de_DE.UTF-8')
def article_generator(self, # noqa
filter_existing=None, filter_redirects=None,
exclude_article_status=[],
onlyinclude_article_status=[] ):
"""
Yields pywikibot pageobjects for articles belonging to this redfams
in a generator
self.
@param filter_existing Set to True to only get existing pages
set to False to only get nonexisting pages
unset/None results in not filtering
@type filter_existing bool/None
@param filter_redirects Set to True to get only noredirectpages,
set to False to get only redirectpages,
unset/None results in not filtering
@type filter_redirects bool/None
"""
# Helper to leave multidimensional loop
# https://docs.python.org/3/faq/design.html#why-is-there-no-goto
class Continue(Exception):
pass
class Break(Exception):
pass
# Iterate over articles in redfam
for article in self.articlesList:
# To be able to control outer loop from inside child loops
try:
# Not all list elements contain articles
if not article:
raise Break()
page = pywikibot.Page( pywikibot.Link(article),
pywikibot.Site() )
# Filter existing pages if requested with filter_existing=False
if page.exists():
self.article_remove_status( "deleted", title=article )
if filter_existing is False:
raise Continue()
# Filter non existing Pages if requested with
# filter_existing=True
else:
self.article_add_status( "deleted", title=article )
if filter_existing:
raise Continue()
# Filter redirects if requested with filter_redirects=True
if page.isRedirectPage():
self.article_add_status( "redirect", title=article )
if filter_redirects:
raise Continue()
# Filter noredirects if requested with filter_redirects=False
else:
self.article_remove_status("redirect", title=article )
if filter_redirects is False:
raise Continue()
# Exclude by article status
for status in exclude_article_status:
if self.article_has_status( status, title=article ):
raise Continue()
# Only include by article status
for status in onlyinclude_article_status:
if not self.article_has_status( status, title=article ):
raise Continue()
# Proxy loop control to outer loop
except Continue:
continue
except Break:
break
# Yield filtered pages
yield page
def update_status( self ):
"""
Sets status to 3 when worked on
"""
for article in self.articlesList:
if not article:
break
if self.article_has_status( "sav_err", title=article ):
self.status.add( "sav_err" )
return
elif self.article_has_status( "note_rej", title=article ):
self.status.add( "note_rej" )
return
elif not self.article_has_status("deleted", title=article ) and \
not self.article_has_status("redirect", title=article) and\
not self.article_has_status("marked", title=article):
return
self.status.remove("sav_err")
self.status.remove("note_rej")
self.status.add( "marked" )
def get_disc_link( self ):
"""
Constructs and returns the link to Redundancy discussion
@returns Link to diskussion
@rtype str
"""
# Expand templates using pwb site object
site = pywikibot.Site()
anchor_code = site.expand_text(self.heading.strip())
# Remove possibly embbeded files
anchor_code = re.sub( r"\[\[\w+:[^\|]+(?:\|.+){2,}\]\]", "",
anchor_code )
# Replace non-breaking-space by correct urlencoded value
anchor_code = anchor_code.replace( "&nbsp;", ".C2.A0" )
# Use mwparser to strip and normalize
anchor_code = mwparser.parse( anchor_code ).strip_code()
# We try it without any more parsing as mw will do while parsing page
return ( self.redpage.pagetitle + "#" + anchor_code.strip() )
def generate_disc_notice_template( self ):
"""
Generates notice template to add on discussion Pages of Articles when
redundancy discussion is finished
@return Notice template to add on article disc
@rtype wikicode-node
"""
# Generate template boilerplate
template = mwparser.nodes.template.Template(
jogobot.config['redundances']['disc_notice_template_name'])
# Index of first article's param
param_cnt = 3
# Iterate over articles in redfam
for article in self.articlesList:
if not article:
break
# Make sure to only use 8 articles (max. param 10)
if param_cnt > 10:
break
# Add param for article
template.add( param_cnt, article, True )
param_cnt += 1
# Add begin
begin = self.beginning.strftime( "%B %Y" )
template.add( "Beginn", begin, True )
# Add end (if not same as begin)
end = self.ending.strftime( "%B %Y" )
if not end == begin:
template.add( "Ende", end, True )
# Add link to related reddisc
template.add( "Diskussion", self.get_disc_link(), True )
# Add signature and timestamp
# Not used atm
# template.add( 1, "-- ~~~~", True )
return template
@classmethod
def list_by_status( cls, status ):
"""
Lists red_fams stored in db by given status
"""
mysql = MysqlRedFam()
for fam in mysql.get_by_status( status ):
try:
print( cls( fam ) )
except RedFamHashError:
print(fam)
raise
@classmethod
def gen_by_status_and_ending( cls, status, ending ):
"""
Yield red_fams stored in db by given status which have an ending after
given one
"""
for redfam in RedFamWorker.session.query(RedFamWorker).filter(
# NOT WORKING WITH OBJECT NOTATION
# RedFamWorker._status.like('archived'),
# RedFamWorker._status.like("%{0:s}%".format(status)),
text("status LIKE '%archived%'"),
text("status NOT LIKE '%marked%'"),
RedFamWorker.ending >= ending ):
yield redfam
class RedFamError( Exception ):
"""
Base class for all Errors of RedFam-Module
"""
def __init__( self, message=None ):
"""
Handles Instantiation of RedFamError's
"""
if not message:
self.message = "An Error occured while executing a RedFam action"
else:
self.message = message
def __str__( self ):
"""
Output of error message
"""
return self.message
class RedFamHashError( RedFamError ):
"""
Raised when given RedFamHash does not match with calculated
"""
def __init__( self, givenHash, calculatedHash ):
message = "Given fam_hash ('{given}') does not match with \
calculated ('{calc}'".format( given=givenHash, calc=calculatedHash )
super().__init__( message )
class RedFamHeadingError ( RedFamError ):
"""
Raised when given RedFamHeading does not match __sectionhead_pat Regex
"""
def __init__( self, heading ):
message = "Error while trying to parse section heading. Given heading \
'{heading}' does not match RegEx".format( heading=heading )
super().__init__( message )

143
lib/redpage.py Normal file
View File

@@ -0,0 +1,143 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# redpage.py
#
# Copyright 2015 GOLDERWEB Jonathan Golder <jonathan@golderweb.de>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
# MA 02110-1301, USA.
#
#
"""
Provides a class for handling redundance discussion pages and archives
"""
import pywikibot # noqa
import mwparserfromhell as mwparser
import jogobot # noqa
from lib.mysqlred import (
MysqlRedPage, relationship, attribute_mapped_collection )
from lib.redfam import RedFamParser
class RedPage( MysqlRedPage ):
"""
Class for handling redundance discussion pages and archives
"""
def __init__( self, page=None, pageid=None, archive=False ):
"""
Generate a new RedPage object based on the given pywikibot page object
@param page Pywikibot/MediaWiki page object for page
@type page pywikibot.Page
@param pageid MW-Pageid for related page
@type pageid int
"""
# Safe the pywikibot page object
if page:
self._page = page
super().__init__(
pageid=self._page.pageid,
revid=self._page._revid,
pagetitle=self._page.title(),
status=None
)
self.is_archive()
self.session.add(self)
def update( self, page ):
self._page = page
self.revid = page._revid
self.pagetitle = page.title()
self.is_archive()
@property
def page(self):
if not hasattr(self, "_page"):
self._page = pywikibot.Page( pywikibot.Site(), self.pagetitle )
return self._page
@property
def archive(self):
self.is_archive()
return self.status.has("archive")
def is_archive( self ):
"""
Detects wether current page is an archive of discussions
"""
if( ( u"/Archiv" in self.page.title() ) or
( "{{Archiv}}" in self.page.text ) or
( "{{Archiv|" in self.page.text ) ):
self.status.add("archive")
else:
self.status.discard("archive")
def is_parsing_needed( self ):
"""
Decides wether current RedPage needs to be parsed or not
"""
return self.changedp() or not self.status.has("parsed")
def parse( self ):
"""
Handles the parsing process
"""
# Generate Wikicode object
self.wikicode = mwparser.parse( self.page.text )
# Select RedFam-sections
# matches=Regexp or
# function( gets heading content as wikicode as param 1)
# include_lead = if true include first section (intro)
# include_heading = if true include heading
fams = self.wikicode.get_sections(
matches=RedFamParser.is_section_redfam_cb,
include_lead=False, include_headings=True )
# Iterate over RedFam
for fam in fams:
yield fam
else:
self.status.add("parsed")
self._parsed = True
@classmethod
def flush_db_cache( cls ):
"""
Calls flush method of Mysql Interface class
"""
cls.session.commit()
class RedPageParser( RedPage ):
"""
Wrapper class to change the type of redfams collection elements in parser
"""
redfams = relationship(
"RedFamParser", enable_typechecks=False, back_populates="redpage",
collection_class=attribute_mapped_collection("famhash") )

View File

@@ -1,255 +0,0 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# mysqlred.py
#
# Copyright 2015 GOLDERWEB Jonathan Golder <jonathan@golderweb.de>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
# MA 02110-1301, USA.
#
#
"""
Provides interface classes for communication of redundances bot with mysql-db
"""
# Prefere using oursql then MySQLdb
try:
import oursql as mysqldb
except ImportError:
import MySQLdb as mysqldb
from pywikibot import config
import jogobot
class MysqlRed:
"""
Basic interface class, containing opening of connection
Specific querys should be defined in descendant classes per data type
"""
# Save mysqldb-connection as class attribute to use only one
# in descendant classes
connection = False
db_hostname = config.db_hostname
db_username = config.db_username
db_password = config.db_password
db_name = config.db_username + jogobot.db_namesuffix
def __init__( self ):
"""
Opens a connection to MySQL-DB
@returns mysql-stream MySQL Connection
"""
# Connect to mysqldb only once
if not type( self ).connection:
type( self ).connection = mysqldb.connect(
host=type( self ).db_hostname,
user=type( self ).db_username,
passwd=type( self ).db_password,
db=type( self ).db_name )
def __del__( self ):
"""
Before deleting class, close connection to MySQL-DB
"""
type( self ).connection.close()
class MysqlRedPage( MysqlRed ):
"""
MySQL-db Interface for handling querys for RedPages
"""
def __init__( self, page_id ):
"""
Creates a new instance, runs __init__ of parent class
"""
super().__init__( )
self.__page_id = int( page_id )
self.data = self.get_page()
def __del__( self ):
pass
def get_page( self ):
"""
Retrieves a red page row from MySQL-Database for given page_id
@param int page_id MediaWiki page_id for page to retrieve
@returns tuple Tuple with data for given page_id
bool FALSE if none found
"""
cursor = type( self ).connection.cursor(mysqldb.DictCursor)
cursor.execute( 'SELECT * FROM `red_pages` WHERE `page_id` = ?;',
( self.__page_id, ) )
res = cursor.fetchone()
if res:
return res
else:
return False
def add_page( self, page_title, rev_id, status=0 ):
"""
Inserts a red page row in MySQL-Database for given page_id
@param int rev_id MediaWiki current rev_id
@param str page_title MediaWiki new page_title
@param int status Page parsing status
"""
cursor = type( self ).connection.cursor()
if not page_title:
page_title = self.data[ 'page_title' ]
if not rev_id:
rev_id = self.data[ 'rev_id' ]
query = 'INSERT INTO `red_pages` \
( page_id, page_title, rev_id, status ) \
VALUES ( ?, ?, ?, ? );'
data = ( self.__page_id, page_title, rev_id, status )
cursor.execute( query, data)
type( self ).connection.commit()
self.data = self.get_page()
def update_page( self, rev_id=None, page_title=None, status=0 ):
"""
Updates the red page row in MySQL-Database for given page_id
@param int rev_id MediaWiki current rev_id
@param str page_title MediaWiki new page_title
@param int status Page parsing status
"""
cursor = type( self ).connection.cursor()
if not page_title:
page_title = self.data[ 'page_title' ]
if not rev_id:
rev_id = self.data[ 'rev_id' ]
query = 'UPDATE `red_pages` \
SET `page_title` = ?, `rev_id` = ?, `status`= ? \
WHERE `page_id` = ?;'
data = ( page_title, rev_id, status, self.__page_id )
cursor.execute( query, data)
type( self ).connection.commit()
class MysqlRedFam( MysqlRed ):
"""
MySQL-db Interface for handling querys for RedFams
"""
def __init__( self, fam_hash ):
"""
Creates a new instance, runs __init__ of parent class
"""
super().__init__( )
self.__fam_hash = fam_hash
self.data = self.get_fam()
def __del__( self ):
pass
def get_fam( self ):
"""
Retrieves a red family row from MySQL-Database for given fam_hash
@returns dict Dictionairy with data for given fam hash
False if none found
"""
cursor = type( self ).connection.cursor( mysqldb.DictCursor )
cursor.execute( 'SELECT * FROM `red_families` WHERE `fam_hash` = ?;',
( self.__fam_hash, ) )
res = cursor.fetchone()
if res:
return res
else:
return False
def add_fam( self, articlesList, heading, red_page_id,
beginning, ending=None, status=0 ):
cursor = type( self ).connection.cursor()
query = 'INSERT INTO `red_families` \
( fam_hash, red_page_id, beginning, ending, status, heading, \
article0, article1, article2, article3, \
article4, article5, article6, article7 ) \
VALUES ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? );'
data = [ self.__fam_hash, red_page_id, beginning, ending,
status, heading ]
for article in articlesList:
data.append( str( article ) )
while len( data ) < 14:
data.append( None )
data = tuple( data )
cursor.execute( query, data)
type( self ).connection.commit()
self.data = self.get_fam()
def update_fam( self, red_page_id, heading, beginning, ending, status ):
"""
Updates the red fam row in MySQL-Database for given fam_hash
@param int red_page_id MediaWiki page_id
@param datetime beginning Timestamp of beginning
qparam datetime ending Timestamp of ending of
@param int status red_fam status
"""
cursor = type( self ).connection.cursor()
query = 'UPDATE `red_families` \
SET `red_page_id` = ?, `heading` = ?, `beginning` = ?, \
`ending` = ?, `status`= ? WHERE `fam_hash` = ?;'
data = ( red_page_id, heading, beginning,
ending, status, self.__fam_hash )
cursor.execute( query, data)
type( self ).connection.commit()

146
red.py Normal file
View File

@@ -0,0 +1,146 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# reddiscparser.py
#
# Copyright 2016 GOLDERWEB Jonathan Golder <jonathan@golderweb.de>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
# MA 02110-1301, USA.
#
#
"""
Wrapper script to invoke all redundances bot tasks
"""
import os
import locale
import pywikibot
import jogobot
def prepare_bot( task_slug, subtask, genFactory, subtask_args ):
"""
Handles importing subtask Bot class and prepares specific args
Throws exception if bot not exists
@param task_slug Task slug, needed for logging
@type task_slug str
@param subtask Slug of given subtask
@type subtask str
@param genFactory GenFactory with parsed pagegenerator args
@type genFactory pagegenerators.GeneratorFactory
@param subtask_args Additional args for subtasks
@type subtask_args dict\
@returns The following tuple
@return 1 Subtask slug (replaced None for default)
@rtype str
@return 2 Botclass of given subtask (Arg "-task")
@rtype Class
@return 3 GenFactory with parsed pagegenerator args
@rtype pagegenerators.GeneratorFactory
@return 4 Additional args for subtasks
@rtype dict
@rtype tuple
"""
# kwargs are passed to selected bot as **kwargs
kwargs = subtask_args
if not subtask or subtask == "discparser":
# Default case: discparser
subtask = "discparser"
# Import related bot
from bots.reddiscparser import DiscussionParserBot as Bot
elif subtask == "markpages":
# Import related bot
from bots.markpages import MarkPagesBot as Bot
# Subtask error
else:
jogobot.output( (
"\03{{red}} Given subtask \"{subtask} \"" +
"is not existing!" ).format( subtask=subtask ), "ERROR" )
raise Exception
return ( subtask, Bot, genFactory, kwargs )
def parse_red_args( argkey, value ):
"""
Process additional args for red.py
@param argkey The arguments key
@type argkey str
@param value The arguments value
@type value str
@return Tuple with (key, value) if given pair is relevant, else None
@rtype tuple or None
"""
if argkey.startswith("-famhash"):
return ( "famhash", value )
return None
def main(*args):
"""
Process command line arguments and invoke bot.
If args is an empty list, sys.argv is used.
@param args: command line arguments
@type args: list of unicode
"""
# Make sure locale is set to 'de_DE.UTF-8' to prevent problems
# with wrong month abreviations in strptime
locale.setlocale(locale.LC_ALL, 'de_DE.UTF-8')
# Process global arguments to determine desired site
local_args = pywikibot.handle_args(args)
# Get the jogobot-task_slug (basename of current file without ending)
task_slug = os.path.basename(__file__)[:-len(".py")]
# Disabled until [FS#86] is done
# Before run, we need to check wether we are currently active or not
if not jogobot.bot.active( task_slug ):
return
# Parse local Args to get information about subtask
( subtask, genFactory, subtask_args ) = jogobot.bot.parse_local_args(
local_args, parse_red_args )
# select subtask and prepare args
( subtask, Bot, genFactory, kwargs ) = prepare_bot(
task_slug, subtask, genFactory, subtask_args )
# Init Bot
bot = jogobot.bot.init_bot( task_slug, subtask, Bot, genFactory, **kwargs)
# Run bot
jogobot.bot.run_bot( task_slug, subtask, bot )
if( __name__ == "__main__" ):
main()

364
redfam.py
View File

@@ -1,364 +0,0 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# redfam.py
#
# Copyright 2015 GOLDERWEB Jonathan Golder <jonathan@golderweb.de>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
# MA 02110-1301, USA.
#
#
"""
Provides classes for working with RedFams
"""
import hashlib
import locale
import re
from datetime import datetime
import pywikibot
from .mysqlred import MysqlRedFam
class RedFam:
"""
Basic class for RedFams, containing the basic data structure
"""
def __init__( self, fam_hash=None, articlesList=None, red_page_id=None,
beginning=None, ending=None, status=0 ):
"""
Generates a new RedFam object
@param articlesList list List of articles
@param beginning datetime Beginning date
@param ending datetime Ending date
"""
pass
def __repr__( self ):
if( self._beginning ):
beginning = ", beginning=" + repr( self._beginning )
else:
beginning = ""
if( self._ending ):
ending = ", ending=" + repr( self._ending )
else:
ending = ""
__repr = "RedFam( " + repr( self._articlesList ) + beginning +\
ending + ", status=" + repr( self._status ) + " )"
return __repr
class RedFamParser( RedFam ):
"""
Provides an interface to RedFam for adding/updating redundance families
while parsig redundance pages
"""
# Define the timestamp format
__timestamp_format = "%H:%M, %d. %b. %Y"
# Define section heading re.pattern
__sectionhead_pat = re.compile( r"^(=+)(.*\[\[.+\]\].*\[\[.+\]\].*)\1" )
# Define timestamp re.pattern
__timestamp_pat = re.compile( r"(\d{2}:\d{2}), (\d{1,2}). (Jan|Feb|Mär|Apr|Mai|Jun|Jul|Aug|Sep|Okt|Nov|Dez).? (\d{4})" ) # noqa
# Textpattern for recognisation of done-notices
__done_notice = ":<small>Archivierung dieses Abschnittes \
wurde gewünscht von:"
__done_notice2 = "{{Erledigt|"
def __init__( self, heading, red_page_id, red_page_archive,
beginning, ending=None ):
"""
Creates a RedFam object based on data collected while parsing red_pages
combined with possibly former known data from db
@param red_fam_heading str Wikitext heading of section
@param red_page_id int MediaWiki page_id
@param red_page_archive bool Is red_page an archive
@param beginning datetime Timestamp of beginning
str as strptime parseable string
@param ending datetime Timestamp of ending
str strptime parseable string
"""
# Set object attributes:
self._red_page_id = red_page_id
self._red_page_archive = red_page_archive
# Method self.add_beginning sets self._beginning directly
self.add_beginning( beginning )
# Method self.add_ending sets self._ending directly
if( ending ):
self.add_ending( ending )
else:
# If no ending was provided set to None
self._ending = None
self._status = None
# Parse the provided heading of redundance section
# to set self._articlesList
self.heading_parser( heading )
# Calculates the sha1 hash over self._articlesList to
# rediscover known redundance families
self.fam_hash()
# Open database connection, ask for data if existing,
# otherwise create entry
self.__handle_db()
# Check status changes
self.status()
# Triggers db update if anything changed
self.changed()
def __handle_db( self ):
"""
Handles opening of db connection
"""
# We need a connection to our mysqldb
self.__mysql = MysqlRedFam( self._fam_hash )
if not self.__mysql.data:
self.__mysql.add_fam( self._articlesList, self._heading,
self._red_page_id, self._beginning,
self._ending )
def heading_parser( self, heading ):
"""
Parses given red_fam_heading string and saves articles list
"""
# Predefine a pattern for wikilinks' destination
wikilink_pat = re.compile( r"\[\[([^\[\]\|]*)(\]\]|\|)" )
# Parse content of heading for generating section links later
match = self.__sectionhead_pat.search( heading )
if match:
self._heading = match.group(2).lstrip()
else:
raise ValueError( "Heading is not valid" )
# We get the pages in first [0] element iterating over
# wikilink_pat.findall( line )
self._articlesList = [ link[0] for link
in wikilink_pat.findall( self._heading ) ]
# Catch sections with more then 8 articles, print error
if len( self._articlesList ) > 8:
pywikibot.output( "{datetime} \03{{lightred}}[WARNING] \
Maximum number of articles in red_fam exceeded, \
maximum number is 8, {number:d} were given\n\
{repress}".format(
datetime=datetime.now().strftime( "%Y-%m-%d %H:%M:%S" ),
number=len( self._articlesList ), repress=repr( self ) ) )
self._articlesList = self._articlesList[:8]
def fam_hash( self ):
"""
Calculates the SHA-1 hash for the articlesList of redundance family.
Since we don't need security SHA-1 is just fine.
@returns str String with the hexadecimal hash digest
"""
h = hashlib.sha1()
h.update( str( self._articlesList ).encode('utf-8') )
self._fam_hash = h.hexdigest()
def add_beginning( self, beginning ):
"""
Adds the beginning date of a redundance diskussion to the object
@param datetime datetime Beginning date
"""
self._beginning = self.__datetime( beginning )
def add_ending( self, ending ):
"""
Adds the ending date of a redundance diskussion to the object.
@param datetime datetime Ending date
"""
self._ending = self.__datetime( ending )
def __datetime( self, timestamp ):
"""
Decides wether given timestamp is a parseable string or a
datetime object and returns a datetime object in both cases
@param datetime timestamp Datetime object
str timestamp Parseable string with timestamp
@returns datetime Datetime object
"""
# Make sure locale is set to 'de_DE.UTF-8' to prevent problems
# with wrong month abreviations in strptime
locale.setlocale(locale.LC_ALL, 'de_DE.UTF-8')
if( isinstance( timestamp, datetime ) ):
return timestamp
else:
result = datetime.strptime( timestamp,
type( self ).__timestamp_format )
return result
def status( self ):
"""
Handles detection of correct status
There are three possible stati:
- 0 Discussion running --> no ending, page is not an archive
- 1 Discussion over --> ending present, page is not an archive
- 2 Discussion archived --> ending (normaly) present, page is archive
- 3 and greater status was set by worker script, do not change it
"""
# Do not change stati set by worker script etc.
if not self.__mysql.data['status'] > 2:
# No ending, discussion is running:
# Sometimes archived discussions also have no detectable ending
if not self._ending and not self._red_page_archive:
self._status = 0
else:
if not self._red_page_archive:
self._status = 1
else:
self._status = 2
else:
self._status = self.__mysql.data[ 'status' ]
def changed( self ):
"""
Checks wether anything has changed and maybe triggers db update
"""
# On archived red_fams do not delete possibly existing ending
if( not self._ending and self._status > 1
and self.__mysql.data[ 'ending' ] ):
self._ending = self.__mysql.data[ 'ending' ]
# Since status change means something has changed, update database
if( self._status != self.__mysql.data[ 'status' ] or
self._beginning != self.__mysql.data[ 'beginning' ] or
self._ending != self.__mysql.data[ 'ending' ] or
self._red_page_id != self.__mysql.data[ 'red_page_id' ] or
self._heading != self.__mysql.data[ 'heading' ]):
self.__mysql.update_fam( self._red_page_id, self._heading,
self._beginning, self._ending,
self._status )
@classmethod
def is_sectionheading( cls, line ):
"""
Checks wether given line is a red_fam section heading
@param str line String to check
@returns bool Returns True if it is a section heading
"""
if cls.__sectionhead_pat.search( line ):
return True
else:
return False
@classmethod
def is_beginning( cls, line ):
"""
Returns the first timestamp found in line, otherwise None
@param str line String to search in
@returns str Timestamp, otherwise None
"""
match = cls.__timestamp_pat.search( line )
if match:
# Since some timestamps are broken we need to reconstruct them
# by regex match groups
result = match.group(1) + ", " + match.group(2) + ". " +\
match.group(3) + ". " + match.group(4)
return result
else:
return None
@classmethod
def is_ending( cls, line ):
"""
Returns the timestamp of done notice ( if one ), otherwise None
@param str line String to search in
@returns str Timestamp, otherwise None
"""
if ( cls.__done_notice in line ) or ( cls.__done_notice2 in line ):
match = cls.__timestamp_pat.search( line )
if match:
# Since some timestamps are broken we need to reconstruct them
# by regex match groups
result = match.group(1) + ", " + match.group(2) + ". " +\
match.group(3) + ". " + match.group(4)
return result
return None
@classmethod
def is_ending2( cls, line ):
"""
Returns the last timestamp found in line, otherwise None
@param str line String to search in
@returns str Timestamp, otherwise None
"""
matches = cls.__timestamp_pat.findall( line )
if matches:
# Since some timestamps are broken we need to reconstruct them
# by regex match groups
result = matches[-1][0] + ", " + matches[-1][1] + ". " +\
matches[-1][2] + ". " + matches[-1][3]
return result
else:
return None
class RedFamWorker( RedFam ):
"""
Handles working with redundance families stored in database
where discussion is finished
"""
pass

View File

@@ -1,182 +0,0 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# redpage.py
#
# Copyright 2015 GOLDERWEB Jonathan Golder <jonathan@golderweb.de>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
# MA 02110-1301, USA.
#
#
"""
Provides a class for handling redundance discussion pages and archives
"""
import pywikibot # noqa
from .mysqlred import MysqlRedPage
from .redfam import RedFamParser
class RedPage:
"""
Class for handling redundance discussion pages and archives
"""
def __init__( self, page, archive=False ):
"""
Generate a new RedPage object based on the given pywikibot page object
@param page page Pywikibot/MediaWiki page object for page
"""
# Safe the pywikibot page object
self.page = page
self._archive = archive
self.__handle_db( )
self.is_page_changed()
self._parsed = None
if( self._changed or self.__mysql.data[ 'status' ] == 0 ):
self.parse()
self.__update_db()
def __handle_db( self ):
"""
Handles opening of db connection
"""
# We need a connection to our mysqldb
self.__mysql = MysqlRedPage( self.page._pageid )
if not self.__mysql.data:
self.__mysql.add_page( self.page.title(), self.page._revid )
def is_page_changed( self ):
"""
Check wether the page was changed since last run
"""
if( self.__mysql.data != { 'page_id': self.page._pageid,
'rev_id': self.page._revid,
'page_title': self.page.title(),
'status': self.__mysql.data[ 'status' ] } ):
self._changed = True
else:
self._changed = False
def is_archive( self ):
"""
Detects wether current page is an archive of discussions
"""
if( self._archive or ( u"/Archiv" in self.page.title() ) or
( "{{Archiv}}" in self.page.text ) or
( "{{Archiv|" in self.page.text ) ):
return True
else:
return False
def parse( self ):
"""
Handles the parsing process
"""
# Since @param text is a string we need to split it in lines
text_lines = self.page.text.split( "\n" )
length = len( text_lines )
# Initialise line counter
i = 0
fam_heading = None
beginning = None
ending = None
# Set line for last detected Redundance-Family to 0
last_fam = 0
# Iterate over the lines of the page
for line in text_lines:
# Check wether we have an "Redundance-Family"-Section heading
if RedFamParser.is_sectionheading( line ):
# Save line number for last detected Redundance-Family
last_fam = i
# Save heading
fam_heading = line
# Defined (re)initialisation of dates
beginning = None
ending = None
# Check wether we are currently in an "Redundance-Family"-Section
if i > last_fam and last_fam > 0:
# Check if we have alredy recognized the beginning date of the
# discussion (in former iteration) or if we have a done-notice
if not beginning:
beginning = RedFamParser.is_beginning( line )
elif not ending:
ending = RedFamParser.is_ending( line )
# Detect end of red_fam section (next line is new sectionheading)
# or end of file
# Prevent from running out of index
if i < (length - 1):
test = RedFamParser.is_sectionheading( text_lines[ i + 1 ] )
else:
test = False
if ( test or ( length == ( i + 1 ) ) ):
# Create the red_fam object
if( fam_heading and beginning ):
# Maybe we can find a ending by feed if we have None yet
# (No done notice on archive pages)
if not ending and self.is_archive():
j = i
while (j > last_fam) and not ending:
j -= 1
ending = RedFamParser.is_ending2( text_lines[ j ] )
# Create the RedFam object
red_fam = RedFamParser( fam_heading, self.page._pageid,
self.is_archive(), beginning,
ending )
# Increment line counter
i += 1
else:
self._parsed = True
def __update_db( self ):
"""
Updates the page meta data in mysql db
"""
if( self._parsed or not self._changed ):
status = 1
if( self.is_archive() ):
status = 2
else:
status = 0
self.__mysql.update_page( self.page._revid, self.page.title(), status )

23
requirements.txt Normal file
View File

@@ -0,0 +1,23 @@
# This is a PIP 6+ requirements file for using jogobot-red
#
# All dependencies can be installed using:
# $ sudo pip install -r requirements.txt
#
# It is good practise to install packages using the system
# package manager if it has a packaged version. If you are
# unsure, please use pip as described at the top of the file.
#
# To get a list of potential matches, use
#
# $ awk -F '[#>=]' '{print $1}' requirements.txt | xargs yum search
# or
# $ awk -F '[#>=]' '{print $1}' requirements.txt | xargs apt-cache search
# Needed for Database-Connection
# SQLAlchemy Python ORM-Framework
SQLAlchemy>=1.1
# PyMySQL DB-Connector
PyMySQL>=0.7
# Also needed, but not covered here, is a working copy of pywikibot-core
# which also brings mwparserfromhell

View File

@@ -1,2 +0,0 @@
[flake8]
ignore = E129,E201,E202,W293