73 Commits

Author SHA1 Message Date
5c277495a3 Merge branch 'release-1.1' 2018-05-17 12:41:35 +02:00
a466ab4e74 Prepare release-1.1 2018-05-17 12:41:06 +02:00
860a285ab0 Merge branch 'i#68-exclude-users' into develop 2018-05-17 12:36:37 +02:00
2c105336b0 RedFamWorker: Exclude users and user talkpages
Users can't be part of valid redundances

Issue #68 (#68)
2018-05-17 12:35:38 +02:00
ea85ca731f Merge branch 'i#69-already-talkpage' into develop 2018-05-17 12:28:17 +02:00
6e119ea98f RedFamWorker: Improve talkpagetoggling
Do not toggle to main page if we have already a talkpage and vice versa

Issue #69 (#69)
2018-05-17 12:26:37 +02:00
67aaf3cbbe Merge branch 'i#70-follow-moved-pages' into develop 2018-05-17 12:24:00 +02:00
fa13e2a5cf Follow moved pages
Keep notice together with content
https://de.wikipedia.org/w/index.php?title=Benutzer_Diskussion:Jogo.obb&oldid=176464377#Redundanzhinweis_zu_zwischenzeitlich_verschobenen_Artikeln

Issue #70 (#70)
2018-05-17 12:18:13 +02:00
562e689418 Merge branch 'release-1.0' back into develop 2017-11-05 12:30:18 +01:00
ae1ee7d6a5 Merge branch 'release-1.0' 2017-11-05 12:28:21 +01:00
93447d8dc6 Prepare release v1.0
Update Copyright Notices
Version information
2017-11-05 12:25:13 +01:00
1b6faf9e53 Use own db for red-task
Since we have several tables and sometimes need to create a copy on
replication servers.
2017-11-05 12:17:05 +01:00
b4c193eedc Disable echoing of SQLAlchemy Egine
We don't need this extensive output for production
2017-11-05 12:07:38 +01:00
788a3df0cd Update jogobot-submodule to v0.1 2017-11-05 12:00:28 +01:00
04f591b466 Merge branch 'fs#161-add-article-titles' into develop 2017-11-05 11:24:15 +01:00
9640467f69 markpages: Use redarticle attribute of Page
Instead of trying to reconstruct our db article title, use the one added
to Page-object by redfam.article_generator

Related Task: [FS#161](https://fs.golderweb.de/index.php?do=details&task_id=161)
2017-11-05 11:22:43 +01:00
bfec2abf98 markpages: Get rid of PageWithTalkPageGenerator
Since redfam.article_generator can yield talkpage with additional
information about redfam and current article from db, we do not need it
anymore.

Related Task: [FS#161](https://fs.golderweb.de/index.php?do=details&task_id=161)
2017-11-05 11:20:55 +01:00
20103d589d redfam: article_generator add redfam info to page
Add reference to redfam object and article title from db to Page object
since Page.title() may differe (short Namespaces, anchors, special chars)

Related Task: [FS#161](https://fs.golderweb.de/index.php?do=details&task_id=161)
2017-11-05 11:18:53 +01:00
e18aa96a84 redfam: article_generator can return talkpage
To make pywikibot.pagegenerators.PageWithTalkPageGenerators unneccessary
so we can manipulate talkpage object directly

Related Task: [FS#161](https://fs.golderweb.de/index.php?do=details&task_id=161)
2017-11-05 11:15:04 +01:00
1dd4c7f87e Merge branch 'test-v7' back into develop 2017-11-02 18:57:59 +01:00
33b2e47312 Describe version test-v7 2017-10-28 22:43:53 +02:00
3bd17ce692 Merge branch 'fs#160-urlencoded-chars' into develop 2017-10-28 22:36:55 +02:00
5f4640d5ff Replace urlencoded chars with unicode equivalent
Otherwise we get value errors while marking since pwb replaces those

Related Task: [FS#160](https://fs.golderweb.de/index.php?do=details&task_id=160)
2017-10-28 22:35:25 +02:00
7e0456ae4f Merge branch 'test-v6' back into develop 2017-10-28 22:34:30 +02:00
108b7aa331 Describe version test-v6 2017-10-28 18:46:30 +02:00
a3adf31b89 Merge branch 'fs#86-activate-status-api' into develop 2017-10-28 18:44:42 +02:00
614f288bb9 Activate jogobot status api for onwiki disabling
Related Task: [FS#86](https://fs.golderweb.de/index.php?do=details&task_id=86)
2017-10-28 18:44:05 +02:00
c450a045bf Merge branch 'fs#159-space-before-anchor' into develop 2017-10-28 18:43:13 +02:00
84802cf521 Remove leading or trailing spaces in articles
Some articles contain spaces between title and anchor part which will
be stripped now

Related Task: [FS#159](https://fs.golderweb.de/index.php?do=details&task_id=159)
2017-10-28 18:41:06 +02:00
5f6c443ba8 Merge branch 'test-v5' back into develop 2017-10-28 18:17:01 +02:00
0c135ef1bb Describe version test-v5 2017-09-23 23:50:42 +02:00
8b8221cfcd Merge branch 'fs#152-respect-always-flag' into develop 2017-09-23 23:49:59 +02:00
bdccc8417c Set always in Pywikibot.Basebot
If cmdline param -always is set, set the related option in
Pywikibot.Basebot Object for automatic edits with out further requests

Related Task: [FS#152](https://fs.golderweb.de/index.php?do=details&task_id=152)
2017-09-23 23:49:41 +02:00
a70835c58a Merge back branch 'test-v4' into develop 2017-09-23 23:48:25 +02:00
ec2b84df2a Add requirements
To make setup of environment for this module easier
2017-09-23 21:09:58 +02:00
88848cb084 Prepare Version test-v4 for release
Add a README.md file for this project
2017-09-23 20:32:13 +02:00
5057aed0d3 Merge branch 'fs#157-lowercase-title' into develop 2017-09-09 21:47:03 +02:00
02e53475f1 Prevent lowercase article titles in Parser
Since real lowercase article titles are not allowed, make sure to
convert all first letters of article titles to uppercase. This is
neccessary since pywikibot will return article titles like this.

Related Task: [FS#157](https://fs.golderweb.de/index.php?do=details&task_id=157)
2017-09-09 21:35:36 +02:00
d6f9b460c9 Merge branch 'fs#156-dbapi-charset' into develop 2017-09-02 22:13:20 +02:00
ff03ca8f13 Explicitly set charset for PyMySQL-Connection
Since PyMySQL-Connection otherwise uses charset 'latin-1', explicitly
set connection charset to 'utf8'

http://docs.sqlalchemy.org/en/rel_1_0/dialects/mysql.html#charset-selection
http://docs.sqlalchemy.org/en/rel_1_0/core/engines.html?highlight=url#sqlalchemy.engine.url.URL

Related Task: [FS#156](https://fs.golderweb.de/index.php?do=details&task_id=156)
2017-09-02 22:10:25 +02:00
88692ca678 Merge branch 'fs#155-article-surouding-space' into develop 2017-09-02 22:08:31 +02:00
d9b4fcc0bd Strip spaces before adding articles to redfam
Some article links have surounding spaces in their linktext. Remove them
before adding article to RedFam to have a cannonical title

Related Task: [FS#155](https://fs.golderweb.de/index.php?do=details&task_id=155)
2017-09-02 22:06:30 +02:00
22ff78ea98 Merge branch 'fs#154-categorie-colons-missing' into develop 2017-09-02 16:02:45 +02:00
b3cfcdc259 Improve title detection to get correct behaviour
Make sure that categorie links are starting with colon and non article
pages are returned with namespace.

Related Task: [FS#154](https://fs.golderweb.de/index.php?do=details&task_id=154)
2017-09-02 15:59:34 +02:00
b3e0ace2f4 Merge branch 'fs#153-nested-templates' into develop 2017-09-02 14:25:21 +02:00
f8002c85da Do not search for templates recursivly
Since nested templates did not get an index in global wikicode object
searching for index of an nested template results in ValueError

Related Task: [FS#153](https://fs.golderweb.de/index.php?do=details&task_id=153)
2017-09-02 14:23:25 +02:00
49bc05d29b Merge branch 'fs#151-normalize-article-titles-anchor' into develop 2017-09-02 13:36:17 +02:00
8a26b6d92a Normalize article titles with anchors
In our db article titles with anchors are stored with underscores in
anchor string. Therefore we need to replace spaces in anchor string
given by pywikibot.Page.title().

Related Task: [FS#151](https://fs.golderweb.de/index.php?do=details&task_id=151)
2017-08-25 18:11:41 +02:00
49a8230d76 Merge branch 'fs#141-place-notice-after-comment' into develop 2017-08-25 17:11:28 +02:00
31c10073a2 Prevent index errors searching for comments
Make sure not to exceed existing indexes of wikicode object while trying
to search for comments

Related Task: [FS#141](https://fs.golderweb.de/index.php?do=details&task_id=141)
2017-08-25 17:09:38 +02:00
642a29b022 Improve regex for blank lines
Do not match consecutive linebreaks as one

Related Task: [FS#141](https://fs.golderweb.de/index.php?do=details&task_id=141)
2017-08-24 18:47:18 +02:00
2f90751dc2 Merge branch 'fs#146-famhash-generator' into develop 2017-08-24 12:27:54 +02:00
024be69fe1 Use famhash as generator
If famhash is defined, fetch explicitly that redfam from db and work
only on this

Related Task: [FS#146](https://fs.golderweb.de/index.php?do=details&task_id=146)
2017-08-24 12:27:13 +02:00
b6d7268a7f select by famhash: Add methods to get param in bot
We need a method as callback to get bot specific params passed through
to our bot class.
Introduce -famhash parameter to work on specific famhash

Related Task:[FS#146](https://fs.golderweb.de/index.php?do=details&task_id=146)
2017-08-24 12:27:13 +02:00
526184c1e1 Merge branch 'fs#148-articles-mixed-up' into develop 2017-08-24 12:26:53 +02:00
3aa6c5fb1c Disable PreloadingGenerator temporarily
PreloadingGenerator mixes up yielded Pages. This is very unconvenient
for semi-automatic workflow with manual checks as the articles of the
RedFams were not following each other.

Related Task: [FS#148](https://fs.golderweb.de/index.php?do=details&task_id=148)
2017-08-24 12:23:17 +02:00
ec8f459db5 Merge branch 'fs#138-marked-articles-shown-again' into develop 2017-08-24 12:19:24 +02:00
3b2cb95f36 Do not fetch marked redfams from db
Exclude marked Redfams from DB-Query to prevent marking them again

Related Task: [FS#138](https://fs.golderweb.de/index.php?do=details&task_id=138)
2017-08-24 12:09:43 +02:00
41e5cc1a9d Merge branch 'fs#141-place-notice-after-comment' into develop 2017-08-24 12:06:03 +02:00
9b9d50c4d2 Improve detection of empty lines
Search with RegEx as empty lines could also contain spaces

Related Task: [FS#141](https://fs.golderweb.de/index.php?do=details&task_id=141)
2017-08-24 12:04:45 +02:00
a755288700 Merge branch 'fs#147-templates-in-heading' into develop 2017-08-23 14:55:43 +02:00
14ec71dd09 Rewrite get_disc_link to handle special cases
Use methods of pywikibot site-object and mwparser to get rid of any
special elements like templates or links in headings for construction
of our disc link.
Replace   by hand as it otherwise will occur as normal space and
wont work

Related Task: [FS#147](https://fs.golderweb.de/index.php?do=details&task_id=147)
2017-08-23 14:53:22 +02:00
e283eb78ac Merge branch 'fs#140-also-mark-redirects' into develop 2017-08-22 21:59:22 +02:00
cc02006fd2 Do not exclude redirects from beeing marked
In accordance with Zulu55 redirect discussion pages should also get
a notice, therefore do not exclude redirects.

Related Task: [FS#140](https://fs.golderweb.de/index.php?do=details&task_id=140)
2017-08-22 21:59:07 +02:00
37b0cbef08 Merge branch 'fs#138-marked-articles-shown-again' into develop 2017-08-22 21:58:22 +02:00
4137d72468 Look for existing notice by simple in-check
To detect maybe uncommented notices already present, check for them
using just a simple python x in y check over whole wikicode

Related Task: [FS#138](https://fs.golderweb.de/index.php?do=details&task_id=138)
2017-08-22 21:56:43 +02:00
cd87d1c2bb Fix already marked articles was reshown bug
Since we search for matching states for articles to include or exclude
in a loop, we could not control the outer loop via default break/
continue. Python docs recommend using Exceptions and try/except
structures to realise that most conveniently.

https://docs.python.org/3/faq/design.html#why-is-there-no-goto

Related Task: [FS#138](https://fs.golderweb.de/index.php?do=details&task_id=138)
2017-08-22 21:45:58 +02:00
456b2ba3d4 Merge branch 'fs#141-place-notice-after-comment' into develop 2017-08-21 22:11:51 +02:00
47b85a0b5e Add missing line break if there is no template
To make sure our notice template resides in its own line in every case

Related Task: [FS#141](https://fs.golderweb.de/index.php?do=details&task_id=141)
2017-08-21 22:09:59 +02:00
a6fdc974bd Merge branch 'fs#144-PyMySQL-instead-oursql' into develop 2017-08-21 13:58:34 +02:00
30de2a2e12 Replace oursql with PyMySQL
Since this is prefered on toolsforge and works out of the box after
installing via pip, replace oursql which caused some problems.
Especially oursql was not able to connect to db via ssh tunnel.

Related Task: [FS#144](https://fs.golderweb.de/index.php?do=details&task_id=144)
2017-08-21 13:55:33 +02:00
4a6855cf7b Merge branch 'fs#141-place-notice-after-comment' into develop 2017-08-21 13:51:32 +02:00
8422d08cb6 Keep comments and leading templates together
Prevent spliting up existing comments and templates as often those are
documenting archiv templates behaviour

Related Task: [FS#141](https://fs.golderweb.de/index.php?do=details&task_id=141)
2017-08-21 13:49:34 +02:00
9 changed files with 308 additions and 84 deletions

69
README.md Normal file
View File

@@ -0,0 +1,69 @@
jogobot-red
===========
Dependencies
------------
* pywikibot-core
* mwparserfromhell
The libraries above need to be installed and configured manualy considering [documentation of pywikibot-core](https://www.mediawiki.org/wiki/Manual:Pywikibot).
* SQLAlchemy
* PyMySQL
Those can be installed using pip and the _requirements.txt_ file provided with this packet
pip install -r requirements.txt
Versions
--------
* v1.1
- Improved page filter
* v1.0
- first stable release
- less debug output
- fixed problems with article title
* test-v7
- Fixed problem with url encoded chars in article title
* test-v6
- jogobot status API enabled (Bot can be disabled onwiki)
- Fixed problem with space between article title and anchor
* test-v5
- Feature _markpages_ working in full-automatic mode with _always_-flag
python red.py -task:markpages -family:wikipedia -always
* test-v4
- Feature _markpages_ working in semi-automatic mode using command
python red.py -task:markpages -family:wikipedia
- Work on specific redfam using param
-famhash:[sha1-famhash]
- Use _PyMySQL_ instead of _OurSQL_
- Correctly parse redfams with articles with leading small character or spaces in wikilink
* test-v3
* test-v2
* test-v1
License
-------
GPLv3
Author Information
------------------
Copyright 2017 Jonathan Golder jonathan@golderweb.de https://golderweb.de/
alias Wikipedia.org-User _Jogo.obb_ (https://de.wikipedia.org/Benutzer:Jogo.obb)

View File

@@ -3,7 +3,7 @@
# #
# markpages.py # markpages.py
# #
# Copyright 2016 GOLDERWEB Jonathan Golder <jonathan@golderweb.de> # Copyright 2017 Jonathan Golder <jonathan@golderweb.de>
# #
# This program is free software; you can redistribute it and/or modify # This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by # it under the terms of the GNU General Public License as published by
@@ -26,6 +26,7 @@ Bot to mark pages which were/are subjects of redundance discussions
with templates with templates
""" """
import re
from datetime import datetime from datetime import datetime
import pywikibot import pywikibot
@@ -61,6 +62,9 @@ class MarkPagesBot( CurrentPageBot ): # sets 'current_page' on each treat()
# Init attribute # Init attribute
self.__redfams = None # Will hold a generator with our redfams self.__redfams = None # Will hold a generator with our redfams
if "famhash" in kwargs:
self.famhash = kwargs["famhash"]
# We do not use predefined genFactory as there is no sensefull case to # We do not use predefined genFactory as there is no sensefull case to
# give a generator via cmd-line for this right now # give a generator via cmd-line for this right now
self.genFactory = pagegenerators.GeneratorFactory() self.genFactory = pagegenerators.GeneratorFactory()
@@ -69,7 +73,9 @@ class MarkPagesBot( CurrentPageBot ): # sets 'current_page' on each treat()
self.build_generator() self.build_generator()
# Run super class init with builded generator # Run super class init with builded generator
super( MarkPagesBot, self ).__init__(generator=self.gen) super( MarkPagesBot, self ).__init__(
generator=self.gen,
always=True if "always" in kwargs else False )
def run(self): def run(self):
""" """
@@ -101,8 +107,15 @@ class MarkPagesBot( CurrentPageBot ): # sets 'current_page' on each treat()
end_after = datetime.strptime( end_after = datetime.strptime(
jogobot.config["red.markpages"]["mark_done_after"], jogobot.config["red.markpages"]["mark_done_after"],
"%Y-%m-%d" ) "%Y-%m-%d" )
self.__redfams = list( RedFamWorker.gen_by_status_and_ending(
"archived", end_after) ) if hasattr(self, "famhash"):
self.__redfams = list(
RedFamWorker.session.query(RedFamWorker).filter(
RedFamWorker.famhash == self.famhash ) )
else:
self.__redfams = list( RedFamWorker.gen_by_status_and_ending(
"archived", end_after) )
return self.__redfams return self.__redfams
@@ -114,8 +127,12 @@ class MarkPagesBot( CurrentPageBot ): # sets 'current_page' on each treat()
self.genFactory.gens.append( self.redfam_talkpages_generator() ) self.genFactory.gens.append( self.redfam_talkpages_generator() )
# Set generator to pass to super class # Set generator to pass to super class
self.gen = pagegenerators.PreloadingGenerator( # Since PreloadingGenerator mixis up the Pages, do not use it right now
self.genFactory.getCombinedGenerator() ) # (FS#148)
# We can do so for automatic runs (FS#150)
# self.gen = pagegenerators.PreloadingGenerator(
# self.genFactory.getCombinedGenerator() )
self.gen = self.genFactory.getCombinedGenerator()
def redfam_talkpages_generator( self ): def redfam_talkpages_generator( self ):
""" """
@@ -128,15 +145,10 @@ class MarkPagesBot( CurrentPageBot ): # sets 'current_page' on each treat()
for redfam in self.redfams: for redfam in self.redfams:
# We need the talkpage (and only this) of each existing page # We need the talkpage (and only this) of each existing page
for talkpage in pagegenerators.PageWithTalkPageGenerator( for talkpage in redfam.article_generator(
redfam.article_generator( filter_existing=True,
filter_existing=True, exclude_article_status=["marked"],
filter_redirects=True, talkpages=True ):
exclude_article_status=["marked"] ),
return_talk_only=True ):
# Add reference to redfam to talkpages
talkpage.redfam = redfam
yield talkpage yield talkpage
@@ -172,25 +184,28 @@ class MarkPagesBot( CurrentPageBot ): # sets 'current_page' on each treat()
# None if change was not accepted by user # None if change was not accepted by user
save_ret = self.put_current( self.new_text, summary=summary ) save_ret = self.put_current( self.new_text, summary=summary )
# Get article as named in db
article = self.current_page.redarticle
# Status # Status
if add_ret is None or ( add_ret and save_ret ): if add_ret is None or ( add_ret and save_ret ):
self.current_page.redfam.article_remove_status( self.current_page.redfam.article_remove_status(
"note_rej", "note_rej",
title=self.current_page.title(withNamespace=False)) title=article)
self.current_page.redfam.article_remove_status( self.current_page.redfam.article_remove_status(
"sav_err", "sav_err",
title=self.current_page.title(withNamespace=False)) title=article)
self.current_page.redfam.article_add_status( self.current_page.redfam.article_add_status(
"marked", "marked",
title=self.current_page.title(withNamespace=False)) title=article)
elif save_ret is None: elif save_ret is None:
self.current_page.redfam.article_add_status( self.current_page.redfam.article_add_status(
"note_rej", "note_rej",
title=self.current_page.title(withNamespace=False)) title=article)
else: else:
self.current_page.redfam.article_add_status( self.current_page.redfam.article_add_status(
"sav_err", "sav_err",
title=self.current_page.title(withNamespace=False)) title=article)
def add_disc_notice_template( self ): def add_disc_notice_template( self ):
""" """
@@ -214,12 +229,37 @@ class MarkPagesBot( CurrentPageBot ): # sets 'current_page' on each treat()
# There is none on empty pages, so we need to check # There is none on empty pages, so we need to check
if leadsec: if leadsec:
# Get the last template in leadsec # Get the last template in leadsec
ltemplates = leadsec.filter_templates() ltemplates = leadsec.filter_templates(recursive=False)
# If there is one, add notice after this # If there is one, add notice after this
if ltemplates: if ltemplates:
self.current_wikicode.insert_after( ltemplates[-1],
self.disc_notice ) # Make sure not separate template and maybe following comment
insert_after_index = self.current_wikicode.index(
ltemplates[-1] )
# If there is more content
if len(self.current_wikicode.nodes) > (insert_after_index + 1):
# Filter one linebreak
if isinstance( self.current_wikicode.get(
insert_after_index + 1),
mwparser.nodes.text.Text) and \
re.search( r"^\n[^\n\S]+$", self.current_wikicode.get(
insert_after_index + 1 ).value ):
insert_after_index += 1
while len(self.current_wikicode.nodes) > \
(insert_after_index + 1) and \
isinstance(
self.current_wikicode.get(insert_after_index + 1),
mwparser.nodes.comment.Comment ):
insert_after_index += 1
self.current_wikicode.insert_after(
self.current_wikicode.get(insert_after_index),
self.disc_notice )
# To have it in its own line we need to add a linbreak before # To have it in its own line we need to add a linbreak before
self.current_wikicode.insert_before(self.disc_notice, "\n" ) self.current_wikicode.insert_before(self.disc_notice, "\n" )
@@ -228,13 +268,16 @@ class MarkPagesBot( CurrentPageBot ): # sets 'current_page' on each treat()
else: else:
self.current_wikicode.insert( 0, self.disc_notice ) self.current_wikicode.insert( 0, self.disc_notice )
# To have it in its own line we need to add a linbreak after it
self.current_wikicode.insert_after(self.disc_notice, "\n" )
# If there is no leadsec (and therefore no template in it, we will add # If there is no leadsec (and therefore no template in it, we will add
# before the first element # before the first element
else: else:
self.current_wikicode.insert( 0, self.disc_notice ) self.current_wikicode.insert( 0, self.disc_notice )
# To have it in its own line we need to add a linbreak after it # To have it in its own line we need to add a linbreak after it
self.current_wikicode.insert_after(self.disc_notice, "\n" ) self.current_wikicode.insert_after(self.disc_notice, "\n" )
# Notice was added # Notice was added
return True return True
@@ -243,6 +286,10 @@ class MarkPagesBot( CurrentPageBot ): # sets 'current_page' on each treat()
""" """
Checks if disc notice which shall be added is already present. Checks if disc notice which shall be added is already present.
""" """
if self.disc_notice in self.current_wikicode:
return True
# Iterate over Templates with same name (if any) to search equal # Iterate over Templates with same name (if any) to search equal
# Link to decide if they are the same # Link to decide if they are the same
for present_notice in self.current_wikicode.ifilter_templates( for present_notice in self.current_wikicode.ifilter_templates(

View File

@@ -3,7 +3,7 @@
# #
# reddiscparser.py # reddiscparser.py
# #
# Copyright 2016 GOLDERWEB Jonathan Golder <jonathan@golderweb.de> # Copyright 2017 Jonathan Golder <jonathan@golderweb.de>
# #
# This program is free software; you can redistribute it and/or modify # This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by # it under the terms of the GNU General Public License as published by

Submodule jogobot updated: 49ada2993e...d69d873624

View File

@@ -3,7 +3,7 @@
# #
# mysqlred.py # mysqlred.py
# #
# Copyright 2015 GOLDERWEB Jonathan Golder <jonathan@golderweb.de> # Copyright 2017 Jonathan Golder <jonathan@golderweb.de>
# #
# This program is free software; you can redistribute it and/or modify # This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by # it under the terms of the GNU General Public License as published by
@@ -46,13 +46,16 @@ import sqlalchemy.types as types
Base = declarative_base() Base = declarative_base()
url = URL( "mysql+oursql", url = URL( "mysql+pymysql",
username=config.db_username, username=config.db_username,
password=config.db_password, password=config.db_password,
host=config.db_hostname, host=config.db_hostname,
port=config.db_port, port=config.db_port,
database=config.db_username + jogobot.config['db_suffix'] ) database=( config.db_username +
engine = create_engine(url, echo=True) jogobot.config['redundances']['db_suffix'] ),
query={'charset': 'utf8'} )
engine = create_engine(url, echo=False)
Session = sessionmaker(bind=engine) Session = sessionmaker(bind=engine)

View File

@@ -3,7 +3,7 @@
# #
# redfam.py # redfam.py
# #
# Copyright 2017 GOLDERWEB Jonathan Golder <jonathan@golderweb.de> # Copyright 2018 Jonathan Golder <jonathan@golderweb.de>
# #
# This program is free software; you can redistribute it and/or modify # This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by # it under the terms of the GNU General Public License as published by
@@ -28,6 +28,7 @@ Provides classes for working with RedFams
import hashlib import hashlib
import locale import locale
import re import re
import urllib.parse
from datetime import datetime from datetime import datetime
import mwparserfromhell as mwparser # noqa import mwparserfromhell as mwparser # noqa
@@ -282,7 +283,17 @@ class RedFamParser( RedFam ):
articlesList = [] articlesList = []
for link in heading.ifilter_wikilinks(): for link in heading.ifilter_wikilinks():
article = str( link.title ) article = str( link.title ).strip()
# Short circuit empty links
if not article:
continue
# Make sure first letter is uppercase
article = article[0].upper() + article[1:]
# Unquote possible url encoded special chars
article = urllib.parse.unquote( article )
# Split in title and anchor part # Split in title and anchor part
article = article.split("#", 1) article = article.split("#", 1)
@@ -290,6 +301,10 @@ class RedFamParser( RedFam ):
article[0] = article[0].replace("_", " ") article[0] = article[0].replace("_", " ")
if len(article) > 1: if len(article) > 1:
# Strip both parts to prevent leading/trailing spaces
article[0] = article[0].strip()
article[1] = article[1].strip()
# other way round, replace spaces with underscores in anchors # other way round, replace spaces with underscores in anchors
article[1] = article[1].replace(" ", "_") article[1] = article[1].replace(" ", "_")
@@ -499,7 +514,8 @@ class RedFamWorker( RedFam ):
def article_generator(self, # noqa def article_generator(self, # noqa
filter_existing=None, filter_redirects=None, filter_existing=None, filter_redirects=None,
exclude_article_status=[], exclude_article_status=[],
onlyinclude_article_status=[] ): onlyinclude_article_status=[],
talkpages=None ):
""" """
Yields pywikibot pageobjects for articles belonging to this redfams Yields pywikibot pageobjects for articles belonging to this redfams
in a generator in a generator
@@ -513,47 +529,93 @@ class RedFamWorker( RedFam ):
set to False to get only redirectpages, set to False to get only redirectpages,
unset/None results in not filtering unset/None results in not filtering
@type filter_redirects bool/None @type filter_redirects bool/None
@param talkpages Set to True to get Talkpages instead of article page
@type talkpages bool/None
""" """
# Helper to leave multidimensional loop
# https://docs.python.org/3/faq/design.html#why-is-there-no-goto
class Continue(Exception):
pass
class Break(Exception):
pass
# Iterate over articles in redfam # Iterate over articles in redfam
for article in self.articlesList: for article in self.articlesList:
# Not all list elements contain articles
if not article: # To be able to control outer loop from inside child loops
try:
# Not all list elements contain articles
if not article:
raise Break()
page = pywikibot.Page( pywikibot.Link(article),
pywikibot.Site() )
# Filter existing pages if requested with filter_existing=False
if page.exists():
self.article_remove_status( "deleted", title=article )
if filter_existing is False:
raise Continue()
# Filter non existing Pages if requested with
# filter_existing=True
else:
self.article_add_status( "deleted", title=article )
if filter_existing:
raise Continue()
# Filter redirects if requested with filter_redirects=True
if page.isRedirectPage():
self.article_add_status( "redirect", title=article )
if filter_redirects:
raise Continue()
# Filter noredirects if requested with filter_redirects=False
else:
self.article_remove_status("redirect", title=article )
if filter_redirects is False:
raise Continue()
# Exclude by article status
for status in exclude_article_status:
if self.article_has_status( status, title=article ):
raise Continue()
# Only include by article status
for status in onlyinclude_article_status:
if not self.article_has_status( status, title=article ):
raise Continue()
# Proxy loop control to outer loop
except Continue:
continue
except Break:
break break
page = pywikibot.Page(pywikibot.Link(article), pywikibot.Site()) # Follow moved pages
if self.article_has_status( "redirect", title=article ):
try:
page = page.moved_target()
except pywikibot.exceptions.NoMoveTarget:
pass
# Filter existing pages if requested with filter_existing=False # Exclude Users & User Talkpage
if page.exists(): if page.namespace() == 2 or page.namespace() == 3:
self.article_remove_status( "deleted", title=article ) self.article_add_status( "user", title=article )
if filter_existing is False: continue
continue
# Filter non existing Pages if requested with filter_existing=True
else:
self.article_add_status( "deleted", title=article )
if filter_existing:
continue
# Filter redirects if requested with filter_redirects=True # Toggle talkpage
if page.isRedirectPage(): if talkpages and not page.isTalkPage() or\
self.article_add_status( "redirect", title=article ) not talkpages and page.isTalkPage():
if filter_redirects: page = page.toggleTalkPage()
continue
# Filter noredirects if requested with filter_redirects=False
else:
self.article_remove_status("redirect", title=article )
if filter_redirects is False:
continue
# Exclude by article status # Add reference to redfam to pages
for status in exclude_article_status: page.redfam = self
if self.article_has_status( status, title=article ):
continue
# Only include by article status # Keep article title from db with page object
for status in onlyinclude_article_status: page.redarticle = article
if not self.article_has_status( status, title=article ):
continue
# Yield filtered pages # Yield filtered pages
yield page yield page
@@ -590,22 +652,22 @@ class RedFamWorker( RedFam ):
@rtype str @rtype str
""" """
# We need to Replace Links with their linktext # Expand templates using pwb site object
anchor_code = mwparser.parse( self.heading.strip() ) site = pywikibot.Site()
for link in anchor_code.ifilter_wikilinks(): anchor_code = site.expand_text(self.heading.strip())
if link.text:
text = link.text
else:
text = link.title
anchor_code.replace( link, text ) # Remove possibly embbeded files
anchor_code = re.sub( r"\[\[\w+:[^\|]+(?:\|.+){2,}\]\]", "",
anchor_code )
# Whitespace is replaced with underscores # Replace non-breaking-space by correct urlencoded value
anchor_code.replace( " ", "_" ) anchor_code = anchor_code.replace( "&nbsp;", ".C2.A0" )
# We try it with out any more parsing as mw will do while parsing page # Use mwparser to strip and normalize
return ( self.redpage.pagetitle + "#" + anchor_code = mwparser.parse( anchor_code ).strip_code()
str(anchor_code).strip() )
# We try it without any more parsing as mw will do while parsing page
return ( self.redpage.pagetitle + "#" + anchor_code.strip() )
def generate_disc_notice_template( self ): def generate_disc_notice_template( self ):
""" """
@@ -678,6 +740,7 @@ class RedFamWorker( RedFam ):
# RedFamWorker._status.like('archived'), # RedFamWorker._status.like('archived'),
# RedFamWorker._status.like("%{0:s}%".format(status)), # RedFamWorker._status.like("%{0:s}%".format(status)),
text("status LIKE '%archived%'"), text("status LIKE '%archived%'"),
text("status NOT LIKE '%marked%'"),
RedFamWorker.ending >= ending ): RedFamWorker.ending >= ending ):
yield redfam yield redfam

View File

@@ -3,7 +3,7 @@
# #
# redpage.py # redpage.py
# #
# Copyright 2015 GOLDERWEB Jonathan Golder <jonathan@golderweb.de> # Copyright 2017 Jonathan Golder <jonathan@golderweb.de>
# #
# This program is free software; you can redistribute it and/or modify # This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by # it under the terms of the GNU General Public License as published by

29
red.py
View File

@@ -3,7 +3,7 @@
# #
# reddiscparser.py # reddiscparser.py
# #
# Copyright 2016 GOLDERWEB Jonathan Golder <jonathan@golderweb.de> # Copyright 2017 Jonathan Golder <jonathan@golderweb.de>
# #
# This program is free software; you can redistribute it and/or modify # This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by # it under the terms of the GNU General Public License as published by
@@ -60,7 +60,7 @@ def prepare_bot( task_slug, subtask, genFactory, subtask_args ):
@rtype tuple @rtype tuple
""" """
# kwargs are passed to selected bot as **kwargs # kwargs are passed to selected bot as **kwargs
kwargs = dict() kwargs = subtask_args
if not subtask or subtask == "discparser": if not subtask or subtask == "discparser":
# Default case: discparser # Default case: discparser
@@ -83,6 +83,25 @@ def prepare_bot( task_slug, subtask, genFactory, subtask_args ):
return ( subtask, Bot, genFactory, kwargs ) return ( subtask, Bot, genFactory, kwargs )
def parse_red_args( argkey, value ):
"""
Process additional args for red.py
@param argkey The arguments key
@type argkey str
@param value The arguments value
@type value str
@return Tuple with (key, value) if given pair is relevant, else None
@rtype tuple or None
"""
if argkey.startswith("-famhash"):
return ( "famhash", value )
return None
def main(*args): def main(*args):
""" """
Process command line arguments and invoke bot. Process command line arguments and invoke bot.
@@ -105,12 +124,12 @@ def main(*args):
# Disabled until [FS#86] is done # Disabled until [FS#86] is done
# Before run, we need to check wether we are currently active or not # Before run, we need to check wether we are currently active or not
# if not jogobot.bot.active( task_slug ): if not jogobot.bot.active( task_slug ):
# return return
# Parse local Args to get information about subtask # Parse local Args to get information about subtask
( subtask, genFactory, subtask_args ) = jogobot.bot.parse_local_args( ( subtask, genFactory, subtask_args ) = jogobot.bot.parse_local_args(
local_args ) local_args, parse_red_args )
# select subtask and prepare args # select subtask and prepare args
( subtask, Bot, genFactory, kwargs ) = prepare_bot( ( subtask, Bot, genFactory, kwargs ) = prepare_bot(

23
requirements.txt Normal file
View File

@@ -0,0 +1,23 @@
# This is a PIP 6+ requirements file for using jogobot-red
#
# All dependencies can be installed using:
# $ sudo pip install -r requirements.txt
#
# It is good practise to install packages using the system
# package manager if it has a packaged version. If you are
# unsure, please use pip as described at the top of the file.
#
# To get a list of potential matches, use
#
# $ awk -F '[#>=]' '{print $1}' requirements.txt | xargs yum search
# or
# $ awk -F '[#>=]' '{print $1}' requirements.txt | xargs apt-cache search
# Needed for Database-Connection
# SQLAlchemy Python ORM-Framework
SQLAlchemy>=1.1
# PyMySQL DB-Connector
PyMySQL>=0.7
# Also needed, but not covered here, is a working copy of pywikibot-core
# which also brings mwparserfromhell