57 Commits

Author SHA1 Message Date
db6e7fd246 Merge branch 'release-1.0' 2017-11-05 15:36:16 +01:00
33540344b0 Update jogobot submodule 2016-09-25 18:17:12 +02:00
1958ec222f Add a README.md
To have a basic description of this repo
2016-09-25 16:56:10 +02:00
f2d431ab84 Merge branch 'fs#67-more-detailed-logs' into test-v7 2016-09-25 15:10:27 +02:00
31d06224b0 Update file headers 2016-09-24 21:32:02 +02:00
51d8bb9da9 Read Edit Summary from config
To be able to change the Edit-Summary without touching the source code
2016-09-24 21:29:32 +02:00
f3635b2458 Log year change related actions
Improve logging related to atomatically changed years in list title

[https://fs.golderweb.de/index.php?do=details&task_id=67|FS#67]
2016-08-22 16:45:54 +02:00
962e0cb4de Notice End of Task in Log
Showing end of task in log will help to detect unexpectedly terminated
runs

[https://fs.golderweb.de/index.php?do=details&task_id=67|FS#67]
2016-08-22 16:43:45 +02:00
8948fcc78d Output log each parsed page and revision
To improve quality of log

[https://fs.golderweb.de/index.php?do=details&task_id=67|FS#67]
2016-08-22 15:43:25 +02:00
2f022d9d30 Call pywikibot.handle_args before jogobot.status
To prevent pywikibot outputting a warning because of creating site
objects before handling args
2016-07-16 16:00:30 +02:00
56701107db Jogobot module updated 2016-07-11 23:41:26 +02:00
7ccfb90888 Updated jogobot submodule 2016-07-09 20:19:20 +02:00
22a2cc5799 Merge branch 'fs#33-charts.py-abords-with-error' into test-v6 2016-03-09 17:26:12 +01:00
9d471bee20 Bug in function to detect the year from Pagetitle, returning whole title
Missing param added
Explicit int casting will throw errors in future if regex fails
2016-03-09 17:24:00 +01:00
16a774fae5 Merge branch 'CountryList-Entry-Title-SortKeyName' into test-v6 2016-02-25 17:52:10 +01:00
038dd6e36a SortKeyName should be used for Interpret not for Title 2016-02-25 17:48:46 +01:00
e468260f7f Merge branch 'unittest-countrylist' into test-v6
Conflicts:
	countrylist.py
2016-02-25 17:08:28 +01:00
da99dee429 Merge branch 'CountryList-Entry-Title-SortKeyName' into test-v6 2016-02-25 17:05:52 +01:00
b96c5d4a33 Handle SortKeyName and SortKey Template in Title 2016-02-25 17:05:04 +01:00
73bf26b627 Merge branch 'jogobot-StatusAPI' into test-v6 2016-02-25 16:27:28 +01:00
df2f13fb66 Update jogobot 2016-02-25 16:26:10 +01:00
7b27577915 Remove provisonal onwiki activation 2016-02-23 13:58:46 +01:00
d76f914615 Use JogoBot StatusAPI to check if Bot/Task is active 2016-02-23 13:57:56 +01:00
d9d385cfe8 Rename chartsbot.py to charts.py to get filename same as task_slug for jogobot-module 2016-02-23 11:40:15 +01:00
2076932cbf Merge branch 'improve-output' into test-v6
(@see https://fs.golderweb.de/index.php?do=details&task_id=20)
2016-02-23 11:35:12 +01:00
9fe1c36482 Merge branch 'test-v5' 2016-02-23 11:31:39 +01:00
c730d9ba9c Output diff also in verbose mode 2016-02-23 11:21:40 +01:00
3ed67431cf Use jogobot-framework as submodule to get a specific state (instead of directly use development dir as python module)
Use jogobot.output as wrapper for pywikiot outputs
2016-02-22 11:05:32 +01:00
287942e174 Merge branch 'remove-refs' into improve-output
Get recent changes before going on
2016-02-18 19:13:31 +01:00
9a24a988f4 Remove possible ref-tags from raw param values in CountryListEntrys
Explicit conversion to str for better readability
2016-01-04 12:59:31 +01:00
7bb77e86f6 Since last_title also referenced the same object we need to re-replace the year for last year's list 2016-01-04 12:34:13 +01:00
297adc62ec Raise CountryListError if Page exists but no valid Single-Section exists 2016-01-04 12:30:24 +01:00
b6c7a74519 Raise Exception instead of returning False in CountryList.__init__()
since returning False is no valid python construct
2016-01-04 12:28:40 +01:00
81e541ef1d Provisonal on wiki activation 2015-12-26 12:42:14 +01:00
c708832515 Merge branch 'feature-force-reload' 2015-12-11 12:42:41 +01:00
18122fafe8 New feature force parsing of countrylists regardless if needed with param "-force-reload" 2015-12-11 12:41:23 +01:00
55afe94a4e Merge branch 'countrylist-linksearch' 2015-12-11 12:03:51 +01:00
e409c7a02b CountryList-module: Also search for Links in Titel 2015-12-11 00:03:53 +01:00
9d9207c175 CountryList-module: Put linksearching algorithm in separate function for simple reuse for Titel value 2015-12-10 23:13:45 +01:00
4de2116717 Add possibility to manually check against any page in dewiki 2015-11-28 18:17:19 +01:00
3349c9f3d3 Add __str__-method to CountryList-class 2015-11-28 18:16:04 +01:00
a250074caa CountryList-module: Search current year via regex to also make parsing older lists possible 2015-11-28 17:26:27 +01:00
581e043255 Add unitest to CountryList-Modul 2015-11-28 13:42:32 +01:00
e932303c40 improve-output: Only show diff in interactive mode without -always flag 2015-11-27 14:10:33 +01:00
5f13da5934 Clarify licence situation of chartsbot.py 2015-11-25 17:15:55 +01:00
5b084f6fde Fix Bug: Writing is requested even when only rev_ids have changed
Introduce new attr to CountryList for simple get information wether page was parsed

The SummaryPageEntryTemplate comparation to non-equal fails when unparsed Entrys occur
--> and it with information wether CountryList was parsed
2015-11-23 19:36:19 +01:00
e3c2c1a5d9 Merge branch 'pep8-compat' 2015-11-23 19:15:37 +01:00
f819193790 pep8-compat: clean up CountryList-Modul 2015-11-23 19:11:21 +01:00
4a856b1dae pep8-compat: Replace undefined Error by Message in CountryList-Module 2015-11-23 19:04:27 +01:00
166e61aee7 pep8-compat: cleanup SummaryPage-Module 2015-11-23 19:00:07 +01:00
1ea37c0e0d pep8-compat: Remove unnecessary imports from summarypage.py 2015-11-23 18:59:16 +01:00
3e525edd2a pep8-compat: chartsbot.py remove unnecessary imports 2015-11-23 18:48:04 +01:00
3cab979662 Merge branch 'summarypage-module' 2015-11-21 11:52:21 +01:00
52f933bea7 SummaryPage-Module: Bugfix, move countrylist.parse() back in try statement since we need to make sure it is parseable due to automatic year change feature 2015-11-21 11:50:40 +01:00
e854244f0b Merge branches 'countrylist-module' and 'summarypage-module' 2015-11-21 11:33:35 +01:00
f1e0157643 CountryList-Module: Rename method parsing_needed to is_parsing_needed to make boolean character more clear 2015-11-21 11:32:00 +01:00
4987f97e91 SummaryPage-Module: Reimplement feature to prevent parsing for pages where revid haven't changed since last parsing 2015-11-21 11:30:37 +01:00
9 changed files with 694 additions and 297 deletions

2
.gitignore vendored
View File

@@ -62,3 +62,5 @@ target/
# Test
test.py
disabled

4
.gitmodules vendored Normal file
View File

@@ -0,0 +1,4 @@
[submodule "jogobot"]
path = jogobot
url = git@github.com:golderweb/wiki-jogobot-core.git
branch = test-v1

21
README.md Normal file
View File

@@ -0,0 +1,21 @@
# wiki-jogobot-charts
This is a [Pywikibot](https://www.mediawiki.org/wiki/Manual:Pywikibot) based [Wikipedia Bot](https://de.wikipedia.org/wiki/Wikipedia:Bots)
of [User:JogoBot](https://de.wikipedia.org/wiki/Benutzer:JogoBot) on the
[German Wikipedia](https://de.wikipedia.org/wiki/Wikipedia:Hauptseite).
On [JogoBots wikipedia user page](https://de.wikipedia.org/wiki/Benutzer:JogoBot/Charts) a more detailed description can be found.
## Requirements
* Python 3.4+ (at least it is only tested with those)
* pywikibot-core 2.0
* [jogobot-core module](https://github.com/golderweb/wiki-jogobot-core) used as submodule
* [Isoweek module](https://pypi.python.org/pypi/isoweek)
## Bugs
[wiki-jogobot-charts on fs.golderweb.de (de)](https://fs.golderweb.de/proj20)
## License
GPLv3+
## Author Information
Copyright 2016 Jonathan Golder <jonathan@golderweb.de>

View File

@@ -3,7 +3,7 @@
#
# __init__.py
#
# Copyright 2015 GOLDERWEB Jonathan Golder <jonathan@golderweb.de>
# Copyright 2016 Jonathan Golder <jonathan@golderweb.de>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by

274
charts.py Normal file
View File

@@ -0,0 +1,274 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# charts.py
#
# original version by:
#
# (C) Pywikibot team, 2006-2014 as basic.py
#
# Distributed under the terms of the MIT license.
#
# modified by:
#
# Copyright 2016 Jonathan Golder <jonathan@golderweb.de>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
# MA 02110-1301, USA.
#
#
"""
Bot which automatically updates a ChartsSummaryPage like
[[Portal:Charts_und_Popmusik/Aktuelle_Nummer-eins-Hits]] by reading linked
CountryLists
The following parameters are supported:
&params;
-always If given, request for confirmation of edit is short circuited
Use for unattended run
-force-reload If given, countrylists will be always parsed regardless if
needed or not
"""
import locale
import os
import sys
import pywikibot
from pywikibot import pagegenerators
import jogobot
from summarypage import SummaryPage
# This is required for the text that is shown when you run this script
# with the parameter -help.
docuReplacements = {
'&params;': pagegenerators.parameterHelp
}
class ChartsBot( ):
"""
Bot which automatically updates a ChartsSummaryPage like
[[Portal:Charts_und_Popmusik/Aktuelle_Nummer-eins-Hits]] by reading linked
CountryLists
"""
def __init__( self, generator, always, force_reload ):
"""
Constructor.
@param generator: the page generator that determines on which pages
to work
@type generator: generator
@param always: if True, request for confirmation of edit is short
circuited. Use for unattended run
@type always: bool
@param force-reload: If given, countrylists will be always parsed
regardless if needed or not
@type force-reload: bool
"""
self.generator = generator
self.always = always
# Force parsing of countrylist
self.force_reload = force_reload
# Output Information
jogobot.output( "Chartsbot invoked" )
# Save pywikibot site object
self.site = pywikibot.Site()
# Define edit summary
self.summary = jogobot.config["charts"]["edit_summary"].strip()
# Make sure summary starts with "Bot:"
if not self.summary[:len("Bot:")] == "Bot:":
self.summary = "Bot: " + self.summary.strip()
# Set locale to 'de_DE.UTF-8'
locale.setlocale(locale.LC_ALL, 'de_DE.UTF-8')
def run(self):
"""Process each page from the generator."""
# Count skipped pages (redirect or missing)
skipped = 0
for page in self.generator:
if not self.treat(page):
skipped += 1
if skipped:
jogobot.output( "Chartsbot finished, {skipped} page(s) skipped"
.format( skipped=skipped ) )
else:
jogobot.output( "Chartsbot finished successfully" )
def treat(self, page):
"""Load the given page, does some changes, and saves it."""
text = self.load(page)
if not text:
return False
################################################################
# NOTE: Here you can modify the text in whatever way you want. #
################################################################
# Initialise and treat SummaryPageWorker
sumpage = SummaryPage( text, self.force_reload )
sumpage.treat()
# Check if editing is needed and if so get new text
if sumpage.get_new_text():
text = sumpage.get_new_text()
if not self.save(text, page, self.summary, False):
jogobot.output(u'Page %s not saved.' % page.title(asLink=True))
return True
def load(self, page):
"""Load the text of the given page."""
try:
# Load the page
text = page.get()
except pywikibot.NoPage:
jogobot.output( u"Page %s does not exist; skipping."
% page.title(asLink=True), "ERROR" )
except pywikibot.IsRedirectPage:
jogobot.output( u"Page %s is a redirect; skipping."
% page.title(asLink=True), "ERROR" )
else:
return text
return False
def save(self, text, page, comment=None, minorEdit=True,
botflag=True):
"""Update the given page with new text."""
# only save if something was changed (and not just revision)
if text != page.get():
# Show diff only in interactive mode or in verbose mode
if not self.always or pywikibot.config.verbose_output:
# Show the title of the page we're working on.
# Highlight the title in purple.
jogobot.output( u">>> \03{lightpurple}%s\03{default} <<<"
% page.title())
# show what was changed
pywikibot.showDiff(page.get(), text)
jogobot.output(u'Comment: %s' % comment)
if self.always or pywikibot.input_yn(
u'Do you want to accept these changes?',
default=False, automatic_quit=False):
try:
page.text = text
# Save the page
page.save(summary=comment or self.comment,
minor=minorEdit, botflag=botflag)
except pywikibot.LockedPage:
jogobot.output( u"Page %s is locked; skipping."
% page.title(asLink=True), "ERROR" )
except pywikibot.EditConflict:
jogobot.output(
u'Skipping %s because of edit conflict'
% (page.title()), "ERROR")
except pywikibot.SpamfilterError as error:
jogobot.output(
u'Cannot change %s because of spam blacklist \
entry %s'
% (page.title(), error.url), "ERROR")
else:
return True
return False
def main(*args):
"""
Process command line arguments and invoke bot.
If args is an empty list, sys.argv is used.
@param args: command line arguments
@type args: list of unicode
"""
# Process global arguments to determine desired site
local_args = pywikibot.handle_args(args)
# Get the jogobot-task_slug (basename of current file without ending)
task_slug = os.path.basename(__file__)[:-len(".py")]
# Before run, we need to check wether we are currently active or not
try:
# Will throw Exception if disabled/blocked
jogobot.is_active( task_slug )
except jogobot.jogobot.Blocked:
(type, value, traceback) = sys.exc_info()
jogobot.output( "\03{lightpurple} %s (%s)" % (value, type ),
"CRITICAL" )
except jogobot.jogobot.Disabled:
(type, value, traceback) = sys.exc_info()
jogobot.output( "\03{red} %s (%s)" % (value, type ),
"ERROR" )
# Bot/Task is active
else:
# This factory is responsible for processing command line arguments
# that are also used by other scripts and that determine on which pages
# to work on.
genFactory = pagegenerators.GeneratorFactory()
# The generator gives the pages that should be worked upon.
gen = None
# If always is True, bot won't ask for confirmation of edit (automode)
always = False
# If force_reload is True, bot will always parse Countrylist regardless
# if parsing is needed or not
force_reload = False
# Parse command line arguments
for arg in local_args:
if arg.startswith("-always"):
always = True
elif arg.startswith("-force-reload"):
force_reload = True
else:
pass
genFactory.handleArg(arg)
if not gen:
gen = genFactory.getCombinedGenerator()
if gen:
# The preloading generator is responsible for downloading multiple
# pages from the wiki simultaneously.
gen = pagegenerators.PreloadingGenerator(gen)
bot = ChartsBot(gen, always, force_reload)
if bot:
bot.run()
else:
pywikibot.showHelp()
if( __name__ == "__main__" ):
main()

View File

@@ -1,203 +0,0 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# charts.py
#
# Copyright 2015 GOLDERWEB Jonathan Golder <jonathan@golderweb.de>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
# MA 02110-1301, USA.
#
#
"""
Bot which automatically updates a ChartsSummaryPage like
[[Portal:Charts_und_Popmusik/Aktuelle_Nummer-eins-Hits]] by reading linked
CountryLists
The following parameters are supported:
&params;
-always If given, request for confirmation of edit is short circuited
Use for unattended run
"""
import locale
import pywikibot
from pywikibot import pagegenerators
from pywikibot.bot import Bot
import mwparserfromhell as mwparser
from summarypage import SummaryPage
# This is required for the text that is shown when you run this script
# with the parameter -help.
docuReplacements = {
'&params;': pagegenerators.parameterHelp
}
class ChartsBot( ):
"""
Bot which automatically updates a ChartsSummaryPage like
[[Portal:Charts_und_Popmusik/Aktuelle_Nummer-eins-Hits]] by reading linked
CountryListsAn incomplete sample bot.
"""
def __init__( self, generator, always ):
"""
Constructor.
@param generator: the page generator that determines on which pages
to work
@type generator: generator
@param always: if True, request for confirmation of edit is short
circuited. Use for unattended run
@type always: bool
"""
self.generator = generator
self.always = always
# Set the edit summary message
self.site = pywikibot.Site()
self.summary = "Bot: Aktualisiere Übersichtsseite Nummer-eins-Hits"
# Set locale to 'de_DE.UTF-8'
locale.setlocale(locale.LC_ALL, 'de_DE.UTF-8')
def run(self):
"""Process each page from the generator."""
for page in self.generator:
self.treat(page)
def treat(self, page):
"""Load the given page, does some changes, and saves it."""
text = self.load(page)
if not text:
return
################################################################
# NOTE: Here you can modify the text in whatever way you want. #
################################################################
# Initialise and treat SummaryPageWorker
sumpage = SummaryPage( text )
sumpage.treat()
# Check if editing is needed and if so get new text
if sumpage.get_new_text():
text = sumpage.get_new_text()
if not self.save(text, page, self.summary, False):
pywikibot.output(u'Page %s not saved.' % page.title(asLink=True))
def load(self, page):
"""Load the text of the given page."""
try:
# Load the page
text = page.get()
except pywikibot.NoPage:
pywikibot.output(u"Page %s does not exist; skipping."
% page.title(asLink=True))
except pywikibot.IsRedirectPage:
pywikibot.output(u"Page %s is a redirect; skipping."
% page.title(asLink=True))
else:
return text
return None
def save(self, text, page, comment=None, minorEdit=True,
botflag=True):
"""Update the given page with new text."""
# only save if something was changed (and not just revision)
if text != page.get():
# Show the title of the page we're working on.
# Highlight the title in purple.
pywikibot.output(u"\n\n>>> \03{lightpurple}%s\03{default} <<<"
% page.title())
# show what was changed
pywikibot.showDiff(page.get(), text)
pywikibot.output(u'Comment: %s' % comment)
if self.always or pywikibot.input_yn(
u'Do you want to accept these changes?',
default=False, automatic_quit=False):
try:
page.text = text
# Save the page
page.save(summary=comment or self.comment,
minor=minorEdit, botflag=botflag)
except pywikibot.LockedPage:
pywikibot.output(u"Page %s is locked; skipping."
% page.title(asLink=True))
except pywikibot.EditConflict:
pywikibot.output(
u'Skipping %s because of edit conflict'
% (page.title()))
except pywikibot.SpamfilterError as error:
pywikibot.output(
u'Cannot change %s because of spam blacklist \
entry %s'
% (page.title(), error.url))
else:
return True
return False
def main(*args):
"""
Process command line arguments and invoke bot.
If args is an empty list, sys.argv is used.
@param args: command line arguments
@type args: list of unicode
"""
# Process global arguments to determine desired site
local_args = pywikibot.handle_args(args)
# This factory is responsible for processing command line arguments
# that are also used by other scripts and that determine on which pages
# to work on.
genFactory = pagegenerators.GeneratorFactory()
# The generator gives the pages that should be worked upon.
gen = None
# If always is True, bot won't ask for confirmation of edit (automode)
always = False
# Parse command line arguments
for arg in local_args:
if arg.startswith("-always"):
always = True
else:
genFactory.handleArg(arg)
if not gen:
gen = genFactory.getCombinedGenerator()
if gen:
# The preloading generator is responsible for downloading multiple
# pages from the wiki simultaneously.
gen = pagegenerators.PreloadingGenerator(gen)
bot = ChartsBot(gen, always)
bot.run()
else:
pywikibot.showHelp()
if( __name__ == "__main__" ):
main()

View File

@@ -3,7 +3,7 @@
#
# countrylist.py
#
# Copyright 2015 GOLDERWEB Jonathan Golder <jonathan@golderweb.de>
# Copyright 2016 Jonathan Golder <jonathan@golderweb.de>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@@ -25,6 +25,7 @@
Provides a class for handling charts list per country and year
"""
import re
import locale
from datetime import datetime
@@ -33,6 +34,8 @@ from isoweek import Week
import pywikibot
import mwparserfromhell as mwparser
import jogobot
class CountryList():
"""
@@ -66,7 +69,8 @@ class CountryList():
# Check if page exits
if not self.page.exists():
return False
raise CountryListError( "CountryList " +
str(wikilink.title) + " does not exists!" )
# Initialise attributes
__attr = ( "wikicode", "entry", "chartein", "_chartein_raw",
@@ -74,10 +78,12 @@ class CountryList():
for attr in __attr:
setattr( self, attr, None )
self.parsed = False
# Try to find year
self.find_year()
def parsing_needed( self, revid ):
def is_parsing_needed( self, revid ):
"""
Check if current revid of CountryList differs from given one
@@ -94,22 +100,25 @@ class CountryList():
def find_year( self ):
"""
Try to find the year related to CountryList
Try to find the year related to CountryList using regex
"""
self.year = datetime.now().year
match = re.search( r"^.+\((\d{4})\)", self.page.title() )
# Check if year is in page.title, if not try last year
if str( self.year ) not in self.page.title():
self.year -= 1
# If last year does not match, raise YearError
if str( self.year ) not in self.page.title():
raise CountryListYearError
# We matched something
if match:
self.year = int(match.group(1))
else:
raise CountryListError( "CountryList year is errorneous!" )
def parse( self ):
"""
Handles the parsing process
"""
# Set revid
self.revid = self.page.latest_revision_id
# Parse page with mwparser
self.generate_wikicode()
@@ -121,16 +130,23 @@ class CountryList():
self.prepare_titel()
self.prepare_interpret()
# For easy detecting wether we have parsed self
self.parsed = True
# Log parsed page
jogobot.output( "Parsed revision {revid} of page [[{title}]]".format(
revid=self.revid, title=self.page.title() ) )
def detect_belgian( self ):
"""
Detect wether current entry is on of the belgian (Belgien/Wallonien)
"""
# Check if begian province name is in link text or title
if "Wallonien" in str( self.wikilink.text ) \
or "Wallonien" in str( self.wikilink.title):
if( "Wallonien" in str( self.wikilink.text ) or
"Wallonien" in str( self.wikilink.title) ):
return "Wallonie"
elif "Flandern" in str( self.wikilink.text ) \
or "Flandern" in str( self.wikilink.title):
elif( "Flandern" in str( self.wikilink.text ) or
"Flandern" in str( self.wikilink.title) ):
return "Flandern"
else:
return None
@@ -151,23 +167,30 @@ class CountryList():
# For belgian list we need to select subsection of country
belgian = self.detect_belgian()
# Select Singles-Section
# Catch Error if we have none
try:
if belgian:
singles_section = self.wikicode.get_sections(
matches=belgian )[0].get_sections( matches="Singles" )[0]
else:
singles_section = self.wikicode.get_sections( matches="Singles" )[0]
singles_section = self.wikicode.get_sections(
matches="Singles" )[0]
except IndexError:
raise CountryListError( "No Singles-Section found!")
# Since we have multiple categories in some countrys we need
# to select the first wrapping template
try:
wrapping_template = next( singles_section.ifilter_templates(
wrapping = next( singles_section.ifilter_templates(
matches="Nummer-eins-Hits" ) )
except StopIteration:
raise CountryListError( "Wrapping template is missing!")
# Select the last occurence of template "Nummer-eins-Hits Zeile" in
# Wrapper-template
for self.entry in wrapping_template.get("Inhalt").value.ifilter_templates(
for self.entry in wrapping.get("Inhalt").value.ifilter_templates(
matches="Nummer-eins-Hits Zeile" ):
pass
@@ -225,7 +248,15 @@ class CountryList():
If param is not present raise Error
"""
if self.entry.has( "Chartein" ):
self._chartein_raw = self.entry.get("Chartein").value.strip()
self._chartein_raw = self.entry.get("Chartein").value
# Remove possible ref-tags
for ref in self._chartein_raw.ifilter_tags(matches="ref"):
self._chartein_raw.remove( ref )
# Remove whitespace
self._chartein_raw = str(self._chartein_raw).strip()
else:
raise CountryListEntryError( "Template Parameter 'Chartein' is \
missing!" )
@@ -239,6 +270,10 @@ missing!" )
if not self._titel_raw:
self.get_titel_value()
# Try to find a wikilink for Titel on countrylist
if "[[" not in self._titel_raw:
self.titel = self._search_links( str(self._titel_raw) )
else:
self.titel = self._titel_raw
def get_titel_value( self ):
@@ -247,7 +282,14 @@ missing!" )
If param is not present raise Error
"""
if self.entry.has( "Titel" ):
self._titel_raw = self.entry.get("Titel").value.strip()
self._titel_raw = self.entry.get("Titel").value
# Remove possible ref-tags
for ref in self._titel_raw.ifilter_tags(matches="ref"):
self._titel_raw.remove( ref )
# Remove whitespace
self._titel_raw = str(self._titel_raw).strip()
else:
raise CountryListEntryError( "Template Parameter 'Titel' is \
missing!" )
@@ -293,31 +335,10 @@ missing!" )
parts.append( word )
parts.append( " " )
# If we have indexes with out links, search for links
# If we have indexes without links, search for links
if indexes:
# Iterate over wikilinks of refpage and try to find related links
for wikilink in self.wikicode.ifilter_wikilinks():
# Iterate over interpret names
for index in indexes:
# Check wether wikilink matches
if parts[index] == wikilink.text \
or parts[index] == wikilink.title:
# Overwrite name with complete wikilink
parts[index] = str( wikilink )
# Remove index from worklist
indexes.remove( index )
# Other indexes won't also match
break
# If worklist is empty, stop iterating over wikilinks
if not indexes:
break
parts = self._search_links( parts, indexes )
# Join the collected links
sep = " "
@@ -333,11 +354,127 @@ missing!" )
If param is not present raise Error
"""
if self.entry.has( "Interpret" ):
self._interpret_raw = self.entry.get("Interpret").value.strip()
self._interpret_raw = self.entry.get("Interpret").value
# Remove possible ref-tags
for ref in self._interpret_raw.ifilter_tags(matches="ref"):
self._interpret_raw.remove( ref )
# Handle SortKeyName and SortKey
for template in self._interpret_raw.ifilter_templates(
matches="SortKey" ):
if template.name == "SortKeyName":
# Differing Link-Destination is provided as param 3
if template.has(3):
# Construct link out of Template, Params:
# 1 = Surname
# 2 = Name
# 3 = Link-Dest
interpret_link = mwparser.nodes.wikilink.Wikilink(
str(template.get(3).value),
str(template.get(1).value) + " " +
str(template.get(2).value) )
# Default Link-Dest [[Surname Name]]
else:
interpret_link = mwparser.nodes.wikilink.Wikilink(
str(template.get(1).value) + " " +
str(template.get(2).value) )
# Replace Template with link
self._interpret_raw.replace( template, interpret_link )
# SortKey
else:
# Replace SortKey with text from param 2 if present
if template.has(2):
self._interpret_raw.replace( template,
template.get(2).value)
# Else Remove SortKey (text should follow behind SortKey)
else:
self._interpret_raw.replace( template, None)
# Normally won't be needed as there should be only one
# SortKey-Temlate but ... its a wiki
break
# Remove whitespace
self._interpret_raw = str(self._interpret_raw).strip()
else:
raise CountryListEntryError( "Template Parameter 'Interpret' is \
missing!" )
def _search_links( self, keywords, indexes=None ):
"""
Search matching wikilinks for keyword(s) in CountryList's wikicode
@param keywords: One or more keywords to search for
@type keywords: str, list
@param indexes: List with numeric indexes for items of keywords to work
on only
@type indexes: list of ints
@return: List or String with replaced keywords
@return type: str, list
"""
# Maybe convert keywords string to list
if( isinstance( keywords, str ) ):
keywords = [ keywords, ]
string = True
else:
string = False
# If indexes worklist was not provided, work on all elements
if not indexes:
indexes = list(range( len( keywords ) ))
# Iterate over wikilinks of refpage and try to find related links
for wikilink in self.wikicode.ifilter_wikilinks():
# Iterate over interpret names
for index in indexes:
# Check wether wikilink matches
if( keywords[index] == wikilink.text or
keywords[index] == wikilink.title ):
# Overwrite name with complete wikilink
keywords[index] = str( wikilink )
# Remove index from worklist
indexes.remove( index )
# Other indexes won't also match
break
# If worklist is empty, stop iterating over wikilinks
if not indexes:
break
# Choose wether return list or string based on input type
if not string:
return keywords
else:
return str(keywords[0])
def __str__( self ):
"""
Returns str repression for Object
"""
if self.parsed:
return ("CountryList( Link = \"{link}\", Revid = \"{revid}\", " +
"Interpret = \"{interpret}\", Titel = \"{titel}\", " +
"Chartein = \"{chartein}\" )").format(
link=repr(self.wikilink),
revid=self.revid,
interpret=self.interpret,
titel=self.titel,
chartein=repr(self.chartein))
else:
return "CountryList( Link = \"{link}\" )".format(
link=repr(self.wikilink))
class CountryListError( Exception ):
"""
@@ -345,8 +482,104 @@ class CountryListError( Exception ):
"""
pass
class CountryListEntryError( CountryListError ):
"""
Handles errors occuring in class CountryList related to entrys
"""
pass
class CountryListUnitTest():
"""
Defines Test-Functions for CountryList-Module
"""
testcases = ( { "Link": mwparser.nodes.Wikilink( "Benutzer:JogoBot/Charts/Tests/Liste der Nummer-eins-Hits in Frankreich (2015)" ), # noqa
"revid": 148453827,
"interpret": "[[Adele (Sängerin)|Adele]]",
"titel": "[[Hello (Adele-Lied)|Hello]]",
"chartein": datetime( 2015, 10, 23 ) },
{ "Link": mwparser.nodes.Wikilink( "Benutzer:JogoBot/Charts/Tests/Liste der Nummer-eins-Hits in Belgien (2015)", "Wallonien"), # noqa
"revid": 148455281,
"interpret": "[[Nicky Jam]] & [[Enrique Iglesias (Sänger)|Enrique Iglesias]]", # noqa
"titel": "El perdón",
"chartein": datetime( 2015, 9, 12 ) } )
def __init__( self, page=None ):
"""
Constructor
Set attribute page
"""
if page:
self.page_link = mwparser.nodes.Wikilink( page )
else:
self.page_link = None
def treat( self ):
"""
Start testing either manually with page provided by cmd-arg page or
automatically with predefined test case
"""
if self.page_link:
self.man_test()
else:
self.auto_test()
def auto_test( self ):
"""
Run automatic tests with predefined test data from wiki
"""
for case in type(self).testcases:
self.countrylist = CountryList( case["Link"] )
if( self.countrylist.is_parsing_needed( case["revid"] ) or not
self.countrylist.is_parsing_needed( case["revid"] + 1 ) ):
raise Exception(
"CountryList.is_parsing_needed() does not work!" )
self.countrylist.parse()
for key in case:
if key == "Link":
continue
if not case[key] == getattr(self.countrylist, key ):
raise Exception( key + " " + str(
getattr(self.countrylist, key ) ))
def man_test( self ):
"""
Run manual test with page given in parameter
"""
self.countrylist = CountryList( self.page_link )
self.countrylist.parse()
print( self.countrylist )
print( "Since we have no data to compare, you need to manually " +
"check data above against given page to ensure correct " +
"working of module!" )
def main(*args):
"""
Handling direct calls --> unittest
"""
# Process global arguments to determine desired site
local_args = pywikibot.handle_args(args)
# Parse command line arguments
for arg in local_args:
if arg.startswith("-page:"):
page = arg[ len("-page:"): ]
# Call unittest-class
test = CountryListUnitTest( page )
test.treat()
if __name__ == "__main__":
main()

1
jogobot Submodule

Submodule jogobot added at 9131235b7b

View File

@@ -3,7 +3,7 @@
#
# summarypage.py
#
# Copyright 2015 GOLDERWEB Jonathan Golder <jonathan@golderweb.de>
# Copyright 2016 Jonathan Golder <jonathan@golderweb.de>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@@ -25,44 +25,58 @@
Provides classes for handling Charts summary page
"""
import locale
from datetime import datetime, timedelta
import pywikibot
# import pywikibot
import mwparserfromhell as mwparser
import jogobot
from countrylist import CountryList, CountryListError
class SummaryPage():
"""
Handles summary page related actions
"""
def __init__( self, text ):
def __init__( self, text, force_reload=False ):
"""
Create Instance
@param text: Page Text of summarypage
@type text: str
@param force-reload: If given, countrylists will be always parsed
regardless if needed or not
@type force-reload: bool
"""
# Parse Text with mwparser
self.wikicode = mwparser.parse( text )
# Force parsing of countrylist
self.force_reload = force_reload
def treat( self ):
"""
Handles parsing/editing of text
"""
# Get mwparser.template objects for Template "/Eintrag"
for entry in self.wikicode.filter_templates( matches="/Eintrag" ) :
for entry in self.wikicode.filter_templates( matches="/Eintrag" ):
# Instantiate SummaryPageEntry-object
summarypageentry = SummaryPageEntry( entry )
summarypageentry = SummaryPageEntry(entry,
force_reload=self.force_reload)
# Treat SummaryPageEntry-object
summarypageentry.treat()
# Get result
# We need to replace origninal entry since objectid changes due to
# recreation of template object and reassignment won't be reflected
self.wikicode.replace( entry, summarypageentry.new_entry.template )
self.wikicode.replace(entry, summarypageentry.get_entry().template)
def get_new_text( self ):
"""
@@ -85,18 +99,31 @@ class SummaryPageEntry():
write_needed = False
def __init__( self, entry ):
def __init__( self, entry, force_reload=False ):
"""
Constructor
@param entry: Entry template of summarypage entry
@type text: mwparser.template
@param force-reload: If given, countrylists will be always parsed
regardless if needed or not
@type force-reload: bool
"""
self.old_entry = SummaryPageEntryTemplate( entry )
self.new_entry = SummaryPageEntryTemplate( )
# Force parsing of countrylist
self.force_reload = force_reload
def treat( self ):
"""
Controls parsing/update-sequence of entry
"""
self.parse()
# Get CountryList-Object
self.get_countrylist()
# Check if parsing country list is needed
if( self.countrylist.parsed):
self.correct_chartein()
@@ -104,9 +131,9 @@ class SummaryPageEntry():
self.is_write_needed()
def parse( self ):
def get_countrylist( self ):
"""
Handles parsing process of entry template
Get the CountryList-Object for current entry
"""
# Get wikilink to related countrylist
@@ -115,44 +142,68 @@ class SummaryPageEntry():
# Get saved revision of related countrylist
self.get_countrylist_saved_revid()
# Get current year
current_year = datetime.now().year;
# Store old link.title
link_title = self.countrylist_wikilink.title
current_year = datetime.now().year
# If list is from last year, replace year
if (current_year - 1) in link_title:
self.countrylist_wikilink.title.replace( (current_year - 1), current_year )
if (current_year - 1) in self.countrylist_wikilink.title:
jogobot.output( "Trying to use new years list for [[{page}]]"
.format( page=self.countrylist_wikilink.title ) )
self.countrylist_wikilink.title.replace( (current_year - 1),
current_year )
# Try to get current years list
try:
self.countrylist = CountryList( self.countrylist_wikilink )
if self.countrylist:
self.countrylist.parse()
self.maybe_parse_countrylist()
# Maybe fallback to last years list
except CountryListError:
self.countrylist_wikilink.title = link_title
# If list is from last year, replace year
if (current_year ) in self.countrylist_wikilink.title:
jogobot.output( "New years list for [[{page}]] does not " +
"exist, fall back to old list!".format(
page=self.countrylist_wikilink.title ) )
self.countrylist_wikilink.title.replace( current_year,
(current_year - 1) )
self.countrylist = CountryList( self.countrylist_wikilink )
if self.countrylist:
self.countrylist.parse()
else:
self.maybe_parse_countrylist()
if not self.countrylist:
raise SummaryPageEntryError( "CountryList does not exists!" )
def maybe_parse_countrylist( self ):
"""
Parse countrylist if page-object exists and if parsing is needed or
param -force-reload is set
"""
# Fast return if no countrylist-object
if not self.countrylist:
return
# Parse if needed or forced
if( self.countrylist.is_parsing_needed( self.countrylist_revid ) or
self.force_reload ):
self.countrylist.parse()
def get_countrylist_wikilink( self ):
"""
Load wikilink to related countrylist
"""
if self.old_entry.Liste:
try:
self.countrylist_wikilink = next( self.old_entry.Liste.ifilter_wikilinks() )
self.countrylist_wikilink = next(
self.old_entry.Liste.ifilter_wikilinks() )
except StopIteration:
raise SummaryPageEntryError( "Parameter Liste does not contain valid wikilink!")
raise SummaryPageEntryError(
"Parameter Liste does not contain valid wikilink!" )
else:
raise SummaryPageEntryError( "Parameter Liste is not present!")
@@ -161,7 +212,7 @@ class SummaryPageEntry():
Load saved revid of related countrylist if Param is present
"""
if self.old_entry.Liste_Revision:
self.countrylist_revid = int( self.old_entry.Liste_Revision.strip())
self.countrylist_revid = int(self.old_entry.Liste_Revision.strip())
else:
self.countrylist_revid = 0
@@ -171,7 +222,8 @@ class SummaryPageEntry():
"""
self.new_entry.Liste = self.countrylist_wikilink
self.new_entry.Liste_Revision = self.countrylist.page.latest_revision_id
self.new_entry.Liste_Revision = \
self.countrylist.page.latest_revision_id
self.new_entry.Interpret = self.countrylist.interpret
self.new_entry.Titel = self.countrylist.titel
self.new_entry.Chartein = self._corrected_chartein
@@ -210,9 +262,20 @@ class SummaryPageEntry():
Detects wether writing of entry is needed and stores information in
Class-Attribute
"""
type( self ).write_needed = ( ( self.old_entry != self.new_entry ) or \
type( self ).write_needed = ( ( self.old_entry != self.new_entry ) and
self.countrylist.parsed or
type( self ).write_needed )
def get_entry( self ):
"""
Returns the new entry if CountryList was parsed otherwise returns the
old one
"""
if( self.countrylist.parsed):
return self.new_entry
else:
return self.old_entry
class SummaryPageEntryTemplate():
"""
@@ -229,8 +292,8 @@ class SummaryPageEntryTemplate():
Creates Instance of Class for given mwparser.template object of
SummmaryPageEntry Template. If no object was given create empty one.
@param template_obj mw.parser.template Object of
SummmaryPageEntry Template
@param template_obj Object of SummmaryPageEntry Template
@type template_obj: mwparser.template
"""
# Check if object was given
@@ -240,25 +303,25 @@ class SummaryPageEntryTemplate():
if isinstance( template_obj,
mwparser.nodes.template.Template ):
self.template = template_obj;
self.__initial = False;
self.template = template_obj
self.__initial = False
# Otherwise raise error
else:
raise SummaryPageEntryTemplateError( "Wrong type given" );
raise SummaryPageEntryTemplateError( "Wrong type given" )
# Otherwise initialise template
else:
self.__initial_template()
self.__initial = True;
self.__initial = True
def __initial_template( self ):
"""
Builds the initial template
"""
self.template = next( mwparser.parse(
"{{Portal:Charts und Popmusik/Aktuelle Nummer-eins-Hits/Eintrag|Liste=|Liste_Revision=|Interpret=|Titel=NN\
self.template = next( mwparser.parse( "{{Portal:Charts und Popmusik/\
Aktuelle Nummer-eins-Hits/Eintrag|Liste=|Liste_Revision=|Interpret=|Titel=NN\
|Chartein=|Korrektur=|Hervor=}}" ).ifilter_templates() )
def __getattr__( self, name ):
@@ -302,7 +365,7 @@ class SummaryPageEntryTemplate():
cmpto = self
else:
raise SummaryPageEntryTemplateError(
"One of the compared instances must have been initial!" )
"One of the compared instances must have been initial!" )
# Iterate over each param
for param in initial.template.params:
@@ -319,8 +382,8 @@ class SummaryPageEntryTemplate():
continue
# Compare other param values, if one unequal write is needed
if initial.template.get( param ).value.strip() != \
cmpto.template.get( param ).value.strip():
if( initial.template.get( param ).value.strip() !=
cmpto.template.get( param ).value.strip() ):
return True
# If not returned True until now
@@ -333,12 +396,14 @@ class SummaryPageError( Exception ):
"""
pass
class SummaryPageEntryError( SummaryPageError ):
"""
Handles errors occuring in class SummaryPageEntry
"""
pass
class SummaryPageEntryTemplateError( SummaryPageError ):
"""
Handles errors occuring in class SummaryPageEntryTemplate