Jump to content

User:Wikinews Importer Bot: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
→‎top: inactive for almost a year
reactivated
Line 1: Line 1:
{{bot|Misza13|status=inactive}}
{{bot|Misza13|status=}}


== Operation ==
== Operation ==

Revision as of 21:59, 7 January 2015

Operation

This bot imports certain dynamically-generated Wikinews pages into:

  • Wikipedia portals that use a /Wikinews subpage, or
  • Wikipedia articles that use a {{Wikinewshas}} subpage with the same name as the article.

Lists on Wikinews are rendered dynamically, so the items are extracted from the resulting HTML.

These imports are accomplished by implementing the setup steps described below.

The bot checks for updates on an hourly basis and only updates if there has been a change.

Setup for portals

1. Create a subpage at Wikinews using the DynamicPageList function, e.g., n:Portal:Film/Wikipedia (if there is a category instead of a portal, do not make a category subpage, instead, make a subpage of n:Wikinews:Wikinews Importer Bot). Insert code similar to the following. (See n:Wikinews:DynamicPageList for more info on DynamicPageList syntax)

<DynamicPageList>
category=Published
category=<category name>
notcategory=No publish
notcategory=Disputed
stablepages=only
count=5
namespace=0
addfirstcategorydate=true
</DynamicPageList><noinclude>

'''Note.''' This page is used by the [[w:User:Wikinews Importer Bot|Wikinews Importer Bot]] to update the [[w:Portal:<destination page>|Wikipedia <destination> Portal news]].
</noinclude>

2. Create a sub-subpage at a portal news section, e.g., Portal:Film/Film news/Wikinews and others. Insert the following code.

<noinclude>
{{User:Wikinews Importer Bot/config|page = Portal:<source page name>}}
</noinclude>

Remember, the bot checks for updates on an hourly basis and only updates if there has been a change. You can wait for the first update or manually copy over the original links from the source page. Insert code similar to the following for an initial manual copy of Wikinews items.

* [[wikinews:<first news item>|<first news item>]]
* [[wikinews:<second news item>|<second news item>]]
* etc.
<noinclude>
{{User:Wikinews Importer Bot/config|page = Portal:<source page name>}}
</noinclude>

Optional parameter: A custom indentation string can be passed using the indent = parameter (which defaults to indent = * if not specified). See example below.

<noinclude>
{{User:Wikinews Importer Bot/config|indent = **|page = Portal:<source page name>}}
</noinclude>

3. Transclude the above page to the news section page, e.g., Portal:Film/Film news. Insert code similar to the following.

'''[[:wikinews:Portal:<source>|Wikinews <source> portal]]'''<div style="float:right;margin-left:0.9em"> 
<imagemap> 
Image:Wikinews-logo.svg|75x45px 
default [[n:Main Page|Read and edit Wikinews]] 
desc none </imagemap> </div>
{{Portal:<destination page in step 2>}}

Setup for articles

The setup for articles is very similar to the setup for portals. The basic steps, with important differences and examples are listed below.

1. Create a subpage at Wikinews the same way they are made for portals. In fact, if the source specifications are the same, e.g., number of items and date usage, it is possible to use the same source for both a portal and an article, although the following setup steps differ. In this case, the "Note." directs to an article instead of a portal. (Include all destinations used by the same source.)

<DynamicPageList>
category=Published
category=<category name>
notcategory=Disputed
count=5
addfirstcategorydate=true
namespace=main
</DynamicPageList><noinclude>

'''Note.''' This page is used by the [[w:User:Wikinews Importer Bot|Wikinews Importer Bot]] to update the [[w:<destination page>|Wikipedia <destination> article]].
</noinclude>

2. Start a subpage at Template:Wikinewshas that matches the article name where you want the list to go, such as Template:Wikinewshas/Film and others. Include code such as the following:

<noinclude>
{{User:Wikinews Importer Bot/config|page = Portal:<source page name>}}
</noinclude>

3. Place {{Wikinewshas}} on the desired article, typically in the "External links" section. Use the Only header, "automatic" content option. Link to a Wikinews portal with code such as the following, e.g., at Film#External links:

{{Wikinewshas|related<br>[[wikinews:Portal:Film|Film news]]}}

After a topical subpage is created, its news can be added to related articles by explicitly including the subpage as the second parameter. Add code such as the following:

{{Wikinewshas|related<br>[[wikinews:Portal:Film|Film news]]|{{Wikinewshas/Film}}}}

Wikinews templates on user pages

Two templates – {{Wikinewshas}} and {{Wikinewstable}} – can be added to user pages for selected Wikinews topics. For example, the following parameters will display film Wikinews as shown.

{{wikinewshas|the latest<br>[[n:Category:Film|film news]]|{{Portal:Film/Film news/Wikinews}}}}

{{Wikinewstable
  |width=
  |topic=film
  |newspage=Category:Film
  |pediapage={{Portal:Film/Film news/Wikinews}}
}}

Template:Wikinewshas

Latest film news from Wikinews
Read and edit Wikinews

Visit Category:Film to read and write news articles in more detail.

See also

Source

#/usr/bin/env python
# -*- coding: utf-8 -*-

import os, sys, re, traceback
sys.path.append(os.environ['HOME'] + '/pywikipedia')

import wikipedia, simplejson
from xml.dom.minidom import parseString as minidom_parseString
from xml.dom import Node


MONTHS = [u'January',u'February',u'March',u'April',u'May',u'June',u'July',u'August',u'September',u'October',u'November',u'December',
    u'Janvier',u'Février',u'Mars',u'Avril',u'Mai',u'Juin',u'Juillet',u'Août',u'Septembre',u'Octobre',u'Novembre',u'Décembre'] #TODO: srsly...
date_rx = re.compile(r'(\d+) (%s) (\d\d\d\d)' % ('|'.join(MONTHS),), re.IGNORECASE)


def parseNews(page):
    wikipedia.output(page.aslink())
    site = page.site()
    response, data = site.postForm('/w/api.php', {'action':'parse','format':'json','page':page.title()})
    text = simplejson.loads(data)['parse']['text']['*']
    #print text

    #doc = minidom_parseString(u'<html><body>' + text.encode('utf-8') + u'</body></html>')
    doc = minidom_parseString((u'<html><body>' + text + u'</body></html>').encode('utf-8'))

    ul = doc.getElementsByTagName('ul')
    if ul:
        for li in ul[0].getElementsByTagName('li'):
            if li.firstChild.nodeType == Node.TEXT_NODE:
                prefix = li.firstChild.nodeValue
                if site.lang == 'en':
                    prefix = date_rx.sub(r'[[\2 \1]]',prefix)
                elif site.lang == 'fr':
                    prefix = date_rx.sub(r'{{date|\1|\2|\3}}',prefix)
            else:
                prefix = ''
            yield prefix, wikipedia.Page(site, li.getElementsByTagName('a')[0].getAttribute('title'))


def doOnePage(tpl, page, site_src):
    wikipedia.output(page.aslink())
    txt = page.get().replace('_', ' ')
    rx = re.search(r'{{(%s\|.*?)}}' % (tpl.title()), txt)
    if not rx:
        return

    config = {
            'page' : (None, False),
            'indent' : (u'*', False),
            }

    raw_config = rx.group(1).split('|')[1:]
    for x in raw_config:
        var, val = x.split('=',1)
        var, val = var.strip(), val.strip()
        config[var] = (val, True)

    if not config['page'][0]:
        wikipedia.output(u'No target page specified!')

    newsPage = wikipedia.Page(site_src, config['page'][0])

    text = u'\n'.join(
            [u'%(indent)s %(prefix)s[[wikinews:%(lang)s:%(article_page)s|%(article_title)s]]' % {
                    'article_page' : re.sub(r'[\s\xa0]', ' ', news.title()),
                    'article_title' : news.title(),
                    'prefix' : prefix,
                    'indent' : config['indent'][0],
                    'lang' : site_src.lang }
                for prefix, news in parseNews(newsPage)]
            )

    #Check for old content
    oldtext = page.get()
    #Ignore lead (timestamp etc.)
    rx = re.compile('^(.*)<noinclude>.*', re.DOTALL)
    oldtext = rx.sub(r'\1', oldtext).strip()

    if text != oldtext:
        raw_config = '|'.join(u'%s = %s' % (v,k[0]) for v,k in config.items() if k[1])
        text = u'%(text)s<noinclude>\n{{%(tpl)s|%(config)s}}\nRetrieved by ~~~ from [[wikinews:%(lang)s:%(page)s|]] on ~~~~~\n</noinclude>' % {
                'text' : text,
                'tpl' : tpl.title(),
                'config' : raw_config,
                'page' : config['page'][0],
                'lang' : site_src.lang,
                }
        #wikipedia.output(text)
        page.put(text, comment=u'Updating from [[n:%s|%s]]' % (newsPage.title(),newsPage.title(),))
        
    return {
        'src' : newsPage.title(),
        'ns'  : page.site().namespace(page.namespace()),
        'dst' : page.title(),
        }


def main(lang):
    pages_maintained = {}
    site_src = wikipedia.getSite(code = lang, fam = 'wikinews')
    site_dest = wikipedia.getSite(code = lang, fam = 'wikipedia')
    tpl = wikipedia.Page(site_dest, 'User:Wikinews Importer Bot/config')
    for page in tpl.getReferences(onlyTemplateInclusion=True):
        if page.title().endswith('/Wikinews') or page.title().startswith('Template:Wikinewshas/') or '/Wikinews/' in page.title():
            try:
                step = doOnePage(tpl, page, site_src)
                if step['ns'] not in pages_maintained:
                    pages_maintained[step['ns']] = []
                pages_maintained[step['ns']].append(step)
            except KeyboardInterrupt:
                break
            except:
                traceback.print_exc()

    audit_txt = u''
    for ns in sorted(pages_maintained.keys()):
        audit_txt += '\n\n== %s: ==\n\n' % ns
        items = sorted(pages_maintained[ns], key=lambda x: x['dst'])
        audit_txt += '\n'.join('# [[%(dst)s]] &larr; [[n:%(src)s|%(src)s]]' % item for item in items)
    audit_txt = audit_txt.strip()

    audit_page = wikipedia.Page(site_dest,'User:Wikinews Importer Bot/List')
    oldtext = audit_page.get()
    rx = re.compile('^.*?(?=\n== )', re.DOTALL)
    oldtext = rx.sub('', oldtext).strip()
    #wikipedia.showDiff(oldtext, audit_txt)
    if oldtext != audit_txt:
        audit_page.put(
            u'List of pages maintained by {{user|Wikinews Importer Bot}} by namespace. Last updated: ~~~~~\n\n' + audit_txt,
            comment='Updating list of maintained pages (%d items).' % sum(len(i) for i in pages_maintained.values()),
            )

if __name__ == '__main__':
    try:
        if len(sys.argv) == 1:
            lang = 'en'
        else:
            lang = sys.argv[1]
        main(lang)
    finally:
        wikipedia.stopme()