/dev/random
As part of the 2011 Wikimedia Summer of Research, we uncovered a possible correlation between the decline in new active editors that began in 2007 and the rise of warnings issued to new users by bots and automated tools, which started in 2006.
http://blog.wikimedia.org/2012/03/27/analysis-of-the-quality-of-newcomers-in-wikipedia-over-time/
L=A=N=G=U=A=G=E
James Joyce dictionary
deception
detection of alzheimer's = Memories of my nervous illness by Daniel Paul Schreber
http://en.wikipedia.org/wiki/Daniel_Paul_Schreber
http://www.luftgangster.de/schreber/start.html
http://en.wikipedia.org/wiki/War_on_Terror
Wikipedia / Mediawiki API: http://www.mediawiki.org/wiki/API:Main_page
documentation english pattern: http://www.clips.ua.ac.be/pages/pattern-en
Tasks!
- => build wiki parsing tool
- => collate data in csv (necessary?)
- => build pattern parser (checking modality && sentiment) for output - what shall be the output? text-based, graph, web application?
- => Choose and download taxonomy datasets
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xml&titles=Main%20Page
Getting the history of a wikipedia page.
Properties Revision
http://www.mediawiki.org/wiki/API:Properties#revisions_.2F_rv
Useful wiki API URLs:
Python diffing
http://localhost/doc/python2.7/html/library/difflib.html?highlight=diff#difflib
how subjective is the Wikipedia article for Neutrality?
Wikipedia article: Neutrality
'Neutral' is 0.39375 subjective.
'Politics and social science' is 0.195238095238 subjective.
'Mathematics and natural science' is 0.338293650794 subjective.
'Geographic locations' is 0.0 subjective.
'Other and related senses' is 0.3875 subjective.
Wikipedia article: Subjectivity
'Subjectivity' is 0.405013736264 subjective and -0.00144230769231 positive.
'Society' is 0.414166666667 subjective and 0.015 positive.
'Self' is 0.37 subjective and -0.0388888888889 positive.
'See also' is 0.333333333333 subjective and -0.166666666667 positive.
'References' is 0.0 subjective and 0.0 positive.
'Further reading' is 0.176136363636 subjective and 0.0340909090909 positive.
# ;) / tourette.py
from pattern.en.wordlist import PROFANITY
import os
for word in PROFANITY:
print word
os.system('echo "'+word+'" | festival --tts --pipe')
#!/usr/bin/python
#getting: time|user|content
import urllib
import json
from csv import writer
pagetitle = 'War_on_Terror'
#baseq = 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvli
baseq = 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvlim
q=baseq
count = 0
csvfile = open('revisions.csv', 'wb')
w = writer(csvfile, dialect='excel')
while True:
results = json.load(urllib.urlopen(q))
p = results['query']['pages']
for key in p:
pass
revs = p[key]['revisions']
count += len(revs)
print revs[-1]['timestamp']
print len(revs)
for r in revs:
w.writerow((r['revid'], r['timestamp'], r['user'].encode('utf-8'), r['co
rvcontinue = None
if 'query-continue' in results:
if 'revisions' in results['query-continue']:
if 'rvcontinue' in results['query-continue']['revisions']:
rvcontinue = results['query-continue']['revisions']['rvcontinue']
q = baseq+"&rvcontinue="+str(rvcontinue)
if rvcontinue==None:
break
break
print "done"
print count, "total revs"
csvfile.close()
# GETTING DATASETS FOR USING TAXONOMIES