Gutenberg project sources

  1. J. B. Bury, The Idea Of Progress, 1920,
  2. Maud Churton Braby, Modern Marriage and How To Bear It, 1908,
  3. Harriet Martineau, How to Observe Morals and Manners, 1838,
  4. Irwin Edman, Human Traits and their Social Significance, 1920,
  5. James Hayden Tufts, The Ethics of Cooperation, 1918,
  6. James Harvey Robinson, The Mind in the Making: The Relation of Intelligence to Social Reform, 1921,
  7. Helen Kendrick Johnson, Woman And The Republic, 1897,
  8. Charles Darwin, On the Origin of species, 1859,
  9. Emma Goldman, Anarchism and other essays, 1910,
  10. John F. Hume, The Abolitionists (Together With Personal Memories Of The Struggle For Human Rights), 1830-1864,

Wikipedia sources

  1. Mining : FS
  2. Textile Industry : FS
  3. History of computing hardware : MB
  4. Marissa Mayer : MB
  5. Larry Page : CL
  6. Liberty : CL
  7. Choice : CC
  8. Sabotage : CC
  9. Social Darwinism : JBT
  10. Anarchism : JBT

(Wikipedia - pattern import)

#!/usr/bin/env python from pattern.web import Wikipedia article = Wikipedia().search('sociology') for section in article.sections: print repr(' ') print repr(' ' * section.level + section.title) print repr(' ' * section.level + section.content) [[wikipedia-sociology-scraped-content]]

Roels Random Wiki Section Script

#!/usr/bin/env pythonfrom pattern.web import Wikipedia
import random

wikilist = ['Mining', 'Textile Industry', 'History of computing hardware', 'Marissa Mayer', 'Larry Page', 'List of stock characters']

#This script will choose a random section from a list of given articles.

for wiki in wikilist:
                article = Wikipedia().search(wiki)
                chosen_section = ''
                section_filter =['References', 'Further Reading', 'See also', 'Other uses']

                chosen_section = random.choice(article.sections)

                if chosen_section.title in section_filter:
                        chosen_section = random.choice(article.sections).title

                print chosen_section.title
                print chosen_section.plaintext()
                print '*'*40

Roels 35-Random-Paragraph-From-Plaintext-Gutenberg-book-To-CSV-O'matic

#!/bin/env python
# a script to automatically get paragraphs from gutenberg plaintext books and feed them into a csv 
import os, random, sys, csv 

#author name, book title, year, source, localfilename

("J. B. Bury", "The Idea Of Progress", "1920", "", 'theideaofprogress.txt'),
("Maud Churton Braby", "Modern Marriage and How To Bear It", "1908", "", 'modernmarriageandhowtobearit.txt'),
("Harriet Martineau", "How to Observe Morals and Manners", "1838", "", 'howtoobservemoralsandmanners.txt'),
("Irwin Edman", "Human Traits and their Social Significance", "1920", "", 'humantraitsandtheirsocialsignificance.txt'),
("James Hayden Tufts", "The Ethics of Cooperation", "1918", "",'theethicsofcooperation.txt'),
("James Harvey Robinson", "The Mind in the Making: The Relation of Intelligence to Social Reform", "1921", "",'themindinthemaking.txt'),
("Helen Kendrick Johnson", "Woman And The Republic", "1897", "",'womanandtherepublic.txt'),
("Charles Darwin", "On the Origin of species", "1859", "", 'originofspecies.txt'),
("Emma Goldman", "Anarchism and other essays", "1910", "", 'anarchismandotheressays.txt'),
("John F. Hume","The Abolitionists (Together With Personal Memories Of The Struggle For Human Rights)","1830-1864","","theabolitionists.txt")

with open('gutenberg.csv', 'wb') as f:
    for i in authorlist:
        print "working on", i
        filtered_list = []
        text = open(i[-1]).read().split('\r\n\r\n')
        for paragraph in text:
            paragraph = paragraph.replace('\r','').replace('\n','')
            if len(paragraph) > 1:
        for a in range(0,35):
            print "paragraph", a, "of", i[-1]
            random_pick = random.choice(filtered_list)
            # id, source, subject, year, content
        print "\n"