you can change SVM() to KNN(), now everybody is in the Torah. 'bag of words' is not necessarily about words but features, these can be anything (f.ex. when you look at images)
Other example: defining authors of Gutenberg same code, but broke up the texts into sentences, did not get to results Roel working on utopian & distopian literature (with aim to check business model of Clips): same, but result gives 'none'
Wikipedia history classify authorship -> txt/date/handle on username predict who wrote an article after analyzing the edits of the different authors. policing and profiling mentality in algorithms: is the wikipedia text NPOV (http://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_view) or not? automatic bot to normalize wiki articles. Maybe more interesting to train based on ADDED vs. REMOVED (a police bot / spam bot) Qualify the censors Also: modality / sentiment -- could you question the NPOV (neutrality) of Wikipedia text? the npov page of wikipedia has 0.3 neutrality sentiment! :)
Gettin sentimental -> real time writing tool that grades your sentimentality! 'it seems you're getting a bit sentimental'
'sentiment' is based on simple annoted dictionary ex with 'Mussolini'= 'Benito Mussolini' is 0.26 subjective. -> sentiment / positiveness analyses on a certain wikipedia article
Kafka's "the trial"
ex. search onf 'free' 'K was living in a free country' free is 0.8 positive.. and 0.666 objective, thats how the sentences are categorized... -> rating is between -1 and 1 for subjective - objective / negative - positive -> has been analysed in the context of sentences in which the word 'free' occurs -> depends who the annotators are... show their perspective; is as good as determining the protocol for sentiment
what kind of knowledge is produced with what kind of tool
the annotation problem of the PAN database. based on necessary averaging. (it's the question of what kind of sentiment you are looking at while annotating.) how can you show this problem in an exercise? how can we open up this annotation (evaluation) proces?
tools for anti-harrassment, trying to find out who the author is what counts as harassment? to whom? a practical example that would show the absurdity of defining what counts as hurtful statistically. -> depends on who is speaking in what context - not generizable -> the tool lack an awareness of context
'there is no free lunch' (for each aim you have to construct your classifier' vs 'neutrality, objectivity' normality gets reinforced make clear point of view of the annotator -> by doing this, accept you're living in a world that is statistically controlled
In Readme on data/training data, there is no date mentioned -> points of view depend on newscontext -> what if we use the same data in 10 years? has authority (How could annotation data sets carry something of their contstruction
examples of excercises: annotation process that creates links back to context so you can unpack how we got there, data reveals these traces
ex start with positive sentences, finish with negatives -> rereading a text, look at the meaning showing the normative structure
-> make a system that allows people to look at sitcom & comment in % of being upset Speaking in bags of words
-> find most positive sentence on the internet (or generate this most positive sentence) The most positive sentence in J.G. Ballard's CRASH / "Why I want to fuck Ronald Reagan" http://sensitiveskinmagazine.com/ronald-reagan/ the first one: He saw Reagan in a complex rear-end collision , dying a stylized death that expressed Vaughan 's obsession with Reagan 's genital organs , like his obsession with the exquisite transits of the screen actress 's pubis across the vinyl seat covers of hired limousines.
-> revisiting harassment mining annotators vs harassers and harassed
-> understanding algorithms & showing their limitations -> writing tools ex autocomplete - generalising/randomising Generating simple english from a "normal" input -> real time writing tool that grades your sentimentality! 'it seems you're getting a bit sentimental/pedophile/lying/Kafkaian... '
mh feedback result: This constabulary survey documents the findings of the vicious probe into an allegation made by Mohamed aluminium Fayed of Dixieland to death penalty the Princess of Cambria and his boy Dodi Al Fayed .
-> Vocabulary styles: automatic profanities.. automatic adademics as well? lexicons in pattern, academic
profanity 1000 basic English they needed a list of profanities for the paedophilia project..
-> Paul Otlet's text. - The Time Machine at the time seen as progressive and world changing. seen now is racist, colonialist, paternalist. the classifiers changed clearly. -> how to follow the claissifers across time compare with other texts of that time, to see where it is progressive, to put it in context time-machine: is this 'how to make the experiment fail interestingly' 'not only use bag of words, but also associations between words'
-> Writing tool, sentiment as a reader Different sources, responses to it 1984 and it's influence ... fear/dystopia as an idea and how people nowadays use it to refer to it, as a reading feedback loop -> there is a norm for sentiment -> compare to the feeling that you feel when you read/write