Response to pattern package paradox: annotator is key to create reference data and made invisible train on human scoring but this process is hidden
group w/ different experiments what feature interesting to develop? How could we play out the problems?
chosing your classifier close to choose your sources chose sthg interesting to annotate make a proposal for the pattern software: naming from pattern.en import revolution, sentiment etc
issue of ?nudging (economic behaviourism, bill people with comparison of energybill with you rneighbour) hope that through the actions of humans, computer can "learn", improve. discussion on paternalism reflected discussion on big data autonomy of the subject to decide
sources Corpus of (non)patternalist statements, also latent paternalism: Gutenberg project - positively quot for patternalism Wikipedia - can have latent patternalism
3 groups of 3 annotators looked at same data Looking at definitions of paternalism Find agreement on what it is, adding comments on the score - add comments take quantity into accont - what to do w/ disagreement? in general this would be removed. Decided to mark the results with D as in disagreement
Meta mining: disagrements are listed different styles of annotating
Not frictionless, Gijs spent lot of time handling
Train the algorithm to train the algorithm to differe nciate bw paternalism and other things lots of data, asking the classifier what are the features of paternalism. Disapointing results: hard to connect to any understanding of paternalism + very few features testing the algorithm against the annotations
Manage to insert disagreement into the algorithm
Looking at wikipedia history scripts that dumps history of an article limit of pattern api, it doesn t understand article is in constant flux, it only gives last version dump a json file and visualize raw data: ../share/wiki_history/terror.html explore the history and see when things get removed or added, possibility to see where a term is mentioned same work on the entry on svw q: if the feature would be used in another context could it reveal the comments that lead to that decision? Democratic: it is the opinion of all anotators that is taken into account
Pattern writing coach
if somebody transcribes what I did it will start to criticize me Characters to use the criterias: positiveness, objectiivty and modality The Love Coach is a guy ... the coaches have been swapped to counter gender stereotyping Make more visible how it affects the way sbdy speaks You have the judgement but are not forced to respond back
Was there too little content maybe for the algorithm to be effective? "We did not want to hide the crudeness"
"i don't believe the algorithms very much" write a script where other modules could be plugged in.. like patternalism
paternalism, patternalism, paternity, patternity
what is a bag of words? feed the algorithm into intself, feeds the bag of words into the classifier and see how it understand itself
Correlative 1 / Small Data
critical, having studied probability and models... criticize machine learning no big data, make survey w/ very few data small data Trace incertaine, aire incertaine several a population of (a small number of) anonymous individuals provided a very specific set of data on the adjective 'correlative' where to put the adj. correlative b/w obj and subj Graph displays area covered by the adjective correlative We cannot compare the results obtained through interview and obtained through text mining
Two independent variables suggest a correlation through the result: The weather is nice because I got a raise I did not get a raise because the weather was nice
independent values / dependent values if there would be a correlation, you could just multiply them
Question is always, are variables really independent or not
What is the probability that you are happy, how does it relate to the probability of you being happy because of a raise?
Correlative 2 / Small Data
parti de l idee trop deterministe de representer un mot par un point echelle est de totale incertitude, mesure du non-sens 100% les criteres ne veulent plus rien dire l aire de la surface pas seule indication aussi la forme plus elle est ramifiee plus elle montre la complexite ds le sens du mot complexite en montrant les configurations possibles de sens nombre total de rectangles total ds cette forme on aurait une idee de la complexite du mot donc moins de sens de retranscrire un mot par un point
the determinist idea to classify a word by a point, it does not make any sense, it is a measure of non-sense at some point criteria do not matter anymore
i realised that the the surface e is not only importance, also the form
the more its ramified the more it shows the complexity of a word all the possible configurations of sense
we looked at all possible meanings, we're unprecise in defining words
by counting the number of rectangles : the more the rectangles, the more complex, the less possible to transcribe a word as a dot like CLIPS does the more the surface is rectangular at the basis, the more simple it would be it doesn't make sens to transcribe word by a dot, like CLIPS does
it has still makes no sense to me translate a word by a point, like is happening in the pattern library words are not unidirectional, by analysing text they try to reduce blurriness, defining their results by the 40 % that is not blurry... words after words it is reduced if we manage to communicate even if there is not intersection, confusion relationships all the time -> big data can also reflect this is the word the same going from one person to another, are we measuring the same thing? looks like the question of the background in debate of last night "the sample of the sample"
Est-ce que l incertitude diminue? Est-ce qu on mesure la meme chose? do we measure the same thing each time ? it seems more complex than a measure in physics for instance ?
Is this the question of the background? What space are we in? What plane?
it's beautiful as an exercice : it shows a certain type of questions. shadiness of meanings, words... it shows that human beings are adaptable despite/because we have different meanings for words. we can live together despite:thanks to this fuzziness (would be horrible if we all would understand same thing bug reports of conversations?
KAFKA// Mining the trial
../share/the_kafka_trial/ sentiment analysis in pattern analyses the adjectives Analyzing all the adjectives of the trial: values for positive/negative the presence of the word wrong in a sentence means the sentence will be classified as negative The trial is then reordered according to its index of positivity/negativity Poem; Country of the mountains, country of the river Poem created by navigating through a dictionary - Oulipian s+1 A text-mining inspector! No irony is permited, no nuance. Use of the irony symbol to signal irony is in use.
It could be used as a study aid. Text mining inspector, inspects the result of the algorithm. -> text mining inspector (look at where text mining is used and what results it gives/is based on
using sphinx [[talk2201 speechRec]] proposes combinations by proximity of words each sentence produces a series of hypothesis -> all the doubts of the software are displayed, normally hidden in black box, now shown
Latent semantic analysis
Confront two models between two texts and see what words connect the two texts jump b/w meanings of words
wants to look at style, but to make it easy looked at content. Sources from various news sites crawler -> classifier -> webpage how news items are structured trained with different newspapers correlates newspaper and political parties in the future, use css styles as classifiers