pattern elements
there are different models to train the classifiers
- NB: Naive Bayes, based on the probability that a feature occurs in a class.
- KNN: k-nearest neighbor, based on the k most similar documents in the training set.
- SLP: single-layer averaged perceptron, based on an artificial neural network.
- SVM: support vector machine, based on a representation of the documents in a high-dimensional space separated by hyperplanes (see further).
--> http://www.clips.ua.ac.be/pages/pattern-vector#classification
parser (options)
- tags=True --> each word is annotated with a part-of-speech tag.
- chunks=True --> each word is annotated with a chunk tag and a PNP tag (prepositional noun phrase, PP + NP). The O tag (= outside) means that the word is not part of a chunk.
- relations=True --> each word is annotated with a role tag (e.g., -SBJ for subject or -OBJ for).
- lemmata=True --> each word is annotated with its base form.
- tokenize=False, --> punctuation marks will not be separated from words.
- The input string is expected to be tokenized beforehand, or sentence delimiters are not discovered.
parser tags
The word's part-of-speech tag is NN, which means that it is a noun. The word occurs in a NP chunk, a noun phrase (i.e., a fork). It is also part of a prepositional noun phrase (i.e., with a fork).
Common part-of-speech tags are NN (noun), VB (verb), JJ (adjective), RB (adverb) and IN (preposition).
Common chunk tags are NP (noun phrase) and VP (verb phrase).
Common chunk relations are NP-SBJ (subject) and NP-OBJ (object).
--> http://www.clips.ua.ac.be/pages/pattern-en#parser
En-sentiment.xml
location: /python2.7/dist-packages/Pattern-2.6-py2.7.egg/pattern/text/en/en-sentiment.xml
<?xml
- version="1.0"
- encoding="utf-8"?
>
- <sentiment
- language="en"
- version="1.3"
- author="
- license="PDDL"
- >
- <word
- form=""
- cornetto_synset_id=""
- wordnet_id=""
- pos=""
- sense=""
- polarity=""
- subjectivity=""
- intensity=""
- confidence=""
- reliability=""
- ></word>
- <word></word>
- <word></word>
- ...
- </sentiment>
<!--
SUBJECTIVITY LEXICON FOR ENGLISH ADJECTIVES.
Adjectives have a polarity (negative/positive, -1.0 to +1.0) and a subjectivity (objective/subjective, +0.0 to +1.0).
The reliability specifies if an adjective was hand-tagged (1.0) or inferred (0.7).
Words are tagged per sense, e.g., ridiculous (pitiful) = negative, ridiculous (humorous) = positive.
The Cornetto id (lexical unit id) and Cornetto synset id refer to the Cornetto lexical database for Dutch.
The WordNet id refers to the WordNet3 lexical database for English.
The part-of-speech tags (pos) use the Penn Treebank || tag set: NN = noun, JJ = adjective, ...
For English movie reviews (Pang & Lee polarity dataset v2.0), the accuracy is 75% (P 0.76, R 0.75, F1 0.75).
-->