The semantic is latent, that's what you're trying to analyse.
http://upload.wikimedia.org/wikipedia/commons/thumb/4/47/SVM_with_soft_margin.pdf/page1-640px-SVM_with_soft_margin.pdf.jpg
documentation pool
http://en.wikipedia.org/wiki/Latent_semantic_analysis // wiki
http://blog.josephwilk.net/projects/latent-semantic-analysis-in-python.html
http://lsa.colorado.edu/cgi-bin/LSA-matrix.html // lsa matrix applied to texts
http://www.puffinwarellc.com/index.php/news-and-articles/articles/33-latent-semantic-analysis-tutorial.html?showall=1
http://www.puffinwarellc.com/index.php/news-and-articles/articles/30-singular-value-decomposition-tutorial.html
https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/ // understanding for dummies
https://groente.puscii.nl/lsa-thesis.pdf
function/library
SVD needs SciPy or NumPy
kwargs, args //
tf-idf Transform //In sophisticated Latent Semantic Analysis systems, the raw matrix counts are usually modified so that rare words are weighted more heavily than common words. For example, a word that occurs in only 5% of the documents should probably be weighted more heavily than a word that occurs in 90% of the documents. The most popular weighting is TFIDF (Term Frequency – Inverse Document Frequency). Under this method, the count in each cell is replaced by the following formula.
methods
https://technowiki.files.wordpress.com/2011/08/diagram1.png?w=640
- Documents are represented as “bags of words”, where the order of the words in a document is not important, only how many times each word appears in a document.
- Concepts are represented as patterns of words that usually appear together in documents. For example “leash”, “treat”, and “obey” might usually appear in documents about dog training.
- Words are assumed to have only one meaning. This is clearly not the case (banks could be river banks or financial banks) but it makes the problem tractable.
Code samples
http://www.puffinwarellc.com/lsa.py
//
from numpy.linalg import svd
from numpy import dot, diag
u, sigma, vt = svd(matrix, full_matrices=False)
for i in range(-k, 0):
sigma[i] = 0 # Reduce k smallest singular values.
matrix = dot(u, dot(diag(sigma), vt))
[[meh]]
[[words]]
[[dontknow]]
---------
The U matrix gives us the coordinates of each word on our “concept” space, the Vt matrix gives us the coordinates of each document in our “concept” space, and the S matrix of singular values gives us a clue as to how many dimensions or “concepts” we need to include.
Link to the example in the patterns library:
https://github.com/clips/pattern/blob/820cccf33c6ac4a4f1564a273137171cfa6ab7cb/examples/05-vector/03-lsa.py
http://tech.blog.aknin.name/2011/12/11/walking-python-objects-recursively/