## The semantic is latent, that's what you're trying to analyse.

###### http://upload.wikimedia.org/wikipedia/commons/thumb/4/47/SVM_with_soft_margin.pdf/page1-640px-SVM_with_soft_margin.pdf.jpg

#### documentation pool

http://en.wikipedia.org/wiki/Latent_semantic_analysis // wiki

http://blog.josephwilk.net/projects/latent-semantic-analysis-in-python.html

http://lsa.colorado.edu/cgi-bin/LSA-matrix.html // lsa matrix applied to texts

http://www.puffinwarellc.com/index.php/news-and-articles/articles/33-latent-semantic-analysis-tutorial.html?showall=1

http://www.puffinwarellc.com/index.php/news-and-articles/articles/30-singular-value-decomposition-tutorial.html

https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/ // understanding for dummies

https://groente.puscii.nl/lsa-thesis.pdf

#### function/library

SVD needs SciPy or NumPy

kwargs, args //

tf-idf Transform //In sophisticated Latent Semantic Analysis systems, the raw matrix counts are usually modified so that rare words are weighted more heavily than common words. For example, a word that occurs in only 5% of the documents should probably be weighted more heavily than a word that occurs in 90% of the documents. The most popular weighting is TFIDF (Term Frequency – Inverse Document Frequency). Under this method, the count in each cell is replaced by the following formula.

#### methods

https://technowiki.files.wordpress.com/2011/08/diagram1.png?w=640

- Documents are represented as “bags of words”, where the order of the words in a document is not important, only how many times each word appears in a document.
- Concepts are represented as patterns of words that usually appear together in documents. For example “leash”, “treat”, and “obey” might usually appear in documents about dog training.
- Words are assumed to have only one meaning. This is clearly not the case (banks could be river banks or financial banks) but it makes the problem tractable.

#### Code samples

http://www.puffinwarellc.com/lsa.py

`//`

`from numpy.linalg import svd`

`from numpy import dot, diag`

` `

`u, sigma, vt = svd(matrix, full_matrices=False)`

`for i in range(-k, 0):`

` sigma[i] = 0 # Reduce k smallest singular values.`

` `

`matrix = dot(u, dot(diag(sigma), vt))`

[[meh]]

[[words]]

[[dontknow]]

**---------**

**The U matrix gives us the coordinates of each word on our “concept” space, the Vt matrix gives us the coordinates of each document in our “concept” space**, and** the S matrix of singular values gives us a clue as to how many dimensions or “concepts” we need to include.**

Link to the example in the patterns library:

https://github.com/clips/pattern/blob/820cccf33c6ac4a4f1564a273137171cfa6ab7cb/examples/05-vector/03-lsa.py

http://tech.blog.aknin.name/2011/12/11/walking-python-objects-recursively/