how to cqrrelate

elements for every correlation value correlations.correlations.forEach(function(row, r_idx) { row.forEach(function(val, c_idx) { var $cell = document.createElement("div"); // Position the cell based on column and row index $cell.style.left = c_idx * 10; $cell.style.top = r_idx * 10; // Set the CSS background color to red for positive correlations, and blue for negative correlations if(val > 0) { $cell.style.backgroundColor = "rgb(" + Math.floor(val*255) + ",0,0)"; } else if(val < 0) { $cell.style.backgroundColor = "rgb(0,0," + Math.floor(-val*255) + ")"; } document.body.appendChild($cell); }); });

# With labels, the result looks as such. The diagonal line in the middle represents every column's positive correlation with itself:

http://i.imgur.com/HxmvLya.png?1

#Categorising the columns (JM)

# Naturally a large number of columns presented in the correlation matrix presemnt some ambiguities

# - you can get an idea of +, weak, no and - correlations but the order of the columns result in less immediate understanding - WE SHOULD SORT THE COLUMNS!!!

# A fan of ontologies and taxonomies, I use the following approach for categorising things (feel free to use my framwork, its good for everything, though designed for public affairs and monitoring):

Projects Project (to focus later on when stabbing at user profiles) Issue

Activity
Initiative

Event

Stakeholders

Group Individual Organisation

# The categories in the database focus on #Issues # This is because the database is focusing on qualities which contribute to the economic success of the potato
# I split these issues into two main themes: Resistance

Content

# This is because the categories highlighted a split between (I) inputs and outputs which determined the health of the crop and then (II) what were the outcomes of the sales ready product

The subcategories became:
Resistance

Environment
- #This was chosen to represent what the potato would be residing in (soil adaptability, survival against cold)
Host
- #This covered the risk of the potato being fed upon by things (virus, bacteria, potato owls)
Content
Content Yield
- #This covered the economic production (bountiful, volume)
Content Quality
- #Things determined to determine by volume value (€/kg style)
Content Usability
- # How the potato can be used fucntionally (for frying, starch usage)

# For this approach there was a need for bundling across an x/y access. Therefore things which may be interdependent (perhaps a 'host' category exists because of poor 'environmental' conditions) are ignored. Also, this breakdown was done in expediency, with only cursory knowledge of topics (after all this is only to shift columns for legibility).

# This is naturally responsive to the existing categorisation, if personal insight was greater perhaps it could have been less passive. If reordering the data provided more insight there approach could naturally be refined.

# The file with the columns and my folksonomy can be found herehttp://cqrrelations.lan/share/datasets/europotato/potato_categories_by_JM.csv

# They can be added to the data via a join.

# Either a 'two stage' column approach could be done with the data columns or a rename of the columns to include say 'resistance environment' or 'content yield' before can sort the columns at the correct stage.

t#TODO(ricardo) - Scraping europotato into a CSV

## Turning descriptors into numbers

# Finding correlations from numbers

# Visualizing the matrix of correlations

#Categorising the columns (JM)

#TODO(???) - "... profit"