data_packaging

Notes about the data packaging presentation (Ana Isabel Carvalho & Ricardo Lafuente)

http://www.transparenciahackday.org/

csv includes no metadata - no authorship information, no licenses ... -> so how to keep track with data that comes from different sources?
tiny problems, add up to irritation and blocking in process

datasets shared in different formats & on different places/change the location
-> how to organise the information, so it is available for others to use
-> much time spent in scraping, cleaning the datasets...

http://ckan.org -> asks time/ressources to maintain
data.okfn.org - frictionless data vision (http://data.okfn.org/vision)
-> they propose a stadard, 'data package' (http://data.okfn.org/standards)
-> series of commandline tools to generate dataset easily (authors, license, coloms, fieldnames...) (http://data.okfn.org/tools)
close to software packaging
cfr github.com/datasets (Rufus Pollock)
ex Datapackage Viewer on dataokfn.org
also library to import data package directly in R, Python

'Data Central': created site generator using python library with metadata of the packages
static html website, handy for local server use
Readme.md is rendered into the website
--> use this for Cqrrelations publication?! Yes! We can help with this!

Live PT Central de Dados: http://centraldedados.pt/
Data Central repository: https://github.com/centraldedados/datacentral
THackday repositories on GitHub: https://github.com/centraldedados

okfnlabs.org
more technical discussions

tools:
    data package manager: https://github.com/okfn/dpm
    data package viewer: http://data.okfn.org/tools/view

Other links:
Github datasets repository: https://github.com/datasets/ (curated by okfn)
R library for handling Data Packages: https://github.com/QBRC/RODProt