Notes about the data packaging presentation
(Ana Isabel Carvalho & Ricardo Lafuente)
http://www.transparenciahackday.org/
csv includes no metadata - no authorship information, no licenses ... -> so how to keep track with data that comes from different sources?
tiny problems, add up to irritation and blocking in process
datasets shared in different formats & on different places/change the location
-> how to organise the information, so it is available for others to use
-> much time spent in scraping, cleaning the datasets...
http://ckan.org -> asks time/ressources to maintain
data.okfn.org - frictionless data vision (http://data.okfn.org/vision)
-> they propose a stadard, 'data package' (http://data.okfn.org/standards)
-> series of commandline tools to generate dataset easily (authors, license, coloms, fieldnames...) (http://data.okfn.org/tools)
close to software packaging
cfr github.com/datasets (Rufus Pollock)
ex Datapackage Viewer on dataokfn.org
also library to import data package directly in R, Python
'Data Central': created site generator using python library with metadata of the packages
static html website, handy for local server use
Readme.md is rendered into the website
--> use this for Cqrrelations publication?! Yes! We can help with this!
Live PT Central de Dados: http://centraldedados.pt/
Data Central repository: https://github.com/centraldedados/datacentral
THackday repositories on GitHub: https://github.com/centraldedados
okfnlabs.org
more technical discussions
tools:
data package manager: https://github.com/okfn/dpm
data package viewer: http://data.okfn.org/tools/view
Other links:
Github datasets repository: https://github.com/datasets/ (curated by okfn)
R library for handling Data Packages: https://github.com/QBRC/RODProt