meta-data-science-socrata



meta-data-science-socrata

0 0


meta-data-science-socrata


On Github tlevine / meta-data-science-socrata

Meta Data Science

Thomas Levine

Meta data science

Datasets on Socrata portals feel like files rather than data.

  • Data science about data science
  • Science about metadata

Outline

Data science mindset What I did What I learned Things to consider

Data science mindset

Exploit cheap computers to study how the world works.

Store everything. Anything can be counted. Numbers can be turned into anything. Boring work should be sent to robots. Get more data rather than tuning your model.

Store everything

  • Storage is cheap.
  • You don't need a full research plan.

Anything can be counted

Numbers can be turned into anything

Boring work should be sent to robots

  • Computers can perform mindless tasks
  • Computers can also make complex decisions
  • All analyses should be scripted.

Get more data rather than tuning your model

  • Modeling problems versus computation/storage problems
  • Confidence versus validity

  • Don't collect new data to answer your new questions.
  • Look for new ways of using existing data sources.
  • Store raw data! Don't aggregate prematurely.

What I did

Data science about open data

Store everything

Anything can be counted

Numbers can be turned into anything

AppGen

Boring work should be sent to robots

Site analytics

Scripted analyses

Get more data rather than tuning your model

What I learned

Nobody knows much How Socrata Open Data portal is constructed How people use Socrata Open Data portal

What people know

  • Portal administrators
  • Portal developers
  • Anecdotes

Construction of Socrata Open Data Portal

Data provenance

Every view on Socrata has an "owner" and a "table author". What's an owner, and what's a table author?

API limits

What are Socrata's API limits?

I don't know, but they apply across all portals.

Form validation

What must be true about the form fields?

One web application

With a some software, you have many different installations that might be able to communicate with each other.

  • Wordpress
  • CKAN

With other software, a single web application runs everything.

  • Tumblr
  • Socrata

How people use Socrata

Analysis tools exist.

People use them.

But not really.

Benefits of the data portal

(As I see it)

Import data from various formats. Standard way of discovering datasets. Convert data to standard formats. Mark datasets as official in some sense.

But not a lot of analysis

Things to consider

Data science

  • Store/expose everything
  • Datasets are data points, and metadata is data
  • You can automate human work, even if it seems complicated.

Socrata

  • What if the different portals were more connected?
  • Are the analysis tools important?

References