There is a fair amount of open data from the US government now available on data.gov. At the White House Open Data Innovation Summit held last month, three questions were explored:
- How has open data made the government more efficient and effective?
- How has open data made us live better lives?
- How has open data spurred innovative thinking, job growth and economic opportunity?
Not if, how.
Don’t get me wrong. I believe improving the transparency of the US Government is a good thing. And I think open data plays an important role in that mission. It’s just that it’s not a magic bullet for efficiency and innovation, let alone giving us all better lives.
Open data should not be considered an end in and of itself, but one step that might help increase the likelihood of several beneficial outcomes (if the right people can access it, at the right time, and under the right conditions.)
The real question is not about how many apps have been created based upon the data, but about what kind of analysis has been and will be made possible.
I explored data.gov with this spirit in mind, and below is an account of my first experience with the resource. Please forgive my poor data visualization skills!
First off, data.gov is a nice looking site! I appreciated that it had data separated by topic area, as opposed to by department or year or something. Being the particular brand of dork that I am, I clicked on “Science & Research.” My first impression was that I loved the disclaimer right off the bat that they might not have everything you want, with a direct link to a contact form to request a particular type of data. I wonder how long it takes to get a response, but it’s a nice feature.
The highlighted dataset at the top of the page took me to a list of NASA’s labs and facilities. I have visited NASA and I know they’re working on amazing projects, but this particular spreadsheet did not improve my understanding in any way. Without doing an enormous amount of further research, all I was able to glean from these data is that NASA works with multiple agencies on many different projects. I also feel concerned about the privacy of the individual people who have their names, phone numbers, and addresses listed on the sheet…
I kept exploring and found research.gov, which took me to a page on the Adoption of Genetically Engineered Crops in the U.S. I downloaded the data there, and played around with it for awhile.
Do you know the percentage of soybeans planted in the US that are genetically engineered? Or how it varies by state? I didn’t either, but now I do! Here is a graph depicting the change over the past 16 years among 14 states that grow soybeans, and in the US as a whole . In 2000, 54% of all soybeans planted in the US had been genetically engineered. In 2016, that number has risen to 94%!
It is actually pretty cool to have this information, and it makes me curious to keep exploring data.gov to see what else I can find.
At this point however, the datasets I have seen seem sparser and more random then I was expecting. If I had a particular research question in mind, I doubt I would be able to find what I wanted, or even if I did, I doubt the data points would be the particular ones I was hoping for.
More importantly, and what I found most interesting about this experience, is that these data tell a single story. Actually, not even a story. More like a single torn-out page. I don’t know who the actors are. I don’t know anything about the context. I know that nearly all soybeans in the US have been genetically engineered, but I don’t know why, how, or what that means. In the excitement around open data, we can’t forget these limitations. Numbers are satisfying, but if they are presented as a full story, they can hide more than they reveal. Meaningful analysis requires more than just data.