Last week, three of us from PolicyMap went to the annual APDU Conference in Washington, DC. APDU is the Association of Public Data users, so the APDU Conference is basically our Super Bowl. The conference is full of data providers (lots of Census people), users, and groups like us who are somewhere in the middle. Our own Elizabeth Nash moderated a panel on using public data alongside private sector data.
For all of you who weren’t able to attend, here are a few of the trends we saw discussed at this year’s conference:
Big Data is Here
This trend has been brewing for a while, but this year, it seems like we’ve reached the point where users of public data are looking at Big Data as a legitimate resource. What’s the difference between public data and Big Data? We’ve covered this before, but in short, public data is the sort of data you see on PolicyMap: government-collected datasets like those from the Census and BLS, which are aggregated over a geographic area (like a county). Big Data tends to come from private sources, like Google or Facebook, and often goes down to the individual level (though the public can’t access this data). It’s how Google Flu came along, where Google tracked the prevalence of the flu through user searches. (Unfortunately, Google Flu turned out to be generally inaccurate, which is a common symptom of Big Data.)
But as Big Data becomes more prevalent and available, researchers are finding that it’s an invaluable resource, because unlike public government data, it’s available in real time (how many people are Tweeting about looking for a job right now?). Georgetown University provost Robert M. Groves suggested in his keynote that “datasets” as we know them will soon be a thing of the past. Instead of a file of scientifically collected data, we’ll be turning to Twitter, Google, Facebook, and the like. But, as Google Flu taught us, we’re still finding our way.
Public Data and Private Data Can Work Together
(Since Elizabeth Nash led a panel discussion on this topic, she’s uniquely qualified to write this section)
One of the highlights of the APDU Conference this year was the emphasis on seamlessly incorporating various data sources in research projects, online tools, and advocacy work. I had the opportunity to moderate a panel of my own design on how publicly available data is being used alongside private sector information.
Elizabeth Nash, Keith Wardrip, David Norris, and Nima Nattagh, talking data
Our three speakers came from very different backgrounds and offered fascinating perspectives on their experiences with using proprietary, purchased data to complement free, public data. Keith Wardrip, Community Development Research Manager at the Federal Reserve Bank of Philadelphia, walked us through his research on employer demand for workers and opportunity occupations using online job posting data that the Fed purchased from Burning Glass Technologies, Occupational Employment Statistics from the BLS, Current Population Survey from the Census and BLS, and more. Nima Nattagh, Manager of Analytics at Verisk Analytics explained to us how insurance companies have taken publicly available data and made the data proprietary. And David Norris, of the Kirwan Institute at the Ohio State University, discussed their Opportunity Mapping framework, and how both publicly available data and proprietary data estimates contribute to their indices. He also provided a cautionary tale of mapping proprietary data without checking it against publicly available data, which was quite provocative and thoughtful.
A meaningful Q&A followed, which culminated into a discussion of the benefits and dangers of limiting access and charging fees for proprietary and value-added public data.
Congress Is Threatening Cuts
One thing about public government data is that it comes from, well, the government. And as you may have heard, there’s been a lot of disagreement in Washington on the role of the federal government, and its various agencies. Proposed budgets are slashing funding for agencies like Census, BLS, and others. BLS has already cut some of its data products, like Mass Layoff Statistics, and Census discontinued its released of 3-year ACS estimates (PolicyMap uses the 5-year estimates). It’s considered unlikely that these proposed budgets will make it through both houses of Congress, but the possibility exists.
Also being proposed is an effort to end mandatory responses to the ACS. Right now, if you get an ACS form in the mail, by law, you have to fill out the 28-page survey, which some consider onerous and invasive. However, without this requirement, the quality of the data decreases substantially, and it becomes more expensive to collect. Canada recently made its census voluntary, with disastrous results. Again, this effort is considered unlikely to be signed by the president, but it’s being discussed.