big data and doing big things

Spencer is right: this Wired piece about DARPA’s Nexus 7 initiative is very good. Nexus 7 is an ambitious data processing effort meant to synthesize both traditional signals (e.g. vehicle tracking data) and unorthodox signals (market fruit prices seem to be their favorite example) into useful intelligence through sophisticated analytic techniques taken from the social sciences.

And it’s a pretty good reminder of why I’m wary of the Big Data movement.  These were my two favorite bits:

On the surface, there wasn’t much to it: just a graph of violence in the Jalalabad region, and a plot of those fruit prices. When the level of violence was stable — reliably low, or reliably high — so were those prices. Fruit sellers knew what to expect. But when there were sudden swings in the number of attacks, the prices shot up.

Therefore, the Nexus 7 team said, you could use the fruit as an indirect indicator of instability.

The reaction was less than rapturous.

“Right from the start, I’m like: Oh. My. God,” one of the people who attended a Nexus 7 presentation tells Danger Room. “A high school kid could do that.”

Afterward, Dugan presented the pilot as a triumph — a “big breakthrough” that impressed a bevy of four-star generals.

Privately, she was underwhelmed. Dugan was looking for projects that could save troops’ lives, and maybe even bend the direction of the war. By that standard, fruit-price swings seemed pretty inconsequential.

But the presenters maintained an aura of confidence. Oh, this is just a test. Give us more data sources, they said, and we’ll make better connections. We’ve got the hardware: a cloud computing platform that would soak up all kinds of classified and open source intelligence data. We’ve got the software: these social science PhDs and counterinsurgency veterans, who can figure out how to apply that data to rebuild Afghanistan.

and:

“One assumed there was some secret mound of data to be exploited. But it’s just not true.”

I’ve fallen prey to this temptation: thinking that your mastery of awesome tools means you’re about to do some awesome stuff (perhaps via some cleverly counterintuitive Freakonomic insight). Unfortunately, it’s not that easy. You actually need to have a great idea before great things will happen, and it’s difficult to come up with great ideas unless you both know and care — deeply — about the topic you’re planning to examine.

It’s important to acknowledge that the story of Nexus 7 seems to be told, to some extent, from the perspective of people in the military establishment who feel insulted or threatened by the project.  But that in itself is telling: it’s never a good idea to enter a field of inquiry with the assumption that those who preceded you were well-meaning simpletons — particularly when your reasons for thinking so boil down to a difference in the complexity of your tools.

I think this same story is about to unfold in the tech industry, albeit with a more cheerful tone.  Consider this recent post from Read Write Web about the explosion in job listings mentioning the phrase “data scientist”:

“Right now, everybody with data knows that there’s value in there, that they should be doing something,” says Edd Dumbill, program chair for Strata, O’Reilly’s new conference on Data. “Trouble is, nobody’s entirely clear on the next steps, but they do know that a data scientist can help frame questions and transform data into useful insight.”

They don’t “know” this. They’re assuming it.  And this leaves me worried, because the ability to draw meaning from mountains of information is almost always going to depend on the specific question being examined more so than the tools being used or the investigator’s level of enthusiasm for the idea of quantitative analysis.

It’s not that I don’t believe in the techniques and tools that have these folks so excited.  It’s not even that I think nothing will come of data-rich firms applying quantitative analytic techniques. These things have got me excited, too!  I’m trying to make sure we take advantage of the same kinds of tools at work.  Still, there’s no substitute for good ideas.

To me, this wave of hype doesn’t seem much different from the one that occurred at the start of the last decade.  “Look at the power of webservers and online payment processing!” we exclaimed. “Can you imagine the benefits they’ll yield when applied to the problem of selling pet food?”

Those things are powerful. But that’s beside the point.