Everything New is Old Again: On the Curiously Retrograde World of Big Data
by Mike Callaghan, London School of Hygiene and Tropical Medicine
At first glance, the current obsession with big data – and especially its uses in social science – seems very much of the moment: an exciting new technology to be applied to any number of problems. And yet, a closer look suggests that much of the thinking that underpins it is far from the cutting edge – it is in fact decidedly, almost quaintly, old-fashioned.
It is customary to begin discussions about big data with some variation on the statistic that about 2.3 trillion gigabytes of data are produced each day. This fact alone would be meaningless were it not for concomitant increases in computing power that allow us to grapple with this data: this is a potent combination, leading to suggestions that we’re on the cusp on a Kuhnian paradigm shift towards an age of ‘computational social science’ (Chang et al. 2014).
For the moment, though, our grappling with big data mostly involves observation: trying to gather an ever-wider, ever-deeper list of variables to correlate for sales, marketing, or political campaigns. But as with any new technology, tantalizing new horizons beckon.
To date the best-known example of actually experimental big data social science research is a study conducted in 2014 with Facebook on the problem of ‘emotional contagion’ (Kramer et al. 2014). The now-infamous experiment sought to manipulate the mood of users by altering the tone of the content posted to their walls. Researchers discovered that to a small but significant degree, we are indeed more likely to post sad updates when we read sad updates. The results of the study have been overshadowed by the fact that Facebook conducted this experiment on 689, 003 unwitting users, and published the findings in the Proceedings of the National Academy of Sciences, without having sought consent from the users or clearance from any regulatory body.
Rather remarkable ethical issues aside, this example reveals something very important about the field of play for big data in social science as it currently stands: it is a relationship filled to bursting with potential, but built upon a steam-age view of the world. The common thread linking most such projects is an abiding belief in a Newtonian social universe, a clockwork just behind the curtain that will finally reveal itself to us if only we gather enough data.
This may not be surprising: many of the Silicon Valley giants who champion big data are in a very real sense neo-Fordist monopolists, capitalist technocrats for whom the ‘complexities’ of society and the ‘messiness’ of culture are simply problems waiting to be solved (and then, presumably, monetized).
The issue with this unreconstructed pre-modernist weltanschauung isn’t the gathering of this or that data. It’s the entire project: the idea that the problem with the old data was that it wasn’t ‘big’ enough, and that only sufficiently ‘big’ data will allow us to see behind the curtain. Indeed, the issue is the very belief in the land behind the curtain, the idea that we can visit the starry dynamo, and reveal the gears and cogs of human life to behave according to knowable rules that will finally reveal themselves in a grand algorithm once we have, at long last, enough data.
Geertz was not the first but was (as usual) the most eloquent, when he deflated this line of thought by famously arguing: “I take [the analysis of culture] to be therefore not an experimental science in search of law but an interpretative one in search of meaning. It is explication I am after, construing social expressions on their surface enigmatical.”
On some level we may read this as a dilemma about validity versus reliability; about positivism versus interpretivism. If big data draws correlations a mile wide but an inch deep, what does it really tell us about living?
But more deeply, this tension is about the nature of the anthropological itself. Anthropology is in some ways the original big-data project – with the goal of understanding all humanity, past and present through building an ever-expanding history of the description and interpretation of human experience. And though gathering a big storehouse of facts is good thing, the research that truly edifies is about synthesis, translation, and exegesis.
Geertz, again: “It is not against a body of uninterrupted data, radically thinned descriptions, that we must measure the cogency of our explications, but against the power of the scientific imagination to bring us into touch with the lives of strangers.”
Measured this way, ‘big data’ is of course nothing of the sort: far from ‘big,’ it is rather so skinny as to nearly disappear when viewed side-on. Stripped of context, these thin streams of ‘big data’ may indeed have uses, but insofar as they do nothing to bring us into touch with the lives of strangers, the data is not just unhelpful but antithetical to the ethnographic project.
And so anthropology, at least in its ethnographic mode, may be better understood as ‘small data’: a painstaking attention to the tightly-observed, finely-grained detail of quotidian mundanity that is the stuff of all human experience, fundamentally irreducible to a set of quantitative variables.
It is early days for the relationship between big data and anthropology, and to dismiss it would be folly. McLuhan said all media work us over completely, and so it may be soon: vast data sets and staggering processing power may soon give us not just new information but new ways of thinking. But at this point it seems that the technology has clouded the imagination to use it, and this remarkable tool is put to mundane ends.
For now, anthropologists may continue to find that the best data is the smallest.
Mike Callaghan is a Social Scientist at the London School of Hygiene and Tropical Medicine, UK. Learn more at www.lshtm.academia.edu/MikeCallaghan