latest from the magazine
latest journal issue
“The benefits to society will be myriad, as big data becomes part of the solution to pressing global problems like addressing climate change, eradicating disease, and fostering good governance and economic development.” (Mayer-Schoenberger and Cukier, 2013: 17)
“A statistical model of society that ignores issues of class, that takes patterns of influence as givens rather than as historical contingencies, will tend to perpetuate existing social structures and dynamics. It will encourage us to optimize the status quo rather than challenge it.” (Carr, 2014)
As the public discourse around data turns from hubristic claims to existing, empirical results, it’s become nearly as easy to bash ‘big data’ as to hype it (Carr, 2014; Marcus and Davis, 2014; Harford, 2014; Podesta, 2014). Geographers are intimately involved with this recent rise of data. Most digital information now contains some spatial component (Hahmann and Burghardt, 2013) and geographers are contributing tools (Haklay and Weber, 2008), maps (Zook and Poorthius, 2014), and methods (Tsou et al. 2014) to the rising tide of quantification. Critiques of ‘big data’ thus far offer keen insight and acerbic wit, but remain piecemeal and disconnected. ‘Big data’s’ successes or failures as a tool are judged (K.N.C. 2014), or it is examined from a specific perspective, such as its role in surveillance (Crampton et al. 2014). Recently, voices in critical geography have raised the call for a systemic approach to data criticisms, a critical data studies (Dalton and Thatcher, 2014; Graham, 2014; Kitchin, 2014). This post presents seven key provocations we see as drivers of a comprehensive critique of the new regimes of data, ‘big’ or not. We focus on why a critical approach is needed, what it may offer, and some idea of what it could look like.
- Situating ‘big data’ in time and space
Data has always been big. Such a phrase borders on the trite, but it is important to recognize the epiphenomenal nature of the term ‘big data.’ It is specific to a moment in time whose dominance seems already to be shifting. This does not mean that ‘big data’ is going away. Much as the term “e-commerce” disappeared from our conscious use as online shopping became a normal practice (Leyshon et al. 2005), ‘big data’ is simply receding into the banality of the every-day. This enables and constrains sets of social processes, but does so from an unconsidered position that rises to our attention only when it fails (Harman, 2010). In doing so, ‘big data’ appears inevitable, naturalizing its consequences and foreclosing alternative possibilities. To understand ‘big data’ and whatever comes next, we must resist this urge to let it stand apart from history and pass silently into our everyday lives.
‘Big data’ has big precursors, earlier knowledges that set the stage and helped define the nature and needs that present-day ‘big data’ realizes. The epistemologies of Nineteenth Century statistical mapping (Schulten, 2012), social physics and geography’s quantitative revolution (Barnes and Wilson, Forthcoming), the development of geodemographic targeted marketing (Goss, 1994), and the boom-bust cycle of the information technology industry all laid the conditions that realized ‘big data.’ Today, as ‘big data’ is enrolled in social processes, it also facilitates power geometries between companies – such as Google, Acxiom, and Foursquare – agencies – such as the NSA – and consumer citizens. We must ask: Whose data? On what terms? To what ends? Attempts to set aside or ignore ‘big data’s’ ancestry and effects serve to hype it, but not better understand it. Situating ‘big data’ knowledges help us understand both what is happening and why.
- Technology is never as neutral as it appears
As the pushback against ‘big data’ begins, its excremental qualities (Pearce, 2013) focus around its limitations: the reality of what technology can do versus grandiose claims and hype. In these critiques, ‘big data’ is a tool and its failures are found in its inability to perform its supposed function – to model and predict reality along certain positivist lines (Harford, 2014). By doing so, these critiques fall within the same epistemological frame of ‘big data’ itself.
‘Big data,’ as a technology, is never a neutral tool. It always shapes and is shaped by a contested cultural landscape in both creation and interpretation. Whether in critique or celebration, an instrumental examination of ‘big data’ will necessarily miss its underlying epistemological effects. The myths of ‘big data’ are myths that suffuse modern society, seeping into ideas of the quantified self and smart cities. As the fullness of human experience in the world is reduced to a sequence of bytes, we should not limit our concern to how much better those bytes function vis-à-vis their counterparts. Rather, we must ask what it means to be quantified in such a manner, what possible experiences have been opened and which have been closed off? How is ‘big data’ as a form of technology enabling and constraining our culture and our lives?
Citing Tony Benn, a British Labour party politician, Mark Graham recently suggested we ask of ‘big data’ “What power have you got? Where did you get it from? In whose interests do you exercise it? To whom are you accountable? And how can we get rid of you?” (Graham, 2014). Just as the so-called “science wars” taught us to question the processes by which austere scientific knowledge is produced, we must also question ‘big data.’ Quantified digital information, whether called ‘big data’ or not, is here to stay. As with all successful technologies, it recedes from our attention as it saturates and structures our everyday lives (Feenberg, 1999; Kitchin and Dodge, 2011). We must critically ask who it speaks for and why before it disappears from consideration. To do so, we “follow the [data] scientists” (Latour, 1988).
- ‘Big data’ does not determine social forms: confronting hard technological determinism
Technological change and society have an intricate, recursive relationship. ‘Big data’ and its concept of data has a role in today’s social changes, but it is more complex than simple consequences of large, fast, individualized data analytics or attempts to model society. The innovation, production, and popular use of a technology occurs within and reflects a social context shot through with power, economies, identities, and biases. Even as technology and buzzwords change rapidly, the wider societal processes that shape technology and give it purpose show only gradual change. The popularization of ‘big data,’ the hype around it, and the backlash against it owe much to the pre-existing needs of ever-growing capital accumulation and crises of legitimacy among public agencies.
A technology does not act alone, out of context, determining the form of society. It plays an ensemble role in social changes as it is utilized for one social purpose or another, facilitating material changes in the structure of society and peoples’ everyday lives and deaths. As something made by and for people, a new technology is designed to fulfill social imperatives, such as accumulating capital. In practice, technology can be deployed by many different kinds of people, opening new possibilities (Haraway, 1991) and networks (Terranova, 2004).
A technology designed by one group of stakeholders for a particular purpose may be adopted by different stakeholders and used against its original intended function. In some cases, stakeholders may even reject a technology or pass it by in favor of something else. These political projects and resistances enable and constrain the social and material possibilities down the line (Feenberg, 1999; 2002). Some consumer subjects already attempt to resist aspects of ‘big data’ using pseudonyms, private web browsing, ad/script blocking, location spoofing, web proxies or VPN services, and turning off location services on their mobile devices. ‘Big data’s’ incomplete, contested nature marks it as much the product of society as society’s producer.
- Data is never raw
‘Big data’ is the result of a specific technological imaginary that rests on a mythological belief in the value of quantification and faith in its ability to model reality (Boyd and Crawford, 2012). In this imaginary, life can be fully captured, quantified, and modeled as theory takes a backseat to ‘raw’ number crunching. However, in both its production and interpretation, all data – ‘big’ included – is always the result of contingent and contested social practices that afford and obfuscate specific understandings of the world. The data of ‘big data’ can take many forms for many purposes: from the massive streams generated by the Large Hadron Collider to the global corpus of tweets. In each case, the data’s format and content have been shaped and created for a purpose. Each data model structures and encodes information in one way or another according to the visions of the team of data engineers, scientists, and developers that created it. Furthermore, what is captured is determined by the goals of the project and the analytical model created to instantiate those goals. Fields are defined, accuracies of measurements determined, and other technically necessary steps are taken to create the infrastructure of ‘big data.’ What is quantified, stored, and sorted? What is discarded? All datasets are necessarily limited representations of the world that must be imagined as such to produce the meaning they purport to show (Gitelman, 2013).
Social context is fundamental in both the production and interpretation of meaning. A young boy rapidly contracting his eyelid may be winking, attempting to remove a dust mote, or something else entirely (Geertz, 1973). Ever-present cultural regimes of interpretation structure the analysis of all data, ‘big’ or small (Boellstorff, 2013). Three different “likes” on a Facebook status may reflect three disparate emotional responses: from intense agreement to sardonic recognition to sympathetic pity. However, when it is analyzed simply as a “like” (or an eyelid contraction), the thickness of the data and its variety of meanings is lost. In practice, data are not simple evidence of phenomena, they are phenomena in and of themselves (Wilson, 2014) ‘Big data’ is never “raw.” It has always been “baked” through both its construction and its resulting interpretation (Gitelman, 2013). If we are to understand ‘big data,’ and specifically ‘big data’ derived from social media, we must engage directly with the cultural regimes of production and interpretation to restore the thick, rich fullness of description that reveals subjects’ understandings and intent.
- Big isn’t everything
Chris Anderson’s (2008) claim that ‘big data’ meant the “end of theory,” where numbers speak for themselves, has become a shibboleth among the ‘big data’ savvy. Even for data science evangelists like Nate Silver, counterpointing Anderson’s hubristic framing of ‘big data’ serves as a useful way to pivot towards acknowledging the continuing importance of models and theory as “[n]umbers have no way of speaking for themselves” (Silver, quoted in Marcus and Davis, 2013). As the backlash against ‘big data’ increasingly stresses the importance of domain knowledge, the ability to build sound models from theoretical insights continues to carry weight in practice.
Even with models and theory, ‘Big data’ analytics cannot answer every research questions, and therefore cannot supplant other, more established qualitative and quantitative research methods. Some propose that researchers can understand the “human dynamics” of a landscape by analyzing ‘big data’ sets derived from websites, social media and mobile devices (Tsou et al. 2014). “The new [Human Dynamics in the Mobile Age] research agenda may facilitate the transform[ation] of human geography study from qualitative analysis to computational modeling, simulation, and prediction applications using both quantitative and qualitative methods” (Tsou, 2014). Location-tracking someone’s phone or tweets may give some trace accounting of their affinity for a place or spatial process, leading to valuable contributions to geographic knowledge (Humphreys et al. 2014). Nevertheless, such a ‘big data’ approach can never provide the depth and detail that comes with qualitatively learning about and understanding someone’s standpoint by actually asking them about a place and their personal feelings and motivations, much less experiencing that place and context for yourself with fieldwork. Purely ‘big data’ approaches falter with issues of interpretation and context precisely because data is never raw.
A more common charge levelled against ‘big data’ is that it typically identifies mere correlations in datasets. Further, such large, diverse datasets may be biased. The difference between correlation and causation as well as the care that goes into identifying worthwhile datasets continue to hold validity in an era of ‘big data’ (Harford, 2014). Likewise, perennial questions of credibility and quality control in geographic data are no less an issue for ‘big data’ (Goodchild, 2013). Proponents of ‘big data’ urge us not to rush to judgment as ‘big data’ analytics continue to develop and may include more robust analyses in the future (Hidalgo, 2014).
Like older quantitative methods that often rely on correlation, such as linear regression, ‘big data’ analytics are better suited to quantitative questions of what, where, and when than to questions ofhow,and why. Analysis of twitter data can map where and when tweets were tweeted and retweeted about a riot following an NCAA basketball championship game, but it cannot answer why individuals chose to tweet or not. In fact, those who did not tweet (or do not ever tweet) remain entirely invisible to the data set. This is neither an unknown (Crampton et al. 2013) nor a paralyzing problem. By comparison, GIS-based quantitative spatial analysis has done profound work with what is a quite limited set of concepts and tools (Pavlovskaya, 2006). More importantly, studies involving GIS also expanded into new and significant areas when they began to include participatory and qualitative approaches (Cope and Elwood, 2009; Craig et al. 2002; Sieber, 2006). Geography is a discipline rich with mixed method approaches, many the result of the joining of empirical and theoretical work made possible when researchers “step[ped] outside of their comfort zones” (Wright et al.1997).
We believe ‘big data’ research can be similarly improved by working with, rather than denying the importance of, “small data” (Kitchin and Lauriault, 2014; Thatcher and Burns, 2013) and other existing approaches to research. Employing this combined approach requires an awareness among the researchers of the forms of knowledge being produced and their own role in that process. Furthermore, doing critical work with ‘big data’ involves understanding not only data’s formal characteristics, but also the social context of the research amidst shifting technologies and broad social processes. Done right, ‘big’ and small data utilized in concert opens new possibilities: topics, methods, concepts, and meanings for what can be understood and done through research.
- Counter-Data
What is to be done with ‘big data?’ Data’s role in targeted marketing and the surveillance state are clear, but what other purposes could it serve? The history, discourses, and methods of counter-mapping suggest one opening for critical engagement using ‘big data.’ Maps have long been a geographic knowledge of imperialism and massive capital accumulation, a means to facilitative exploitative material relationships and proposition our consent to those relationships (Crampton, 2010; Wood, 2010). Much like ‘big data,’ if maps are judged by these standards alone, hope for critically-informed use appears dim. However, another aspect of mapping is a beautiful diversity of cartographic knowledges that differ from and even run counter to cartography’s traditional purposes. Harris and Hazen describe how counter-mapping “challenge[s] predominant power effects of mapping” and “engages in mapping that upset[s] power relations” (2005). Counter-mapping works from the bottom-up within a given situation and includes mapping for indigenous rights (Peluso, 1995), autonomous social movements (Holmes, 2003; Dalton and Mason-Deese, 2012) and art maps (Wood, 2010; Mogel and Bhagat, 2007). In such cases, researchers must be self-conscious of their own positionality and the consequences of knowledge production. Recent work on indigenous mapping makes clear the limits of counter-mapping (Wainwright and Bryan, 2009). Nevertheless, eschewing ‘big data’ entirely for its ties to surveillance, capital, and other exploitative power geometries forecloses the possibility of liberatory, revolutionary purposes. We must ask what counter-data actions are possible? What counter-data actions are already happening?
- What can Geographers do? What is our praxis?
Approaching ‘big data’ critically constitutes an opportunity for geographers. Corporations and government agencies include basic spatial criteria into their ‘big data’ analytics and geographers are already utilizing ‘big data’ in their research, though predominately in the form of data fumes (Thatcher, 2014). By situating ‘big data’ technologies and data in contexts and thereby assessing its contingent, non-determinative role and impacts in society, critical data studies offer a less-hyped but more reasoned conceptualization of ‘big data.’ From this critical standpoint, ‘big data’ and older ‘small data’ approaches may be utilized together for better research. Crucially, the critical standpoint also opens possibilities, new questions and topics previously invisible in ‘big data’ practice. Given this situation, we suggest geography sits at a unique position to help develop a fully critical data studies for three reasons:
First, geographers have decades of experience in analyzing data in terms of space. With the majority of digital information containing a spatial component (Hahmann and Burghardt, 2013), geographic analytical concepts, methods, and models are directly relevant in producing an understanding that data. Furthermore, geographers have also developed critical approaches to spatial analysis, such as Bunge’s geographical expeditions (Bunge, 2011; Merrifield, 1995), critical GIS, qualitative GIS, and the above-mentioned counter-mapping. Finally, GIS and cartographic design have prepared geographers for the problems of processing and visualizing complex spatial data for diverse audiences.
Second, geographers emphasize not only space, but place. In a world of quantified individualization, understanding the contextual value of place is significant and powerful. Relying solely on ‘big data’ methods can obscure concepts of place and place-making because places are necessarily situated and partial. Understanding the “making and maintenance of place” remains a central task for geographers (Tuan, 1991: 684), as do the power geometries of places and spaces (Massey, 1993). Drawing from the traditions of spatial theorists like Tuan, Massey, and Cresswell, geographers are uniquely suited to heed recent calls for more relational understandings of space and place in ‘big data’ (Crampton et al. 2013).
Third, geography has long been a field that accommodates a broad range of approaches and mixed methods research. For example, studying the connection between natural and social processes is core to the discipline (Yeager and Steiger, 2013). Debates over the nomothetic or idiographic production of knowledge, perhaps most famously found in the Hartshorne-Shaefer debates of the 1950s, have given way to a multitude of methodologies, many informed by both qualitative and quantitative approaches. Critical data studies must build on these hard-learned lessons of theory and praxis. ‘Big data’s’ multidisciplinary nature provides geographers fertile ground upon which to learn from and contribute to other fields like the Digital Humanities and Critical Information Studies (Vaidhyanathan, 2006).
Geography and geographers have much to offer and much to gain from critical data studies, but it is essential to seize the moment before it passes. Much like the advent of Critical Geographic Information Systems, we must engage in the “hard work of theory” (Pickles, 1997). As the term ‘big data’ normalizes itself in discourse, it recedes from conscious consideration. Now, while ‘big data’ is still a contested concept in public and academic debates, we must question and challenge its role in an emerging hegemonic order of societal calculation. In this pursuit, we conclude with five questions for critical data studies, some already partially taken up, but all requiring further study:
- What historical conditions lead to the realization of ‘big data’ such as it is? (Barnes and Wilson, forthcoming; Dalton, 2013)
- Who controls ‘big data,’ its production and its analysis? What motives and imperatives drive their work? (Thatcher, 2014)
- Who are the subjects of ‘big data’ and what knowledges are they producing? (Haklay, 2012)
- How is ‘big data’ actually applied in the production of spaces, places and landscapes? (Kitchin and Dodge, 2011)
- What is to be done with ‘big data’ and what other kinds of knowledges could it help produce? (Shah, 2014)