What is ‘big data’, and what does it mean in the context of the ‘refugee crisis’? The rhetoric of volume constantly seeps into contemporary discourse about both migration and digital data alike. While the data deluge comes in streams, flows, and pools, leaking here and there, the influx of migrants seems to be continuously pouring in and flooding in waves and swells. This can make for an unhelpful, de-humanizing lexicon in discourse about migration, just as it can spur hype about emerging data practices. The volume of digital data has often been theorized as one of its characteristic ‘3 Vs’. Big data also comes in a great variety of forms on a plethora of subjects, seamlessly gathered worldwide by digital devices and software, and it is generated with constant, rattling velocity. But how can big data analytics help map the trends in ongoing migration, describe the sentiments in the countries involved, or predict areas of conflict?
Data gaps
Statistics have long been crucial to planning and resourcing aid, development, and policy strategies. Sample surveys, administrative sources (e.g. population registers), and national population censuses are currently the vital data sets for understanding migration patterns and contexts, but they each present methodological challenges and partial, temporal, contingent findings. What’s more, organisations find that many developing nations cannot resource the collection of statistics, and data that is collected is not usually or easily shared between the necessary agencies.
Development organisations have also raised concerns about the gaps in and lack of useful data. For example, there is currently no accurate data on the numbers of young refugees living in urban settings, which are estimated to be large. Refugee youth groups often live anonymously and are particularly vulnerable to exploitation. A recent UNHCR conference examined the possibilities of using data from young refugees’ handheld devices such as smartphones and tablets in order to try and understand the changing dynamics and movement of these groups. This and other humanitarian projects have begun to tap into the potential for collecting, analysing, synthesising, cleaning, shaping, and sharing big data. That is, large sets of automatically collected information that be traced from digital devices and software.
Using ‘big data’ in the migration aid context
There are many ways of doing ‘big data’, and, within the context of migration aid, the United Nations is the key actor. Their website, UNITE, initiates collaboration among the global data science community. It is the UNHCR’s ProGres registration database that records, verifies, and updates information on displaced peoples, increasingly via biometric identification systems. UNHCR’s ‘Winter Operations Cell’ is a futuristic initiative that uses weather data to make predictive analyses about who and how many are likely to move, or need help, and where they may go. One inter-organisational report suggests that ‘big data can be used to identify patterns and signatures associated with conflict — and those associated with peace — presenting huge opportunities for better-informed efforts to prevent violence and conflict’. Indeed, the International Organisation for Migration has developed the Displacement Tracking Matrix (DTM), a digital system for tracking and sharing information on displaced communities and related emergency situations. For example, migration patterns can be inferred via locatable mobile phone call records, IP addresses, or geotagged social media activity.
The dangers of using ‘big data’
While these data practices might facilitate the better-coordinated mapping of conflict or adversity and the delivery of aid to people in dire circumstances, we must remember that they also enable strategies of surveillance, control, and population management. It is crucial that these digital projects and processes respect civil liberties and security and privacy rights. With expanding governmental powers to access data and ongoing debates on supposed assurances (e.g. from the NSA, GCHQ) that only ‘metadata’ is collected or ‘trends’ detected, it must be ensured that refugee data is secure and protected. Furthermore, such data practices and related statistical modelling techniques involve the quantification, classification, and construction of individuals and populations, and categories are never impartial or objective but embedded in socio-political contexts. It is the researcher’s ethical role to keep these crucial points in mind when deciding what data to use, how to get it, treat it, store it, and share it.
The dangers in managing, using, and manipulating data are real and concrete. Refugees are vulnerable to discrimination, violence, and persecution if their personal data falls into the wrong hands. The malevolent online practices of state and non-state actors can reach refugees extraterritorially, or back in their country of origin. The Harvard Humanitarian Initiative also show that, in host countries, ‘data leaks may increase the risk that refugees will be targets for discrimination and harassment’ due to ‘racist, ethnic, and economic tensions that position them as an unwelcome monetary and administrative burden on the host state.’ In the UK, the manipulation of data and statistics has played a major role in bolstering anti-immigration narratives and xenophobic political agendas.
Data protection policies and practices are emerging in this field (e.g. the UNHCR data protection policy), but it is an ongoing challenge to ensure that standards are maintained among all organisations and actors involved in handling migration data worldwide. ‘Big data’ findings are necessarily patchy as not everyone uses smartphones or the Internet. My next article will explore the interplay between qualitative and quantitative methodologies in gathering data about and for refugees. I will point to the crucial role of anthropological, on-the-ground research in providing human experiences as well as news and information about refugee contexts. We need big, digital, numerical data and local, ‘thick’, descriptive data, but why and how?
Header photo: kris krüg (CC BY-NC-ND 2.0).