Algorithmic Security Vision: Diagrams of Computer Vision Politics

Ruben van de Ven, Ildikó Zonga Plájás, Cyan Bae, Francesco Ragazzi

December 2023

Abstract

More images than ever are being processed by machine learning algorithms for security purposes. Yet what technical and political transformations do these sociotechnical developments create? This paper charts the development of a novel set of practices which we term "algorithmic security vision" using a method of diagramming-interviews. Based on descriptions by activists, computer scientists and security professionals, this article marks three shifts in security politics: the emergence of synthetic data; the increased importance of movement, creating a cinematic vision; and the centrality of error in the design and functioning of these systems. The article then examines two tensions resulting from these shifts: a fragmentation of accountability through the use of institutionalized benchmarks, and a displacement of responsibility through the reconfiguration of the human-in-the-loop. The study of algorithmic security vision thus engenders a rhizome of interrelated configurations. As a diagram of research, algorithmic security vision invites security studies to go beyond a singular understanding of algorithmic politics, and think instead in terms of trajectories and pathways through situated algorithmic practices.

Introduction

In cities and at borders around the world, algorithms process streams of images produced by surveillance cameras. For decades, computer vision has been used to analyze security imagery using arithmetic to, for example, send an alert when movement is detected in the frame, or when a perimeter is breached. The increases in computing power and advances in (deep) machine learning have reshaped the capabilities of such security devices. These devices no longer simply quantify vast amounts of image sensor data but qualify it to produce interpretations in previously inconceivable ways. Pilot projects and off-the-shelf products are intended to distinguish individuals in a crowd, extract information from hours of video footage, gauge emotional states, identify potential weapons, discern normal from anomalous behavior, and predict intentions that may pose a security threat. Security practices are substantially reconfigured through the use of machine learning-based computer vision, or "algorithmic security vision."

Algorithmic security vision represents a convergence of security practices and what Rebecca Uliasz calls algorithmic vision: the processing of images using machine learning techniques to produce a kind of "vision" that does not make sense of the “visual” but that makes realities actionable (Uliasz, 2020). It does not promise to eradicate human sense making, but rather allows a reconsideration of how human and nonhuman perception is interwoven with sociotechnical routines. Algorithmic security vision thus draws together actors, institutions, technologies, infrastructures, legislations, and sociotechnical imaginaries (see Bucher, 2018: 3). Yet how does algorithmic security vision work — how does it draw together these entities— and what are the social and political implications of its use? In this article we explain how “algorithmic vision” and “security” can map out sociotechnical practices and explore how their coming together reframes what it means to see and suspect. We are not concerned with the technical features of the systems, but with the societal and political projects that are embedded in technical choices made in their construction.

We ground this article in Lucy Suchman’s notions of figuration and configuration as both a conceptual frame and a method of analysis which offer insight into the interplay of technology, imaginaries, and politics. Configuration allows access to key assumptions about the boundaries that are negotiated in the practices of algorithmic security vision, and how entities solidify and stabilize as they circulate. The realities that are made possible by algorithms are performed and perpetuated in the design and description of these systems (Suchman, 2006: 239; see also Barad, 2007: 91).

To grasp the specificities of algorithmic security vision, we turn to the professionals who work with those technologies. How do people working with algorithmic security vision make sense of, or figure, their practices? An important dimension of this paper is therefore methodological. Suchman, drawing on Haraway, mobilizes the trope of the figure to examine the construction and circulation of concepts: "to figure is to assign shape, designate what is to be made noticeable and consequential, to be taken as identifying” (Suchman, 2012: 49). To expand on traditional textual analysis of such figurations we introduce time-based diagramming where we combine qualitative interviews with drawing. With these diagrams that record both voice and the temporal unfolding of the drawing, the figurations appear in spatial and temporal dimensions.

We begin by situating our research in the debates on sensors, algorithms and power, and outlining our theoretical and methodological approach. Then, drawing on the time-based diagrams, we discuss three figurations that challenge us to rethink our understandings of algorithmic vision in security: algorithmic vision as synthetically trained and cinematic, and the error as an inherent feature of algorithmic vision. In a second step, we outline the fragmentation of accountability through the use of benchmarks, and the reconfigurations of the human-in-the-loop.

Sensors, Algorithms, Power

The critical reflection on the politics of algorithmic security systems is not novel in Geography or in interdisciplinary debates, along with Science and Technology Studies, Critical Security Studies or Media Studies (Fourcade and Gordon, 2020; Graham, 1998; Mahony, 2021; Schurr et al., 2023). Yet aside from a few exceptions (Andersen, 2018; Bellanova et al., 2021), the politics specific to computer vision in the security field have been overlooked.

Computer vision is a term used to designate a multiplicity of algorithms that can process still or moving images, producing information upon which human or automated systems can make decisions. It is meant to replicate certain aspects of human cognition. Algorithms can be used to segment parts of an image, detect and recognize objects or faces, track people or objects, estimate motion in a video, or reconstruct 3D models based on multiple photo or video perspectives (Dawson-Howe, 2014).

Some scholars working on algorithmic security have addressed the role of "operative images," which are "images that do not represent and object, but rather are part of an operation" (Farocki, 2004). Authors have shown how algorithms organize the regimes of visibility in platforms such as YouTube and Facebook (Andersen, 2015), in war and especially in military drone strikes (Bousquet, 2018; Suchman, 2020; Wilcox, 2017). Others have focused on machine-mediated vision at the European border by analyzing the functioning of EUROSUR (Dijstelbloem et al., 2017; Tazzioli, 2018;) and SIVE (Fisher, 2018).

These studies have contributed to theoretical debates around novel practices algorithmic power (Bucher, 2018), surveillance capitalism (Srnicek and De Sutter, 2017; Zuboff, 2019) and platform politics (Carraro, 2021; Gillespie, 2018). Some works have described the social and political effects of surveillance and social sorting (Gandy, 2021; Lyon, 2003), as well as the reinforcement of control and marginalization of post-colonial, gendered and racialized communities (Fraser, 2019; Thatcher et al., 2016), defined by Graham as "software-sorted Geographies" (Graham, 2005).

These debates have highlighted the entanglement of these technologies with risk assessment and pre-emptive security logics (Amoore, 2014; Aradau and Blanke, 2018). Critical work has started catching up with machine learning as an algorithmic technique (Amoore, 2021; Mackenzie, 2017), marking a shift from the management of "populations" to "clusters," the acceleration of knowledge feedback loops (Isin and Ruppert, 2020), foregrounding the normalization of behavior through the regulation of the "normal" and the "anomaly" (Aradau and Blanke, 2018). Or, in Pasquinelli’s words, how algorithms "normalize the abnormal in a mathematical way" (Pasquinelli, 2015: 8 emphasis in original).

Yet what characterizes the state of the literature is a segmentation between work on the politics of the “sensor,” and those on the political specificities of deep learning models.

On the one hand, using the notion of “sensor society,” Mark Andrejevic and Mark Burdon (2015) have noted the prevalence of embedded and distributed sensors. They have noted a shift from targeted, purposeful, and discrete forms of information collection to always-on, ubiquitous, opportunistic ever-expanding forms of data capture. Andrejevic and Burdon insist that the sensors are only part of the story; infrastructures are also critical: “It is […] the potential of the automated processing of sensor-derived data that underwrites the productive promise of data analytics in the sensor society: that the machines can keep up with the huge volumes of information captured by a distributed array of sensing devices” (Andrejevic and Burdon, 2015: 27). Yet their focus is more on the sensors than on the underlying algorithmic infrastructures.

In their work on “sensory power,” Engin Isin and Evelyn Ruppert have recently analyzed the effect of recent developments in technological software and infrastructure. Unlike the three traditional forms of power identified by Foucault (sovereign, disciplinary, and regulatory) they argue that sensory power operates through apps, devices, and platforms to collect and analyze data about individuals' bodies, behaviors, and environments. For Isin and Ruppert, the central notion of sensory power is the cluster. Clusters do not merely constitute "new" representations of "old" populations, but rather “intermediary objects of government between bodies and populations that a new form of power enacts and governs through sensory assemblages” (Isin and Ruppert, 2020: 7). Despite their contribution to, thinking about sorting techniques and their relations to new forms of power Isin and Ruppert, like Andrejevic and Burdon, bracket the specificities of the underlying deep learning models.

A growing body of literature has explored the politics of machine learning techniques. In her latest work on the “deep border,” Louise Amoore revisits her 2006 essay on the “biometric border.” Her focus is on “deep machine learning,” and the “capacity to abstract and to represent the relationships in high-dimensional data” such as in image recognition (Amoore, 2021: 6). She shows that the change in border technologies, from simple IF-THEN algorithmics with pre-determined variables, to complex, deep, “neural networks” characterized by the indeterminacy of variables marked a profound change in the logic, and thus the political effects of these technologies. Like Isin and Ruppert she is interested in the notion of the “cluster,” which, “with its attendant logic of iterative partitioning and rebordering, loosens the state’s application of categories and criteria in borders and immigration” (Amoore, 2021: 6). Yet her approach overlooks the importance of sensorial data posited by Andrejevic and Burdon, and Isin and Ruppert.

In sum, we still have only a rudimentary understanding of the politics of algorithmic security vision. So, how does one think politically about the new relations among sensors, algorithmic vision, and politics? We propose a methodology for exploratory research that can help outline a research agenda.

Methodology

Configuration as a methodological device

Recent scholarship on technology and security has emphasized the importance of algorithmic systems as enacted through relations between human and nonhuman actors (Aradau and Blanke, 2015; Bellanova et al., 2021; Hoijtink and Leese, 2019; Suchman, 2006). Sociotechnical systems act in "co-production" (Jasanoff, 2004), as "actants" in a network (Latour, 2005), or in "intra-action" (Barad, 2007). In these understandings, technology forms an ontological assemblage, in which human agency is tied in with the sociomaterial arrangements of which it is part. Humans and non-humans, technological objects and infrastructures, all populate complex, sometimes messy networks where the boundaries between entities are enacted in situated practices (Haraway, 1988). This conception of technology "draws attention to the fact that these relations are not a given but that they are constructed — and thereby relates them back to cultural imaginaries of what technology should look like and how it should be positioned vis-à-vis humans and society" (Leese, 2019: 45)

In this context, how can we understand the characteristics and effects of security systems built on the analysis of sensor data through “deep learning,” and the new security politics that they introduce? On the technical level, the novelty of “algorithmic security vision” does not lie in the sensors themselves, but in the new abilities of “artificial intelligence software” (McCosker and Wilken, 2020). The promise of the systems is that the multiplication of the sensors and modalities of knowing, the ability to create information feeds that are under the scrutiny of automated systems means that data collection and data analysis are no longer separated; surveillance can happen in real time, capturing life as it unfolds, so that the operators can act on hotspots, clusters, or the moods and emotions of a crowd (Andrejevic and Burdon, 2015; Isin and Ruppert, 2020).

To make sense of such developments, Suchman’s concept of configuration is a useful methodological “toolkit.” It helps “delineating the composition and bounds of an object of analysis” (Suchman, 2012: 48) and allows us to conceptualize algorithmic security vision as heterogeneous assemblages of human and nonhuman elements whose agency is "an effect of practices that are multiply distributed and contingently enacted" (Suchman, 2006: 267). We are interested here in a framework that underscores "how the entities that come into relation are not given in advance, but rather emerge through the encounter with one another" (van de Ven and Plájás, 2022: 52).

Suchman also draws our attention to the “ways in which technologies materialize cultural imaginaries, just as imaginaries narrate the significance of technical artefacts” (2012: 48). For Suchman, “configuration” is a tool for “studying technologies with particular attention to the imaginaries and materialities that they join together” (2012: 48). The configuration of humans and machines is constructed through discourse and practice, which, drawing on Haraway, she conceptualizes as “figurations.” Sociotechnical systems thus do not exist without their intended uses and users. Such discourses are an important part of individual experience, collective professional practices, and narratives about technology. Technologies bring together elements from various registers into stable material-semiotic arrangements. Those configurations draw attention to the political effects of everyday practices and how they institute bounded entities and their relations. If we take Suchman’s suggestion that algorithmic security vision is complex and multiple, how can we get to "know" it as an object of research, while acknowledging its partiality? When taking the coming together of algorithms, vision and (in)security as configuring imaginaries and practices in heterogeneous and complex networks, how can we explore their politics?

Time-based Diagramming

Suchman defines figuration as “action that holds the material and the semiotic together in ways that become naturalized over time, and in turn requires ‘unpacking’ to recover its constituent elements” (2012: 49). The first step in her methodology therefore requires us to “reanimate the figure at the heart of a given configuration, in order to recover the practices through which it comes into being and sustains its effects.”

In her work, Suchman has used a variety of methods of inquiry to “reanimate the figure.” Qualitative interviews and ethnography have been instrumental in producing the raw material for the analysis. In this paper, we expand the methodological toolkit envisaged by Suchman to multimodal methods that go beyond text to capture the materiality of imaginaries and practices. We explore the epistemic possibilities of capturing figurations as both semiotic and material traces.

The result of our theoretical and methodological quest is a tool that allows us to produce “time-based diagramming.” We use this method for both elicitation and multimodal data collection. We presented our participants with a large digital tablet, and asked them to draw a diagram while answering our questions. Ruben van de Ven programmed an interface that could play back the recorded conversation in drawing and audio. The participants could not delete or change their drawings, so their hesitations and corrections remained. The ad hoc figuring out of the participants’ descriptions thus remains part of the recording.1 In the phase of data analysis, the software allows the diagrams to be annotated, creating short clips. The diagrams thus enable a practice of combination and composition (O’Sullivan, 2016), providing for a material-semiotic support to analyze various imaginaries of algorithmic security vision.

Diagramming is a key method in the field of technology, most notably in the conceptualization and design of computational practices (Mackenzie, 2017; Soon and Cox, 2021: 221). We don’t assume that the materiality of the drawings brings us any closer to the materiality of the actors’ practices, which are of a different order. Our interest is in the possibilities offered by the diagrams: they are composed of elements that are not necessarily similar, but are connected by their mere appearance on the same plane, thus allowing heterogeneous elements to co-exist. Diagrams are composed of parts that can be separated and recombined in different ways, creating new formations and expressions (O’Sullivan, 2016). Using such a multimodal tool seemed a pertinent methodological setup to capture figurations and configurations (see van de Ven and Plájás, 2022).

We interviewed twelve professionals who developed, deployed or contested computer vision technologies in the field of (in)security.2 We asked them to describe the coming together of computer, vision and (in)security from their professional vantage points.3 In what follows, we focus on three figurations and two configurations that emerged from the diagrams.

Figurations of algorithmic security vision

Diagram 1. Collage of excerpts from the conversations. Computer vision is often depicted as camera based. The third drawing depicts a "sensor hotel" on top of a light post in the Burglary-Free Neighborhood.

To understand the politics involved in the introduction of algorithmic vision in security practices, the first step was to see how the practitioners we spoke with figured their own practices through the use of our diagramming method. Our aim was to capture through shapes, relations, associations, and descriptions, the actors, institutions, technical artifacts, and processes in situated practices of algorithmic security vision.

When we asked our interviewees what unites practices of computer vision in (in)security, they start by foregrounding the camera and the (algorithmically processed) visual image. However, when they began drawing these assemblages based on examples, complexities emerged. In an example of crowd detection developed for the securitization of the Hague’s seaside boulevard, multiple sensors are installed on lampposts and benches to count passersby. Based on behaviors and moving patterns in the public space, operators can know, how many people are on the boulevard at a certain moment, and whether these are individuals, or small or large groups — the latter of which might be seen as a potential security threat. The Burglary-Free Neighborhood in Rotterdam uses a “sensor hotel” installed under the hood of street lamps (Diagram 1) where the trajectory of pedestrians is analyzed with sounds like breaking glass, gunshots or screams. In the security assemblages described by our interviewees, the camera is but one element. During the diagramming, the figure of the visual is pushed out of focus.

In analyzing the twelve diagrams, three central figurations in camera-based algorithmic security practices emerged that help us to rethink some central notions of the literature on algorithmic security: (1) a figure of “vision” as increasingly trained synthetically, not organically; (2) a figure of vision as cinematic and moving in time, not photographic; (3) a figure of the error as a permanent dimension of algorithmic vision, not as something that could be solved or eliminated.

1. From skilled vision to synthetic vision

Diagram 2. Sergei Miliaev distinguishes three sources of training data for facial recognition technologies.

In most of our conversations, algorithmic security vision is understood to involve a particular subset of algorithms: deep neural networks.4 Such a machine learning-based vision brings to the fore one key dimension of security practices: the question of training, and the ability to “see.” Training has been assumed as part of the discussion around the socialization of security professionals (Amicelle et al., 2015; Bigo, 2002), and algorithmic systems (Fourcade and Johns, 2020) but scant attention has been paid to how training elaborates upon and incorporates specific sets of skills.

Some authors have explored the way in which the “seeing” of security agents is trained at the border, by building on the literatures on “skilled vision” (Maguire et al., 2014) or “vision work” (Olwig et al., 2019). Maguire mobilized work in anthropology that locates human vision as a “embodied, skilled, trained sense” (Grasseni, 2004: 41) that informs standardized practices of local “communities of vision” (see also Goodwin, 1994). Skilled vision is useful in that it draws attention to the sociomaterial circumstances under which vision becomes a trained perception (Grasseni, 2018: 2), and how it becomes uniform in communities through visual apprenticeship. This literature examines the production of “common sense” by taking training, exercise, peer monitoring and other practices of visual apprenticeship as locus of attention. Yet these works fail to capture the specificities of the type of machine learning we encountered in our research. How then is visual apprenticeship reconfigured under algorithmic security vision?

In our conversations, the “training of the algorithms,” figures as a key stake of algorithmic security vision. The participants in our diagram interviews explained how deep learning algorithms are trained on a multiplicity of visual data which provides the patterns a system should discriminate on. In Diagram 2, Sergei Miliaev, head of the facial recognition research team at VisionLabs in Rotterdam illustrated this point.

Miliaev distinguishes three sources for training images: web scraping, “operational” data collected through its partners or clients, and “synthetic” data. The first two options, Miliaev argues, have some limitations. Under European data protection regulation it is very difficult to obtain or be allowed to use data “from the wild” because it is often illegal to collect data of real people in the places where the algorithm will be used. Additionally, partners sometimes resist sharing their operational footage outside of their own digital infrastructures. Finally, when engineering a dataset, one cannot control what kind of footage is encountered in “the wild.” This has led to the emergence of a new phenomenon: training data generated in the lab.

Synthetic training data is often collected by acting in front of a camera. We see this in the case of intelligent video surveillance (Intelligente Videoüberwachung) deployed in Mannheim since 2018. Commenting on this case, chief of police (Polizeidirektor) Dirk Herzbach explains that self-defense trainers imitated 120 body positions to create the annotated data used to train the behavior recognition technology. In another example, Gerwin van der Lugt, developer of software that detects violent behavior, stated that given the insufficiency of data available, they “rely on some data synth techniques,” such as simulating violent acts in front a green screen. Sometimes even the developers, computer scientists or engineers themselves re-enact certain movements or scenes for training their algorithms. In Diagram 3, two developers involved in the project at seaside boulevard in Scheveningen give a striking example of how such enactments of suspicious events require the upfront development of a threat model that contains visual indicators that distinguish threat (a positive detection) from non-threat (a negative detection). The acting of the developers embeds these desirable and undesirable traits into the computer model.

Diagram 3. Two developers involved in a project at the seaside boulevard in Scheveningen describe the use of computer vision to distinguish the legal use balloons from their illegal use for inhaling the nitrous oxide gas.

Sometimes the meaning of “synthetic” or “fake” data is pushed. Sergei Miliaev explains how in the context of highly sensitive facial recognition algorithms, software companies use faces generated entirely through artificial neural networks to train their algorithms. Miliaev mentions Microsoft’s DigiFace-1M (Bae et al., 2023), a training dataset containing one million algorithmically generated faces. Such synthetic training sets complicate the borders between sensor-originated and other types of images. In the use of artificially generated images, one GPU generates bytes that are interpreted by another. Algorithmic vision occurs without direct reference to people or things that live outside electronic circuits.

These technical developments offer a changing figure of the skilled vision of security that calls for new research directions. While these technologies are still in their infancy, our interviewees see it as a token of “better” and “fairer” technology that can circumvent racial bias, as any minority can be generated to form an equal distribution in the training dataset (Stevens and Keyes, 2021). But with an emerging concern for algorithmic hallucination (Ji et al., 2023), glitches or undesirable artifacts in the generated data, one wonders what kind of vision is trained using such collections. Learning from synthetic data thus produces an internalized vision, providing insights by circulating data through a chain of artificial neural networks. While appearing in new technological assemblages, the processing of images to form archetypes is reminiscent of the composite photographs created by Galton (1879). His composites were used to train police officers to identify people as belonging to a particular group, circulating and reinforcing the group boundaries based on appearance (Hopman and M’charek, 2020). Which boundaries does “fake” or synthetized training data perpetuate? Skilled vision shifts attention to the negotiations that happen before algorithmic vision is trained, such as how algorithmic vision depends on access to data and regulations around data protection.

With the use of synthetic data, the question of a "community of lookers" — the embodied social and material practices through which apprenticeship is perpetuated — appears in a new light. Such a community becomes more dispersed as generative models circulate freely online. For instance, a generative model from Microsoft, trained on images shared online, is used for training an authentication system in the Moscow Metro system. Such models are informed by communities of looking from which their training data is sourced, and the norms of that platform. These norms then circulate with the model and become "plugged-in" to other systems. Algorithmic vision, trained on synthetic data, is thus a composable vision, in which different sources of training data mobilize imagery from all kinds of aesthetic apprenticeships. The cascading of generative and discriminative models thus reshapes security practices. Furthermore, to comprehend changes in the politics of vision, attention to the training of vision, as a moment of standardization and operationalization, could be extended to the training of security professionals.

2. Figuring time: from photographic to cinematic vision

Conversations with practitioners revealed yet another dimension of the figure of vision in flux: its relation to time and movement. Deep learning-based technologies distinguish themselves from earlier algorithmic security systems based on their status as prediction models, which by definition raises questions on the temporal dimensions of their processing (Sudmann, 2021). Yet, how algorithmic security vision reconfigures temporalities has yet to receive scholarly attention in CSS and related disciplines. While literature on border studies has located border security in multiple places and temporalities (e.g. Bigo and Guild, 2005), scholarship on image-based algorithmic security practices have often focused on a photography-centric paradigm: biometric images (Pugliese, 2010) facial, iris and fingerprint recognition (Møhl, 2021), and body scanners (Leese, 2015). These technologies capture immutable features of suspect identities. In the diagrams, however, vision appears less static. Instead, two central dimensions of the figure of vision appear: the ability to capture and make sense of the movement of the bodies in a fixed space, and the movement of bodies across spaces.

On the first point, we notice increasing attention to corporeality, how physical movements render certain individuals suspicious. This process takes place through the production and analysis of motion by composing a sequence of frames. Gerwin van der Lugt, who helped develop a violence detection algorithm at Oddity.ai, stresses how “temporal information integration” is the biggest technical challenge in detecting violence in surveillance footage: a raised hand might be either a punch or a high-five. In Diagram 4, van der Lugt visualizes the differences between the static and dynamic models. A first layer of pose or object detection often analyzes a merely static image. Oddity.ai then uses custom algorithms to integrate individual detections into one that tracks movement. It is then the movement that can be assessed as violent or harmless. From these outputs, Oddity.ai runs “another [...] process that [they] call temporal information integration—it’s quite important—to [...] find patterns that are [even] longer.” This case illustrates how algorithmic security vision temporarily attributes risk to bodies, in accordance with the ways violence is imagined and choreographed in the training data.

Diagram 4. Top: Frame 1 is processed by YOLO, an object detection model, producing Output 1 (O1). Other frames are processed independently. Bottom: Frames 1 to 10 are combined for processing by the customized model (“M”), where it produces outputs (O 1-10, O 11-20). These outputs are then processed in relation to one another by the temporal information integration to find body patterns over longer periods. Drawn by Gerwin van der Lugt.

Our interviewees figured movement in a second way. Bodies are tracked in space, leading to an accumulation of suspicious data over time. Ádám Remport explained how facial recognition technology (FRT) works by drawing a geographical map featuring building blocks and streets. In this map (Diagram 5), Person A could visit a bar, a church, or an NGO.

Diagram 5. Ádám Remport explains how a person’s everyday routes can be inferred when facial recognition technology is deployed in various sites, montaging a local, photographic vision into spatio-temporal, cinematic terms.

If “FRT is fully deployed and constantly functioning,” explains Remport, people can be “followed wherever [they] go.” Remport’s drawing therefore suggests that in this setting it is not important to be able to identify the person under surveillance; what matters is that this person can be tracked over different surveillance camera feeds. The trajectories of bodies and their “signature” marked through the reconstruction of their habitual movements through space are used as a benchmark for the construction of suspicion. Cinematic vision is thus made possible thanks to the broader infrastructure that allows for the collection and analysis of data over longer periods of time, and their summarization through montage.

The emerging centrality of movement thus opens up a new research agenda for security, focused not on who and what features are considered risky, but when, and through which movements specific bodies become suspicious. While earlier studies on biometric technologies have located the operational logic on identification, verification, authentication, thus knowing the individual (Ajana, 2013; Muller, 2010), figuring algorithmic security vision as cinematic locates its operational logic in the mobility of embodied life (see Huysmans, 2022). While many legal and political debates revolve around the storage of images as individual frames, and the privacy issues involved, less is known about the consequences of putting these frames into a sequence on a timeline and the movements that emerge through the integration of frames over time.

3. Managing error: from the sublime to the risky algorithm

Our third emerging figuration concerns the place of the error. A large body of literature examines actual and speculative cases of algorithmic prediction based on self-learning systems (Azar et al., 2021). Central to these analyses is the boundary-drawing performed by such algorithmic devices, enacting (in)security by rendering their subjects as more- or less-risky others (Amicelle et al., 2015: 300; Amoore and De Goede, 2005; Aradau et al., 2008; Aradau and Blanke, 2018) based on a spectrum of individual and environmental features (Calhoun, 2023). In other words, these predictive devices conceptualize risk as something produced by, and thus external to, security technologies.

In this critical literature on algorithmic practices, practitioners working with algorithmic technologies are often critiqued for understanding software as “sublime” (e.g. Wilcox, 2017: 3). However, in our diagrams, algorithmic vision appears as a practice of managing error. The practitioners we interviewed are aware of the error-prone nature of their systems but know it will never be perfect, and see it as a key metric that needs to be acted upon.

The most prominent way in which error figures in the diagrams is in its quantified form of the true positive and false positive rates, TPR and FPR. The significance and definition of these metrics is stressed by CTO Gerwin van der Lugt (Diagram 6). In camera surveillance, the false positive rate could be described as the number of fales positive classifications relative to the number of video frames being analyzed. Upon writing down these definitions, van der Lugt corrected his initial definitions, as these definitions determine the work of his development team, the ways in which his clients — security operators — engage with the technology, and whether they perceive the output of the system as trustworthy.

Diagram 6. Gerwin van der Lugt corrects his initial definitions of the true positive and false positive rates, and stresses the importance of their precise definition.

The figuration of algorithmic security vision as inherently imprecise affects the operationalization of security practices. Van der Lugt’s example concerns whether the violence detection algorithm developed by Oddity.ai should be trained to categorize friendly fighting (stoeien) between friends as “violence” or not. In this context, van der Lugt finds it important to differentiate what counts as false positive in the algorithm’s evaluation metric from an error in the algorithm’s operationalization of a security question.

He gives two reasons to do so. First, he anticipates that the exclusion of stoeien from the category of violence would negatively impact TPR. In the iterative development of self-learning systems, the TPR and FPR, together with the true and false negative rates must perform a balancing act. Van der Lugt outlines that with their technology they aim for fewer than 100 false positives per 100 million frames per week. The FPR becomes indicative of the algorithm’s quality, as too many faulty predictions will desensitize the human operator to system alerts.

This leads to van der Lugt’s second point: He fears that the exclusion of stoeien from the violence category might cause unexpected biases in the system. For example, instead of distinguishing violence from stoeien based on people’s body movements, the algorithm might make the distinction based on their age. For van der Lugt, this would be an undesirable and hard to notice form of discrimination. In developing algorithmic (in)security, error is figured not merely as a mathematical concept but (as shown in Diagram 6) as a notion that invites pre-emption — a mitigation of probable failure — for which the developer is responsible. The algorithmic condition of security vision is figured as the pre-emption of error.

Diagram 7. By drawing errors on a timeline, van Rest calls attention to the pre-emptive nature of error in the development process of computer vision technologies.

According to critical AI scholar Matteo Pasquinelli, “machine learning is technically based on formulas for error correction” (2019: 2). Therefore, any critical engagement with such algorithmic processes needs to go beyond citing errors, “for it is precisely through these variations that the algorithm learns what to do” (Amoore, 2019: 164), pushing us to reconsider any argument based on the inaccuracy of the systems.

The example of stoeien suggests that it is not so much a question if, or how much, these algorithms err, but how these errors are anticipated and negotiated. Thus, taking error as a hallmark of machine learning we can see how practices of (in)security become shaped by the notion of mathematical error well beyond their development stages. Error figures centrally in the development, acquisition and deployment of such devices. As one respondent indicated, predictive devices are inherently erroneous, but the quantification of their error makes them amenable to "risk management.”

While much has been written about security technologies as a device for risk management, little is known about how security technologies are conceptualized as objects of risk management. What happens then in this double relation of risk? The figure of the error enters the diagrams as a mathematical concept, throughout the conversations we see its figure permeate the discourse around algorithmic security vision. By figuring algorithmic security vision through the notion of error, risk is placed at the heart of the security apparatus.

Con-figurations of algorithmic security vision: fragmenting accountability and expertise

In the previous section we explored the changing figurations of key dimensions of algorithmic security vision, in this section we examine how these figurations configure. For Suchman, working with configurations highlights “the histories and encounters through which things are figured into meaningful existence, fixing them through reiteration but also always engaged in ‘the perpetuity of coming to be’ that characterizes the biographies of objects as well as subjects” (Suchman, 2012: 50, emphasis ours) In other words, we are interested in the practices and tensions that emerge as figurations become embedded in material practices. We focus on two con-figurations that emerged in the interviews: the delegation of accountability to externally managed benchmarks, and the displacement of responsibility through the reconfiguration of the human-in-the-loop.

Delegating accountability to benchmarks

The first configuration is related to the evaluation of the error rate in the training of algorithmic vision systems: it involves datasets, benchmark institutions, and the idea of fairness as equal representation among different social groups. Literature on the ethical and political effects of algorithmic vision has notoriously focused on the distribution of errors, raising questions of ethnic and racial bias (e.g. Buolamwini and Gebru, 2018). Our interviews reflect the concerns of much of this literature as the pre-emption of error figured repeatedly in relation to the uneven distribution of error across minorities or groups. In Diagram 8, Ádám Remport draws how different visual traits have often led to different error rates. While the general error metric of an algorithmic system might seem "acceptable," it actually privileges particular groups, which is invisible when only the whole is considered. Jeroen van Rest distinguishes such errors from the inherent algorithmic imprecision in deep machine learning models, as systemic biases (Diagram 7), as they perpetuate inequalities in the society in which the product is being developed.

To mitigate these concerns and manage their risk, many of our interviewees who develop and implement these technologies, externalize the reference against which the error is measured. They turn to a benchmark run by the American National Institute of Standards and Technology (NIST), which ranks facial recognition technologies by different companies by their error metric across groups. John Riemen, who is responsible for the use of forensic facial recognition technology at the Center for Biometrics of the Dutch police, describes how their choice for software is driven by a public tender that demands a "top-10" score on the NIST benchmark. The mitigation of bias is thus outsourced to an external, and in this case foreign, institution.

Diagram 8. Ádám Remport describes that facial recognition technologies are often most accurate with white male adult faces, reflecting the datasets they are trained with. The FPR is higher with people with darker skin, children, or women, which may result in false flagging and false arrests.

We see in this outsourcing of error metrics a form of delegation that brings about a specific regime of (in)visibility. While a particular kind of algorithmic bias is rendered central to the NIST benchmark, the mobilization of this reference obfuscates questions on how that metric was achieved. That is to say, questions about training data are invisibilized, even though that data is a known site of contestation. For example, the NIST benchmark datasets are known to include faces of wounded people (Keyes, 2019). The Clearview company is known to use images scraped illegally from social media, and IBM uses a dataset that is likely in violation of European GDPR legislation (Bommasani et al., 2022: 154). Pasquinelli (2019) argued that machine learning models ultimately act as data compressors: enfolding and operationalizing imagery of which the terms of acquisition are invisibilized.

Attention to this invisibilization reveals a discrepancy between the developers and the implementers of these technologies. On the one hand, the developers we interviewed expressed concerns about how their training data is constituted to gain a maximum false positive rate/true positive rate (FPR/TPR) ratio, while showing concern for the legality of the data they use to train their algorithms. On the other hand, questions about the constitution of the dataset have been virtually non-existent in our conversations with those who implement software that relies on models trained with such data. Occasionally this knowledge was considered part of the developers' intellectual property that had to be kept a trade secret. A high score on the benchmark is enough to pass questions of fairness, legitimizing the use of the algorithmic model. Thus, while indirectly relying on the source data, it is no longer deemed relevant in the consideration of an algorithm. This illustrates well how the invisibilization of the “compressed” dataset, in Pasquinelli’s terms, into a model, with the formalization of guiding metrics into a benchmark, permits a bracketing of accountability. One does not need to know how outcomes are produced, as long as the benchmarks are in order.

The configuration of algorithmic vision’s bias across a complex network of fragmented locations and actors, from the dataset, to the algorithm, to the benchmark institution reveals the selective processes of (in)visibilization. This opens up fruitful alleys for new empirical research: What are the politics of the benchmark as a mechanism of legitimization? How does the outsourcing of assessing the error distribution impact attention to bias? How has the critique of bias been institutionalized by the security industry, resulting in the externalization of accountability, through dis-location and fragmentation?

Reconfiguring the human-in-the-loop

A second central question linked to the delegation of accountability is the configuration in which the security operator is located. The effects of delegation and fragmentation in which the mitigation of algorithmic errors is outsourced to an external party, becomes visible in the ways in which the role of the security operator is configured in relation to the institution they work for, the software’s assessment, and the affected publics.

The public critique of algorithms has often construed the human-in-the loop as one of the last lines of defense in the resistance to automated systems, able to filter and correct erroneous outcomes (Markoff, 2020). The literature in critical security studies has however problematized the representation of the security operator in algorithmic assemblages by discussing how the algorithmic predictions appear on their screen (Aradau and Blanke, 2018), and how the embodied decision making of the operator is entangled with the algorithmic assemblage (Wilcox, 2017). Moreover, the operator is often left guessing at the working of the device that provides them with information to make their decision (Møhl, 2021).

What our participants’ diagrams emphasized is how a whole spectrum of system designs emerges in response to similar questions, for example the issue of algorithmic bias. A primary difference can be found in the degree of understanding of the systems that is expected of security operators, as well as their perceived autonomy. Sometimes, the human operator is central to the system’s operation, forming the interface between the algorithmic systems and surveillance practices. Gerwin van der Lugt, developer of software at Oddity.ai that detects criminal behavior argues that “the responsibility for how to deal with the violent incidents is always [on a] human, not the algorithm. The algorithm just detects violence—that’s it—but the human needs to deal with it.”

Dirk Herzbach, chief of police at the Police Headquarters Mannheim, adds that when alerted to an incident by the system, the operator decides whether to deploy a police car. Both Herzbach and Van der Lugt figure the human-in-the-loop as having full agency and responsibility in operating the (in)security assemblage (cf. Hoijtink and Leese, 2019).

Some interviewees drew a diagram in which the operator is supposed to be aware of the ways in which the technology errs, so they can address them. Several other interviewees considered the technical expertise of the human-in-the-loop to be unimportant, even a hindrance.

Chief of police Herzbach prefers an operator to have patrol experience to assess which situations require intervention. He is concerned that knowledge about algorithmic biases might interfere with such decisions. In the case of the Moscow metro, in which a facial recognition system has been deployed to purchase tickets and open access gates, the human-in-the-loop is reconfigured as an end user who needs to be shielded from the algorithm’s operation (c.f. Lorusso, 2021). On these occasions, expertise on the technological creation of the suspect becomes fragmented.

These different figurations of the security operator are held together by the idea that the human operator is the expert of the subject of security, and is expected to make decisions independent from the information that the algorithmic system provides.

Diagram 9. Riemen explains the process of information filtering that is involved in querying the facial recognition database of the Dutch police.

Other drivers exist, however, to shield the operator from the algorithm’s functioning, challenging individual expertise and acknowledging the fallibility of human decision making. In Diagram 9, John Riemen outlines the use of facial recognition by the Dutch police. He describes how data from the police case and on the algorithmic assessment is filtered out as much as possible from the information provided to the operator. This, Riemen suggests, might reduce bias in the final decision. He adds that there should be no fewer than three humans-in-the-loop who operate independently to increase the accuracy of the algorithmic security vision.

Instead of increasing their number, there is another configuration of the human-in-the-loop that responds to the fallibility of the operator. For the Burglary-Free Neighborhood project in Rotterdam, project manager Guido Delver draws surveillance as operated by neighborhood residents, through a system that they own themselves. By involving different stakeholders, Delver hopes to counter government hegemony over the surveillance apparatus. However, residents are untrained in assessing algorithmic predictions raising new challenges. Delver illustrates a scenario in which the algorithmic signaling of a potential burglary may have dangerous consequences: “Does it invoke the wrong behavior from the citizen? [They could] go out with a bat and look for the guy who has done nothing [because] it was a false positive.” In this case, the worry is that the erroneous predictions will not be questioned. Therefore, in Delver’s project the goal was to actualize an autonomous system, “with as little interference as possible.” Human participation or “interference” in the operation are potentially harmful. Thus, figuring the operator — whether police officer or neighborhood resident — as risky, can lead to the relegation of direct human intervention.

By looking at the figurations of the operator that appear in the diagrams we see multiple and heterogeneous configurations of regulations, security companies, and professionals. In each configuration, the human-in-the-loop appears in different forms. The operator often holds the final responsibility in the ethical functioning of the system. At times they are configured as an expert in sophisticated but error-prone systems; at others they are figured as end users who are activated by the alerts generated by the system, and who need not understand how the software works and errs, or who can be left out.

These configurations remind us that there cannot be any theorization of “algorithmic security vision,” both of its empirical workings and its ethical and political consequences without close attention to the empirical contexts in which the configurations are arranged. Each organization of datasets, algorithms, benchmarks, hardware and operators has specific problems. And each contains specific politics of visibilization, invisibilization, responsibility and accountability.

A diagram of research

In this conclusion, we reflect upon a final dimension of the method of diagraming in the context of figurations and configurations: its potential as an alternative to the conventional research program.

While writing this text, indeed, the search for a coherent structure through which we could map the problems that emerged from analyzing the diagrams in a straightforward narrative proved elusive. We considered various organizational frameworks, but consistently encountered resistance from one or two sections. It became evident that our interviews yielded a rhizome of interrelated problems, creating a multitude of possible inquiries and overlapping trajectories. Some dimensions of these problems are related, but not to every problem.

If we take for example the understanding of algorithmic security vision as practices of error management as a starting point, we see how the actors we interviewed have incorporated the societal critique of algorithmic bias. This serves as a catalyst for novel strategies aimed at mitigating the repercussions of imperfect systems. The societal critique has driven the development of synthetic datasets, which promise equitable representation across diverse demographic groups. It has also been the reason for the reliance on institutionalized benchmarks to assess the impartiality of algorithms. Moreover, different configurations of the human-in-the-loop emerge, all promised to rectify algorithmic fallibility. We see a causal chain there.

But how does the question of algorithmic error relate to the shift from photographic to cinematic vision that algorithmic security vision brings about? Certainly, there are reverberations. The relegation of stable identity that we outlined, could be seen as a way to mitigate the impact of those errors. But it would be a leap to identify these questions of error as the central driver for the increased incorporation of moving images in algorithmic security vision.

However, if we take as our starting point the formidable strides in computing power and the advancements in camera technologies, we face similar problems. These developments make the analysis of movement possible while helping to elucidate the advances in the real-time analysis that are required to remove the human-in-the-loop, as trialed in the Burglary-Free Neighborhood. These developments account for the feasibility of the synthetic data generation, a computing-intense process which opens a vast horizon of possibilities for developers to detect objects or actions. Such an account, however, does not address the need for such a synthetic dataset. A focus on the computation of movement, however, would highlight how a lack of training data necessitates many of the practices described. Synthetic data is necessitated by the glaring absence of pre-existing security datasets that contain moving bodies. While facial recognition algorithms could be trained and operated on quickly repurposed photographic datasets of national identity cards or drivers’ license registries, no dataset for moving bodies has been available to be repurposed by states or corporations. This absence of training data requires programmers to stage scenes for the camera. Thus, while one issue contains echoes of the other, the network of interrelated problematizations cannot be flattened into a single narrative.

The constraints imposed by the linear structure of an academic article certainly necessitate a specific ordering of sections. Yet the different research directions we highlight form something else. The multiple figurations analyzed here generate fresh tensions when put in relation with security and political practices. What appears from the diagrams is a network of figurations in various configurations. Instead of a research program, our interviews point toward a larger research diagram of interrelated questions, which invites us to think in terms of pathways through this dynamic and evolving network of relations.

Interviewees

References

Ajana B (2013) Governing Through Biometrics. London: Palgrave Macmillan UK. DOI: 10.1057/9781137290755.

Amicelle A, Aradau C and Jeandesboz J (2015) Questioning security devices: Performativity, resistance, politics. Security Dialogue 46(4): 293–306. DOI: 10.1177/0967010615586964.

Amoore L (2014) Security and the incalculable. Security Dialogue 45(5). SAGE Publications Ltd: 423–439. DOI: 10.1177/0967010614539719.

Amoore L (2019) Doubt and the algorithm: On the partial accounts of machine learning. Theory, Culture & Society 36(6). SAGE Publications Ltd: 147–169. DOI: 10.1177/0263276419851846.

Amoore L (2021) The deep border. Political Geography. Elsevier: 102547.

Amoore L and De Goede M (2005) Governance, risk and dataveillance in the war on terror. Crime, Law and Social Change 43(2): 149–173. DOI: 10.1007/s10611-005-1717-8.

Andersen RS (2015) Remediating Security. 1. oplag. Ph.d.-serien / københavns universitet, institut for statskundskab. Kbh.: Københavns Universitet, Institut for Statskundskab.

Andersen RS (2018) The art of questioning lethal vision: Mosse’s infra and militarized machine vision. In: _Proceeding of EVA copenhagen 2018_, 2018.

Andrejevic M and Burdon M (2015) Defining the sensor society. Television & New Media 16(1): 19–36. DOI: 10.1177/1527476414541552.

Aradau C and Blanke T (2015) The (big) data-security assemblage: Knowledge and critique. Big Data & Society 2(2): 205395171560906. DOI: 10.1177/2053951715609066.

Aradau C and Blanke T (2018) Governing others: Anomaly and the algorithmic subject of security. European Journal of International Security 3(1). Cambridge University Press: 1–21. DOI: 10.1017/eis.2017.14.

Aradau C, Lobo-Guerrero L and Van Munster R (2008) Security, technologies of risk, and the political: Guest editors’ introduction. Security Dialogue 39(2-3): 147–154. DOI: 10.1177/0967010608089159.

Azar M, Cox G and Impett L (2021) Introduction: Ways of machine seeing. AI & SOCIETY. DOI: 10.1007/s00146-020-01124-6.

Bae G, de La Gorce M, Baltrušaitis T, et al. (2023) DigiFace-1M: 1 million digital face images for face recognition. In: 2023 IEEE Winter Conference on Applications of Computer Vision (WACV), 2023. IEEE.

Barad KM (2007) Meeting the Universe Halfway: Quantum Physics and the Entanglement of Matter and Meaning. Durham: Duke University Press.

Bellanova R, Irion K, Lindskov Jacobsen K, et al. (2021) Toward a critique of algorithmic violence. International Political Sociology 15(1): 121–150. DOI: 10.1093/ips/olab003.

Bigo D (2002) Security and immigration: Toward a critique of the governmentality of unease. Alternatives 27. SAGE Publications Inc: 63–92. DOI: 10.1177/03043754020270S105.

Bigo D and Guild E (2005) Policing at a distance: Schengen visa policies. In: Controlling Frontiers. Free Movement into and Within Europe. Routledge, pp. 233–263.

Bommasani R, Hudson DA, Adeli E, et al. (2022) On the opportunities and risks of foundation models. Available at: http://arxiv.org/abs/2108.07258 (accessed 2 June 2023).

Bousquet AJ (2018) The Eye of War. Minneapolis: University of Minnesota Press.

Bucher T (2018) If...Then: Algorithmic Power and Politics. New York: Oxford University Press.

Buolamwini J and Gebru T (2018) Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of Machine Learning Research 81.

Calhoun L (2023) Latency, uncertainty, contagion: Epistemologies of risk-as-reform in crime forecasting software. Environment and Planning D: Society and Space. SAGE Publications Ltd STM: 02637758231197012. DOI: 10.1177/02637758231197012.

Carraro V (2021) Grounding the digital: A comparison of Waze’s ‘avoid dangerous areas’ feature in Jerusalem, Rio de Janeiro and the US. GeoJournal 86(3): 1121–1139. DOI: 10.1007/s10708-019-10117-y.

Dawson-Howe K (2014) A Practical Introduction to Computer Vision with OpenCV. 1st edition. Chichester, West Sussex, United Kingdon; Hoboken, NJ: Wiley.

Dijstelbloem H, van Reekum R and Schinkel W (2017) Surveillance at sea: The transactional politics of border control in the Aegean. Security Dialogue 48(3): 224–240. DOI: 10.1177/0967010617695714.

Farocki H (2004) Phantom images. Public. Available at: https://public.journals.yorku.ca/index.php/public/article/view/30354 (accessed 6 March 2023).

Fisher DXO (2018) Situating border control: Unpacking Spain’s SIVE border surveillance assemblage. Political Geography 65: 67–76. DOI: 10.1016/j.polgeo.2018.04.005.

Fourcade M and Gordon J (2020) Learning like a state: Statecraft in the digital age32.

Fourcade M and Johns F (2020) Loops, ladders and links: The recursivity of social and machine learning. Theory and Society: 1–30. DOI: 10.1007/s11186-020-09409-x.

Fraser A (2019) Curating digital geographies in an era of data colonialism. Geoforum 104. Elsevier: 193–200.

Galton F (1879) Composite portraits, made by combining those of many different persons into a single resultant figure. The Journal of the Anthropological Institute of Great Britain and Ireland 8. [Royal Anthropological Institute of Great Britain; Ireland, Wiley]: 132–144. DOI: 10.2307/2841021.

Gandy OH (2021) The Panoptic Sort: A Political Economy of Personal Information. Oxford University Press. Available at: https://books.google.com?id=JOEsEAAAQBAJ.

Gillespie T (2018) Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media. Illustrated edition. Yale University Press.

Goodwin C (1994) Professional vision. American Anthropologist 96(3).

Graham S (1998) Spaces of surveillant simulation: New technologies, digital representations, and material geographies. Environment and Planning D: Society and Space 16(4). SAGE Publications Sage UK: London, England: 483–504.

Graham SD (2005) Software-sorted geographies. Progress in human geography 29(5). Sage Publications Sage CA: Thousand Oaks, CA: 562–580.

Grasseni C (2004) Skilled vision. An apprenticeship in breeding aesthetics. Social Anthropology 12(1): 41–55. DOI: 10.1017/S0964028204000035.

Grasseni C (2018) Skilled vision. In: Callan H (ed.) The International Encyclopedia of Anthropology. 1st ed. Wiley, pp. 1–7. DOI: 10.1002/9781118924396.wbiea1657.

Haraway D (1988) Situated knowledges: The science question in feminism and the privilege of partial perspective. Feminist Studies 14(3). Feminist Studies, Inc.: 575–599. DOI: 10.2307/3178066.

Hoijtink M and Leese M (2019) How (not) to talk about technology international relations and the question of agency. In: Hoijtink M and Leese M (eds) Technology and Agency in International Relations. Emerging technologies, ethics and international affairs. London ; New York: Routledge, pp. 1–24.

Hopman R and M’charek A (2020) Facing the unknown suspect: Forensic DNA phenotyping and the oscillation between the individual and the collective. BioSocieties 15(3): 438–462. DOI: 10.1057/s41292-020-00190-9.

Hunger F (2023) Unhype artificial ’intelligence’! A proposal to replace the deceiving terminology of AI. 12 April. Zenodo. DOI: 10.5281/zenodo.7524493.

Huysmans J (2022) Motioning the politics of security: The primacy of movement and the subject of security. Security Dialogue 53(3): 238–255. DOI: 10.1177/09670106211044015.

Isin E and Ruppert E (2020) The birth of sensory power: How a pandemic made it visible? Big Data & Society 7(2). SAGE Publications Ltd: 2053951720969208. DOI: 10.1177/2053951720969208.

Jasanoff S (2004) States of Knowledge: The Co-Production of Science and Social Order. Routledge Taylor & Francis Group.

Ji Z, Lee N, Frieske R, et al. (2023) Survey of hallucination in natural language generation. ACM Computing Surveys 55(12): 1–38. DOI: 10.1145/3571730.

Keyes O (2019) The gardener’s vision of data: Data science reduces people to subjects that can be mined for truth. Real Life Mag. Available at: https://reallifemag.com/the-gardeners-vision-of-data/.

Latour B (2005) Reassembling the Social: An Introduction to Actor-Network-Theory. Clarendon Lectures in Management Studies. Oxford; New York: Oxford University Press.

Leese M (2015) ‘We were taken by surprise’: Body scanners, technology adjustment, and the eradication of failure. Critical Studies on Security 3(3). Routledge: 269–282. DOI: 10.1080/21624887.2015.1124743.

Leese M (2019) Configuring warfare: Automation, control, agency. In: Hoijtink M and Leese M (eds) Technology and Agency in International Relations. Emerging technologies, ethics and international affairs. London ; New York: Routledge, pp. 42–65.

Lorusso S (2021) The user condition. Available at: https://theusercondition.computer/ (accessed 18 February 2021).

Lyon D (2003) Surveillance as Social Sorting: Privacy, Risk, and Digital Discrimination. Psychology Press. Available at: https://books.google.com?id=yCLFBfZwl08C.

Mackenzie A (2017) Machine Learners: Archaeology of a Data Practice. The MIT Press. DOI: 10.7551/mitpress/10302.001.0001.

Maguire M, Frois C and Zurawski N (eds) (2014) The Anthropology of Security: Perspectives from the Frontline of Policing, Counter-Terrorism and Border Control. Anthropology, culture and society. London: Pluto Press.

Mahony M (2021) Geographies of science and technology 1: Boundaries and crossings. Progress in Human Geography 45(3): 586–595. DOI: 10.1177/0309132520969824.

Markoff J (2020) Robots will need humans in future. The New York Times: Section B, 22 May. New York. Available at: https://www.nytimes.com/2020/05/21/technology/ben-shneiderman-automation-humans.html (accessed 31 October 2023).

McCosker A and Wilken R (2020) Automating Vision: The Social Impact of the New Camera Consciousness. 1st edition. Routledge.

Møhl P (2021) Seeing threats, sensing flesh: Human–machine ensembles at work. AI & SOCIETY 36(4): 1243–1252. DOI: 10.1007/s00146-020-01064-1.

Muller B (2010) Security, Risk and the Biometric State. Routledge. DOI: 10.4324/9780203858042.

O’Sullivan S (2016) On the diagram (and a practice of diagrammatics). In: Schneider K, Yasar B, and Lévy D (eds) Situational Diagram. New York: Dominique Lévy, pp. 13–25.

Olwig KF, Grünenberg K, Møhl P, et al. (2019) The Biometric Border World: Technologies, Bodies and Identities on the Move. 1st ed. Routledge. DOI: 10.4324/9780367808464.

Pasquinelli M (2015) Anomaly detection: The mathematization of the abnormal in the metadata society. Panel presentation.

Pasquinelli M (2019) How a machine learns and fails – a grammar of error for artificial intelligence. Available at: https://spheres-journal.org/contribution/how-a-machine-learns-and-fails-a-grammar-of-error-for-artificial-intelligence/ (accessed 13 January 2021).

Pugliese J (2010) Biometrics: Bodies, Technologies, Biopolitics. New York: Routledge. DOI: 10.4324/9780203849415.

Schurr C, Marquardt N and Militz E (2023) Intimate technologies: Towards a feminist perspective on geographies of technoscience. Progress in Human Geography. SAGE Publications Ltd: 03091325231151673. DOI: 10.1177/03091325231151673.

Soon W and Cox G (2021) Aesthetic Programming: A Handbook of Software Studies. London: Open Humanities Press. Available at: http://www.openhumanitiespress.org/books/titles/aesthetic-programming/ (accessed 9 March 2021).

Srnicek N and De Sutter L (2017) Platform Capitalism. Theory redux. Cambridge, UK ; Malden, MA: Polity.

Stevens N and Keyes O (2021) Seeing infrastructure: Race, facial recognition and the politics of data. Cultural Studies 35(4-5): 833–853. DOI: 10.1080/09502386.2021.1895252.

Suchman L (2006) Human-Machine Reconfigurations: Plans and Situated Actions. 2nd edition. Cambridge University Press.

Suchman L (2012) Configuration. In: Inventive Methods. Routledge, pp. 48–60.

Suchman L (2020) Algorithmic warfare and the reinvention of accuracy. Critical Studies on Security 8(2). Routledge: 175–187. DOI: 10.1080/21624887.2020.1760587.

Sudmann A (2021) Artificial neural networks, postdigital infrastructures and the politics of temporality. In: Volmar A and Stine K (eds) Media Infrastructures and the Politics of Digital Time. Amsterdam University Press, pp. 279–294. DOI: 10.1515/9789048550753-017.

Tazzioli M (2018) Spy, track and archive: The temporality of visibility in Eurosur and Jora. Security Dialogue 49(4): 272–288. DOI: 10.1177/0967010618769812.

Thatcher J, O’Sullivan D and Mahmoudi D (2016) Data colonialism through accumulation by dispossession: New metaphors for daily data. Environment and Planning D: Society and Space 34(6). SAGE Publications Ltd STM: 990–1006. DOI: 10.1177/0263775816633195.

Uliasz R (2020) Seeing like an algorithm: Operative images and emergent subjects. AI & SOCIETY. DOI: 10.1007/s00146-020-01067-y.

van de Ven R and Plájás IZ (2022) Inconsistent projections: Con-figuring security vision through diagramming. A Peer-Reviewed Journal About 11(1): 50–65. DOI: 10.7146/aprja.v11i1.134306.

Wilcox L (2017) Embodying algorithmic war: Gender, race, and the posthuman in drone warfare. Security Dialogue 48(1): 11–28. DOI: 10.1177/0967010616657947.

Zuboff S (2019) The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. First edition. New York: Public Affairs.


  1. The interface software and code is available at https://git.rubenvandeven.com/security_vision/svganim and https://gitlab.com/security-vision/chronodiagram

    ↩︎
  2. The interviews were conducted in several European countries: the majority in the Netherlands, but also in Belgium, Hungary and Poland. Based on an initial survey of algorithmic security vision practices in Europe we identified various roles that are involved in such practices. Being a rather small group of people, these interviewees do not serve as “illustrative representatives” (Mol & Law 2002, 16-17) of the field they work in. However, as the interviewees have different cultural and institutional affiliations, and hold different positions in working with algorithms, vision and security, they cover a wide spectrum of engagements with our research object.

    ↩︎
  3. The interviews were conducted by the first two authors, and at a later stage by Clemens Baier. The conversations were largely unstructured, but began with two basic questions. First, we asked the interviewees if they use diagrams in their daily practice. We then asked: “when we speak of ‘security vision’ we speak of the use of computer vision in a security context. Can you explain from your perspective what these concepts mean and how they come together?” After the first few interviews, we identified some recurrent themes, which we then specifically asked later interviewees to discuss.

    ↩︎
  4. Using anthropomorphizing terms such as “neural networks,” “learning” and “training” to denote algorithmic configurations and processes is suggested to hype “artificial intelligence.” While we support the need for an alternative terminology as proposed by Hunger (2023), here we preserve the language of our interviewees.

    ↩︎