3. Managing error: from the sublime to the risky algorithm

Our third emerging figuration concerns the place of the error. A large body of literature examines actual and speculative cases of algorithmic prediction based on self-learning systems (Azar et al., 2021). Central to these analyses is the boundary-drawing performed by such algorithmic devices, enacting (in)security by rendering their subjects as more- or less-risky others (Amicelle et al., 2015: 300; Amoore and De Goede, 2005; Aradau et al., 2008; Aradau and Blanke, 2018) based on a spectrum of individual and environmental features (Calhoun, 2023). In other words, these predictive devices conceptualize risk as something produced by, and thus external to, security technologies.

In this critical literature on algorithmic practices, practitioners working with algorithmic technologies are often critiqued for understanding software as “sublime” (e.g. Wilcox, 2017: 3). However, in our diagrams, algorithmic vision appears as a practice of managing error. The practitioners we interviewed are aware of the error-prone nature of their systems but know it will never be perfect, and see it as a key metric that needs to be acted upon.

The most prominent way in which error figures in the diagrams is in its quantified form of the true positive and false positive rates, TPR and FPR. The significance and definition of these metrics is stressed by CTO Gerwin van der Lugt (Diagram 6). In camera surveillance, the false positive rate could be described as the number of fales positive classifications relative to the number of video frames being analyzed. Upon writing down these definitions, van der Lugt corrected his initial definitions, as these definitions determine the work of his development team, the ways in which his clients — security operators — engage with the technology, and whether they perceive the output of the system as trustworthy.

Diagram 6. Gerwin van der Lugt corrects his initial definitions of the true positive and false positive rates, and stresses the importance of their precise definition.

The figuration of algorithmic security vision as inherently imprecise affects the operationalization of security practices. Van der Lugt’s example concerns whether the violence detection algorithm developed by Oddity.ai should be trained to categorize friendly fighting (stoeien) between friends as “violence” or not. In this context, van der Lugt finds it important to differentiate what counts as false positive in the algorithm’s evaluation metric from an error in the algorithm’s operationalization of a security question.

He gives two reasons to do so. First, he anticipates that the exclusion of stoeien from the category of violence would negatively impact TPR. In the iterative development of self-learning systems, the TPR and FPR, together with the true and false negative rates must perform a balancing act. Van der Lugt outlines that with their technology they aim for fewer than 100 false positives per 100 million frames per week. The FPR becomes indicative of the algorithm’s quality, as too many faulty predictions will desensitize the human operator to system alerts.

This leads to van der Lugt’s second point: He fears that the exclusion of stoeien from the violence category might cause unexpected biases in the system. For example, instead of distinguishing violence from stoeien based on people’s body movements, the algorithm might make the distinction based on their age. For van der Lugt, this would be an undesirable and hard to notice form of discrimination. In developing algorithmic (in)security, error is figured not merely as a mathematical concept but (as shown in Diagram 6) as a notion that invites pre-emption — a mitigation of probable failure — for which the developer is responsible. The algorithmic condition of security vision is figured as the pre-emption of error.

Diagram 7. By drawing errors on a timeline, van Rest calls attention to the pre-emptive nature of error in the development process of computer vision technologies.

According to critical AI scholar Matteo Pasquinelli, “machine learning is technically based on formulas for error correction” (2019: 2). Therefore, any critical engagement with such algorithmic processes needs to go beyond citing errors, “for it is precisely through these variations that the algorithm learns what to do” (Amoore, 2019: 164), pushing us to reconsider any argument based on the inaccuracy of the systems.

The example of stoeien suggests that it is not so much a question if, or how much, these algorithms err, but how these errors are anticipated and negotiated. Thus, taking error as a hallmark of machine learning we can see how practices of (in)security become shaped by the notion of mathematical error well beyond their development stages. Error figures centrally in the development, acquisition and deployment of such devices. As one respondent indicated, predictive devices are inherently erroneous, but the quantification of their error makes them amenable to "risk management.”

While much has been written about security technologies as a device for risk management, little is known about how security technologies are conceptualized as objects of risk management. What happens then in this double relation of risk? The figure of the error enters the diagrams as a mathematical concept, throughout the conversations we see its figure permeate the discourse around algorithmic security vision. By figuring algorithmic security vision through the notion of error, risk is placed at the heart of the security apparatus.

Con-figurations of algorithmic security vision: fragmenting accountability and expertise

In the previous section we explored the changing figurations of key dimensions of algorithmic security vision, in this section we examine how these figurations configure. For Suchman, working with configurations highlights “the histories and encounters through which things are figured into meaningful existence, fixing them through reiteration but also always engaged in ‘the perpetuity of coming to be’ that characterizes the biographies of objects as well as subjects” (Suchman, 2012: 50, emphasis ours) In other words, we are interested in the practices and tensions that emerge as figurations become embedded in material practices. We focus on two con-figurations that emerged in the interviews: the delegation of accountability to externally managed benchmarks, and the displacement of responsibility through the reconfiguration of the human-in-the-loop.

Delegating accountability to benchmarks

The first configuration is related to the evaluation of the error rate in the training of algorithmic vision systems: it involves datasets, benchmark institutions, and the idea of fairness as equal representation among different social groups. Literature on the ethical and political effects of algorithmic vision has notoriously focused on the distribution of errors, raising questions of ethnic and racial bias (e.g. Buolamwini and Gebru, 2018). Our interviews reflect the concerns of much of this literature as the pre-emption of error figured repeatedly in relation to the uneven distribution of error across minorities or groups. In Diagram 8, Ádám Remport draws how different visual traits have often led to different error rates. While the general error metric of an algorithmic system might seem "acceptable," it actually privileges particular groups, which is invisible when only the whole is considered. Jeroen van Rest distinguishes such errors from the inherent algorithmic imprecision in deep machine learning models, as systemic biases (Diagram 7), as they perpetuate inequalities in the society in which the product is being developed.

Diagram 8. Ádám Remport describes that facial recognition technologies are often most accurate with white male adult faces, reflecting the datasets they are trained with. The FPR is higher with people with darker skin, children, or women, which may result in false flagging and false arrests.

To mitigate these concerns and manage their risk, many of our interviewees who develop and implement these technologies, externalize the reference against which the error is measured. They turn to a benchmark run by the American National Institute of Standards and Technology (NIST), which ranks facial recognition technologies by different companies by their error metric across groups. John Riemen, who is responsible for the use of forensic facial recognition technology at the Center for Biometrics of the Dutch police, describes how their choice for software is driven by a public tender that demands a "top-10" score on the NIST benchmark. The mitigation of bias is thus outsourced to an external, and in this case foreign, institution.

We see in this outsourcing of error metrics a form of delegation that brings about a specific regime of (in)visibility. While a particular kind of algorithmic bias is rendered central to the NIST benchmark, the mobilization of this reference obfuscates questions on how that metric was achieved. That is to say, questions about training data are invisibilized, even though that data is a known site of contestation. For example, the NIST benchmark datasets are known to include faces of wounded people (Keyes, 2019). The Clearview company is known to use images scraped illegally from social media, and IBM uses a dataset that is likely in violation of European GDPR legislation (Bommasani et al., 2022: 154). Pasquinelli (2019) argued that machine learning models ultimately act as data compressors: enfolding and operationalizing imagery of which the terms of acquisition are invisibilized.

Attention to this invisibilization reveals a discrepancy between the developers and the implementers of these technologies. On the one hand, the developers we interviewed expressed concerns about how their training data is constituted to gain a maximum false positive rate/true positive rate (FPR/TPR) ratio, while showing concern for the legality of the data they use to train their algorithms. On the other hand, questions about the constitution of the dataset have been virtually non-existent in our conversations with those who implement software that relies on models trained with such data. Occasionally this knowledge was considered part of the developers' intellectual property that had to be kept a trade secret. A high score on the benchmark is enough to pass questions of fairness, legitimizing the use of the algorithmic model. Thus, while indirectly relying on the source data, it is no longer deemed relevant in the consideration of an algorithm. This illustrates well how the invisibilization of the “compressed” dataset, in Pasquinelli’s terms, into a model, with the formalization of guiding metrics into a benchmark, permits a bracketing of accountability. One does not need to know how outcomes are produced, as long as the benchmarks are in order.

The configuration of algorithmic vision’s bias across a complex network of fragmented locations and actors, from the dataset, to the algorithm, to the benchmark institution reveals the selective processes of (in)visibilization. This opens up fruitful alleys for new empirical research: What are the politics of the benchmark as a mechanism of legitimization? How does the outsourcing of assessing the error distribution impact attention to bias? How has the critique of bias been institutionalized by the security industry, resulting in the externalization of accountability, through dis-location and fragmentation?

Reconfiguring the human-in-the-loop

A second central question linked to the delegation of accountability is the configuration in which the security operator is located. The effects of delegation and fragmentation in which the mitigation of algorithmic errors is outsourced to an external party, becomes visible in the ways in which the role of the security operator is configured in relation to the institution they work for, the software’s assessment, and the affected publics.

The public critique of algorithms has often construed the human-in-the loop as one of the last lines of defense in the resistance to automated systems, able to filter and correct erroneous outcomes (Markoff, 2020). The literature in critical security studies has however problematized the representation of the security operator in algorithmic assemblages by discussing how the algorithmic predictions appear on their screen (Aradau and Blanke, 2018), and how the embodied decision making of the operator is entangled with the algorithmic assemblage (Wilcox, 2017). Moreover, the operator is often left guessing at the working of the device that provides them with information to make their decision (Møhl, 2021).

What our participants’ diagrams emphasized is how a whole spectrum of system designs emerges in response to similar questions, for example the issue of algorithmic bias. A primary difference can be found in the degree of understanding of the systems that is expected of security operators, as well as their perceived autonomy. Sometimes, the human operator is central to the system’s operation, forming the interface between the algorithmic systems and surveillance practices. Gerwin van der Lugt, developer of software at Oddity.ai that detects criminal behavior argues that “the responsibility for how to deal with the violent incidents is always [on a] human, not the algorithm. The algorithm just detects violence—that’s it—but the human needs to deal with it.”

Dirk Herzbach, chief of police at the Police Headquarters Mannheim, adds that when alerted to an incident by the system, the operator decides whether to deploy a police car. Both Herzbach and Van der Lugt figure the human-in-the-loop as having full agency and responsibility in operating the (in)security assemblage (cf. Hoijtink and Leese, 2019).

Some interviewees drew a diagram in which the operator is supposed to be aware of the ways in which the technology errs, so they can address them. Several other interviewees considered the technical expertise of the human-in-the-loop to be unimportant, even a hindrance.

Chief of police Herzbach prefers an operator to have patrol experience to assess which situations require intervention. He is concerned that knowledge about algorithmic biases might interfere with such decisions. In the case of the Moscow metro, in which a facial recognition system has been deployed to purchase tickets and open access gates, the human-in-the-loop is reconfigured as an end user who needs to be shielded from the algorithm’s operation (c.f. Lorusso, 2021). On these occasions, expertise on the technological creation of the suspect becomes fragmented.

These different figurations of the security operator are held together by the idea that the human operator is the expert of the subject of security, and is expected to make decisions independent from the information that the algorithmic system provides.

Diagram 9. Riemen explains the process of information filtering that is involved in querying the facial recognition database of the Dutch police.

Other drivers exist, however, to shield the operator from the algorithm’s functioning, challenging individual expertise and acknowledging the fallibility of human decision making. In Diagram 9, John Riemen outlines the use of facial recognition by the Dutch police. He describes how data from the police case and on the algorithmic assessment is filtered out as much as possible from the information provided to the operator. This, Riemen suggests, might reduce bias in the final decision. He adds that there should be no fewer than three humans-in-the-loop who operate independently to increase the accuracy of the algorithmic security vision.

Instead of increasing their number, there is another configuration of the human-in-the-loop that responds to the fallibility of the operator. For the Burglary-Free Neighborhood project in Rotterdam, project manager Guido Delver draws surveillance as operated by neighborhood residents, through a system that they own themselves. By involving different stakeholders, Delver hopes to counter government hegemony over the surveillance apparatus. However, residents are untrained in assessing algorithmic predictions raising new challenges. Delver illustrates a scenario in which the algorithmic signaling of a potential burglary may have dangerous consequences: “Does it invoke the wrong behavior from the citizen? [They could] go out with a bat and look for the guy who has done nothing [because] it was a false positive.” In this case, the worry is that the erroneous predictions will not be questioned. Therefore, in Delver’s project the goal was to actualize an autonomous system, “with as little interference as possible.” Human participation or “interference” in the operation are potentially harmful. Thus, figuring the operator — whether police officer or neighborhood resident — as risky, can lead to the relegation of direct human intervention.

By looking at the figurations of the operator that appear in the diagrams we see multiple and heterogeneous configurations of regulations, security companies, and professionals. In each configuration, the human-in-the-loop appears in different forms. The operator often holds the final responsibility in the ethical functioning of the system. At times they are configured as an expert in sophisticated but error-prone systems; at others they are figured as end users who are activated by the alerts generated by the system, and who need not understand how the software works and errs, or who can be left out.

These configurations remind us that there cannot be any theorization of “algorithmic security vision,” both of its empirical workings and its ethical and political consequences without close attention to the empirical contexts in which the configurations are arranged. Each organization of datasets, algorithms, benchmarks, hardware and operators has specific problems. And each contains specific politics of visibilization, invisibilization, responsibility and accountability.