Unpublished documents and inside sources give rare look into secretive AI system used to decide who is shirking work
This article was published in partnership with El Confidencial. To read the original report in Spanish, visit El Confidencial.
Spain has positioned itself as a leader in responsible AI within Europe. Yet promises of transparency have often fallen flat and public watchdogs have struggled to gain meaningful access to algorithms deployed in justice and welfare systems. This makes Spain part of a growing trend where governments quietly experiment with predictive algorithms that make life-changing decisions out of public view. In the rare cases where systems have been subject to independent scrutiny, watchdogs have often found fundamental flaws and evidence of discrimination.
Spain’s National Institute of Social Security (INSS) has vowed to crack down on fraud and reduce public spending on its sick leave benefits. In 2018, the INSS deployed two machine learning algorithms to assess the health of millions of people receiving sick leave in Spain. These algorithms attempt to guess which benefit recipients might be shirking work and defrauding the state.
For five years, the use of these algorithms remained largely secret. But over the course of 10 months, we used public records laws to obtain unpublished documents describing the system and the data it relies on. Working with partner El Confidencial, we carried out more than a dozen interviews with ministry officials and the medical inspectors who receive the algorithm-generated scores. They reveal an opaque and under-performing system that nonetheless makes high stakes decisions about millions, potentially pushing patients back to work who aren’t ready.
In April 2022, we discovered a white paper from the INSS describing how it had turned to “big data to fight fraud.” It boasted of advanced analytics, complex statistical reports and predictive models. We sent in a freedom of information access request asking for technical documentation about the agency’s use of algorithms.
While the INSS refused to answer much of the request and subsequent questions, it did disclose a limited selection of technical documents. These include the variables it uses for its calculations, which include gender, age, place of residence and medical diagnoses. They also included performance evaluations which, experts told us, showed high numbers of false positives generated by the system and the use of sensitive medical data protected under European data regulations.
Documents revealed that the algorithms were built using government fraud detection software developed by US software giant SAS and implemented by ViewNext, a Spanish subsidiary of IBM. A tender between the agency and SAS suggests that the agency likely paid at least one million euros for the system. Yet senior INSS officials we spoke to conceded that the algorithms are “not accurate.” Meanwhile, the INSS has refused to answer questions about whether the system may disproportionately flag certain demographic groups.
On the ground, we spoke to the medical inspectors tasked with chasing down the cases flagged by the INSS’s algorithms amidst chronic underfunding and staff shortages. “Those of us who work with it every day are not able to explain what it is,” one INSS medical inspector said. Another was blunt about the usefulness of the system: “The system utility? It would be more helpful if we had more staff.”
Working for three months with the team at El Confidencial, we shaped our reporting into a four-part series.
The first piece is a technical deep-dive into the INSS’ algorithms. Experts expressed alarm at the technical documents we shared with them. Ana Valdivia, a professor in AI Governance and Policy at the Oxford Internet Institute, described the slew of false positives generated by the algorithms as “poor” and “unbalanced.”
The deployment of INSS’ algorithms — referred to internally as the “SAS criteria” — came with big promises internally of catching wide-scale fraud and reducing public spending. The second piece in the series tells the story of how, now more than five years later, few of the promised gains have come to fruition. Interviews with medical inspectors paint a picture of a technology rendered effectively useless as the social security system continues to buckle from spending cuts and staff shortages.
The third piece raises important unanswered questions about the INSS system, including whether its use of sensitive data is legal and potential discrimination.
The final piece in the series retells some of our reporting on algorithms in the Netherlands and digital profiling of low-income neighbourhoods. It positions the Netherlands as a canary in the coal mine for the consequences of unchecked algorithm use and covers attempts to reign in high-risk AI.
Co-publications from this investigation