The Rainforest Investigations Network (RIN) has asked its 2021 Fellows about the innovative methodologies behind their impactful stories.
From freedom of information requests to using artificial intelligence to analyze satellite imagery, the reporters got their hands on previously unseen data that sheds a light on the corruption and systems behind the destruction of the world’s biggest rainforests. In this Pulitzer Center series, they explain how they did it.
New York Times reporter Manuela Andreoni spoke to the Pulitzer Center about tracking cattle in the Amazon rainforest and discussed how to trace the leather upholstery in U.S. cars to Brazilian slaughterhouses.
Whistleblowers and others in possession of sensitive information of public concern can now securely and confidentially share tips, documents, and data with the Pulitzer Center’s Rainforest Investigations Network (RIN), its editors, and journalists.
The article, which was featured on the newspaper’s front page, details how landowners in the Brazilian Amazon are able to sell cattle to large companies, regardless of whether they have a history of environmental violations.
For her investigation, Andreoni traveled to the Brazilian state of Rondônia, where she witnessed the transaction between a farmer and a cattle middleman for a large meat packing plant. In addition, she followed trucks of an international company that frequented this same location.
In the interview, the Brazilian reporter lists which databases were necessary to cross-reference and arrive at the key story for her RIN project.
Pulitzer Center: Part of your investigation is driven by data. What problems were you trying to solve by using data?
Manuela Andreoni: My investigation was built on several data sets from different steps of the supply chain. The data helped us show that cattle illegally kept in the protected reserve in the Amazon rainforest ended up in slaughterhouses that were part of the leather supply chain of major automakers in the U.S. The data was necessary to prove and establish the scope of each step of the supply chain.
The datasets Andreoni and the team at The New York Times used
- Cattle movement data issued by the state of Rondônia in Brazil (GTA)
- Locations of farms that supplied major meat packers issued by the companies
- Perimeter of farms issued by the Brazilian government (Sicar)
- Perimeter of protected areas issued by the Brazilian government
- Deforestation data issued by the Brazilian government (Inpe)
- Perimeter and location of fines for environmental destruction (Ibama)
- Location of farms and names of farmers inside the reserve (Idaron)
- Shipping data for leather from Brazil to the U.S. (Panjiva)
- Trucking data for leather between the U.S. and Mexico (Material Research)
PC: How did you obtain the data?
MA: We obtained one set of the cattle movement data from the Environmental Investigation Agency, and another set from a different source. We collectetd farm locations and found the names of farmers from court filings through Digesto, a paid service. The rest was publicly available or available through private providers.
PC: What tools did you use to process the data?
MA: We processed any data in PDF format with Tabula. We then transferred the numbers to Google Sheets for analysis. Our goal was to understand how many cattle were moving from the reserve to middlemen and then onwards to slaughterhouses.
We also checked the data for signs that cattle were being moved directly from reserve to slaughterhouse, because that indicates that any middlemen or companies that register the cattle are simply there to launder it through paperwork. They obscure the fact that the cattle was raised in a reserve, which should not have any cattle on it, by making it look like the cattle was coming from them.
We also checked for irregularities among farms that supplied JBS directly. We came up with a method using directives published by the federal prosecutors office and talking to researchers. This challenging and complex work was done by Albert Sun, a data journalist with programming skills.
PC: What is the hardest part in your data work? How did you overcome it?
MA: I can’t speak to the technical challenges of the analysis of JBS direct suppliers, because that was done by Albert Sun. But the vetting process, which I took part in, involved having independent researchers check our work and methods. It also involved relaying our findings to the company and then checking it against their response.
This was the hardest part of our work. When working with a complex data analysis process we are new to, we found it very assuring to have a multiple-step vetting process. I highly recommend reaching out to people who are accustomed to dealing with the datasets you’re working on because when we are new to a dataset we can easily make mistakes we are unable to identify, no matter how many times we check it ourselves. We found many researchers are interested in helping journalists produce accurate and responsible investigations.
The technical challenge that I can speak to was transforming the PDFs using Tabula. The PDFs were incredibly bad and so it took me many days to clean them and verify the extracted data, looking through the resulting file from Tabula against the original file. I tried other tools but Tabula in the end was the best one.
The other challenge was to check that the cattle movement data we got was accurate because we didn't get it directly from the government agency that produces it. We checked large samples of the data by making freedom of information requests to the agency, asking if the serial numbers correlating to each data point existed. While the government agency wouldn’t give us the cattle movement data itself, it was very good in confirming the existence of the specific files we were using.
We also checked that the data was correct by going on the ground and verifying that the commercial relationships in the data indeed existed. We also supplemented our data findings with new information and, for example, followed a particular truck carrying cattle to a slaughterhouse.