As the biologist prepared to enter a cave in Uganda, a village leader stopped her. Before they began searching for bats, the leader said, they would need to talk to the dead.
So, on a day in early 2013, researcher Amy Gilbert sat on the dirt floor of a thatched hut, surrounded by a group of villagers, and together they asked the ancestors for permission to enter the cave.
Gilbert, who was working for the U.S. Centers for Disease Control and Prevention, found herself wondering about eye contact — should she look up at the villagers, or down at the dirt? More important: What would happen if the spirits said no?
Fortunately, they didn’t.
In 2009, Alison Peel’s search for bats — a host for dozens of illnesses including Severe Acute Respiratory Syndrome (SARS) — took her to Jinga, a Ugandan town at the source of the Nile. There, the Cambridge University doctoral student chased a large Marabou stork, hoping it would drop the bat dangling lifelessly from its beak.
Instead, she watched the bat disappear down the bird’s throat, its wings protruding comically.
There went her data.
While sitting in séances and pursuing Marabou storks may seem like extreme measures, the data researchers gather in the field — everything from the size and frequency of bat litters, to the levels of virus in their blood serum — is being used to build mathematical tools that scientists hope will achieve a landmark in human health.
They want to predict an infectious outbreak before it happens.
“I think we can do it, though I don’t think we can do it now,” said John Drake, a professor of ecology at the University of Georgia. “With every new outbreak, we get better and better because we learn lessons.
“But I think this is a very achievable goal.”
In a 2015 paper, “The Algorithm That’s Hunting Ebola,” disease ecologist Barbara Han put it this way:
“One day, I hope that biologists will forecast disease outbreaks in the same way meteorologists forecast the weather. With one major difference: A meteorologist can’t stop a storm front, but we may be able to prevent outbreaks.”
Biologists are now monitoring possible outbreak signs, for example, weather patterns that could boost mosquito numbers in certain regions; or land-use changes that bring us into closer contact with animals carrying diseases capable of jumping to humans.
“How soon will we actually forecast a disease outbreak? I’d say it’s a near-term possibility,” said Han, who works at the not-for-profit Cary Institute of Ecosystem Studies in Millbrook, N.Y. “It sounds incredible because it is incredible. But it is within our grasp.”
Already, mathematical models have successfully predicted additional rodent species capable of passing diseases to humans. They have also suggested that a much larger swath of the U.S. than previously thought — as far north as Wisconsin — may be vulnerable to the Zika virus, a disease linked to severe birth defects.
Driving this effort to translate early signals in nature into accurate disease warnings are the powerful, problem-solving operations called algorithms, building blocks of the computer age.
Where would modern life be without the algorithm?
When the Google search bar attempts to finish a phrase before you’ve typed all the words, an algorithm is at work predicting what you’re looking for and offering suggestions.
When you get street directions on an iPhone, algorithms help decide the best way to get from Point A to Point B.
When you call up your Facebook page and ads appear, algorithms have targeted them to you based on your interests and search practices. Facial recognition technology, Google Translate and so many other tools make sense of our world using algorithms.
“I would say that algorithms and mathematical modeling are fairly pervasive and ubiquitous, from the time someone wakes up in the morning until the end of the day,” said Anthony Gitter, an assistant professor in the department of biostatistics and medical informatics at the University of Wisconsin-Madison.
So what is an algorithm and how does it work?
An algorithm is a step-by-step set of instructions — often simple — used to accomplish a specific goal. Let’s say you’ve just flown into the airport and need to get home. You could take a taxi. That involves waiting at a taxi stand, telling the driver where you want to go and hoping he has the expertise to find the best route.
But there are more ways to get home than just a taxi, so making a decision involves weighing the pros and cons of each possibility. Algorithms perform much the same function, considering different factors — the time of day, the number of taxis available, the traffic — then analyzing the best options. They’ve proven especially useful in helping to make sense of vast quantities of data.
The use of mathematical techniques to model the spread of diseases goes back more than two centuries.
In 1760, Swiss mathematician Daniel Bernoulli created a dynamic model of smallpox transmission, showing how the disease spread and examining the effect that inoculation had on the survival of patients.
In 1853, English doctor John Snow used mathematics to demonstrate that a cholera outbreak in London could be traced to contaminated water coming from a single pump on Broad Street.
In 1904, another English doctor, Ronald Ross, published a mathematical model that used mosquito movements. The model helped him devise a plan to reduce the mosquito population and eliminate disease in a defined area.
But the 2001 foot and mouth outbreak in Britain marked one of the first times mathematical models were used to drive decision-making during an outbreak. Data collected in the field was released in real time and shared with scientists, officials and academics. The results convinced some that modeling ought to be used even earlier.
“Models have an important role to play in defining policy before any cases occur, as once an epidemic starts there is insufficient time to experiment with control measures given the rapid spread of (foot and mouth disease) between farms,” wrote Matt J. Keeling, a mathematics professor at the University of Warwick in a 2004 journal article.
Still, the foot and mouth outbreak — which resulted in the forced slaughter of millions of uninfected animals — did not sell everyone on the benefits of modeling. A 2006 article in a French scientific journal argued that the models used for foot and mouth disease were flawed, as was the heavy reliance on them.
The next big test for the disease modelers was the West African Ebola outbreak that began in 2014.
“Ebola for me was kind of the turning point in the use of models,” said Caitlin M. Rivers, senior associate at Johns Hopkins Center for Health Security. “It really changed the conversation about where the outbreak was and where it was going.”
No longer was computer modeling off to the side, a largely academic discipline. This time, responders used models throughout the outbreak to predict staffing and equipment needs of the treatment units in the field, to forecast the potential number of sexually-transmitted cases and to anticipate the effects of different measures, including safe burial practices.
“Models provided critical decision-making tools in real time, and helped demonstrate to public health authorities that the epidemic could be stopped by using existing tools and strategies,” wrote the authors of a 2016 paper in a CDC journal.
A mathematical model that makes us smarter about disease begins with the data field researchers such as Amy Gilbert and Alison Peel chase so hard. The computer models, experts stress, are only as good as the data that goes into them.
Using a host of information about rodents — from age of sexual maturity to resting metabolic rate and adult body size — Han constructed an algorithm to answer a crucial question: How many rodent species can harbor diseases capable of being passed to humans, the sort scientists call “zoonotic?"
Rodents are the most diverse order of mammals, with some 2,277 species. They are known to be reservoirs for such threats to humans as hantavirus, typhus and the bacterium responsible for bubonic plague. But only about 220 species — less than 10% —are known to transmit at least one disease also found in humans.
Han suspected there might be more.
She began her study by feeding field data on 80% of the rodent species into her algorithm. The remaining 20% of species she saved to test later.
Han’s algorithm took all of this information and created what’s called a classification tree — essentially a series of branches that split all of the species into two groups. For example, into species that have one litter of young per year, and those that have more than one.
Other variations included rodents that are active during the night, and those that are active during the day; those that have a lifespan of under a year, and those that have a lifespan of over a year. In all, the formula evaluated more than 50 distinguishing characteristics.
Inevitably there were classification errors in which the two branches failed to provide a 100% separation between the rodents that harbor disease and those that do not. Maybe some, but not all, rodents that harbor disease are active at night.
Using a process called “boosting,” the algorithm learns from how far short it has fallen from 100%, and creates a second tree. That tree will inevitably contain its own classification error, meaning that the algorithm must learn from that and create a third tree, and so on.
The technique can generate hundreds or thousands of trees; the cumulative knowledge from them can, in Han’s words, “produce a powerfully accurate predictive model.”
Han went back and used that model to evaluate the 20% of rodents she had set aside in the beginning. She found that the model could predict with almost 90% accuracy whether or not a species could be the reservoir for a zoonotic disease.
All told, her model revealed that 16 rodent species never previously linked to zoonotic diseases should now be considered likely reservoirs. Moreover, the algorithm uncovered fascinating, sometimes counter-intuitive qualities shared by rodents that harbor diseases capable of spreading to people.
The species highlighted by the algorithm as potential carriers were not biologically close (such as red squirrels and grey squirrels), Han wrote in IEEE Spectrum, a magazine on engineering and applied science.
“Instead, it found that reservoir species were distinguished by their ‘fast’ life cycles — with rapid growth rates, early sexual maturity and frequent litters … These animals may tolerate pathogens because they have a ‘live fast, die young strategy.’ Their immune systems aren’t their top priority because they need to stay healthy just long enough to reproduce.”
The algorithm’s list of suspect rodents gave biologists and disease hunters valuable guidance on where to look for future illnesses that could leap from animals to humans.
As Han’s paper was going to press, researchers confirmed that two of the newly suspect rodents were reservoirs for zoonotic diseases — just as the algorithm had predicted.
One, a red-backed vole found in Canada and the northern U.S., harbors the parasite that causes echinococcosis, a disease that afflicts more than 1 million people at any given time with symptoms that include vomiting and cysts on the liver and lungs.
The list of new rodents likely to be reservoirs of zoonotic diseases allowed Han and her colleagues to go a step further. They went on to list geographical hot spots that could contain multiple rodent species newly identified as likely carriers of disease.
The hot spots appeared in the Middle East, Central Asia and the American Midwest.
A similar study published in 2016 used modeling to predict that 35 species of mosquitoes, including 26 not previously linked to Zika, are nonetheless capable of transmitting the virus.
Some of those mosquitoes are found throughout the continental U.S., leading the authors to conclude: “If control efforts are to include all areas at potential risk of disease transmission, public health efforts would need to expand to address regions such as the northern Midwest.”
That would include Wisconsin.
Developing mathematical tools that actually predict an outbreak before it happens is no easy task, and there are experts who doubt it can be done.
“The challenging part of modeling is to take into account the incubation period, which is pathogen-specific,” said Elena N. Naumova, academic dean for faculty in the Friedman School of Nutrition Science & Policy at Tufts University. “When the pathogen is unknown, the true incubation period is also unknown.”
Incubation periods, the time between exposure to an infection and the appearance of symptoms, vary greatly. For Ebola, incubation ranges from 2 to 21 days; for rabies, 21 to 56 days; and for influenza, just one to three days. This means that a model attempting to predict an outbreak is flying blind when it comes to offering reliable guidance to health leaders seeking to prepare.
Even confirming that an epidemic is underway is challenging in the early stages. Scientists track hundreds of real-time disease reports and emerging health threats on the webpage HealthMap.
“But which is the one of the many thousands that we have to pay attention to and try to mitigate?” asked Marc Lipsitch, a professor of epidemiology at Harvard University. “There’s just a lot of chance and a lot of factors we don’t understand.”
When modelers do know the disease, they still face significant challenges trying to predict where and how widely it will spread.
For example, an English study in The Lancet found 75% of adults infected with influenza showed no symptoms — though they could still spread the virus to others. And many who do show symptoms never go to the doctor. Researchers have tried to compensate by using both patient numbers and spikes in Google inquiries about flu to monitor outbreaks.
However, Lipsitch said, these methods are least accurate when they are most needed — during a pandemic.
Other diseases present a similar problem.
“With Zika, 80% of people are completely asymptomatic … We only see the tip of the iceberg with the people who have symptoms and go to the doctor,” said Alessandro Vespignani, director of the Network Science Institute at Northeastern University in Boston.
However, Vespignani believes this is precisely why models are needed.
Those tip-of-the-iceberg observations begin to make more sense when mathematical tools include information drawn from past outbreaks of the same disease.
In such cases, a good model provides scientists with a picture of how the disease is likely to play out, where health care agencies should put their money and staff, and even whether it makes sense to close a country’s borders.
“We are at war against diseases,” explained Vespignani. “The soldiers on the field are the doctors, nurses, health care workers. What we do is provide intelligence so they can anticipate the movements of the enemy and make the best use of resources.”
At least three, multi-million dollar federal programs are fueling the development of predictive tools for diseases.
PREDICT includes teams of scientists in 29 countries using gene-sequencing to search the globe for viruses with pandemic potential. Launched in 2009, the project received $100 million in funding over its first five years.
The work includes looking for new diseases in those 29 countries, places where high risk behaviors, such as bushmeat hunting and bat guano farming, bring people and animals into close contact. Such activities raise the possibility of animal viruses spilling over into people.
“They are looking for new diseases in animals before they spill over into humans,” said Tracey Goldstein, associate director of One Health Institute at University of California, Davis.
The RAPIDD project, focusing on infectious disease modeling, has received close to $18 million since 2008, mostly from the Department of Homeland Security.
And then there is the MIDAS program, which began in 2004 and gave out almost $14 million in the most recent year. The money is funding studies examining the spread of disease, the interplay between infectious agents and their hosts, and methods of prediction.
A fourth program, the Global Virome Project, seeks to prepare the world for the next pandemic by finding and identifying 99% of the animal viruses that have zoonotic potential.
The task is massive. Organizers predict it will take about $3.4 billion over 10 years.
To date, scientists have discovered about 4,400 viruses, but the actual figure is believed to be much larger; by one estimate the number of mammalian viruses alone is around 320,000. Researchers hope to determine the geographic ranges of viruses, rate their risks of spilling over into humans and find ways to fight them.
While all of the research spending may sound impressive, it reflects a less impressive truth: We spend far less money trying to predict outbreaks than we do responding to them.
In September 2016, Congress put $1.1 billion into fighting the spread of Zika virus. In 2015, Congress approved $5.4 billion in emergency spending for Ebola. In 2009, Congress appropriated more than $7.6 billion to fight the H1N1 pandemic.
“We’ve been plagued by diseases for as long as humans have been around,” Han said, “but we’ve never been able to truly forecast disease. We react to diseases when they crop up … We spend a lot of money fighting fires.”
At Harvard, Lipsitch receives about $1.4 million a year in MIDAS money, which helps him run the Center for Communicable Disease Dynamics. More than 20 researchers at the center have worked on a dozen infectious diseases, including influenza, malaria and Ebola.
One of their goals is to understand how antibiotics and vaccines influence disease-causing pathogens in complex ways.
For example, vaccines target 13 types of streptococcus pneumoniae, but there are actually about 95 types. So, while the vaccines lead to large reductions in the numbers of people infected, they may be helping some strains to outcompete others.
Lipsitch and his colleagues have also been examining ways to improve the system for running clinical trials, an effort that could help medicine respond more quickly to outbreaks. Computer simulations allow them to learn how a vaccine trial might proceed during an epidemic, and to learn which of the possible designs for the trial is most likely to be successful.
In addition, two researchers at the center, Amy Wesolowski and Caroline Buckee, have done groundbreaking work with cellphone data. The two have used the data from millions of subscribers in Kenya and Pakistan to track how movements of people have affected outbreaks of rubella and dengue.
Nowhere is the threat of animal diseases jumping to humans more acute — and the promise of modeling to help us prepare, more crucial — than with influenza. The 1918 Spanish Flu killed an estimated 20 million to 50 million people, many of them otherwise healthy young adults.
Both pig and bird flu can infect humans. Swine flu caused the most recent pandemic in 2009, killing up to 203,000 people worldwide. Bird flu could be the next pandemic. So far human cases of avian influenza have proven relatively rare and difficult to transmit; but the mortality rate is high, 60%.
Jeffrey L. Shaman, director of the climate and health program at Columbia University, has developed methods of predicting seasonal influenza that can be applied to a pandemic.
“Predicting future cases of seasonal flu is actually harder than for a full-blown pandemic,” Shaman said.
During the 2012-’13 season, he generated weekly flu predictions for 108 cities in the U.S.
Each week, for each city, his team ran an ensemble of 300 linked simulations depicting different ways the flu season might progress. First, the simulations were from the start of the season up until the current week. Then they were compared with field estimates of the actual number of people with flu in each city.
Based on this comparison an algorithm adjusted or optimized all 300 simulations. The optimized versions were then used to generate forecasts for the rest of the flu season. As the season progressed, the simulations got closer and closer to agreement with one another, and produced more accurate forecasts.
Shaman compared the process to aiming a cannon and using the results of each shot to draw closer and closer to the target.
The next year, 2013-’14, the CDC launched its first “Predict the Influenza Season Challenge.” Matthew Biggerstaff, an epidemiologist in CDC’s influenza division, said the contest was intended to gauge the state of modeling work and to encourage more groups to take it on.
Eleven groups entered; Shaman’s group won.
His team continues to refine and improve the model, now used to make flu projections on 90 cities covering all 50 states.
CDC has continued its contest.
“We’ve incorporated forecasts into the weekly summary of what’s going on in the flu each week,” Biggerstaff said, adding that the forecasts now contribute to decision-making at the agency.
Last year, researchers Han and Drake proposed an early warning system for diseases similar to the one scientists developed after the catastrophic Indian Ocean tsunami in December 2004.
Writing in the journal EMBO reports, the two borrowed from the language of meteorology, suggesting a system in which a “Watch” would be issued when the potential exists that an animal disease could spill over into humans.
A “Warning” would be issued if such a disease has already progressed to human populations and is threatening to spread to new geographical areas.
“Emergency” status would be reserved for cases in which an outbreak “threatens to overwhelm existing efforts at controlling the disease,” with the potential for a high number of infections and deaths.
Han said such a system would work best as an inter-agency effort, including the CDC and housed under the Department of Defense or a federal research agency.
The paper was well received, she said. But she knows of no plans to adopt the disease warning system.