COVID-19 Models and Questions

The outbreak of the coronavirus has been accompanied by a dizzying volume of scientific research and a concomitant volume of explanatory articles to communicate, either to a lay audience or to an audience of scientists whose specialties are not in those research areas, the assumptions, limitations and conclusions of that research. The research – or a big portion of it at least – can be divided into two main categories: biomedical work aimed at producing vaccines, cures and therapies, on the one hand, and statistical, epidemiological research aimed at understanding and predicting the progress of the disease throughout society on the other. The repository for many of the research papers has to a large degree defaulted to the Medical Preprint archive medRxiv.org.

In this post I wish to begin examining the epidemiological models with the hope of clarifying the predictions of the most noteworthy (in particular the IHME model from Christopher Murray at the University of Washington). I will attempt to run a middle course between highly technical and layman’s descriptions with the hopes of not hitting that sweetspot which the Japanese refer to as “chuu to hanpa” (approximately, the worst of both worlds). Therefore, while I will include equations I will try to make the discussion sufficiently lucid that even if you can’t parse differential calculus formulas you will nevertheless get the relevant ideas.

One of the most prominent COVID-19 models (discussed in this Nature News Feature) was developed at Imperial College in the group of Dr. Neil Ferguson. It famously predicted in mid March an ultimate potential death count in Britain of 500,000 as well as 2.2 million fatalities in the United States. This one work arguably led Prime Minister Boris Johnson to opt for a lockdown strategy as opposed to letting the virus infect until it died out through so-called “herd-immunity.” The model was further shared with the White House and led promptly to new guidance on social distancing. In other words, it was quite influential.

The Imperial College model begins from the most traditional of epidemic models known as the SIR (Susceptible-Infected-Recovered) model, which I describe below. The Imperial College researchers substantially elaborated on that model using a type of simulation of “agent” motion rather than trying to write and solve all of the pertinent differential equations (which is only feasible in the simple form of the SIR model).

A second highly cited model – frequently referred to by White House corona virus advisors Deborah Birx and Anthony Fauci – was developed by Dr. Christopher Johnson and co-workers at the University of Washington’s Institute for Health Metrics and Evaluation (IHME). As we will show below the premises and theory of the IHME model differ significantly from the more standard approach of the Imperial college research. this second approach is based entirely on the statistics of deaths and attempts to fit the data to specific functional forms. This implies that there is some relevant functional form whose features, up to this time, are relevant for determining its features in the future.

I am particularly dubious of the IHME model simply because one can at least imagine situations where it would fail spectacularly. For example, if a cure is found the death rate would obviously collapse. But no parameter in the original functional estimate will have any way of predicting that. Of course one could argue that a cure would thwart any predictive model, so IHME is just as good as any. But there could potentially be existing features in the data that could indicate some non-linear change in the future (other than the discovery of a cure) which the parameter-based IHME model would be unable to capture. More on that below.

The Basic SIR model

The SIR or Susceptible-Infected-Recovered model is a simple set of rate equations with parameters describing the probability of transmission and the rate at which infected people either die or recover. Slightly more sophisticated models can incorporate the latency period from infection to transmisibility.

The model places all the members of the group (city, country, globe) into one of the three categories: S, I or R. The number of those who are infected changes in time as either susceptible members become infected, or as infected members either die or recover – in either case leaving the infected group. The simplest assumption then is that those recovered can never be infected again (they are not susceptible).

The parameter beta controls the fraction of times that – upon interacting, hence the product of I and S – a susceptible becomes infected. The parameter gamma controls that recovery plus death rate.

Typical numerical solution of SIR equations showing S (yellow), I (red) and R (blue) as a function of time.

The Imperial College model – which based their initial analysis on data from China – obviously went far beyond this simple model. First, the SIR model puts everyone into the same three categories. But one might think to subdivide the population into different towns, or different age groups for example. Further subdivision into social groups makes the model still more complicated. And then each parameter in the model (the transmission rate and the death or recovery rate) could then depend on those “cohorts.”

Further, the SIR model is deterministic. It could be recast as an equation for a probability distribution but the Imperial College study instead used a stochastic simulation based on agents with particular behaviors circulating through society. (They also used the complicated set of rate equations directly).

data sources: JHU, NY Times, RealClearPolitics, NextStrain, Social distancing data: Descartes Labs, SafeGraph, Google COVID-19 Community Mobility Reports, unacast.com (BM)