Well my goodness, it has been a while since I posted to this blog. October 2019 seems like a lifetime away, a different world of normal classes, normal activities, happy hours, eating out, all the rest. Now it is social distancing, PPE, and other words/acronyms/terms we knew nothing about.
Just a short note to say me and mine are fine, but I understand there are many out there suffering dearly for themselves, loved ones, their society, or the world at large.
To make sense of this covid-19 (c19) dystopian apocalypse I turn to data analysis (it is what I do). I tried machine learning but got bogged down and was concerned about an arbitrary model when physical models may apply. Next came an email from John Stockwell who used SeismicUnix to model US active cases by manual parameter estimation of the Verhulst equation, also called the logistic equation. I also found Wolfram Research had posted a pretty good blog and Mathematica notebook that got me a little further, but still not what I needed. Then I found a TowardsDataScience blog showing how to use python to pull on the John's Hopkins data repository, do a nonlinear model fit, and plot some results. With that as starting point it was possible for me to get where I wanted to go.
The model is a logistic curve with three free parameters that has many applications, including population growth and decay. Is it a good model for this crisis? Time will tell. I will be posting my results each day. It changes each day because a new data point is added, the optimization estimates the parameters on all the data, and the plots are updated. I keep the old ones for reference.
For the big picture I use the c19 deaths summed for each day across all countries in the world. Death data is used because I think it is much harder data than confirmed cases. Lots of folks are posting why confirmed cases are fuzzy numbers. One can argue the virus deaths are also not perfect, it assumes each person who dies with the known symptoms and progression will be tested and confirmed to have c19. Might be a good assumption in some countries, not others. It seems unlikely however that anyone would claim a c19 death if they are not sure. There are clear economic and societal downsides to reporting more deaths than you actually have. So we can take the Johns Hopkins death numbers as a lower limit of the actual number of c19 deaths. Once the observed data is fit, the parameters are set, and we can run the model forward. I choose 200 days from 22 Jan 2020, the first date in the Johns Hopkins data series.
Are you ready? Here is your first plot, fresh out of the oven this morning. Stay safe and I'll see you tomorrow.