Choose Language Hide Translation Bar
Byron_JMP
Staff
Visualizing Coronavirus COVID-19 Global Cases Using a Time Series Animated Bubble Plot

Screen Shot 2020-02-19 at 9.40.39 AM.png

 

Visualizing Coronavirus COVID-19 Global Cases

Using a Time Series Animated Bubble Plot

 

Preface: 

  1. The coronavirus and all the issues caused by its spread are serious and affecting a lot of people, some in horrible ways. It is not my intent to trivialize the seriousness of this by creating this figure. Neither is it my goal to promote hysteria or xenophobia. 
  2. In my script I recode location names that the original authors use. I don't have a political agenda. When I use the google map API to geocode I need it to give me country locations. Taiwan, Taiwan results in a restaurant location, while Taiwan, China result in a geocode near the center of Taiwan. Again, this isn't political it's functional.
  3. All the data surrounding the reporting and detection of Coronavirus is massively contentious. I'm not vouching for the accuracy of this data; it is however, the best aggregation of publicly available data (in English) that I know of. If you have something better, please let me know.
  4. Its not my data. The folks at Johns Hopkins University, Whiting School of Engineering, in the Center for Systems Science and Engineering published a sweet looking dashboard along with the data. Huge thanks and kudos to this team for making their data available. Read their Blog and definitely check out their visualization.

They also set up a GitHub page with the raw data in an easily accessible format. (Kudos x 10^6).

 

Why Use a Bubble Plot with a Time Series Animation?

Bubble plots have a high degree of utility in that a large number of dimensions can be represented in a single plot. With the addition of a time series animation, the changes in relationships between these dimensions can be visualized and communicated without a whole lot of complex pre-attentive processing by the viewer. (aka: They look cool and are easy to understand) 

Note that I said a bubble plot is useful for looking at the relationships of multiple dimensions animated over time, not just one dimension (variable) animated over time. I've seen people make a run/control chart-like plot with a single dimension on the X-axis and time on the Y-axis, and then animate by time.  This is called a tedious run/control chart, please don't do this.

 

For the figure I'm going to construct here. The X and Y axis are on a geodesic scale, so that the points will line up with a map. We can see the spatial relationships of points on the graph, which are given context by the background map shape. Next I add a size dimension for the markers, which is the cumulative count of cases by time period for each location. Last, I use the time period sequence to animate the plot which results in bubbles that grow or appear over time.   Each of the points are labeled by Country and Region. In this figure, it is possible to collapse all the individual points into their Country ID or split them out to their region ID within the Country ID.  

 

The Script to generate the graph is chasing a moving target. The authors of the data keep making changes, so this might only work for a couple of days if there are changes to the source data. As of the morning of 2/12/2020, it works.  At the end of the script is an option to publish or refresh the figure in JMP Live. This won't work for you because you don't have my top secret API Key for accessing the package in JMP Live. It might be useful as an example of how make publish and refresh work if you have JMP Live. 

 

How the script works.

  1. Sets up all the variables I need later in the script
  2. Downloads the data and formats it
    1. there are some weird things in this part. At several points I save the data locally, close the table and then re-open it. I was having trouble forcing the script to run sequentially and this is a way to make it happen. Sometimes a table wasn’t finished updating before the next step started and I got wonky tables.  This brute force method matches my character and style.
  1. Generate the bubble plot
  2. Figure out if need to publish or refresh, and then publish or refresh the figure

In looking at the script, you may note that I wrapped sections in expressions.  This is a handy way of de-cluttering a long script when its combined with the code folding option (check preferences for the script editor). 

 

////////////////////////////////////////////////////////////
////   Byron Wingerd, byron.wingerd@jmp.com, 2/19/2020   ///
////////////////////////////////////////////////////////////
//
//
//Open("https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv",HTML Table( 1, Column Names( 1 ), Data Starts( 2 ) ));//	invisible);
gitpath = "https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/";
//gitpath = "https://github.com/CSSEGISandData/COVID-19/blob/master/archived_data/time_series/";
githubdata = Expr(
	dtcon = Open(
		gitpath || "time_series_19-covid-Confirmed.csv",
		HTML Table( 1, Column Names( 1 ), Data Starts( 2 ) ),

	);//	invisible);
	dtrec = Open(
		gitpath || "time_series_19-covid-Recovered.csv",
		HTML Table( 1, Column Names( 1 ), Data Starts( 2 ) ),

	);//	invisible);	
	dtdea = Open(
		gitpath || "time_series_19-covid-Deaths.csv",
		HTML Table( 1, Column Names( 1 ), Data Starts( 2 ) ),

	);//	invisible);	
	Wait( .01 );															////just alittle time to down load the files
	dtcon << set name( "Confirmed" );
	dtrec << set name( "Recovered" );
	dtdea << set name( "Death" );
);
githubdata;

dtcon << saveas( "Confirmed.jmp" );  								////Save local copy
dtrec << saveas( "Recovered.jmp" );									////This slows down the script a bit too
dtdea << saveas( "Death.jmp" );	

////////////////////////////////////////////////////////////
openandform = Expr(

	dtcon = Open( "Confirmed.jmp" );								////opened invisibly, faster, and less memory
	dtrec = Open( "Recovered.jmp" );
	dtdea = Open( "Death.jmp" );
	//dtcon = Open( "Confirmed.jmp"	, invisible );					////opened invisibly, faster, and less memory
	//dtrec = Open( "Recovered.jmp"	, invisible );
	//dtdea = Open( "Death.jmp"		, invisible );
	tables = {dtcon, dtrec, dtdea};									////List of files for fixing things
	//wait();														////Just alittle time to down load the files

	For( ii = 1, ii <= N Items( tables ), ii++, 					////Fixing all the Date columns in all the tables
		For( i = 4, i <= N Cols( tables[ii] ), i++,
			Column( tables[ii], i ) << Set Data Type( "Numeric" );
			Column( tables[ii], i ) << Set Modeling Type( "Continuous" );
		)
	);

	dt3 = dtcon << Concatenate(
		//invisible, 							////Assemble all the tabels
		dtrec,
		dtdea,
		Output Table( "All three" ),
		Create source column
	);
//this might only work in v15//
	dt3 << Begin Data Update;
	dt3 << Recode Column(
		dt3:Source Table,
		{Regex( _rcNow, ".jmp", "", GLOBALREPLACE )},
		Target Column( :Source Table )
	);  //just showing off
	dt3 << End Data Update;
	
	Close( dtcon, nosave );											////Don't need the parts, so closing
	Close( dtrec, nosave );
	Close( dtdea, nosave );

	clist = dt3 << Get Column Names( Continuous );
	cclist = {};
	For( i = 3, i <= N Items( clist ), i++, 							////Don't kow the date column names, so this gets them
		Insert Into( cclist, clist[i] )								////note: i=3, because the first cols aren't the dates
	);

	dt3s = dt3 << Stack(
		invisible,
		columns( cclist ),
		Source Label Column( "Date" ),
		Stacked Data Column( "Count" ),
		Output Table( "All three stacked" )
	);
	Close( dt3, save( "All three.jmp" ) );							////Done with this table and don't need it later
	////bunch of clean up steps
	//make a good date column//
	dt3s << New Column( "temp", 										//get the dates right
		Numeric,
		"Continuous",
		Format( "m/d/y h:m", 19 ),
		Input Format( "m/d/y h:m" ),
		Formula( Informat( :Date ) ),
		Set Selected,
		Set Display Width( 177 )
	);
	dt3s << run formulas();
	dt3s:temp << Delete Formula;
	dt3s << select columns( :Date );
	dt3s << delete columns;
	dt3s:temp << set name( "Date" );


	Try( dt3s << clear select );
	//dt3s << Select Where( Is Missing( :Name( "Count" ) ) );	
	//dt3s << Select Where( :Name( "Count" )==0 );					////delete rows with missing data, makes bubble plot work better
	//dt3s << deleterows;
	//dt3s << clear select;
	
//////////////////
	dt3s << Begin Data Update;
	col1 = dt3s << New Column( dt3s:Column 1 );
	col1 << Set Name( "Counts" );
	dt3s << Move Selected Columns( {col1}, after( dt3s:Count ) );
	dt3s << Recode Column(
		dt3s:Count,
		{Map Value( _rcOrig, {., 0}, Unmatched( _rcNow ) )},
		Target Column( col1 )
	);
	dt3s << End Data Update;
	dt3s:Counts << Set Data Type( "Numeric" );
	dt3s:Counts << Set Modeling Type( "Continuous" );
/////////////////
	
	
	dt3s << Select Where( Is Missing( :Name( "Date" ) ) );
	dt3s << deleterows;
	dt3s << clear select;



	dt3s << Begin Data Update;										////Recode script from version 15						
	For Each Row(
		:Name( "Country/Region" ) = Match( :Name( "Country/Region" ),
			"Hong Kong", "China", 									////This is super political, sorry
			"Mainland China", "China", 								////
			"Taiwan", "China", 										//// In an earlier version, I was using a Google API to geocode
			:Name( "Country/Region" )								//// Tiwan, Tiwan and Hong Kong, Hong Cong returned restruants rather than shape centrioids
		)
	);																//// So I added this format.
	dt3s << end data update;

	dt3s:Column 1 << set name( "temp" );
	:temp << formula(
		If( Is Missing( :name( "Province/State" ) ) == 1,
			:name( "Country/Region" ),
			:name( "Province/State" )
		)
	);
	dt3s << run formulas();
	dt3s:temp << Delete Formula;
	dt3s << select columns( :name( "Province/State" ) );
	dt3s << delete columns;
	dt3s:temp << set name( "Province/State" );
	dt3s << New Column( "Location" );
	dt3s:Location << formula( :Name( "Province/State" ) || ", " || :Name( "Country/Region" ) );
	dt3s << run formulas();
	dt3s:Location << Delete Formula;
	//Wait();


	dt3sp = dt3s << Split(
		invisible,
		Split By( :Source Table ),
		Split( :Count, :name( "Counts" ) ),
		Group( :Name( "Province/State" ), :Name( "Country/Region" ), :Date ),
		Output Table( "All three split" ),
		Remaining Columns( Keep All ),
		Sort by Column Property
	);


dt3sp << Begin Data Update;
dt3sp << Recode Column(
	dt3sp:Count Confirmed,
	{Map Value( _rcOrig, {0, .}, Unmatched( _rcNow ) )},
	Target Column( :Count Confirmed )
);
dt3sp << End Data Update;

dt3sp << Begin Data Update;
dt3sp << Recode Column(
	dt3sp:Count Death,
	{Map Value( _rcOrig, {0, .}, Unmatched( _rcNow ) )},
	Target Column( :Count Death )
);
dt3sp << End Data Update;

dt3sp << Begin Data Update;
dt3sp << Recode Column(
	dt3sp:Count Recovered,
	{Map Value( _rcOrig, {0, .}, Unmatched( _rcNow ) )},
	Target Column( :Count Recovered )
);
dt3sp << End Data Update;

dt3sp << clear select;
	
/*	//////

dtcollist = dt3sp << get column names;
selection={Confirmed, Death, Recovered};
dtMat = (dt3sp << Get All Columns As Matrix);

For( i = 1, i <= N Items( selection ), i++,
colNum = Contains( dtColList, Name Expr( selection[i] ) );
vMat = dtMat[0, colNum];
mis = Loc( Is Missing( vMat ) );
If( N Row( mis ),
col = column(selection[i]);
If( mis[1] == 1,
col[1] = col[(Loc( vMat ))[1]]
);

For( j = if(mis[1]==1, 2, 1), j <= N Row( mis ), j++,
col[mis[j]] = col[mis[j] - 1]
);
);
);

//////
*/
);
openandform;

countc = {:Count Confirmed, :Count Death, :Count Recovered};
counts = {:Counts Confirmed, :Counts Death, :Counts Recovered};
bplot1 = Expr(
	obj1 = dt3sp << Bubble Plot(
		X( :Long ),
		Y( :Lat ),
		Sizes( countc[1] ),
		Time( :Date ),
		ID( :Location ),
		Speed( 68.04 ),
		Title Position( 266.112082831182, -24.7 ),
		SendToReport(
			Dispatch(
				{},
				"Bubbles Sized by Confirmed Count for each Locations (Country/Region-Providence/State)",
				OutlineBox,
				{Set Title( "Coronavirus 2019-nCoV Global Cases" )}
			),
			Dispatch(
				{},
				"1",
				ScaleBox,
				{Scale( "Geodesic" ), Format( "Best", 12 ), Min( 6.7 ), Max( 326.04257485404 ), Inc( 100 ),
				Minor Ticks( 1 ), Label Row(
					{Show Major Labels( 0 ), Show Major Ticks( 0 ), Show Minor Ticks( 0 )}
				)}
			),
			Dispatch(
				{},
				"2",
				ScaleBox,
				{Scale( "Geodesic" ), Format( "Best", 12 ), Min( -93.56 ), Max( 89.3 ), Inc( 20 ),
				Minor Ticks( 1 ), Label Row(
					{Show Major Labels( 0 ), Show Major Ticks( 0 ), Show Minor Ticks( 0 )}
				)}
			),
			Dispatch(
				{},
				"Bubble Plot",
				FrameBox,
				{Frame Size( 730, 390 ), Background Map( Boundaries( "World" ) ), Grid Line Order( 2 ),
				Reference Line Order( 3 )}
			)
		)
	)
);
bplot2 = Expr(
	obj2 = dt3sp << Bubble Plot(
		X( :Long ),
		Y( :Lat ),
		Sizes( countc[2] ),
		Time( :Date ),
		ID( :Location ),
		Speed( 68.04 ),
		Title Position( 266.112082831182, -24.7 ),
		SendToReport(
			Dispatch(
				{},
				"Bubbles Sized by Death Count for each Locations (Country/Region-Providence/State)",
				OutlineBox,
				{Set Title( "Coronavirus 2019-nCoV Global Cases" )}
			),
			Dispatch(
				{},
				"1",
				ScaleBox,
				{Scale( "Geodesic" ), Format( "Best", 12 ), Min( 6.7 ), Max( 326.04257485404 ), Inc( 100 ),
				Minor Ticks( 1 ), Label Row(
					{Show Major Labels( 0 ), Show Major Ticks( 0 ), Show Minor Ticks( 0 )}
				)}
			),
			Dispatch(
				{},
				"2",
				ScaleBox,
				{Scale( "Geodesic" ), Format( "Best", 12 ), Min( -93.56 ), Max( 89.3 ), Inc( 20 ),
				Minor Ticks( 1 ), Label Row(
					{Show Major Labels( 0 ), Show Major Ticks( 0 ), Show Minor Ticks( 0 )}
				)}
			),
			Dispatch(
				{},
				"Bubble Plot",
				FrameBox,
				{Frame Size( 730, 390 ), Background Map( Boundaries( "World" ) ), Grid Line Order( 2 ),
				Reference Line Order( 3 )}
			)
		)
	)
);
bplot3 = Expr(
	obj3 = dt3sp << Bubble Plot(
		X( :Long ),
		Y( :Lat ),
		Sizes( countc[3] ),
		Time( :Date ),
		ID( :Location ),
		Speed( 68.04 ),
		Title Position( 266.112082831182, -24.7 ),
		SendToReport(
			Dispatch(
				{},
				"Bubbles Sized by Recovered Count for each Locations (Country/Region-Providence/State)",
				OutlineBox,
				{Set Title( "Coronavirus 2019-nCoV Global Cases" )}
			),
			Dispatch(
				{},
				"1",
				ScaleBox,
				{Scale( "Geodesic" ), Format( "Best", 12 ), Min( 6.7 ), Max( 326.04257485404 ), Inc( 100 ),
				Minor Ticks( 1 ), Label Row(
					{Show Major Labels( 0 ), Show Major Ticks( 0 ), Show Minor Ticks( 0 )}
				)}
			),
			Dispatch(
				{},
				"2",
				ScaleBox,
				{Scale( "Geodesic" ), Format( "Best", 12 ), Min( -93.56 ), Max( 89.3 ), Inc( 20 ),
				Minor Ticks( 1 ), Label Row(
					{Show Major Labels( 0 ), Show Major Ticks( 0 ), Show Minor Ticks( 0 )}
				)}
			),
			Dispatch(
				{},
				"Bubble Plot",
				FrameBox,
				{Frame Size( 730, 390 ), Background Map( Boundaries( "World" ) ), Grid Line Order( 2 ),
				Reference Line Order( 3 )}
			)
		)
	)
);
GBfig4 = Expr(
	obj4 = dt3sp << Graph Builder(
		Size( 460, 875),
		Show Control Panel( 0 ),
		Legend Position( "Bottom" ),
		Variables(
			X( counts[1] ),
			X( counts[2], Position( 1 ) ),
			X( counts[3], Position( 1 ) ),
			Y( :Name( "Country/Region" ) ),
			Y( :Name( "Province/State" ), Position( 1 ) )
		),
		Elements(
			Bar(
				X( 1 ),
				X( 2 ),
				X( 3 ),
				Y( 1 ),
				Y( 2 ),
				Legend( 13 ),
				Bar Style( "Bullet" ),
				Summary Statistic( "Max" )
			)
		),
		SendToReport(
			Dispatch(
				{},
				"Graph Builder",
				OutlineBox,
				{Set Title( "Category Counts by Location" ), Image Export Display( Normal )}
			),
			Dispatch(
				{},
				"Counts Confirmed",
				ScaleBox,
				{Scale( "Log" ), Format( "Best", 6 ), Min( 0.678809381075438 ), Max( 358397.627745527 ),
				Inc( 1 ), Minor Ticks( 1 )}
			),
			Dispatch(
				{},
				"Country/Region",
				ScaleBox,
				{Min( 74.5 ), Max( -1 ), Inc( 1 ), Minor Ticks( 0 ), Label Row(
					1,
					{Inside Ticks( 1 ), Lower Frame( 1 ), Show Major Grid( 1 ), Wrap Lines( 4 ),
					Set Font Size( 10 )}
				), Label Row( 2, Set Font Size( 10 ) )}
			),
			Dispatch(
				{},
				"400",
				ScaleBox,
				{Legend Model(
					13,
					Level Name( 0, "Count Confirmed", Item ID( "Max(Counts Confirmed)", 1 ) ),
					Level Name( 1, "Count Death", Item ID( "Max(Counts Death)", 1 ) ),
					Level Name( 2, "Count Recovered", Item ID( "Max(Counts Recovered)", 1 ) ),
					Properties( 0, {Fill Color( 16 )}, Item ID( "Max(Counts Confirmed)", 1 ) ),
					Properties( 1, {Fill Color( 0 )}, Item ID( "Max(Counts Death)", 1 ) ),
					Properties( 2, {Fill Color( 4 )}, Item ID( "Max(Counts Recovered)", 1 ) )
				)}
			),
			Dispatch( {}, "graph title", TextEditBox, {Set Text( "Count of Response Category" )} ),
			Dispatch( {}, "X title", TextEditBox, {Set Text( "Counts (Log Scale)" )} )
		)
	)
);
GBfig5 = Expr(
	obj5 = dt3sp<<Graph Builder(
	Size( 850, 300 ),
	Show Control Panel( 0 ),
	Show Legend( 0 ),
	Variables(
		X( :Date ),
		Y(
			Transform Column( "-Counts Confirmed", Formula( -:Counts Confirmed ) ),
			Side( "Right" )
		),
		Y( :Counts Death, Position( 1 ) ),
		Y( :Counts Recovered, Position( 1 ) )
	),
	Elements(
		Line( X, Y( 2 ), Y( 3 ), Legend( 8 ), Summary Statistic( "Sum" ) ),
		Line( X, Y( 1 ), Legend( 9 ), Summary Statistic( "Sum" ) )
	),
	SendToReport(
		Dispatch(
			{},
			"Graph Builder",
			OutlineBox,
			{Set Title( "Cumulative Counts by Day" ), Image Export Display( Normal )
			}
		),
		Dispatch(
			{},
			"Date",
			ScaleBox,
			{Format( "m/d/y", 10 ), Min( 3662439840 ), Max( colmaximum(:Date)+36000 ),
			Interval( "Day" ), Inc( 3 ), Minor Ticks( 2 ),
			Label Row( Label Orientation( "Angled" ) )}
		),
		Dispatch(
			{},
			"Counts Death",
			ScaleBox,
			{Min( -15000 ), Max( 15000 ), Inc( 5000 ), Minor Ticks( 4 ),
			Add Ref Line( 0, "Solid", "Black", "", 1 )}
		),
		Dispatch(
			{},
			"-Counts Confirmed",
			ScaleBox,
			{Format( "Best", 9 ), Min( -110000 ), Max( 110000 ), Inc( 25000 ),
			Minor Ticks( 4 ), Add Ref Line( 0, "Solid", "Black", "", 1 )}
		),
		Dispatch(
			{},
			"400",
			ScaleBox,
			{Legend Model(
				8,
				Properties(
					0,
					{Line Color( 0 )},
					Item ID( "Sum(Counts Death)", 1 )
				),
				Properties(
					1,
					{Line Color( 20 )},
					Item ID( "Sum(Counts Recovered)", 1 )
				)
			), Legend Model(
				9,
				Properties(
					0,
					{Line Color( 1 )},
					Item ID( "Sum(-Counts Confirmed)", 1 )
				)
			)}
		),
		Dispatch( {}, "graph title", TextEditBox, {Set Text( "" )} )
	)
)
);
figure = Expr(
	win = New Window( "Coronavirus COVID-19 Global Cases",
		outline box("Coronavirus COVID-19 Global Cases",
		H List Box("top",
			V List Box("col1",
			tb = Tab Box(
				Tab Page Box( "Confirmed", bplot1 ),
				Tab Page Box( "Death", bplot2 ),
				Tab Page Box( "Recovered", bplot3 ),
				Tab Page Box(
					"Data Source",
					V List Box(
						"vlb1",
						V List Box(
							"hlb1",
							tb1 = Text Box( "@misc=      kraemer2020epidemiological", <<Set Wrap( 1000 ) ),
							tb2 = Text Box(
								"author=      nCoV-2019 Data Working Group",
								<<Justify Text( "left" ),
								<<Set Wrap( 1000 )
							),
							tb5 = Text Box(
								"Accessed= " || Long Date( Today() ),
								<<Justify Text( "left" ),
								<<Set Wrap( 1000 )
							),
							tb6 = Text Box(
								"From=        \url{http://virological.org/t/epidemiological-data-from-the-ncov-2019-outbreak-early-descriptions-from-publicly-available-data/337",
								<<Set Wrap( 1700 )
							),
							tb7 = Text Box( "Year=         " || Char( Year( Today() ) ), <<Set Wrap( 1000 ) ),
							H List Box(
								"hlb2",
								Button Box( "Source",
									Web(
										"https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6"
									)
								),
								Button Box( "Raw Data",
									Web(
										"https://github.com/CSSEGISandData/COVID-19"
									)
								),
								Button Box( "About this Plot",
									Web(
										"https://community.jmp.com/t5/Byron-Wingerd-s-Blog/Visualizing-Coronavirus-2019-nCoV-Global-Cases-Using-a-Time/ba-p/247257"
									)
								),
								Button Box( "Updated on " || Long Date( Today() ),
									Speak( "Updated on " || Short Date( Today() ) )
								)
							)
						)
					)
				),
			),GBfig5),
			GBfig4
		)
	)
));
figure;
Article Labels
Article Tags
2 Comments
Community Manager

Really interesting and nice way to give a useful (and way cool) script to keep up with this global public health tragedy, Byron @Byron_JMP .  It's so useful that JMP handles time series data.

 

I was going to copy the code into a .jsl file, but noticed that some customization is involved.  For example, you point to your JMP Live account.  Can you update the script so it is a bit more generic?  And, thanks for linking to the GitHub page with the raw data so I can try to look at new data in JMP on my own.

 

 

Staff

This project is kind of chasing a moving target. The authors of the data keep moving it around. 

I updated the script (yes, on valentines day, but while I was waiting for a monster lasagna to finish baking.)

Now the data pull is a little more simple, and instead of having a data filter box, each category is in its own tab box.

Also there was an issue of points not showing up on the bubble plot, and thats fixed too.