Biological Surveillance Techniques Developed with JMP (2020-US-30MP-577)

Americas 2020

Biological Surveillance Techniques Developed with JMP (2020-US-30MP-577)

Sam Edgemon, Analyst, SAS Institute
Tony Cooper, Principal Analytical Consultant, SAS

The Department of Homeland Security asked the question, “how can we detect acts of biological terrorism?” After discussion and consideration, our answer was “If we can effectively detect an outbreak of a naturally occurring event such as influenza, then we can find an attack in which anthrax was used because both present with similar symptoms.” The tools that were developed became much more relevant to the detection of naturally occurring outbreaks, and JMP was used as the primary communication tool for almost five years of interactions with all levels of the U.S. Government. In this presentation, we will demonstrate how those tools developed then could have been used to defer the affects of the Coronavirus COVID-19. The data that will be used for demonstration will be from Emergency Management Systems, Emergency Departments and the Poison Centers of America.

Auto-generated transcript...

Speaker	Transcript
Sam Edgemon	Hello. This is Sam Edgemon. I worked for the SAS Institute, you know, work for the SAS Institute, because I get to work on so many different projects.
	And we're going to tell you about one of those projects that we worked on today. Almost on all these projects I work on I work with Tony Cooper, who's on the screen. We've worked together really since since we met at University of Tennessee a few years ago.
	And the things we learned at the University of Tennessee we've we've applied throughout this project. Now this project was was done for the Department of Homeland Security.
	The Department of Homeland Security was very concerned about biological terrorism and they came to SAS with the question of how will we detect acts of biological terrorism.
	Well you know that's that's quite a discussion to have, you know, if you think about
	the things we might come back with. You know, one of those things was well what do you, what are you most concerned with what does, what do the things look like
	that you're concerned with? And they they talked about things like anthrax, and ricin and a number of other very dangerous elements that terrorists could use to hurt the American population.
	Well, we took the question and and their, their immediate concerns and researched as best we could concerning anthrax and ricin, in particular.
	You know, our research involved, you know, involved going to websites and studying what the CDC said were symptoms of anthrax, and the symptoms of
	ricin and and how those, those things might present in a patient that walks into the emergency room or or or or takes a ride on an ambulance or calls a poison center or something like that happens. So what we realized in going through this process was
	was that the symptoms look a lot like influenza if you've been exposed to anthrax. And if you've been exposed to ricin, that looks a lot like any type of gastrointestinal issue that you might might experience. So we concluded and what our response was to Homeland Security was that
	was that if we can detect an outbreak of influenza or an outbreak of the, let's say the norovirus or some gastrointestinal issue,
	then we think we can we can detect when when some of these these bad elements have been used out in the public. And so that's the path we took. So we we took data from EMS and and
	emergency rooms, emergency departments and poison centers and we've actually used Google search engine data as well or social media data as well
	to detect things that are you know before were thought as undetectable in a sense. But but we developed several, several tools along the way. And you can see from the slide I've got here some of the results of the questions
	that that we that we put together, you know, these different methods that we've talked about over here. I'll touch on some of those methods in the brief time we've got to talk today, but let's let's dive into it. What I want to do is just show you the types of conversations we had
	using JMP. We use JMP throughout this project to to communicate our ideas and communicate our concerns, communicate what we were seeing. An example of that communication could start just like this, we, we had taken data from from the EMS
	system, medical system primarily based in North Carolina. You know, SAS is based in North Carolina, JMP is based in North Carolina in Cary and
	and some of them, some of the best data medical data in the country is housed in North Carolina. The University of North Carolina's got a lot to do that.
	In fact, we formed a collaboration between SAS and the University of North Carolina and North Carolina State University to work on this project for Homeland Security that went on for almost five years.
	But what what I showed them initially was you know what data we could pull out of those databases that might tell us interesting things.
	So let's just walk, walk through some of those types of situations. One of the things I initially wanted to talk about was, okay let's let's look at cases. you know,
	can we see information in cases that occur every, every day? So you know this this was one of the first graphs I demonstrated. You know, it's hard to see anything in this
	and I don't think you really can see anything in this. This is the, you know, how many cases
	in the state of North Carolina, on any given day average averages, you know, 2,782 cases a day and and, you know, that's a lot of information to sort through.
	So we can look at diagnosis codes, but some of the guys didn't like the idea that this this not as clear as we want want it to be so so we we had to find ways to get into that data and study
	and study what what what ways we could surface information. One of those ways we felt like was to identify symptoms, specific symptoms related to something that we're interested in,
	which goes back to this idea that, okay we've identified what anthrax looks like when someone walks in to the emergency room or takes a ride on an ambulance or what have you.
	So we have those...if we identify those specific symptoms, then we can we can go and search for that in the data.
	Now a way that we could do that, we could ask professionals. There was there's rooms full of of medical professionals on this, on this project and and lots of physicians. And kind of an odd thing that
	I observed very quickly was when you asked a roomful of really, really smart people question like, what what is...what symptoms should I look for when I'm looking for influenza or the norovirus, you get lots and lots of different answers.
	So I thought, well, I would really like to have a way to to get to this information, mathematically, rather than just use opinion. And what I did was I organized the data that I was working with
	to consider symptoms on specific days and and the diagnosis. I was going to use those diagnosis diagnosis codes.
	And what I ended up coming out with, and I set this up where I could run it over and over, was a set of mathematically valid symptoms
	that we could go into data and look and look for specific things like influenza, like the norovirus or like anthrax or like ricin or like the symptoms of COVID 19.
	This project surfaced again with with many asks about what we might...how we might go about finding the issues
	of COVID 19 in this. This is exactly what I started showing again, these types of things. How can we identify the symptoms? Well, this is a way to do that.
	Now, once we find these symptoms, one of the things that we do is we will write code that might look something similar to this code that will will look into a particular field in one of those databases and look for things that we found in those analyses that we've
	that we've just demonstrated for you. So here we will look into the chief complaint field in one of those databases to look for specific words
	that we might be interested in doing. Now that the complete programs would also look for terms that someone said, Well, someone does not have a fever or someone does not have nausea. So we'd have to identify
	essentially the negatives, as well as the the pure quote unquote symptoms in the words. So once we did that, we could come back to
	JMP and and think about, well, let's, let's look at, let's look at this information again. We've got we've got this this number of cases up here, but what if we took a look at it
	where we've identified specific symptoms now
	and see what that would look like.
	So what I'm actually looking for is any information regarding
	gastrointestinal issues. I could have been looking for the flu or anything like that, but this is this is what the data looks like. It's the same data. It's just essentially been sculpted to look like you know something I'm interested in. So in this case, there was an outbreak
	of the norovirus that we told people about that they didn't know about that, you know, we started talking about this on January 15.
	And and you know the world didn't know that there was a essentially an outbreak of the norovirus until we started talking about it here.
	And that was, that was seen as kind of a big deal. You know, we'd taken data, we'd cleaned that data up and left the things that we're really interested in
	But we kept going. You know that the strength of what we were doing was not simply just counting cases or counting diagnosis codes, we're looking at symptoms that that describe the person's visit to
	the emergency room or what they called about the poison center for or they or they took a ride on the ambulance for.
	chief complaint field, symptoms fields,
	and free text fields. We looked into the into the fields that described the words that an EMS tech might use on the scene. We looked in fields that describe
	the words that a nurse might use whenever someone first comes into the emergency room, and we looked at the words that a physician may may use. Maybe not what they clicked on the in in the boxes, but the actual words they used. And we we developed a metric around that as well.
	This metric
	was, you know, it let us know
	you know, another month in advance that something was was odd in a particular area in North Carolina on a particular date. So I mentioned this was January 15 and this, this was December 6
	and it was in the same area. And what is really registering is is the how much people are talking about a specific thing and if one person is talking about it,
	it's not weighted very heavily, therefore, it wouldn't be a big deal. If two people are talking about it, if a nurse
	and an EMS tech are talking about a specific set of symptoms, or mentioning a symptom several times, then, then we're measuring that and we're developing a metric from that information.
	So if three people, you know, the, the doctor, the nurse and the EMS tech if that's what information we have is, if they're all talking about it,
	then it's probably a pretty big deal. So that's what's happened here on December 6, a lot of people are talking about symptoms that would describe something like the norovirus.
	This, this was related to an outbreak that the media started talking about in the middle of February. So, so this is seen as...as us telling the world about something that the media started talking about, you know, in a month later.
	And
	specific specifically you know, we were drawn to this Cape Fear region because a lot of the cases were we're in that area of North Carolina around Wilson,
	Wilson County and that sort of thing. So, so that that was seen as something of interest that we could we could kind of drill in that far in advance of, you know, talk about something going on. Now
	we carried on with that type of work concerning um, you know, using those tools for bio surveillance.
	But what what we did later was, you know, after we set up systems that would that would, you know, was essentially running
	every day, you know every hour, every day, that sort of thing. And then so whenever we would be able to say, well,
	the system has predicted an outbreak, you know if this was noticed. The information was providing...was was really noise free in a sense. We we look back over time and we was
	predicting let's say, between 20 and 30 alerts a year,
	total alerts a year. So there was 20 or 30 situations where we had just given people, the, the, the notice that they might should look into something, you know, look, check something out. There might be you know a situation occurring. But in one of these instances,
	the fellow that we worked with so much at Homeland Security came to us and said, okay, we believe your alert, so tell us something more about it. Tell us what
	what it's made up of. That's that's that's how he put the question. So, so what we we did
	was was develop a model, just right in front of him.
	And the reason we were able to do that (and here's, here's the results of that model), the reason we were able to do that was by now, we realized the value of
	of keeping data concerning symptoms relative to time and place and and all the different all the different pieces of data we could keep in relation to that, like age, like ethnicity.
	So when we were asked, What's it made up of, then then we could... Let's put this right in the middle of the screen, close some of the other information around us here so you can just focus on that.
	So when we're asked, okay, what's this outbreak made up of, you know, we, we built a model in front of them (Tony actually did that)
	and that that seemed to have quite an impact when he did this, to say, Okay, you're right. Now we've told you today there there's there's an alert.
	And you should pay attention to influenza cases in this particular area because it appears to be abnormal. But we could also tell them now that, okay
	these cases are primarily made up of young people, people under the age of 16.
	The symptoms, they're talking about when they go into emergency room or get on an ambulance is fever, coughing, respiratory issues. There's pain.
	and there's gastrointestinal issues. The, the key piece of information we feel like is is the the interactions between age groups and the symptoms themselves.
	While this one may, you know, it may not be seen as important is because it's down the list, we think it is,
	and even these on down here. We talked about young people and dyspnea, and young people and gastro issues, and then older people.
	So there was, you know, starting to see older people come into the data here as well. So we could talk about younger people, older people and and people in their
	20s, 30s, 40s and 50s are not showing up in this outbreak at this time. So there's a couple of things here. When we could give people you know intel on the day of
	of an alert happening and we could give them a symptom set to look for. You know when COVID 19 was was well into our country, you know you you still seem to turn on the news everyday and hear of a different symptom.
	This is how we can deal with those types of things. You know, we can understand
	you know, what what symptoms are surfacing such that people may may actually have, you know, have information to recognize when a problem is actually going to occur and exist.
	So, so this is some of the things that you know we're talking about here, you'll think about how we can apply it now.
	Using the the systems of alerting that I showed you earlier that, you know, I generally refer to as the TAP method as just using text analytics and proportional charting.
	Well, you know, that's we're probably beyond that now, it's it's on us. So we didn't have the tool in place to to go looking then.
	But these types of tools may still help us to be able to say, you know, this is these are the symptoms we're looking for. These are the
	these are the age groups were interested in learning about as well. So, so let's let's keep walking through some ways that we could use what we learned back on that project to to help the situation with COVID 19.
	One of the things that we did of course we've we've talked about building this this the symptoms database. The symptoms database is giving us information on a daily basis about symptoms that arise.
	And and you know who's, who's sick and where they're sick at. So here's an extract from that database that we talked about, where it it has information on a date,
	it has information about gender, ethnicity, in regions of North Carolina. We could you take this down to towns and and the zip codes or whatever was useful.
	This I mentioned TAP in that text analytics information, well now we've got TAP information on symptoms. You know, so if people are talking about
	this, say for example, nausea, then we we know how many people are talking about nausea on a day, and eventually in a place. And so this is just an extract of symptoms from
	from this
	this database. So, so let's take a look at how we could use this this. Let's say you wanted to come to me, an ER doctor, or some someone investigating COVID 19 might come to me and say,
	well, where are people getting sick at. You know, that's where are people getting sick
	now, or where might an outbreak be occurring in a particular area. Well, this is the type of thing we might do to demonstrate that.
	I use Principal Components Analysis a lot. In this case because we've got this data set up, I can use this tool to identify
	the stuff I'm interested in analyzing. In this case it's the regions, they asked, you know, the question was where, where and what. Okay what what are you interested in knowing about? So I hear people talk about respiratory issues
	concerning COVID and I hear people talking about having a fever and and these are kind of elevated symptoms. These are issues that people are talking about
	even more than they're writing things down. That's the idea of TAP is, is we're getting into those texts fields and understanding understanding interesting things. So once we we
	we run this analyses,
	JMP creates this wonderful graph for us. It's great for communicating what's going on. And what's going on in this case is that Charlotte, North Carolina,
	is really maybe inundated with with with physicians and nurses and maybe EMS techs talking about their patients having a fever
	and respiratory issues. If you want to get as far as you can away from that, you might spend time in Greensboro or Asheville, and if you're in Raleigh Durham, you might be aware of what's on the way.
	So that this is this is a way that we can use this type of information for
	for essentially intelligence, you know, intelligence into what what might be happening next in specific areas. We could also talk about severity in the same, in the same instance. We could talk about severity of cases and measure where they are the same way.
	So you know the the keys here is is getting the symptoms database organized and utilized.
	We've we use JMP to communicate these ideas. A graph like this may may have been shown to Homeland Security and we talked about it for two hours easily just with, not just questions about even validity,
	you know, is where the data come from and so forth. We could talk about that and and we could also talk about
	okay, this, this is the information that that you need to know, you know. This is information that will help you understand where people are getting sick at, such that warnings can be given and essentially life...lives saved.
	So, so that's that in a sense is the system that we've we put together. The underlying key is, is the data.
	Again, the data we've used is EMS, ED, poison center data. I don't have an example of the poison center data here, but I've got a long talk about how we how we use poison center data to surface foodborne illness, just in similar ways than what we've shown here.
	And then the ability to, to, to be fairly dynamic with developing our story in front of people and talking to them
	in, you know, selling belief in what we do. JMP helps us do that; SAS code helps us do that. That's a good combination tools and that's all I have for this this particular
	topic. I appreciate your attention and hope you find it useful, and hope we can help you with this type of stuff. Thank you.

Presenter

Sam Edgemon

Files

Biological_Surveillance_JMPDiscov.pdf