Do You Trust Me? Assessing Reliability for Driver-assisted Autonomous Vehicles (2023-EU-30MP-1235)

2 Kudos

Caleb King, Research Statistician Tester, JMP Division, SAS Institute Inc.
Peng Liu, Principal Research Statistician Developer, JMP

Autonomous vehicles, or self-driving cars, no longer only live in science fiction. Engineers and scientists are making them a reality. Their reliability concerns, or more importantly, safety concerns, have been crucial to their commercial success. Can we trust autonomous vehicles? Do we have the information to make this decision? In this talk, we investigate the reliability of autonomous vehicles (AVs) produced by four leading manufacturers by analyzing the publicly available data that have been submitted to the California DMV AV testing program. We will assess the quality of the data, evaluate the amount of information contained in the data, analyze the data in various ways, and eventually attempt to draw some conclusions from what we have learned in the process. We will show how we utilized various tools in JMP® in this study, including processing the raw data, establishing assumptions and limitations of the data, fitting different reliability models, and finally selecting appropriate models to draw conclusions. The limitations of the data include both quality and quantity. As such, our results might be far from conclusive, but we can still gain important insights with proper statistical methodologies.

Link to CA DMV disengagement reports

Link to AV Recurrent Events Paper

Hello, my name is Caleb King. I'm a developer in the DoE and reliability group at JMP. Today I figured I'd showcase what I think is a bit of an overlooked platform in the reliability suite of analysis tools, and that's the reliability growth platform. I thought I'd do that in the context of something that's become pretty popular nowadays, and that's autonomous vehicles. They're fast becoming a reality, not so much science fiction anymore. We have a lot of companies working on extensive testing of these vehicles.

It's nice to test these vehicles on maybe a nice track at your lab or something like that. But nothing beats real actual road testing, which is why early in the 2010s, the state of California's Department of Motor Vehicles actually put together a testing program that allowed these companies to test their vehicles on roads within the state. Now, as part of that agreement, each company was required to submit an annual report which would detail out any type of disengagement incidents. Or heaven forbid, any crashes that happened involving their autonomous vehicles.

Those had to be reported to the Department of Motor Vehicles, the DMV. Now, one benefit of the DMV being a federal institution is that these reports are actually available upon request. In fact, we can go there right now to the site and you'll see that you can at least access the most recent reports. We have the 2021 reports. They're still compiling the 2022. You could also, if you want, email. If you wanted some previous ones, I did that with a brief justification of what I was doing. They were pretty quick to respond.

Now, we have different types of reports and different types of testing. We're we're focusing on testing where there is a driver in the vehicle and the driver can take over as necessary. This isn't a fully autonomous vehicle. You do have to be in the driver's seat to do this. We're using these disengagement events as a proxy for assessing the reliability of the vehicles. Obviously, we don't have access to the software in these vehicles. If you worked at those companies, you could probably have more information. We obviously don't.

But they're a proxy because if you want our vehicle to be reliable, that means it needs to be operating as you intend within the environment. Any time you have to take over the AI for some reason, that could be a sign, "I t's not exactly operating as I intended." We can use it as a bit of a proxy. Again, it's not the best approximation, but it's still pretty good. Of course, I'm not the first one to think of this. This is actually an informal extension of some work I've done recently with my advisor, Yili Hong, and a bunch of other co-authors where we actually looked at this type of data from a recurrent events perspective.

I'm going to do a slightly different approach here. But there is a preprint of this article available if you want to check it out that does something similar. Let me go in and describe the data you for you real quick. I'm not going to be looking at every company doing testing. There's so many out there. I'm going to focus on one, and those would be events submitted by Waymo, which was Google's self driving car project. Now they're their own subsidiary entity. These are their annual reports. Let me define what we mean by disengagement events.

I'm in the driver's seat and if something's happening, I'm in autonomous mode and I need to take over and take over driving. That's a disengagement event. I disengage from autonomous mode. That could be for any reason. They, of course, need to report what that reason was. We're just using that as our proxy measure here. These annual reports are going to go all the way back to about 2015, 2014. That's when Waymo started participating in this program. The 2015 report actually contains data back to 2014. They start in the middle there.

Each report essentially covers the range from December of the previous year to November of the current year. The 2016 report would contain data from December 2015 up to November of 2016. That way, they have a month to process the previous year's numbers. There are two primary sources of data we're looking at in each report. The first one is going to list each incident that occurred, when it happened that could be as detailed as day and time, or it could just be the month.

Again, there's not a lot of good consistency across years. It's something we ran into. But they at least give some indication of when it happened. They might say where it happened and they can describe what happened. It could be very detailed or it could just be falling into a particular category that they give. Then the second part of data is going to list the the VIN or partial VIN of the vehicle, so the vehicle identification number. Something to identify the vehicle and how many autonomous miles that vehicle is driven that month. You might see later on when I show this data, there might be a bunch of zeros. Zero just means I either didn't drive that vehicle or I just didn't drive it in autonomous mode.

In either case, I was not doing active testing of the autonomous mode of the vehicle. Now, as I mentioned earlier, there was a bit of inconsistency. Prior to 2018, when they listed the disengagement events, they actually don't give the benefit of the vehicle. We don't know what vehicle was involved. We know how many autonomous miles it drove that month, but we have no idea what vehicle was involved. Starting in 2018, that information is now available. Now we can match vehicle to the incident, which means when we do this analysis, I'm going to do it at two different levels.

One is at an aggregate level where I'm going to be looking at each month all of the vehicles being tested at that time. Then looking at the incident rates overall in an aggregate measure. The second will be then I will zoom in at the vehicle level. I'll look at it by VIN. For that data, I'll only be going back to 2018. For the aggregate level, I can take all of it. Now, before we get through the analysis, actually wanted to show you some tools that JMP has available that allowed us to quickly process and accumulate this data. Again, to show you how easy it is and show off a few features in JMP. Some of them are really new, some of them have been around for a little while.

Let me start by showing you one thing that helped us, and that was being able to read in data from PDFs. Prior to 2018, a lot of these data were compiled in PDFs. Afterwards, they put them in an Excel file, which made it a lot easier to just copy and paste into a JMP table. But for those PDFs, how did we handle that? Let me give you an example using data from 2017. This is actually one of the best formatted reports we see from companies. Some summaries here, some tables here and there. This in appendix A is the data I'm looking at.

You can see here, this is the disengagement events. We have a cause, usually just a category here. They have the day, which is actually the month. A bit of a discrepancy there, the location and type. But this is basically just telling us how many disengagement events happen each month. Then we have a table here or a series of tables actually here at the back. This is showing us for each vehicle, in this case, it only gives partial event information. There is not a lot of information available in these early reports and then showing you how many autonomous miles were driven each month. How can we put this into JMP? Well, I could just copy and paste, but that's a bit tedious. We can do better than that.

Let me come here. I'm going to go to my File, I'm going to go to Open. There's my PDF. I'm going to click Open and JMP has a PDF import wizard. Awesome. Now what it's going to do is it's going to go through and look at each page and identify whatever tables it finds there. It's going to categorize them by the page and what the table is on that page. Of course, when you save it out, you can, of course, change the name.

Now, I don't want every table on every page. What I'm going to do is I'm going to go to this red triangle on this page and just say, "Ignore all the tables on this page. I don't want these." I'll say, ""Okay," I'll do the same here. It's a nice summary table, but it's not what I want. Then I start saying, " This is the data I want." Now, we're going to notice here, this is formatted pretty well. It's gotten the data I want. If I scroll to the next one, this is technically a continuation of the table from before. However, by default, JMP is going to assume that every table on each page is its own entity.

What I can do to tell JMP that actually this is just a continuation is to go to the table here on the page, click the red triangle and say for the number of rows to use as header, there actually are none. This is a way to tell JMP that actually that's a continuation of the previous table. We'll check in the data here, and it looks like it did it now. I'm going to check here at the bottom and I noticed, "Oh, I missed that October data. That's okay. I'm going to do a quick little stretch there and boom, I got it." That's okay. You can manipulate the tables. I f it didn't catch something, you can stretch and manipulate the table to adjust it. You can also add tables it didn't find.

In this case, I missed this. That's okay. I'm going to drag a box around it. Boom. There's a new table for you, JMP. I'm going to go in here. It's going to assume that there are some header rows. Actually, there are none. Okay, great. Now it's captured that part of the data. There's a bit of an empty cell here. That's just a formatting error because this is technically two lines, so they didn't put this at the top. It's okay. Easy fix on the back end. Now for these tables, what we notice if we go to that is it actually thinks, "Well, this is actually one table." Unfortunately, it's technically not correct because there are two tables, but it's an easy fix. I can simply go to each one and say, "It's actually not a continuation JMP. This actually has its own header."

It says, "Okay," and you can do that for each of these tables. I won't do it for all of them. I'm just doing this to illustrate. What we'd have to do is we'd probably end up with a bunch of tables here where we'll have to horizontally concatenate. That's just the way they decide to format in the report. But JMP has a lot of tools to help us with concatenating and putting tables together. But you can see this is a lot easier than trying to copy and paste this into JMP, making sure that the formatting is all good. JMP is going to do a lot of that for us.

Okay, another helpful tool that came out recently in JMP 2017 is, if you've probably heard of it, the JMP workflow. That was super helpful because obviously we have multiple reports over multiple years. We'd like to at least combine across all the years into two reports, one with the disengagement events, one with the mileage. What we did is we created an initial... We followed some steps to set up the table in a way that we can then concatenate them together into one table, and then we saved it into a workflow.

That's what I have demonstrated here. This is a workflow builder that we put together for that. I'm going to demonstrate it using this data set. This is particular for our mileage. What we have here is a table. This represents what we would have a raw output from one of the reports. Here we, of course, have it broken down by VIN number. We've got a lot of information here. We'd like to reformat this table. First thing I'm going to do, I'm just going to walk through each step. I'm not going to show too many details in each step. You'll see what they are. It's pretty self explanatory.

This first one is going to change the name of this column to vehicle. That way it matches a column in our concatenated table. I'm going to go over, I'm going to delete this total column. I don't need that. Then I'm going to do a stack across all the dates. You can see I've got that here. We conveniently called it stacked table, very informative. Now, one thing I need to do here, I put a pause here, that's the little stop sign. In. That's because I would usually need to go in and change the year.

Now, something I could do right now, I couldn't really figure out a way to get a variable, say year, that you could just fill out, put the year there, and then it automatically fill it in here. That's maybe something I can go to community.jmp.com , go onto the wish list and say, "Hey, it'd be nice if I could do this." But for right now, I just put in the years. It was pretty easy to do compared to doing this multiple times. Pretty straightforward. But again, I can also highlight for you how you can actually go in and adjust the script itself. You can go in and tailor this to your needs. What this is going to do is recode these so it shows the month and the year. I'll do that real quick. There we are.

The next step is going to take this. Right now, this is a category, it's a string, I want a number. That's what I do next. Now, this isn't pretty. This is just the number of seconds since some date in 1900, I believe. Obviously, that's not pretty. I'd like to show something more informative. That's what I do in the next step. Now, it shows the month and the year. I'm going to stop here. I'm not going to continue because at this point I'd have another table open. This next step would then concatenate the tables and then close off these intermediate tables. What I'm going to do is I'm going to reset stuff. I'll reset, click here. I'm going to reopen this table. I'm going to do this just so you can see how fast this goes.

Here I'm going to click over here, I'm going to click Play, I'm going to click Play again. Look how fast that was. Now, imagine doing this for multiple reports. How much faster that is than repeating the same steps over and over and over again. This workflow was really helpful in this situation.

Now, I'm going to close all these out because it's time to get into the analysis. Let's do that. I'm going to start with the aggregate level data. Here's my table. I compiled across all the data, all the time periods. We have the month, we have how many disengagement happened in that month. I got a column here for the cumulative total. I've got here how many autonomous models were driven. I got two columns here that I'm going to talk about in just a second. You'll have to just wait.

What I'm going to do is I'm going to go in, I'm going to go to Analyze. I'm going to go under Reliability and Survival, and then I'm going to scroll all the way down until I reach Reliability Growth. I'll click that. Now we have multiple tabs here. I'm only going to focus on these first two because these last two concern if I have multiple systems. I'll revisit those when we get to the actual vehicle information. For right now, let's pretend that we're looking at the whole AV system, the artificial intelligence system in these vehicles. Think of it as one big system.

There are two ways that I can assess this. One, I can do as time to event, essentially how many months until a certain event happened or days if we had that. Or I could do it via a particular time stamp. Basically, what was the time at which it occurred? I do have that type of formatted data. I have it at the month. The month is a fine timestamp. It just says in that month I had this many events happen. That's all I need to put in. I have all the data I need. I'll click OK.

Now, great thing about this is before you do any analysis, you should, of course, look at your data, visualize your data. It's nice because the first thing it does is it visualizes the data for you. Let's look at it. One thing we're looking at, we're looking at cumulative events over time. What we expect is a behavior where early on we might have what I'll call a burn in type period where I have a lot of events and it's happening. I'm tweaking the system, helping fix it, helping improve it.

Then ultimately, what I'd like to see is this plateau. I'd like it to increase and then flatten off. That tells me that my number of incidents is decreasing. If it goes completely flat, that's great. I have no more incidents whatsoever. I wish the were like that, it is not. But we can see patterns here in the data. Let's walk through. We have a burn in period here, early 2015, and then about mid 2016, we flatten off until about here. We see a little blip, about summer of 2016, something happens. We get a few more incidents. We level off again until we get to about here, about late spring of 2018.

Something else happened because we start going up again. They're not very steep. This one's a bit longer. Then we pretty much at here, we almost flattened out. We've reached the period where we're really having no incidents happen, essentially, till the end of 2020. Then something happens in 2021 and where we've reached essentially another burn in period, something's going on. Essentially what we've got is four phases, if you will, happening in the growth of the system. Something's changed two or three times to impact the reliability.

Another way to visualize this, I'll run this plot. This uses some data. I'm plotting again the cumulative total. I'm also plotting something what I call the empirical mean time between failures. It's a very simple metric to compute. It's just the inverse of the number of disengagements. It is a very ad hoc, naive way to try and estimate the mean time between incidents. But I plotted here so that you can see, you'll notice these four peaks that correspond to the bend in the curve. But there are four of them indicating these four places where something has changed in the system to affect its reliability.

What we can do then is try to figure out, what are those breakpoints? One way you could do that is the reliability growth platform has a way to fit a certain model. I'll pause here to talk about the model a bit. All of these are actually the same model with slight modifications. They're all what we call a non-homogeneous poisson process. That is a fancy way to describe a counting process. I'm counting something, but the rate at which the accounts might occur per unit time is changing over time. A Poisson process just means that at a constant rate, so the rate at which incidents would occur would to be constant, that would be equivalent to seeing a straight line.

It's very easy to model, but it's bad for reality because obviously we don't want the rate to stay the same. We would like it to decrease to essentially zero. That's why we have a non-homogeneous poisson process. We want it to change over time. Here we have a model where we can actually let JMP try and figure out a change point in the process. If I run it, what it's going to do is it's actually going to catch this big piece and say, "Hey, something really changed there. For most of it, it was the same thing, but after this point, it really changed." Now here it's only going to change one at a time. I have talked to the developer about, wouldn't it be nice if we could identify multiple change points?

A pparently that's a bit of an open research problem, so me and him might be working together to try and figure that out. But what I did is I essentially eyeballed it and said, "I think there are certain phases. I think there's about three or four phases, and I did it empirically, which is where you get this column." I'm going to run that script. Let me show you how I did it. I come here under redo, go to relaunch, and all I did was I added the phase column here. This tells you that there are different periods where the reliability might have changed significantly, excuse me, in some way.

If we think of that, we're going to look at the key metric here as the mean time between failure. We're going to see early on, so this is in months, this here is about three days, 4- 5, about a week, and this is about a day, day and a half. Early on, we have a bit of a low time. It's pretty frequent. We can also look here, I'll show you the intensity plot. That might be another thing to interpret. What we're looking for is we'd like the mean time between failures to be long. We'd like it to be a long time between incidents, ideally infinite. That means nothing ever happens, and our intensity to decrease.

What we're looking here is, we get a bit of a good start. About middle of 2016, we're doing really well. In fact, we get to about a week between an incident for any vehicle. There was a bit of a blip, but we primarily get back to where we were until we get to the end of 2021, where now it's essentially about a day between incidents for any vehicle. Something big happened here at the end of 2020 with these vehicles with this software system, if you will.

Again, you can see here with the intensity, you can almost do one curve and we get down to about six or seven incidents per month. Whereas here it's almost 30, essentially, once a day. We've been able to look into here and discover what's going on, at least at the aggregate level. Before we get to the vehicle level, I'm going to run one more graph that's looking at, we've got all these autonomous miles. Could it be that if I drive it more often, maybe I encounter more incidents? Could that have an effect? T here's a quick way to assess that. Just using a simple graph. We'll just plot autonomous miles versus the total disengagements.

We see here for are a few number of disengagements, that might be true. The more you drive, the more you might see. But in general, long term, not really. There's really no big, strong correlation between how many autonomous miles driven, how many engagements you see. There's something else going on. T hat's actually what we found in the paper that I mentioned earlier is that the mileage impact was very minimal.

Now, let's zoom in to the individual vehicle. We're not going to have all the data, even though I actually do have it here. But we're not going to have complete data for all of the vehicles. Let me break it down. I have the month, I have the vehicle identification number. Notice some of these, it's only partial. I have here what I call a VIN series. This is very empirical. I'm just taking the first four digits of the VIN. You'll see here, I'm going to scroll down a bit and we'll see. Let's see, maybe I will drag down a little bit. There we go.

Some of these VINs, a lot of them actually start with the same four digits, 2C4R. I'll call them the 2C4 series. There's a bunch of vehicles that have this as their starting one. This identifies a particular fleet of vehicles, at least from an empirical view. If I scroll down, we're going to run into a different series, which I'm going to call the SADH series. This is the one that was introduced about 2021. That's when I saw the Venn numbers change to the SADH designation.

Again, I have how many of miles, I have a starting month, when did that vehicle start? I'm going to use this to compute the time to events. First, I'm going to do a plot. I think this is the most informative plot you'll see for this analysis. What I've done here is I've essentially created for you a heat map. You can see I've got the heat map option select. Selected, I got for each vehicle and over time, essentially a cell, and that's just going to indicate, was I driven in autonomous mode anytime that month?

I got it color coded by the series. These vertical lines correspond to the transitions between those empirical phases I mentioned earlier. What this is telling us is basically, can we identify what might have caused those transitions? Here we see an initial series of vehicles, and it looks like there wasn't a big change in what vehicles were introduced here. Maybe there was a bit of a software upgrade for this particular series that may have introduced those new incidents. Here we see that a new series was introduced, a smaller number of vehicles, maybe pilot series. Then a bunch of them introduced about that same period where we saw the other transition.

Here, this seems to correspond to a new fleet of vehicles with maybe a slightly updated version of the software. Here, we see a clear distinction. Obviously, in 2021, a completely new series of vehicles was introduced. We have a bit of the old vehicles still there in the mixture, but most of them are the new vehicles. That probably explains why we got a new batch of new incidents. We got a burn in period for this new series of vehicles. This is cool because now we have a bit more explanation as to what was going on with the aggregate data, which is why it's important to have this information.

Now let's break it down by VIN. I have right here script to indicate we've got a table here and it's similar to the table I have previously. Notice some of these have been excluded and this is because if for that particular vehicle, the total number of incidents was less than three, the platform is not going to be able to fit a model for you because it needs at least three incidents per vehicle. That makes sense. I have only one or two, that's not really enough information to assess the reliability. If I have three or more, now we're talking, I can do something.

I also have the month since the start. I have some cumulative information there which month it started. I'm going to go ahead then and run the platform. Don't worry, I will show you how I did this. I'm going to go to the redo, relaunch. I'm going to get rid of that. That's some leftover stuff. I'm looking at one of these two last platforms. These are about multiple systems. Now we're thinking of each vehicle as its own system. The concurrent just means I'm going to run each vehicle one after the other.

That's not what's happening here. The vehicles are essentially being run in parallel. Multiple vehicles are being driven at a time. Here I need a column to indicate the time to event, in this case, how many months since the start until this many events happened. I have the system ID. The one thing that's not shown here is I actually took the VIN series and used that as a by variable, which is why we have the where VIN equals so and so. There's only going to be two because the one with a little asterisks has no information about incidents. That's because that was two earlier that was prior to being able to tie the VIN to the vehicle.

But they were there for completeness, I'm cancelled out of that. Now, what we're going to see here is a list of models that you could run. These first four, I'm not going to be able to do because I only have one phase. Essentially, the phase now corresponds to the VIN series. But there are two that I could run and the only difference is, do I want to run a model for all of them saying these are all part of an identical system? Makes sense. These are all vehicles, they probably run the same software. Maybe I can run a model for all of them. Or I can have a model where I fit it to each individual vehicle.

Before I run those models, let's take a look at this plot. Again, start with the visualization and what we see plotted is all the vehicles. Notice here there's a bit of a shallow slope to this. Essentially there's a bit of a steep curve, but then it levels off pretty quick. This is a pretty good sign of reliability going on here. I'm going to compare them, I'm going to scroll down to the next one, the set series. Now, initially the axes here, just so you know, when you run it next time, the axes are going to only go to the complete set of data. This would actually be a smaller range.

I fix them so they had the same range. You can clearly see that this is much steeper than this. Clearly, we have more incidents happening with this new series than this one. But we can do a quick model fit. I'm going to do the identical system. Again, it's a non-homogeneous poisson process. Although in this case, I'm going to ignore the estimates for right now, if you want to look at that, you can. I'm going to go straight to the mean time between failure and you'll notice that for all the months it's pretty much flat.

What it's essentially done is this is just a poisson process. The rate is constant, which is good for modeling, not so good in terms of assessing it. It's just saying across this whole time for this particular series, we pretty much reached for any one vehicle, a mean time of about five months between incidents. Now, let's compare that to what we saw with the aggregate where it was about a week, that's across any vehicle. It's just saying for any vehicle, it was about a week between an incident for any vehicle. Whereas this seems to be implying it's about five months for any one vehicle.

You can think of it as the running in parallel, so you can see it's staggered. Any one vehicle, it could be about five months. Again, this is an average, there's a lot of range between there. For one vehicle, it's a pretty long time. But in aggregate, they're probably staggered enough that it seems like it's about a week for any one vehicle. They can be consistent like that. But this is still pretty good. That's about five months between an incident, a disengagement event here for that series. If we run to the SADH series and do the same model, let me go here. There we see clearly increasing. I'm going to hide that. If we look at this, it says, "No," early on we probably had about two months. That's a bit of a start there. But we've dropped to less than a month, almost two to three weeks between incidents.

Clearly, there's a bit more work to do on this series. Again, it was just introduced, so this is probably more of the burn in phase. If we get the 2022 data, we might start to see it level off like it did in the previous series. This actually might flip and be more of a level curve. That's about all I want to show for these platforms. I can show you some of the individual distinct systems, but there are a lot of systems here and so it's going to get crowded very quickly. There's a plot for each one. There's estimates for each one. You can look at the mean time between failure for each one.

If there are particular vehicles you wanted to call out and see how they might differ, this is what you can do. You can see some increase, some decrease, but overall more or less flat. You can also look at intensity plots. If you find that more interpretable than the mean time between failure, you have other metrics that you can incorporate here. Okay, t hat's all I want to show for this platform. Now, of course, there's data I didn't include here. For example, we could break it down by cause.

For some of this data, cause might be, I just need to take over because it was getting too close to the side of the the road. Or maybe the car stopped at the stop sign, did what it was supposed to, started rolling, and some other driver blew through the stop sign coming the other way. In which case, maybe that might not necessarily be a reliability hit. The car did what it was supposed to. Somebody else wasn't. It'd be interesting to break it down by that, also by location. The number of incidents, you get more when you're in the city, maybe on the highway, something like that.

Real quick, we should look at the mileage impact. Again, same information, one or two incidents, sure, that might change. But overall it's going to be flat. The mileage impact on the incident rate is minimal. Of course, this is just one of many platforms available in the reliability suite. You can see there's a ton of options, very flexible for helping assess reliability. Again, that's all I have to show you. Hopefully, I've been able to demonstrate for you how well JMP can help initiate discovery and analysis. Hopefully, you discovered a lot of things about this particular company's autonomous vehicles. I hope you enjoy the rest of the conference. Thank you.