Researchers from MIT, Stanford University, and the University of Pennsylvania have devised a method for predicting failure rates of safety-critical machine learning systems and efficiently determining their rate of occurrence. Safety-critical machine learning systems make decisions for automated technology like self-driving cars, robotic surgery, pacemakers, and autonomous flight systems for helicopters and planes. Unlike AI that helps you write an email or recommends a song, safety-critical system failures can result in serious injury or death. Problems with such machine learning systems can also cause financially costly events like SpaceX missing its landing pad.
Researchers say their neural bridge sampling method gives regulators, academics, and industry experts a common reference for discussing the risks associated with deploying complex machine learning systems in safety-critical environments. In a paper titled “Neural Bridge Sampling for Evaluating Safety-Critical Autonomous Systems,” recently published on arXiv, the authors assert their approach can satisfy both the public’s right to know that a system has been rigorously tested and an organization’s desire to treat AI models like trade secrets. In fact, some AI startups and Big Tech companies refuse to grant access to raw models for testing and verification out of fear that such inspections could reveal proprietary information.
“They don’t want to tell you what’s inside the black box, so we need to be able to look at these systems from afar without sort of dissecting them,” co-lead author Matthew O’Kelly told VentureBeat in a phone interview. “And so one of the benefits of the methods that we’re proposing is that essentially somebody can send you a scrambled description of that generated model, give you a bunch of distributions, and you draw from them, then send back the search space and the scores. They don’t tell you what actually happened during the rollout.”
Safety-critical systems have failure rates so low that the rates can be tough to compute, and the better the systems get, the harder it is to estimate, O’Kelly said. To come up with a predicted failure rate, a novel Markov chain Monte Carlo (MCMC) scheme is used to identify areas in a distribution believed to be in proximity to a failure event.
“Then you continue this process and you build what we call this ladder toward the failure regions. You keep getting worse and worse and worse as you play against the Tesla autopilot algorithm or the pacemaker algorithm to keep pushing it toward the failures that are worse and worse,” said co-lead author Aman Sinha.
The neural bridge sampling method detailed in the paper draws on decades-old statistical techniques, as well as recent work published in part by O’Kelly and Sinha that uses a simulation testing framework to evaluate a black box autonomous vehicle system. In addition to the neural bridge contribution in the paper, the authors argue in favor of continued advances in privacy-conscious tech like federated learning and differential privacy and urge more researchers and people with technical knowledge to join regulatory conversations and help drive policy.
“We would like to see more statistics-driven, science-driven initiatives, in terms of regulation and policy around things like self-driving vehicles,” O’Kelly said. “We think that it’s just such a novel technology that information is going to need to flow pretty rapidly from the academic community to the businesses making the objects to the government that’s going to be responsible for regulating them.”
In other recent safety-critical systems news, autonomous shipping has grown during the COVID-19 pandemic, and last week a team of researchers detailed DuckieNet, a physical model for evaluating autonomous vehicle and robotics systems. Also last week: Medical experts introduced the first set of standards for reporting artificial intelligence use in medical clinical trials.