How one R&D firm aims to make human-AI systems testing more effective

8/11/2025

DC Studio

By Bridget Dean, Associate Editor

Artificial intelligence (AI) has been increasingly used in freight-rail transportation to increase efficiency, enhance safety and reduce human error. With any new technology, however, extensive testing is required prior to implementation.

Charles River Analytics (CRA) recently developed a framework for testing AI-human teaming.

CRA in July completed a one-year Federal Railroad Administration contract to develop a human-AI system testing and evaluation framework titled “Assessment for Better Operator-AI-centered Reseach and Development” (ABOARD).

As a research and development service provider for the U.S. government and commercial entities, CRA utilized its in-house AI research and partnered with industry experts, including a locomotive engineer. CRA worked with Volpe National Transportation Systems Center’s freight and passenger simulation environment to develop a framework for testing human-AI system interactions and ensuring systems meet human standards.

“There’s all sorts of different types of AI technology that [are] enabling things like remote train operations,” says Mandy Warren, senior scientist at CRA and lead on the ABOARD contract. “That’s a really good example."

Warren’s first brush with the USDOT was in 2001 as an engineering psychologist. She worked on system integration, discovering ways to mitigate human error in transportation systems, she says. As AI became more prevalent in the transportation industry, her work pivoted to focus on human and AI interactions.

Ensuring AI meets human standards

Systems like remote train operations incorporate computer vision algorithms, train control algorithms and decision support algorithms in their programming, Warren says.

Currently, computer vision algorithms are tested against a standard requirement for the system, Warren says. They aren’t always tested for how well they meet human standards or work in partnership with a human user, until operational testing. If developers find an issue with how the AI system interacts with human users at that point, it is often too late to make significant changes to the algorithms, Warren says.

ABOARD aims to identify a way to incorporate the necessary human-AI testing into system evaluation early on, so that AI developers can ensure their systems work well with and compared to human users. ABOARD also aims to make the testing and evaluation process manageable for developers without the need for someone like herself permanently on staff, Warren added, since doing the cognitive and algorithmic testing can be a “very manual process.”

When testing for how well an algorithm such as a computer vision algorithm, works compared to humans, Warren typically begins with a simulation-based study, comparing how well humans can recognize different stimuli versus how well the computer can. Data collected from the simulation would then be statistically evaluated. Even using advanced simulators, the programming required to collect the data is complex, she says.

Making AI systems user-friendly

Using the ABOARD framework should also ensure developers create effective ways for AI programs and users to interface.

“An algorithm is an algorithm. It’s going to spit information out, but how you frame that information to the person who’s going to be using it is going to determine whether or not they can use that information effectively,” Warren says.

In train yards, for example, optimization algorithms are often used to build trainsets. If the user interface for the optimization algorithm is poorly designed, users may not be able to utilize the AI tool to its full extent, leading to errors and miscommunication. It’s critical that the ABOARD framework provides developers with ways to test how usable an AI system will be for the end user, Warren says.

Future uses

With the FRA contract now completed, it’s up to the rail industry to show interest in the ABOARD framework’s potential, as further development of potential software depends on additional funding, Warren says.

However, CRA is working on sister project for the U.S. Air Force, which is interested in broadening the scope of ABOARD for purposes beyond rail. While the FRA now retains general purpose rights to the ABOARD framework, CRA still owns the intellectual property. This means interested railroads and other parties could reach out to CRA.

“We could certainly start to have transition conversations about how this solution could benefit [interested parties’] AI development programs,” Warren says. “That was the [FRA]’s vision: that this would be a resource for railroad and railroad technology providers to be able to appropriately test the ‘human-in-the-loop' when it comes to AI-enabled technology.”

Warren believes AI has great potential in the rail industry to decrease human decision- making errors that lead to safety concerns. Because of how critical safety is in the rail industry, a human will always need to be “in the loop” somewhere in railroad operations, she adds.

If the implementation of AI systems is done correctly, with the human end user considered, these algorithms will make railroad operations more efficient and safer, Warren believes. That’s what CRA is advocating for, she adds.

How one R&D firm aims to make human-AI systems testing more effective

Ensuring AI meets human standards

Making AI systems user-friendly

Future uses

BNSF opens Flight Operations Center to boost drone performance

CSX applies autonomous technology to safety, efficiency projects

MxV Rail's international partnerships aim to advance rail research at home and abroad

How one R&D firm aims to make human-AI systems testing more effective

Ensuring AI meets human standards

Making AI systems user-friendly

Future uses

BNSF opens Flight Operations Center to boost drone performance

CSX applies autonomous technology to safety, efficiency projects

MxV Rail's international partnerships aim to advance rail research at home and abroad

Login