Multi-domain operations, the Army’s future operating concept, requires autonomous agents with learning components to operate alongside the warfighter. New Army research reduces the unpredictability of current training reinforcement learning policies so that they are more practically applicable to physical systems, especially ground robots.
These learning components will permit autonomous agents to reason and adapt to changing battlefield conditions, said Army researcher Dr. Alec Koppel from the U.S. Army Combat Capabilities Development Command, now known as DEVCOM, Army Research Laboratory.
The underlying adaptation and re-planning mechanism consists of reinforcement learning-based policies. Making these policies efficiently obtainable is critical to making the MDO operating concept a reality, he said.
According to Koppel, policy gradient methods in reinforcement learning are the foundation for scalable algorithms for continuous spaces, but existing techniques cannot incorporate broader decision-making goals such as risk sensitivity, safety constraints, exploration and divergence to a prior.
Designing autonomous behaviors when the relationship between dynamics and goals are complex may be addressed with reinforcement learning, which has gained attention recently for solving previously intractable tasks such as strategy games like go, chess and videogames such as Atari and Starcraft II, Koppel said.
Prevailing practice, unfortunately, demands astronomical sample complexity, such as thousands of years of simulated gameplay, he said. This sample complexity renders many common training mechanisms inapplicable to data-starved settings required by MDO context for the Next-Generation Combat Vehicle, or NGCV.
“To facilitate reinforcement learning for MDO and NGCV, training mechanisms must improve sample efficiency and reliability in continuous spaces,” Koppel said. “Through the generalization of existing policy search schemes to general utilities, we take a step towards breaking existing sample efficiency barriers of prevailing practice in reinforcement learning.”
Koppel and his research team developed new policy search schemes for general utilities, whose sample complexity is also established. They observed that the resulting policy search schemes reduce the volatility of reward accumulation, yield efficient exploration of an unknown domains and a mechanism for incorporating prior experience.
“This research contributes an augmentation of the classical Policy Gradient Theorem in reinforcement learning,” Koppel said. “It presents new policy search schemes for general utilities, whose sample complexity is also established. These innovations are impactful to the U.S. Army through their enabling of reinforcement learning objectives beyond the standard cumulative return, such as risk sensitivity, safety constraints, exploration and divergence to a prior.”
Notably, in the context of ground robots, he said, data is costly to acquire.
“Reducing the volatility of reward accumulation, ensuring one explores an unknown domain in an efficient manner, or incorporating prior experience, all contribute towards breaking existing sample efficiency barriers of prevailing practice in reinforcement learning by alleviating the amount of random sampling one requires in order to complete policy optimization,” Koppel said.
The future of this research is very bright, and Koppel has dedicated his efforts towards making his findings applicable for innovative technology for Soldiers on the battlefield.
“I am optimistic that reinforcement-learning equipped autonomous robots will be able to assist the warfighter in exploration, reconnaissance and risk assessment on the future battlefield,” Koppel said. “That this vision is made a reality is essential to what motivates which research problems I dedicate my efforts.”
The next step for this research is to incorporate the broader decision-making goals enabled by general utilities in reinforcement learning into multi-agent settings and investigate how interactive settings between reinforcement learning agents give rise to synergistic and antagonistic reasoning among teams.
According to Koppel, the technology that results from this research will be capable of reasoning under uncertainty in team scenarios.
The Latest Updates from Bing News & Google News
Go deeper with Bing News on:
- Reliable Bonding for Autonomous Drivingon January 12, 2021 at 7:21 am
For a long time, autonomous driving seemed like science fiction. Now, it is increasingly becoming a reality. In the next 10 years, driving as we know it will undergo greater changes than in the last ...
- Travelers Institute Kicks Off 2021 Programmingon January 12, 2021 at 1:46 am
The Travelers Institute, the public policy division of The Travelers Companies, Inc. (NYSE: TRV), today announced the lineup of its 2021 educational webinars, which will focus on critical issues ...
- Mobileye announces autonomous vehicle pilot programs in Tokyo, Shanghai, and Parison January 11, 2021 at 6:00 pm
Intel's Mobileye says it will begin testing its autonomous vehicles in Tokyo, Shanghai, Paris, and potentially New York within months.
- Don’t Just Harden U.S. Military Bases, Make Them Smarteron January 11, 2021 at 3:50 pm
While the main threat to military facilities may come from enemy ballistic and cruise missiles, it is time to consider the possibility of unconventional attacks involving small drones and infiltrators ...
- On the market: Two houses on one lot can shelter extra family, home offices or paying tenantson January 9, 2021 at 6:36 am
Families housing multi generations and people working at home who are finding that a smaller, self-contained apartment (ADU) can serve their needs.
Go deeper with Google Headlines on:
Go deeper with Bing News on:
- Draganfly Selected by Knightscope to Integrate Mobile Vital Sign Screening Technology into its Autonomous Security Robotson January 12, 2021 at 6:19 am
Initial Five Orders Integrated and DeliveredLos Angeles, CA., Jan. 12, 2021 (GLOBE NEWSWIRE) -- Draganfly Inc. (OTCQB: DFLYF) (CSE: DFLY) (FSE: ...
- Milrem Robotics Rolls out its New Type-X RCVon January 12, 2021 at 5:13 am
X Robotic Combat Vehicle by the European leading robotics and autonomous systems developer Milrem Robotics passed its initial mobility tests. This press release features multimedia. View the full ...
- Autonomous Mobile Robots Market Size Worth $8.3 Billion By 2027: Grand View Research, Inc.on January 12, 2021 at 2:05 am
The global autonomous mobile robots market size is expected to reach USD 8.3 billion by 2027 registering a CAGR of 19.6% from 2020 to ...
- NASA readies Astrobee flying robots for serious space scienceon January 12, 2021 at 12:00 am
NASA astronauts aboard the International Space Station are preparing new Astrobee flying robots to enhance science on the orbiting laboratory -- a technology that could help ensure future deep space ...
- Pandemic's Robot 'Heroes' Highlight Their Value At Tech Showon January 11, 2021 at 11:36 pm
Robots that helped people survive and stay safe over the past year are touting their value at the tech industry's annual extravaganza amid a pandemic which has given fresh momentum to the robotics ...