文档介绍:: .
兵工学报
到时间协同与碰撞避免的协同航迹,
并能对环境建模时所未探明的障碍物进行躲避;与 A*算法相比,针对在线应用问题,新算法具有更高的
求解效率。
关键词:航迹规划;Q 学****时间协同;碰撞避免
中图分类号: 文献标志码:A
DOI:.0606
Reinforcement Learning-based Multi-UAVs Path Planning Method
YIN Yiyi1,2,WANG Xiaofang1,ZHOU Jian3
( of Aerospace Engineering,Beijing Institute of Technology,Beijing 100081,China;
Institute of Electronic System Engineering,Beijing 100854,China;
’an Modern Control Technology Research Institute,Xi’an 710065,Shaanxi,China )
Abstract: To solve the path planning problem of multi-UAVs with time cooperation,the battlefield model as
well as the Markov model of a single-UAV path planning is established,and the optimal path is calculated on the
basis of the Q learning algorithm. The Q-table obtained based on the Q learning algorithm is used to calculate the
shortest path of each UAV and the cooperative range,and the time cooperative paths is obtained by adjusting the
action selection strategy of the orbiting UAVs. Considering the collision avoidance problem of multi-UAVs,the
partial area is determined by designing backward parameters,and based on the deep reinforcement learning theory,
neural network is used to replace Q-table to re-plan partial path for UAVs which can avoid the problem of
dimensional explosion. As for the previously unexplored obstacles,the obstacle matrix is designed based on the idea
of the artificial potential field theory,and it is superimposed on the original Q-table to realize the collision avoidance
for the unexplored obstacle. Simulation results verify that the propo