OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

- 2018

采用双层强化学习的干扰决策算法
An Algorithm for Jamming Decision Using Dual Reinforcement Learning

DOI: 10.7652/xjtuxb201802010

颛孙少帅,杨俊安,刘辉,黄科举

Keywords: 强化学习,双层强化学习,干扰决策,先验信息,奖赏标准
reinforcement learning,dual reinforcement learning,jamming decision,prior information,reward standard

Full-Text Cite this paper Add to My Lib

Abstract:

为解决强化学习算法在干扰决策过程中收敛速度慢的问题,提出了一种采用双层强化学习的干扰决策算法(DRLJD)。首先对等效通信参数进行建模,模型减少了待学习参数的个数,降低了搜索空间的维度;然后利用降维后的搜索空间指导干扰参数选择,避免随机选择导致干扰性能差的缺点;最后利用选择的干扰参数施加干扰,并根据环境反馈进一步降低搜索空间的维度,通过不断交互的方式加快算法的收敛速度。此外,将以往的干扰经验以先验信息的形式加入到系统的学习进程中,进一步缩短了系统的学习时间。针对构造的干扰问题实验表明,DRLJD算法经过200次交互便学习到优异的干扰策略,小于现有算法所需600次交互,且先验信息的利用进一步降低了对交互次数的要求。以提出的新的奖赏标准作为奖赏依据,算法能够在未知通信协议情况下以牺牲交互时间为代价学习到最佳干扰策略。
A novel algorithm for jamming decision using dual reinforcement learning (DRLJD) is proposed to accelerate convergence rate of reinforcement learning algorithms in jamming decision. First, a model of equivalent communication parameter is constructed to reduce both the number of unlearned parameters and the dimension of the search space. Secondly, the search space with reduced dimension is used to choose jamming parameters and to avoid worse jamming performance caused by random selection. Finally, the selected parameters are used to take jamming action, and to reduce the dimension of search space from the environment feedback information. The convergence rate of the algorithm is accelerated by constant interaction. Moreover, previous jamming experiences are used as prior information to further shorten the learning time of the system and to accelerate the convergence rate. The proposed DRLJD algorithm is validated by taking experiments on some jamming problems. Simulation results show that the algorithm obtains the optimal or suboptimal jamming strategy with 200 interaction times which is less than 600 interaction times of existing algorithms, and the use of prior information further reduces the requirements for the number of interactions. When the new reward standard is used as a basis for reward the proposed algorithm could learn the optimal jamming strategy at the expense of interaction times in the case that communication protocols are not known

References

[1]	［5］WU Yongle, WANG Beibei, LIU K J R, et al. Anti？？jamming games in multi？？channel cognitive radio networks ［J］. IEEE Journal on Selected Areas in Communications, 2012, 30(1): 4？？15.
[2]	［6］JIN H, SONG X Q, WANG M, et al. A fast anti？？jamming decision method based on the rule reduced genetic algorithm ［J］. KSII Transactions on Internet and Information Systems, 2016, 10(9): 4549？？4567.
[3]	［7］WANG H, HUANG T, LIAO X, et al. Reinforcement learning in energy trading game among smart micro？？grids ［J］. IEEE Transactions on Industrial Electronics, 2016, 63(8): 5109？？5119.
[4]	［11］AMURU S, TEKIN C, SCHAAR M, et al. Jamming bandits: a novel learning method for optimal jamming ［J］. IEEE Transactions on Wireless Communications, 2016, 15(4): 2792？？2808.
[5]	［12］AMURU S, BUEHRER R M. Optimal jamming using delayed learning ［C］∥Proceedings of 33rd Annual IEEE Military Communications Conference. Piscataway, NJ, USA: IEEE, 2014: 1528？？1533.
[6]	［13］黄科举, 杨俊安, 周继航, 等. 基于多臂赌博机模型的IEEE 802？？11 MAC协议认知干扰技术［J］. 通信对抗, 2017, 36(2): 30？？35.
[7]	JIA Xin, ZHU Weigang, QU Wei, et al. Concept of cognitive electronic warfare and it’s key technology ［J］. Journal of Equipment Academy, 2015, 26(4): 96？？100.
[8]	［3］YANG D, XUE G, ZHANG J, et al. Coping with a smart jammer in wireless networks: a Stackelberg game approach ［J］. IEEE Transactions on Wireless Communications, 2013, 12(8): 4038？？4047.
[9]	［4］WANG Beibei, WU Yongle, LIU K J R. An anti？？jamming stochastic game in cognitive radio networks ［J］. IEEE Journal on Selected Areas in Communications, 2011, 29(4): 877？？889.
[10]	［8］XIA Wei, LI Huiyun, LI Baopu. A control strategy of autonomous vehicles based on deep reinforcement learning ［C］∥Proceedings of 2016 9th International Symposium on Computational Intelligence and Design. Piscataway, NJ, USA: IEEE, 2016: 16637586.
[11]	［9］GWON Y L, DASTANGOO S, FOSSA C E, et al. Competing mobile network game: embracing antijamming and jamming strategies with reinforcement learning ［C］∥2013 IEEE Conference on Communications and Networks Security. Piscataway, NJ, USA: IEEE, 2013: 28？？36.
[12]	［10］SLIMENI F, SCHEERS B, NIR V L, et al. Learning multi channel power allocation against smart jammer in cognitive radio networks ［C］∥2016 International Conference on Military Communications and Information Systems. Piscataway, NJ, USA: IEEE, 2016: 7496544.
[13]	HUANG Keju, YANG Junan, ZHOU Jihang, et al. Cognitive jamming in IEEE 802？？11 MAC based on multi？？armed bandits ［J］. Communication Countermeasures, 2017, 36(2): 30？？35.
[14]	［14］CHU C Y, ITO S, HARADA T, et al. Position？？based reinforcement learning biased MCTS for general video game playing ［C］∥IEEE Conference on Computational Intelligence and Games. Piscataway, NJ, USA: IEEE, 2017: 7860449.
[15]	［1］张春磊, 杨小牛. 认知电子战与认知电子战系统研究［J］. 中国电子科学研究院学报, 2014, 9(6): 551？？555.
[16]	ZHANG Chunlei, YANG Xiaoniu. Research on the cognitive electronic warfare and cognitive electronic warfare system ［J］. Journal of CAEIT, 2014, 9(6): 551？？555.
[17]	［2］贾鑫, 朱卫纲, 曲卫, 等. 认知电子战概念及关键技术［J］. 装备学院学报, 2015, 26(4): 96？？100.

Full-Text

Contact Us

[email protected]

QQ:3279437679

WhatsApp +8615387084133

采用双层强化学习的干扰决策算法An Algorithm for Jamming Decision Using Dual Reinforcement Learning

采用双层强化学习的干扰决策算法
An Algorithm for Jamming Decision Using Dual Reinforcement Learning