全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

模型未知非零和博弈问题的策略迭代算法

DOI: 10.3969/j.issn.1005-3026.2015.03.004, PP. 318-322

Keywords: 自适应动态规划,非零和博弈,策略迭代,神经网络,最优控制

Full-Text   Cite this paper   Add to My Lib

Abstract:

提出了一种在线积分策略迭代算法,用来求解内部非线性动力模型未知的双人非零和博弈问题.通过在控制策略和干扰策略中引入探测信号,从而避开了系统的模型信息,得到了一个求解非零和博弈的无模型的近似动态规划算法.该算法同步更新值函数、控制策略、扰动策略,并且最终得到收敛的策略权值.在算法实现过程中,使用4个神经网络分别近似两个值函数、控制策略和扰动策略,使用最小二乘法估计神经网络的未知参数.最后仿真结果验证了算法的有效性.

References

[1]  Vamvoudakis K G, Lewis F L.Multi-player non-zero-sum games:online adaptive learning solution of coupled Hamilton-Jacobi equations[J].Automatica,2011,47(8):1556-1569.
[2]  张化光,张欣,罗艳红,等,自适应动态规划综述[J].自动化学报,2013,39(4):303-311.(Zhang Hua-Guang,Zhang Xin,Luo Yan-Hong ,et al.An overview of research on adaptive dynamic programming[J].ACTA Automatica Sinica,2013,39(4):303-311.)
[3]  刘德荣,李宏亮,王鼎.基于数据的自学习优化控制:研究进展与展望[J].自动化学报,2013,39(11):1858-1870.(Liu De-rong,Li Hong-liang,Wang Ding.Data-based self-learning optimal control:research progress and prospects[J].ACTA Automatica Sinica,2013,39(11):1858-1870.)
[4]  Abu-Khalaf M, Lewis F L,Jie H.Neurodynamic programming and zero-sum games for constrained control systems[J].IEEE Transactions on Neural Networks,2008,19(7):1243-1252.
[5]  Al-Tamimi A, Abu-Khalaf M,Lewis F L.Adaptive critic designs for discrete-time zero-sum games with application to H infinity control[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B:Cybernetics,2007,37(1):240-247.
[6]  Zhang H,Wei Q,Liu D.An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games[J].Automatica,2011,47(1):207-214.
[7]  Vrabie D, Lewis F.Integral reinforcement learning for online computation of feedback Nash strategies of nonzero-sum differential games[C]// 2010 49th IEEE Conference on Decision and Control(CDC).Atlanta,2010:3066-3071.
[8]  Huaguang Z,Lili C,Yanhong L.Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP[J].IEEE Transactions on Cybernetics,2013,43(1):206-216.
[9]  Jiang Y,Jiang Z P.Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics[J].Automatica,2012,48(10):2699-2704.
[10]  Li H,Liu D,Wang D.Integral policy iteration for zero-sum games with completely unknown nonlinear dynamics[C]// Neural Information Processing,20th International Conference,ICONIP 2013.Berlin Heidelberg:Springer,2013:225-232.
[11]  Gajic Z,Li T Y.Simulation results for two new algorithms for solving coupled algebraic Riccati equations[C]//In Third International Jymposium.on Differential Games.Nice,1988.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413