OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

东北大学学报:自然科学版 2015

模型未知非零和博弈问题的策略迭代算法

DOI: 10.3969/j.issn.1005-3026.2015.03.004, PP. 318-322

杨明,罗艳红,王义贺

Keywords: 自适应动态规划,非零和博弈,策略迭代,神经网络,最优控制

Full-Text Cite this paper Add to My Lib

Abstract:

提出了一种在线积分策略迭代算法，用来求解内部非线性动力模型未知的双人非零和博弈问题.通过在控制策略和干扰策略中引入探测信号，从而避开了系统的模型信息，得到了一个求解非零和博弈的无模型的近似动态规划算法.该算法同步更新值函数、控制策略、扰动策略，并且最终得到收敛的策略权值.在算法实现过程中，使用4个神经网络分别近似两个值函数、控制策略和扰动策略，使用最小二乘法估计神经网络的未知参数.最后仿真结果验证了算法的有效性.

References

[1]	Vamvoudakis K G， Lewis F L.Multi-player non-zero-sum games:online adaptive learning solution of coupled Hamilton-Jacobi equations［J］.Automatica，2011，47(8):1556-1569.
[2]	张化光，张欣，罗艳红，等，自适应动态规划综述［J］.自动化学报，2013，39(4):303-311.(Zhang Hua-Guang，Zhang Xin，Luo Yan-Hong ，et al.An overview of research on adaptive dynamic programming［J］.ACTA Automatica Sinica，2013，39(4):303-311.)
[3]	刘德荣，李宏亮，王鼎.基于数据的自学习优化控制:研究进展与展望［J］.自动化学报，2013，39(11):1858-1870.(Liu De-rong，Li Hong-liang，Wang Ding.Data-based self-learning optimal control:research progress and prospects［J］.ACTA Automatica Sinica，2013，39(11):1858-1870.)
[4]	Abu-Khalaf M， Lewis F L，Jie H.Neurodynamic programming and zero-sum games for constrained control systems［J］.IEEE Transactions on Neural Networks，2008，19(7):1243-1252.
[5]	Al-Tamimi A， Abu-Khalaf M，Lewis F L.Adaptive critic designs for discrete-time zero-sum games with application to H infinity control［J］.IEEE Transactions on Systems，Man，and Cybernetics，Part B:Cybernetics，2007，37(1):240-247.
[6]	Zhang H，Wei Q，Liu D.An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games［J］.Automatica，2011，47(1):207-214.
[7]	Vrabie D， Lewis F.Integral reinforcement learning for online computation of feedback Nash strategies of nonzero-sum differential games［C］// 2010 49th IEEE Conference on Decision and Control(CDC).Atlanta，2010:3066-3071.
[8]	Huaguang Z，Lili C，Yanhong L.Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP［J］.IEEE Transactions on Cybernetics，2013，43(1):206-216.
[9]	Jiang Y，Jiang Z P.Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics［J］.Automatica，2012，48(10):2699-2704.
[10]	Li H，Liu D，Wang D.Integral policy iteration for zero-sum games with completely unknown nonlinear dynamics［C］// Neural Information Processing，20th International Conference，ICONIP 2013.Berlin Heidelberg:Springer，2013:225-232.
[11]	Gajic Z，Li T Y.Simulation results for two new algorithms for solving coupled algebraic Riccati equations［C］//In Third International Jymposium.on Differential Games.Nice，1988.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413