%0 Journal Article %T 随机线性二次控制的资格迹方法
Eligibility Trace Method for Stochastic Linear Quadratic Control %A 朱亚楠 %J Pure Mathematics %P 416-432 %@ 2160-7605 %D 2024 %I Hans Publishing %R 10.12677/PM.2024.141041 %X 本文研究了强化学习方法在线性二次控制问题(LQR)中的应用。在LQR问题的研究中,常见的方法通过求解代数黎卡提方程得到最优控制,并不直接优化控制增益。本文在策略梯度算法的基础上引入资格迹方法,直接优化控制增益矩阵。考虑已知和未知参数两种情况下,资格迹方法的收敛。在有限时域和高斯噪声的条件下,分别给出了已知和未知参数两种情况下算法的全局收敛保证。参数未知时,利用零阶优化定理近似梯度项,这可以将问题扩展至代价函数非凸的情况。数值模拟结果显示资格迹方法与梯度下降算法相比更快收敛,方差更小。
This paper studies the application of reinforcement learning method to linear quadratic regulator (LQR) problem. For the study of LQR problem, the usual method is to obtain the optimal control by solving the algebraic Riccati equation, but not to optimize the control gain directly. This paper op-timizes the control gain directly, proposes the eligibility trace method on the basis of gradient de-scent algorithm, and produces global convergence guarantee in the case of known and unknown parameters, in the setting of finite time horizon and Gaussian noise. When the parameters are unknown, the zero-order optimization theorem is used to approximate the gradient term, which can extend the problem to cases where the cost function is not convex. Numerical simulation results show that the eligibility trace method has faster convergence and smaller variance than gradient descent algorithm. %K 线性二次最优控制,梯度下降,资格迹
Linear Quadratic Optimal Control %K Gradient Descent %K Eligibility Traces %U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=80335