OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

自动化学报 2012

Expectation-maximization Policy Search with Parameter-based Exploration
基于参数探索的期望最大化策略搜索

CHENG Yu-Hu,FENG Huan-Ting,WANG Xue-Song,
程玉虎,冯涣婷,王雪松

Keywords: Policy search,reinforcement learning,parameter space,exploration,expectation-maximization (EM),importance sampling
策略搜索,强化学习,参数空间,探索,期望最大化,重要采样

Full-Text Cite this paper Add to My Lib

Abstract:

In order to reduce large variance of gradient estimation resulted from stochastic exploration strategy, a kind of expectation-maximization policy search reinforcement learning with parameter-based exploration is proposed. At first, a probability distribution over the parameters of a controller is used to define a policy. Secondly, samples are collected by directly sampling in the controller parameter space according to the probability distribution for several times. During the sample-collection procedure of each episode, because the selected actions are deterministic, sampling from the defined policy leads to a small variance in the samples, which can reduce the variance of gradient estimation. At last, based on the collected samples, policy parameters are iteratively updated by maximizing the lower bound of the expected return function. In order to reduce the time-consumption and to lower the cost of sampling, an importance sampling technique is used to repeatedly use samples collected from policy update process. Simulation results on two continuous-space control problems illustrate that the proposed policy search method can not only obtain the most optimal policy but also improve the convergence speed as compared with several policy search reinforcement learning methods with action-based stochastic exploration, thus has a better learning performance.

Full-Text

Contact Us

[email protected]

QQ:3279437679

WhatsApp +8615387084133

Expectation-maximization Policy Search with Parameter-based Exploration基于参数探索的期望最大化策略搜索

Expectation-maximization Policy Search with Parameter-based Exploration
基于参数探索的期望最大化策略搜索