%0 Journal Article
%T Expectation-maximization Policy Search with Parameter-based Exploration<br>基于参数探索的期望最大化策略搜索
%A CHENG Yu-Hu
%A FENG Huan-Ting
%A WANG Xue-Song
%A <br>程玉虎
%A 冯涣婷
%A 王雪松
%J 自动化学报
%D 2012
%I 
%X In order to reduce large variance of gradient estimation resulted from stochastic exploration strategy, a kind of expectation-maximization policy search reinforcement learning with parameter-based exploration is proposed. At first, a probability distribution over the parameters of a controller is used to define a policy. Secondly, samples are collected by directly sampling in the controller parameter space according to the probability distribution for several times. During the sample-collection procedure of each episode, because the selected actions are deterministic, sampling from the defined policy leads to a small variance in the samples, which can reduce the variance of gradient estimation. At last, based on the collected samples, policy parameters are iteratively updated by maximizing the lower bound of the expected return function. In order to reduce the time-consumption and to lower the cost of sampling, an importance sampling technique is used to repeatedly use samples collected from policy update process. Simulation results on two continuous-space control problems illustrate that the proposed policy search method can not only obtain the most optimal policy but also improve the convergence speed as compared with several policy search reinforcement learning methods with action-based stochastic exploration, thus has a better learning performance.
%K Policy search
%K reinforcement learning
%K parameter space
%K exploration
%K expectation-maximization (EM)
%K importance sampling<br>策略搜索
%K 强化学习
%K 参数空间
%K 探索
%K 期望最大化
%K 重要采样
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=E76622685B64B2AA896A7F777B64EB3A&aid=7541B02586A94E58A4F3F484F4E822D3&yid=99E9153A83D4CB11&vid=16D8618C6164A3ED&iid=CA4FD0336C81A37A&sid=16D8618C6164A3ED&eid=94E7F66E6C42FA23&journal_id=0254-4156&journal_name=自动化学报&referenced_num=0&reference_num=18