%0 Journal Article
%T Distributional Reinforcement Learning with Quantum Neural Networks
%A Wei Hu
%A James Hu
%J Intelligent Control and Automation
%P 63-78
%@ 2153-0661
%D 2019
%I Scientific Research Publishing
%R 10.4236/ica.2019.102004
%X <div style=\"text-align:justify;\">
	Traditional reinforcement learning (RL) uses the return, also known as the expected value of cumulative random rewards, for training an agent to learn an optimal policy. However, recent research indicates that learning the distribution over returns has distinct advantages over learning their expected value as seen in different RL tasks. The shift from using the expectation of returns in traditional RL to the distribution over returns in distributional RL has provided new insights into the dynamics of RL. This paper builds on our recent work investigating the quantum approach towards RL. Our work implements the quantile regression (QR) distributional <i>Q</i> learning with a quantum neural network. This quantum network is evaluated in a grid world environment with a different number of quantiles, illustrating its detailed influence on the learning of the algorithm. It is also compared to the standard quantum <i>Q</i> learning in a Markov Decision Process (MDP) chain, which demonstrates that the quantum QR distributional <i>Q</i> learning can explore the environment more efficiently than the standard quantum <i>Q</i> learning. Efficient exploration and balancing of exploitation and exploration are major challenges in RL. Previous work has shown that more informative actions can be taken with a distributional perspective. Our findings suggest another cause for its success: the enhanced performance of distributional RL can be partially attributed to its superior ability to efficiently explore the environment.
</div>
%K Continuous-Variable Quantum Computers
%K Quantum Reinforcement Learning
%K Distributional Reinforcement Learning
%K Quantile Regression
%K Distributional &lt
%K i&gt
%K Q&lt
%K /i&gt
%K Learning
%K Grid World Environment
%K MDP Chain Environment
%U http://www.scirp.org/journal/PaperInformation.aspx?PaperID=91668