In this paper, a
collection of value-based quantum reinforcement learning algorithms are
introduced which use Grover’s algorithm to update the policy, which is stored
as a superposition of qubits associated with each possible action, and their
parameters are explored. These algorithms may be grouped in two classes, one
class which uses value functions (V(s)) and new class which
uses action value functions (Q(s,a)). The
new (Q(s,a))-based quantum algorithms
are found to converge faster than V(s)-based algorithms, and
in general the quantum algorithms are found to converge in fewer iterations
than their classical counterparts, netting larger returns during training. This
is due to fact that the (Q(s,a)) algorithms are more
precise than those based on
References
[1]
Brown, B. (2015) Q-Learning with Neural Networks.
[2]
Dong, D., Chen, C., Li, H. and Tarn, T.-J. (2008) Quantum Reinforcement Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 38, 1207-1220. https://doi.org/10.1109/TSMCB.2008.925743
[3]
Duryea, E., Ganger, M. and Hu, W. (2016) Exploring Deep Reinforcement Learning with Multi Q-Learning. Intelligent Control and Automation, 7, 129.
https://doi.org/10.4236/ica.2016.74012
[4]
Grover, L.K. (1996) A Fast Quantum Mechanical Algorithm for Database Search. Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, Philadelphia, 22-24 May 1996, 212-219. https://doi.org/10.1145/237814.237866
[5]
Hasselt, H.V. (2010) Double Q-Learning. Advances in Neural Information Processing Systems, 23, 2613-2621.
[6]
Kober, J., Bagnell, J.A. and Peters, J. (2013) Reinforcement Learning in Robotics: A Survey. The International Journal of Robotics Research, 32, 1238-1274.
https://doi.org/10.1177/0278364913495721
[7]
Lee, J.W. (2001) Stock Price Prediction Using Reinforcement Learning. 2001 IEEE International Symposium on Industrial Electronics Proceedings, Pusan, 12-16 June 2001, 690-695.
[8]
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. and Riedmiller, M. (2013) Playing Atari with Deep Reinforcement Learning. arXiv preprint arXiv:1312.5602.
[9]
Shor, P.W. (1994) Algorithms for Quantum Computation: Discrete Logarithms and Factoring. Proceedings 35th Annual Symposium on Foundations of Computer Science, Santa Fe, 20-22 November 1994, 124-134.
https://doi.org/10.1109/SFCS.1994.365700
[10]
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016) Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, 529, 484-489. https://doi.org/10.1038/nature16961
[11]
Sutton, R.S and Barto, A.G. (1998) Reinforcement Learning: An Introduction, Volume 1. MIT Press, Cambridge.
[12]
Tromp, J. (2016) Number of legal Go Positions.
[13]
Van Hasselt, H., Guez, A. and Silver, D. (2016) Deep Reinforcement Learning with Double Q-Learning. AAAI, 2094-2100.
[14]
IBM (2017) IBM Builds Its Most Powerful Universal Quantum Computing Processors.
[15]
Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N. and Lloyd, S. (2017) Quantum Machine Learning. Nature, 549, 195. https://doi.org/10.1038/nature23474
[16]
Cárdenas-López, F.A., Lamata, L., Retamal, J.C. and Solano, E. (2017) Generalized Quantum Reinforcement Learning with Quantum Technologies. PLoS ONE, 13, e0200455. arXiv:1709.07848
[17]
Cao, Y., Guerreschi, G.G. and Aspuru-Guzik, A. (2017) Quantum Neuron: An Elementary Building Block for Machine Learning on Quantum Computers. arXiv:1711.11240
[18]
Crawford, D., Levit, A., Ghadermarzy, N., Oberoi, J.S. and Ronagh, P. (2016) Reinforcement Learning Using Quantum Boltzmann Machines. arXiv:1612.05695
[19]
Hu, W. (2018) Empirical Analysis of Decision Making of an AI Agent on IBM 5Q Quantum Computer. Natural Science, 10, 45.
https://doi.org/10.4236/ns.2018.101004
[20]
Hu, W. (2018) Towards a Real Quantum Neuron. Natural Science, 10, 99.
https://doi.org/10.4236/ns.2018.103011
[21]
Levit, A., Crawford, D., Ghadermarzy, N., Oberoi, J.S., Zahedinejad, E. and Ronagh, P. (2017) Free Energy-Based Reinforcement Learning Using a Quantum Processor. arXiv:1706.00074
[22]
Nielsen, M.A. and Chuang, I.L. (2011) Quantum Computation and Quantum Information: 10th Anniversary Edition. 10th Edition, Cambridge University Press, New York.
[23]
Sallans, B. and Hinton, G.E. (2004) Reinforcement Learning with Factored States and Actions. Journal of Machine Learning Research, 5, 1063-1088.
[24]
Sriarunothai, T., Wolk, S., Giri, G.S., Fries, N., Dunjko, V., Briegel, H.J. and Wunderlich, C. (2017) Speeding-Up the Decision Making of a Learning Agent Using an Ion Trap Quantum Processor. arXiv:1709.01366
[25]
Watkins, C.J.C.H. (1989) Learning from Delayed Rewards. PhD Thesis, University of Cambridge, Cambridge.