As a powerful and intelligent machine learning method, reinforcement learning (RL) has been widely used in many fields such as game theory, adaptive control, multi-agent system, nonlinear forecasting, and so on. The main contribution of this technique is its exploration and exploitation approaches to find the optimal solution or semi-optimal solution of goal-directed problems. However, when RL is applied to multi-agent systems (MASs), problems such as “curse of dimension”, “perceptual aliasing problem”, and uncertainty of the environment constitute high hurdles to RL. Meanwhile, although RL is inspired by behavioral psychology and reward/punishment from the environment is used, higher mental factors such as affects, emotions, and motivations are rarely adopted in the learning procedure of RL. In this paper, to challenge agents learning in MASs, we propose a computational motivation function, which adopts two principle affective factors “Arousal” and “Pleasure” of Russell’s circumplex model of affects, to improve the learning performance of a conventional RL algorithm named Q-learning (QL). Compared with the conventional QL, computer simulations of pursuit problems with static and dynamic preys were carried out, and the results showed that the proposed method results in agents having a faster and more stable learning performance.
References
[1]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; The MIT Press: Cambridge, MA, USA, 1998.
[2]
Doya, K. Metalearning and neuromodulation. Neural Netw. 2002, 15, 495–506, doi:10.1016/S0893-6080(02)00044-8.
[3]
Asada, M.; Uchibe, E.; Hosoda, K. Cooperative behavior acquisition for mobile robots in dynamically changing real worlds via vision-based reinforcement learning and development. Artif. Intell. 1999, 110, 275–292, doi:10.1016/S0004-3702(99)00026-0.
[4]
Kollar, T.; Roy, N. Trajectory optimization using reinforcement learning for map exploration. Int. J. Robot. Res. 2008, 27, 175–196, doi:10.1177/0278364907087426.
[5]
Jouffe, L. Fuzzy inference system learning by reinforcement learning. IEEE Trans. Syst. Man Cybern. B 1998, 28, 338–355, doi:10.1109/5326.704563.
[6]
Obayashi, M.; Nakahara, N.; Kuremoto, T.; Kobayashi, K. A robust reinforcement learning using concept of slide mode control. Artif. Life Robot. 2009, 13, 526–530, doi:10.1007/s10015-008-0608-3.
[7]
Kuremoto, T.; Obayashi, M.; Yamamoto, A.; Kobayashi, K. Predicting Chaotic Time Series by Reinforcement Learning. In Proceedings of the 2nd International Conference on Computational Intelligence, Robotics, and Autonomous Systems, Singapore, 15–18 December 2003.
[8]
Kuremoto, T.; Obayashi, M.; Kobayashi, K. Nonlinear prediction by reinforcement learning. Lect. Note. Comput. Sci. 2005, 3644, 1085–1094.
[9]
Kuremoto, T.; Obayashi, M.; Kobayashi, K. Forecasting Time Series by SOFNN with Reinforcement Learning. In Proceedings of the 27th Annual International Symposium on Forecasting, Neural Forecasting Competition (NN3), New York, NY, USA, 24–27 June 2007.
[10]
Kuremoto, T.; Obayashi, M.; Kobayashi, K. Neural forecasting systems. In Reinforcement Learning, Theory and Applications; Weber, C., Elshaw, M., Mayer, N.M., Eds.; InTech: Vienna, Austria, 2008; pp. 1–20.
[11]
Kuremoto, T.; Obayashi, M.; Kobayashi, K.; Adachi, H.; Yoneda, K. A Reinforcement Learning System for Swarm Behaviors. In Proceedings of IEEE World Congress Computational Intelligence (WCCI/IJCNN 2008), Hong Kong, 1–6 June 2008; pp. 3710–3715.
[12]
Kuremoto, T.; Obayashi, M.; Kobayashi, K. Swarm behavior acquisition by a neuro-fuzzy system and reinforcement learning algorithm. Int. J. Intell. Comput. Cybern. 2009, 2, 724–744, doi:10.1108/17563780911005854.
[13]
Kuremoto, T.; Obayashi, M.; Kobayashi, K.; Adachi, H.; Yoneda, K. A neuro-fuzzy learning system for adaptive swarm behaviors dealing with continuous state space. Lect. Notes Comput. Sci. 2008, 5227, 675–683.
[14]
Kuremoto, T.; Obayashi, M.; Kobayashi, K. An improved internal model for swarm formation and adaptive swarm behavior acquisition. J. Circuit. Syst. Comput. 2009, 18, 1517–1531, doi:10.1142/S0218126609005836.
LeDoux, J.E. The Emotional Brain: The Mysterious Underpinnings of Emotional Life; Siman & Schuster: New York, NY, USA, 1996.
[24]
Greenberg, L. Emotion and cognition in psychotherapy: The transforming power of affect. Can. Psychol. 2008, 49, 49–59, doi:10.1037/0708-5591.49.1.49.
[25]
Sato, S.; Nozawa, A.; Ide, H. Characteristics of behavior of robots with emotion model. IEEJ Trans. Electron. Inf. Syst. 2004, 124, 1390–1395.
[26]
Kusano, T.; Nozawa, A.; Ide, H. Emergent of burden sharing of robots with emotion model (in Japanese). IEEJ Trans. Electron. Inf. Syst. 2005, 125, 1037–1042.
[27]
Larsen, R.J.; Diener, E. Promises and problems with the circumplex model of emotion. In Review of Personality and Social Psychology; Clark, M.S., Ed.; Sage: Newbury Park, CA, USA, 1992; Volume 13, pp. 25–59.
[28]
Russell, J.A. A circumplex model of affect. J. Personal. Soc. Psychol. 1980, 39, 1161–1178, doi:10.1037/h0077714.
[29]
Kuremoto, T.; Obayashi, M.; Kobayashi, K.; Feng, L.-B. Autonomic behaviors of swarm robots driven by emotion and curiosity. Lect. Notes Comput. Sci. 2010, 6630, 541–547.
[30]
Kuremoto, T.; Obayashi, M.; Kobayashi, K.; Feng, L.-B. An improved internal model of autonomous robot by a psychological approach. Cogn. Comput. 2011, 3, 501–509, doi:10.1007/s12559-011-9102-7.
[31]
Russell, J.A.; Feldman Barrett, L. Core affect, prototypical emotional episodes, and other things called emotion: Dissecting the elephant. J. Personal. Soc. Psychol. 1999, 76, 805–819, doi:10.1037/0022-3514.76.5.805.
[32]
Russell, J.A. Core affect and the psychological construction of emotion. Psychol. Rev. 2003, 110, 145–172, doi:10.1037/0033-295X.110.1.145.
[33]
Wundn, W. Outlines of Psychology; Wilhem Englemann: Leipzig, Germany, 1897.
[34]
Ortony, A.; Clore, G.; Collins, A. The Cognitive Structure of Emotions; Cambridge University Press: Cambridge, UK, 1988.
Agogino, A.K.; Tumer, K. Quicker Q-Learning in Multi-Agent Systems. Available online: http://archive.org/details/nasa_techdoc_20050182925 (accessed on 30 May 2013).
[37]
Augustine, A.A.; Hemenover, S.H.; Larsen, R.J.; Shulman, T.E. Composition and consistency of the desired affective state: The role of personality and motivation. Motiv. Emot. 2010, 34, 133–143, doi:10.1007/s11031-010-9162-0.
[38]
Watanabe, S.; Obayashi, M.; Kuremoto, T.; Kobayashi, K. A New Decision-Making System of an Agent Based on Emotional Models in Multi-Agent System. In Proceedings of the 18th International Symposium on Artificial Life and Robotics, Daejeon, Korea, 30 January–1 February 2013; pp. 452–455.