全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
Robotics  2013 

An Improved Reinforcement Learning System Using Affective Factors

DOI: 10.3390/robotics2030149

Keywords: multi-agent system (MAS), computational motivation function, circumplex model of affect, pursuit problem, reinforcement learning (RL)

Full-Text   Cite this paper   Add to My Lib

Abstract:

As a powerful and intelligent machine learning method, reinforcement learning (RL) has been widely used in many fields such as game theory, adaptive control, multi-agent system, nonlinear forecasting, and so on. The main contribution of this technique is its exploration and exploitation approaches to find the optimal solution or semi-optimal solution of goal-directed problems. However, when RL is applied to multi-agent systems (MASs), problems such as “curse of dimension”, “perceptual aliasing problem”, and uncertainty of the environment constitute high hurdles to RL. Meanwhile, although RL is inspired by behavioral psychology and reward/punishment from the environment is used, higher mental factors such as affects, emotions, and motivations are rarely adopted in the learning procedure of RL. In this paper, to challenge agents learning in MASs, we propose a computational motivation function, which adopts two principle affective factors “Arousal” and “Pleasure” of Russell’s circumplex model of affects, to improve the learning performance of a conventional RL algorithm named Q-learning (QL). Compared with the conventional QL, computer simulations of pursuit problems with static and dynamic preys were carried out, and the results showed that the proposed method results in agents having a faster and more stable learning performance.

References

[1]  Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; The MIT Press: Cambridge, MA, USA, 1998.
[2]  Doya, K. Metalearning and neuromodulation. Neural Netw. 2002, 15, 495–506, doi:10.1016/S0893-6080(02)00044-8.
[3]  Asada, M.; Uchibe, E.; Hosoda, K. Cooperative behavior acquisition for mobile robots in dynamically changing real worlds via vision-based reinforcement learning and development. Artif. Intell. 1999, 110, 275–292, doi:10.1016/S0004-3702(99)00026-0.
[4]  Kollar, T.; Roy, N. Trajectory optimization using reinforcement learning for map exploration. Int. J. Robot. Res. 2008, 27, 175–196, doi:10.1177/0278364907087426.
[5]  Jouffe, L. Fuzzy inference system learning by reinforcement learning. IEEE Trans. Syst. Man Cybern. B 1998, 28, 338–355, doi:10.1109/5326.704563.
[6]  Obayashi, M.; Nakahara, N.; Kuremoto, T.; Kobayashi, K. A robust reinforcement learning using concept of slide mode control. Artif. Life Robot. 2009, 13, 526–530, doi:10.1007/s10015-008-0608-3.
[7]  Kuremoto, T.; Obayashi, M.; Yamamoto, A.; Kobayashi, K. Predicting Chaotic Time Series by Reinforcement Learning. In Proceedings of the 2nd International Conference on Computational Intelligence, Robotics, and Autonomous Systems, Singapore, 15–18 December 2003.
[8]  Kuremoto, T.; Obayashi, M.; Kobayashi, K. Nonlinear prediction by reinforcement learning. Lect. Note. Comput. Sci. 2005, 3644, 1085–1094.
[9]  Kuremoto, T.; Obayashi, M.; Kobayashi, K. Forecasting Time Series by SOFNN with Reinforcement Learning. In Proceedings of the 27th Annual International Symposium on Forecasting, Neural Forecasting Competition (NN3), New York, NY, USA, 24–27 June 2007.
[10]  Kuremoto, T.; Obayashi, M.; Kobayashi, K. Neural forecasting systems. In Reinforcement Learning, Theory and Applications; Weber, C., Elshaw, M., Mayer, N.M., Eds.; InTech: Vienna, Austria, 2008; pp. 1–20.
[11]  Kuremoto, T.; Obayashi, M.; Kobayashi, K.; Adachi, H.; Yoneda, K. A Reinforcement Learning System for Swarm Behaviors. In Proceedings of IEEE World Congress Computational Intelligence (WCCI/IJCNN 2008), Hong Kong, 1–6 June 2008; pp. 3710–3715.
[12]  Kuremoto, T.; Obayashi, M.; Kobayashi, K. Swarm behavior acquisition by a neuro-fuzzy system and reinforcement learning algorithm. Int. J. Intell. Comput. Cybern. 2009, 2, 724–744, doi:10.1108/17563780911005854.
[13]  Kuremoto, T.; Obayashi, M.; Kobayashi, K.; Adachi, H.; Yoneda, K. A neuro-fuzzy learning system for adaptive swarm behaviors dealing with continuous state space. Lect. Notes Comput. Sci. 2008, 5227, 675–683.
[14]  Kuremoto, T.; Obayashi, M.; Kobayashi, K. An improved internal model for swarm formation and adaptive swarm behavior acquisition. J. Circuit. Syst. Comput. 2009, 18, 1517–1531, doi:10.1142/S0218126609005836.
[15]  Sycara, K.P. Multi-agent systems. Artif. Intell. Mag. 1998, 19, 79–92.
[16]  Mataric, J. Reinforcement learning in multi-robot domain. Auton. Robot. 1997, 4, 77–93, doi:10.1023/A:1008819414322.
[17]  Makar, R.; Mahadevan, S. Hierarchical multi agent reinforcement learning. Adv. Neural Inf. Process. Syst. 2000, 12, 345–352.
[18]  Kobayashi, K.; Kurano, T.; Kuremoto, T.; Obayashi, M. Cooperative behavior acquisition using attention degree. Lect. Notes Comput. Sci. 2012, 7665, 537–544.
[19]  Barto, A.G.; Sutton, R.S.; Anderson, C.W. Neuron-like adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man. Cybern. 1983, 13, 834–846.
[20]  Sutton, R.S. Learning to predict by the method of temporal difference. Mach. Learn. 1988, 3, 9–44.
[21]  Watkins, C.; Dayan, P. Technical note: Q-learning. Mach. Learn. 1992, 8, 55–68.
[22]  Konda, V.R.; Tsitsiklis, J.N. Actor-critic algorithms. Adv. Neural Inf. Process. 2000, 12, 1008–1014.
[23]  LeDoux, J.E. The Emotional Brain: The Mysterious Underpinnings of Emotional Life; Siman & Schuster: New York, NY, USA, 1996.
[24]  Greenberg, L. Emotion and cognition in psychotherapy: The transforming power of affect. Can. Psychol. 2008, 49, 49–59, doi:10.1037/0708-5591.49.1.49.
[25]  Sato, S.; Nozawa, A.; Ide, H. Characteristics of behavior of robots with emotion model. IEEJ Trans. Electron. Inf. Syst. 2004, 124, 1390–1395.
[26]  Kusano, T.; Nozawa, A.; Ide, H. Emergent of burden sharing of robots with emotion model (in Japanese). IEEJ Trans. Electron. Inf. Syst. 2005, 125, 1037–1042.
[27]  Larsen, R.J.; Diener, E. Promises and problems with the circumplex model of emotion. In Review of Personality and Social Psychology; Clark, M.S., Ed.; Sage: Newbury Park, CA, USA, 1992; Volume 13, pp. 25–59.
[28]  Russell, J.A. A circumplex model of affect. J. Personal. Soc. Psychol. 1980, 39, 1161–1178, doi:10.1037/h0077714.
[29]  Kuremoto, T.; Obayashi, M.; Kobayashi, K.; Feng, L.-B. Autonomic behaviors of swarm robots driven by emotion and curiosity. Lect. Notes Comput. Sci. 2010, 6630, 541–547.
[30]  Kuremoto, T.; Obayashi, M.; Kobayashi, K.; Feng, L.-B. An improved internal model of autonomous robot by a psychological approach. Cogn. Comput. 2011, 3, 501–509, doi:10.1007/s12559-011-9102-7.
[31]  Russell, J.A.; Feldman Barrett, L. Core affect, prototypical emotional episodes, and other things called emotion: Dissecting the elephant. J. Personal. Soc. Psychol. 1999, 76, 805–819, doi:10.1037/0022-3514.76.5.805.
[32]  Russell, J.A. Core affect and the psychological construction of emotion. Psychol. Rev. 2003, 110, 145–172, doi:10.1037/0033-295X.110.1.145.
[33]  Wundn, W. Outlines of Psychology; Wilhem Englemann: Leipzig, Germany, 1897.
[34]  Ortony, A.; Clore, G.; Collins, A. The Cognitive Structure of Emotions; Cambridge University Press: Cambridge, UK, 1988.
[35]  Jaakkola, T.; Singh, S.P.; Jordan, M.I. Reinforcement learning algorithm for partially observable Markov decision problems. Adv. Neural Inf. Process. Syst. 1994, 7, 345–352.
[36]  Agogino, A.K.; Tumer, K. Quicker Q-Learning in Multi-Agent Systems. Available online: http://archive.org/details/nasa_techdoc_20050182925 (accessed on 30 May 2013).
[37]  Augustine, A.A.; Hemenover, S.H.; Larsen, R.J.; Shulman, T.E. Composition and consistency of the desired affective state: The role of personality and motivation. Motiv. Emot. 2010, 34, 133–143, doi:10.1007/s11031-010-9162-0.
[38]  Watanabe, S.; Obayashi, M.; Kuremoto, T.; Kobayashi, K. A New Decision-Making System of an Agent Based on Emotional Models in Multi-Agent System. In Proceedings of the 18th International Symposium on Artificial Life and Robotics, Daejeon, Korea, 30 January–1 February 2013; pp. 452–455.
[39]  Aleksander, I. Designing conscious systems. Cogn. Comput. 2009, 1, 22–28, doi:10.1007/s12559-009-9008-9.

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413