全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

时间尺度多样性结合强化学习促进囚徒困境博弈中的合作
Time Scale Diversity Combined with Reinforcement Learning to Promote Cooperation in Prisoner’s Dilemma Game

DOI: 10.12677/ORF.2024.141012, PP. 131-139

Keywords: 社会困境,合作,Q学习,时间尺度
Social Dilemma
, Cooperation, Q-Learning, Time Scale

Full-Text   Cite this paper   Add to My Lib

Abstract:

演化博弈论为解决社会困境提供了关键框架,并且不一定局限于统一时间尺度,同时强化学习已被证明是研究博弈论中策略更新动态和智能体学习过程的有效方法。因此,本文研究了时间尺度机制结合自我关注Q学习算法对空间囚徒困境博弈中合作的影响。具体来说,博弈交互和策略更新具有不同的时间尺度,时间尺度多样性影响策略的概率更新公式,并且将自我关注Q学习算法当作策略更新规则。数值结果表明,在这样的框架下,能够显著地促进合作。最后,分析了影响Q学习的参数以及在不同的初始设置下验证了机制的鲁棒性。
Evolutionary game theory provides a key framework for solving social dilemmas, and it is not necessarily limited to a unified time scale. At the same time, reinforcement learning has been proven to be an effective method to study the strategy update dynamics and agent learning process in game theory. Therefore, this paper studies the influence of time scale mechanism combined with self-focused Q-learning algorithm on cooperation in spatial prisoner’s dilemma game. Specifically, game interaction and strategy update have different time scales. The diversity of time scales affects the probability update formula of the strategy, and the self-focused Q-learning algorithm is used as the strategy update rule. The numerical results show that under such a framework, cooperation can be significantly promoted. Finally, the parameters affecting Q-learning are analyzed and the robustness of the mechanism is verified under different initial settings.

References

[1]  Pennisi, E. (2009) On the Origin of Cooperation. Science, 325, 1196-1199.
https://doi.org/10.1126/science.325_1196
[2]  Axelrod, R. and Hamilton, W.D. (1981) The Evolution of Cooperation. Science, 211, 1390-1396.
https://doi.org/10.1126/science.7466396
[3]  Zhu, Y., Zhang, Z., Xia, C. and Chen, Z. (2023) Equilibrium Analysis and Incentive-Based Control of the Anticoordinating Networked Game Dynamics. Automatica, 147, Article ID: 110707.
https://doi.org/10.1016/j.automatica.2022.110707
[4]  Jian, Q., Li, X., Wang, J. and Xia, C. (2021) Impact of Reputation Assortment on Tag-Mediated Altruistic Behaviors in the Spatial Lattice. Applied Mathematics and Computation, 396, Article ID: 125928.
https://doi.org/10.1016/j.amc.2020.125928
[5]  Li, W.J., Chen, Z., Jin, K.Z., Wang, J., Yuan, L., Gu, C., Perc, M., et al. (2022) Options for Mobility and Network Reciprocity to Jointly Yield Robust Cooperation in Social Di-lemmas. Applied Mathematics and Computation, 435, Article ID: 127456.
https://doi.org/10.1016/j.amc.2022.127456
[6]  Zhang, J., Zhang, C., Chu, T. and Perc, M. (2011) Resolution of the Stochastic Strategy Spatial Prisoner’s Dilemma by Means of Particle Swarm Optimization. PLOS ONE, 6, e21787.
https://doi.org/10.1371/journal.pone.0021787
[7]  Nowak, M.A. and May, R.M. (1992) Evolutionary Games and Spatial Chaos. Nature, 359, 826-829.
https://doi.org/10.1038/359826a0
[8]  Wang, Z., Jusup, M., Wang, R.W., Shi, L., Iwasa, Y., Moreno, Y. and Kurths, J. (2017) Onymity Promotes Cooperation in Social Dilemma Experiments. Science Advances, 3, e1601444.
https://doi.org/10.1126/sciadv.1601444
[9]  Amaral, M.A., Wardil, L., Perc, M. and da Silva, J.K. (2016) Stochastic Win-Stay-Lose-Shift Strategy with Dynamic Aspirations in Evolutionary Social Dilemmas. Physical Re-view E, 94, Article ID: 032317.
https://doi.org/10.1103/PhysRevE.94.032317
[10]  Nowak, M.A. and Sigmund, K. (1992) Tit for Tat in Het-erogeneous Populations. Nature, 355, 250-253.
https://doi.org/10.1038/355250a0
[11]  Xia, C., Gracia-Lázaro, C. and Moreno, Y. (2020) Effect of Memory, Intolerance, and Second-Order Reputation on Cooperation. Chaos: An Interdisciplinary Journal of Nonlinear Sci-ence, 30, Article ID: 063122.
https://doi.org/10.1063/5.0009758
[12]  Molenmaker, W.E., de Kwaadsteniet, E.W. and van Dijk, E. (2016) The Impact of Personal Responsibility on the (un) Willingness to Punish Non-Cooperation and Reward Cooperation. Organizational Behavior and Human Decision Processes, 134, 1-15.
https://doi.org/10.1016/j.obhdp.2016.02.004
[13]  Wang, S., Chen, X. and Szolnoki, A. (2019) Exploring Op-timal Institutional Incentives for Public Cooperation. Communications in Nonlinear Science and Numerical Simu-lation, 79, Article ID: 104914.
https://doi.org/10.1016/j.cnsns.2019.104914
[14]  Li, X., Jusup, M., Wang, Z., Li, H., Shi, L., Podobnik, B., Boccaletti, S., et al. (2018) Punishment Diminishes the Benefits of Network Reciprocity in Social Dilemma Ex-periments. Proceedings of the National Academy of Sciences of the United States of America, 115, 30-35.
https://doi.org/10.1073/pnas.1707505115
[15]  Luo, C., Zhang, X. and Zheng, Y. (2017) Chaotic Evolution of Prisoner’s Dilemma Game with Volunteering on Interdependent Networks. Communications in Nonlinear Science and Numerical Simulation, 47, 407-415.
https://doi.org/10.1016/j.cnsns.2016.12.004
[16]  Guo, H., Song, Z., Ge?ek, S., Li, X., Jusup, M., Perc, M., Wang, Z., et al. (2020) A Novel Route to Cyclic Dominance in Voluntary Social Dilemmas. Journal of the Royal Society Interface, 17, Article ID: 20190789.
https://doi.org/10.1098/rsif.2019.0789
[17]  Gross, J. and De Dreu, C.K. (2019) The Rise and Fall of Cooper-ation through Reputation and Group Polarization. Nature Communications, 10, Article No. 776.
https://doi.org/10.1038/s41467-019-08727-8
[18]  Pal, A. and Sengupta, S. (2022) Network Rewiring Promotes Cooperation in an Aspirational Learning Model. Chaos: An Interdisciplinary Journal of Nonlinear Science, 32, Ar-ticle ID: 023109.
https://doi.org/10.1063/5.0071873
[19]  Mao, Y., Rong, Z. and Wu, Z.X. (2021) Effect of Collective Influence on the Evolution of Cooperation in Evolutionary Prisoner’s Dilemma Games. Applied Math-ematics and Computation, 392, Article ID: 125679.
https://doi.org/10.1016/j.amc.2020.125679
[20]  Han, W., Zhang, Z., Sun, J. and Xia, C. (2021) Emergence of Cooperation with Reputation-Updating Timescale in Spatial Public Goods Game. Physics Letters A, 393, Article ID: 127173.
https://doi.org/10.1016/j.physleta.2021.127173
[21]  Mao, Y., Zhao, Q., Song, R., et al. (2021) Timescales Diversity Induces Influencers to Persist Cooperation on Scale-free Networks. IEEE International Symposium on Circuits and Systems, Daegu, 22-28 May 2021, 1-5.
https://doi.org/10.1109/ISCAS51556.2021.9401147
[22]  Perc, M., Jordan, J.J., Rand, D.G., Wang, Z., Boc-caletti, S. and Szolnoki, A. (2017) Statistical Physics of Human Cooperation. Physics Reports, 687, 1-51.
https://doi.org/10.1016/j.physrep.2017.05.004
[23]  Wang, Z., Kokubo, S., Jusup, M. and Tanimoto, J. (2015) Universal Scaling for the Dilemma Strength in Evolutionary Games. Physics of Life Reviews, 14, 1-30.
https://doi.org/10.1016/j.plrev.2015.04.033
[24]  Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Hassabis, D., et al. (2015) Human-Level Control through Deep Reinforcement Learning. Nature, 518, 529-533.
https://doi.org/10.1038/nature14236
[25]  Hu, S., Leung, C.W. and Leung, H.F. (2019) Model-ling the Dynamics of Multiagent q-Learning in Repeated Symmetric Games: A Mean Field Theoretic Approach. Advances in Neural Information Processing Systems, 32, 12125-12135.
[26]  He, Z., Geng, Y., Du, C., Shi, L. and Wang, Z. (2022) Q-Learning-Based Migration Leading to Spontaneous Emergence of Segregation. New Journal of Physics, 24, Article ID: 123038.
https://doi.org/10.1088/1367-2630/acadfd
[27]  Zhang, J., Weissing, F.J. and Cao, M. (2016) Fixation of Competing Strategies When Interacting Agents Differ in the Time Scale of Strategy Updating. Physical Review E, 94, Article ID: 032407.
https://doi.org/10.1103/PhysRevE.94.032407
[28]  Xu, X., Rong, Z., Tian, Z. and Wu, Z.X. (2019) Timescale Diversity Facilitates the Emergence of Cooperation-Extortion Alliances in Networked Systems. Neurocomputing, 350, 195-201.
https://doi.org/10.1016/j.neucom.2019.03.057

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413