OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Journal of Intelligent Learning Systems and Applications 2024

Sim-to-Real: A Performance Comparison of PPO, TD3, and SAC Reinforcement Learning Algorithms for Quadruped Walking Gait Generation

DOI: 10.4236/jilsa.2024.162003, PP. 23-43

James W. Mock, Suresh S. Muknahallipatna

Keywords: Reinforcement Learning, Reality Gap, Position Tracking, Action Spaces, Domain Randomization

Full-Text Cite this paper Add to My Lib

Abstract:

The performance of the state-of-the-art Deep Reinforcement algorithms such as Proximal Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient, and Soft Actor-Critic for generating a quadruped walking gait in a virtual environment was presented in previous research work titled “A Comparison of PPO, TD3, and SAC Reinforcement Algorithms for Quadruped Walking Gait Generation”. We demonstrated that the Soft Actor-Critic Reinforcement algorithm had the best performance generating the walking gait for a quadruped in certain instances of sensor configurations in the virtual environment. In this work, we present the performance analysis of the state-of-the-art Deep Reinforcement algorithms above for quadruped walking gait generation in a physical environment. The performance is determined in the physical environment by transfer learning augmented by real-time reinforcement learning for gait generation on a physical quadruped. The performance is analyzed on a quadruped equipped with a range of sensors such as position tracking using a stereo camera, contact sensing of each of the robot legs through force resistive sensors, and proprioceptive information of the robot body and legs using nine inertial measurement units. The performance comparison is presented using the metrics associated with the walking gait: average forward velocity (m/s), average forward velocity variance, average lateral velocity (m/s), average lateral velocity variance, and quaternion root mean square deviation. The strengths and weaknesses of each algorithm for the given task on the physical quadruped are discussed.

References

[1]	Mock, J.W. and Muknahallipatna, S.S. (2023) A Comparison of PPO, TD3 and SAC Reinforcement Algorithms for Quadruped Walking Gait Generation. Journal of Intelligent Learning Systems and Applications, 15, 36-56. https://doi.org/10.4236/jilsa.2023.151003
[2]	Salvato, E., Fenu, G., Medvet, E. and Pellegrino, F.A. (2021) Crossing the Reality Gap: A Survey on Sim-to-Real Transferability of Robot Controllers in Reinforcement Learning. IEEE Access, 9, 153171-153187. https://doi.org/10.1109/ACCESS.2021.3126658
[3]	Zhao, W., Queralta, J.P. and Westerlund, T. (2020) Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey. 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, 1-4 December 2020, 737-744. https://arxiv.org/abs/2009.13303 https://doi.org/10.1109/SSCI47803.2020.9308468
[4]	Xie, Z., Clary, P., Dao, J., Morais, P., Hurst, J. and van de Panne, M. (2020) Learning Locomotion Skills for Cassie: Iterative Design and Sim-to-Real. Proceedings of the Conference on Robot Learning, Vol. 100, 317-329. https://proceedings.mlr.press/v100/xie20a.html
[5]	Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V. and Hutter, M. (2019) Learning Agile and Dynamic Motor Skills for Legged Robots. Science Robotics, 4, eaau5872. http://arxiv.org/abs/1901.08652 https://doi.org/10.1126/scirobotics.aau5872
[6]	Tan, J., Zhang, T., Coumans, E., Iscen, A., Bai, Y., Hafner, D., Bohez, S. and Vanhoucke, V. (2018) Sim-to-Real: Learning Agile Locomotion for Quadruped Robots. http://arxiv.org/abs/1804.10332
[7]	Muratore, F., Gienger, M. and Peters, J. (2019) Assessing Transferability from Simulation to Reality for Reinforcement Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 1172-1183. http://arxiv.org/abs/1907.04685
[8]	Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W. and Abbeel, P. (2017) Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, 24-28 September 2017, 23-30. http://arxiv.org/abs/1703.06907 https://doi.org/10.1109/IROS.2017.8202133
[9]	Pinto, L., Andrychowicz, M., Welinder, P., Zaremba, W. and Abbeel, P. (2017) Asymmetric Actor Critic for Image-Based Robot Learning. http://arxiv.org/abs/1710.06542
[10]	Rudin, N., Hoeller, D., Reist, P. and Hutter, M. (2021) Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning. https://arxiv.org/abs/2109.11978
[11]	Mahmood, A.R., Korenkevych, D., Komer, B.J. and Bergstra, J. (2018) Setting up a Reinforcement Learning Task with a Real-World Robot. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, 1-5 October 2018, 4635-4640. http://arxiv.org/abs/1803.07067
[12]	Haarnoja, T., Ha, S., Zhou, A., Tan, J., Tucker, G. and Levine, S. (2019) Learning to Walk via Deep Reinforcement Learning. https://doi.org/10.15607/RSS.2019.XV.011
[13]	Schulman, J., Levine, S., Moritz, P., Jordan, M.I. and Abbeel, P. (2015) Trust Region Policy Optimization. Proceedings of the 32nd International Conference on Machine Learning, Vol. 37, 1889-1897. http://arxiv.org/abs/1502.05477
[14]	Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O. (2017) Proximal Policy Optimization Algorithms. http://arxiv.org/abs/1707.06347
[15]	Schulman, J., Moritz, P., Levine, S., Jordan, M.I. and Abbeel, P. (2016) High-Dimensional Continuous Control Using Generalized Advantage Estimation. 4th International Conference on Learning Representations, ICLR 2016, San Juan, 2-4 May 2016. http://arxiv.org/abs/1506.02438
[16]	Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D. and Wierstra, D. (2016) Continuous Control with Deep Reinforcement Learning. 4th International Conference on Learning Representations, ICLR 2016, San Juan, 2-4 May 2016. http://arxiv.org/abs/1509.02971
[17]	Fujimoto, S., van Hoof, H. and Meger, D. (2018) Addressing Function Approximation Error in Actor-Critic Methods. http://arxiv.org/abs/1802.09477
[18]	Haarnoja, T., Zhou, A., Abbeel, P. and Levine, S. (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. http://arxiv.org/abs/1801.01290
[19]	Mock, J. and Muknahallipatna, S. (2023) A Comparison of PPO, TD3 and SAC Reinforcement Algorithms for Quadruped Walking Gait Generation. Journal of Intelligent Learning Systems and Applications, 15, 36-56. https://doi.org/10.4236/jilsa.2023.151003
[20]	Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M. and Dormann, N. (2021) Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 22, 1-8. http://jmlr.org/papers/v22/20-1364.html

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413