|
Learning of Soccer Player Agents Using a Policy Gradient Method : Coordination Between Kicker and Receiver During Free KicksKeywords: RoboCup , Soccer Simulation , Multiagents , Policy-Gradient methods , Reinforcement Learning Abstract: As an example of multi-agent learning in soccer games of the RoboCup 2D Soccer SimulationLeague, we dealt with a learning problem between a kicker and a receiver when a direct free kickis awarded just outside the opponent’s penalty area. We propose how to use a heuristic functionto evaluate an advantageous target point for safely sending/receiving a pass and scoring. Theheuristics include an interaction term between a kicker and a receiver to intensify theircoordination. To calculate the interaction term, we let a kicker/receiver agent have areceiver’s/kicker’s action decision model to predict a receiver’s/kicker’s action. Parameters in theheuristic function can be learned by a kind of reinforcement learning called the policy gradientmethod. Our experiments show that if the two agents do not have the same type of heuristics, theinteraction term based on prediction of a teammate’s decision model leads to learning a masterservantrelation between a kicker and a receiver, where a receiver is a master and a kicker is aservant.
|