%0 Journal Article
%T Reinforcement Learning in an Environment Synthetically Augmented with Digital Pheromones
%A Salvador E. Barbosa
%A Mikel D. Petty
%J Advances in Artificial Intelligence
%D 2014
%I Hindawi Publishing Corporation
%R 10.1155/2014/932485
%X Reinforcement learning requires information about states, actions, and outcomes as the basis for learning. For many applications, it can be difficult to construct a representative model of the environment, either due to lack of required information or because of that the model's state space may become too large to allow a solution in a reasonable amount of time, using the experience of prior actions. An environment consisting solely of the occurrence or nonoccurrence of specific events attributable to a human actor may appear to lack the necessary structure for the positioning of responding agents in time and space using reinforcement learning. Digital pheromones can be used to synthetically augment such an environment with event sequence information to create a more persistent and measurable imprint on the environment that supports reinforcement learning. We implemented this method and combined it with the ability of agents to learn from actions not taken, a concept known as fictive learning. This approach was tested against the historical sequence of Somali maritime pirate attacks from 2005 to mid-2012, enabling a set of autonomous agents representing naval vessels to successfully respond to an average of 333 of the 899 pirate attacks, outperforming the historical record of 139 successes. 1. Introduction Sequences of events resulting from the actions of human adversarial actors such as military forces or criminal organizations may appear to have random dynamics in time and space. Finding patterns in such sequences and using those patterns in order to anticipate and respond to the events can be quite challenging. Often, the number of potentially causal factors for such events is very large, making it infeasible to obtain and analyze all relevant information prior to the occurrence of the next event. These difficulties can hinder the planning of responses using conventional computational methods such as multiagent models and machine learning, which typically exploit information available in or about the environment. A real-world example of such a problem is Somali maritime piracy. Beginning in 2005, the number of attacks attributed to Somali pirates steadily increased. The attacks were carried out on a nearly daily basis during some periods of the year and often took place despite the presence of naval patrol vessels in the area [1]. They were often launched with little warning and at unexpected locations. We would like to use the attributes of past attacks to anticipate and respond to future attacks. However, the set of attack attributes potentially
%U http://www.hindawi.com/journals/aai/2014/932485/