Self-control with spiking and non-spiking neural networks playing games
Christodoulou, C. and Banfield, Gaye and Cleanthous, A. (2010) Self-control with spiking and non-spiking neural networks playing games. Journal of Physiology 104 (3-4), 108 - 117. ISSN 0928-4257.
Self-control can be defined as choosing a large delayed reward over a small immediate reward, while precommitment is the making of a choice with the specific aim of denying oneself future choices. Humans recognise that they have self-control problems and attempt to overcome them by applying precommitment. Problems in exercising self-control, suggest a conflict between cognition and motivation, which has been linked to competition between higher and lower brain functions (representing the frontal lobes and the limbic system respectively). This premise of an internal process conflict, lead to a behavioural model being proposed, based on which, we implemented a computational model for studying and explaining self-control through precommitment behaviour. Our model consists of two neural networks, initially non-spiking and then spiking ones, representing the higher and lower brain systems viewed as cooperating for the benefit of the organism. The non-spiking neural networks are of simple feed forward multilayer type with reinforcement learning, one with selective bootstrap weight update rule, which is seen as myopic, representing the lower brain and the other with the temporal difference weight update rule, which is seen as far-sighted, representing the higher brain. The spiking neural networks are implemented with leaky integrate-and-fire neurons with learning based on stochastic synaptic transmission. The differentiating element between the two brain centres in this implementation is based on the memory of past actions determined by an eligibility trace time constant. As the structure of the self-control problem can be likened to the Iterated Prisoner’s Dilemma (IPD) game in that cooperation is to defection what self-control is to impulsiveness or what compromising is to insisting, we implemented the neural networks as two players, learning simultaneously but independently, competing in the IPD game. With a technique resembling the precommitment effect, whereby the payoffs for the dilemma cases in the IPD payoff matrix are differentially biased (increased or decreased), it is shown that increasing the precommitment effect (through increasing the differential bias) increases the probability of cooperating with oneself in the future, irrespective of whether the implementation is with spiking or non-spiking neural networks.
|Keyword(s) / Subject(s):||Self-control, precommitment, iterated prisoner’s dilemma, reinforcement learning, spiking neural networks|
|School:||Birkbeck Schools and Departments > School of Business, Economics & Informatics > Computer Science and Information Systems|
|Date Deposited:||26 Jul 2011 14:27|
|Last Modified:||17 Apr 2013 12:21|
Additional statistics are available via IRStats2.