Action value calculations in social context from infancy

Infants adaptively modulate their social behaviours, such as gaze-following, to social context. We propose that such modulations are based on infants' social decision-making, to achieve the most valuable outcome. We propose an 'action value calculator model', which formulates the cognitive mechanisms underlying, and the development of, the decision-making process during interactions.

Infants adaptively modulate their social behaviours, such as gazefollowing, to social context. We propose that such modulations are based on infants' social decisionmaking, to achieve the most valuable outcome. We propose an 'action value calculator model', which formulates the cognitive mechanisms underlying, and the development of, the decision-making process during interactions.

Adaptive modulation of infant social behaviour
Human survival and prosperity depend on the capacity to form and be part of complex social and cultural institutions [1]. Specifically, human infants fully depend on other humans around them for immediate survival, as well as cultural learning which is essential for adaptation to the social environment. Cultural learning is a uniquely human form of social learning that allows behaviours and information to be transmitted among conspecifics [1]. Hence, it is not surprising that human infants are equipped with an efficient 'toolkit' for cultural learning. Over the past decade, multiple studies have shown that infants modulate their learning and communicative strategies to optimise social learning under various communicative contexts.
Gaze-following is one of the earliest emerging social interaction behaviours that infants use to learn about the surrounding environment. Since Senju and Csbira [2] reported the facilitation of infant gazefollowing in the context preceded by ostensive cues (e.g., eye contact), a variety of social contexts have been shown to modulate infant gaze-following. For example, a preceding period of attention-grabbing actions, the familiarity of interacting partners, and social experience before the gaze-following task can facilitate infant gaze-following [3]. The contextual modulation of infants' gaze-following (i.e., effects of contextual information on the likelihood of gaze-following) has been replicated across multiple cultural contexts (e.g., UK [2] and Japan [4]). However, some studies reported the absence of contextual modulations and suggested that infants show gaze-following when there are opportunities to learn from another's gaze, regardless of context (e.g., [5]). Thus, empirical studies primarily focusing on the deterministic role of external cues (e.g., ostensive cues, attentiongrabbing actions) on gaze-following have shown inconsistent results of the contextual modulation.
Given the critical role of contextual modulation on infant cultural learning, there is a need for a coherent theoretical framework to explain the mechanism underlying contextual modulation of social behaviour such as gaze-following.

Calculations of social-action value during interactions
We propose that these contextual modulations of social behaviours are based on a value-driven decision-making process: infants decide between multiple action alternatives (e.g., following gaze, looking at a face, initiating another's attention) under each social context, to maximise the expected rewarding outcome of the action choice. We propose that external cues are not direct determinants of actions as implied in the existing theories, but rather that infants execute actions depending on subjective values of actions within each context. We define action values as determinants of social behaviours where actions predict the likelihood of subsequent reward, which can be intrinsically informative or social (Box 1).
Several recent studies suggest that infant gaze-following, as well as other gaze behaviours, can be explained by decisionmaking based on action value calculation. First, one study reported that gaze behaviours are modulated by the expected reward value of cues in 7-month-olds [6], suggesting that infants decide where to look based on an expected rewarding outcome contingent on a particular looking behaviour. The same study also found that infants had heightened anticipatory arousal to the visual cues when they showed superior learning, suggesting a relationship between arousal and reward expectation. Second, some of our own recent studies [4,7] demonstrated that contextual modulations of infant gazefollowing, such as the facilitative effects of ostensive or reliability cues, are mediated by an increase in the infant's heart rate just before gaze-following behaviour. The increased heart rates (i.e., physiological arousal) before gaze-following observed in these studies are hypothesised to reflect the representation of reward expectations shared across different types of cues. Thus, contextual modulation of gazefollowing reported in previous studies could be mediated by the calculation of the action value of gaze-following in each context. Finally, another study [8] showed that brief screen-based training could reinforce gaze-following behaviour in 4-month-olds, albeit with a weak effect, suggesting that infant gaze-following behaviour can be reinforced by rewarding events following gaze-following.

Cognitive processes underlying calculations of social-action value
We have formulated the hypothesised cognitive processes involved in action value calculation and subsequent decisions on the execution of social actions such as gazefollowing in our 'action value calculator model'. This model proposes that infants initially encode social cues in each context, before calculating the action values of alternative action options. Based on these value calculations, infants decide which action to execute in social interactions at each moment. Thus, external cues are not direct determinants of social actions; rather, infants execute social actions if they have high subjective values, which can be represented partly as a predictive internal state reflecting reward expectation before executing an action, such as physiological arousal. Figure 1 illustrates the hypothesised cognitive processes which consist of the action value calculator model. The model hypothesises the following four steps.
(i) Encoding of social cues: perceptual cues are encoded to determine the social context. Social cues may include any kind of social stimuli, such as faces, gestures, and speechthey are not limited to communicative cues. These social stimuli are processed to form a representation of the current social context for value calculations. Perceived stimuli are referred to memory for recognition (e.g., familiarity).
(ii) Value calculation: after encoding social cues, action values are calculated based on the integrated context. This calculation process is modulated by memories and represented partly as a predictive internal state such as physiological arousal [4,7]. Thus, action values are calculated from social cues and memories, which modulate physiological states. (iii) Policy comparison: to make behavioural decisions in dynamic social interacting situations with many action options, it is necessary to compare the action alternatives (i.e., policies) which would have different action values in a given context. In the comparison process of policies, memory would be referred to in order to compare the expected action values of behavioural policies, which infants learned from the previous experience [7]. (iv) Execution of social action: infants execute the optimal action whose action value is the highest among action alternatives. Executed actions can be part of interactions, including responsive and initiative social actions. To update the value of the action, feedback following the action execution must be received [8]. Executive control can be modulated by topdown task demands or goals, which affect the process of decision-making and action execution.
After executing an action, the social context is updated and processed from the first step to execute another action in interactions.

Development of action value assignment
A crucial mechanism underlying the action value calculator model is how infants learn to assign values to each action under a particular social context. We propose that infants have predispositions to generally engage with social stimuli from immediately after birth, and then learn an optimal policy in which the contingencies between actions and their outcomes are learned through experience.
Even human foetuses in the womb preferentially orient toward face-like stimuli, indicating an early-emerging predisposition to socially relevant stimuli [9]. In early infancy, infants are also responsive to eye motion, and this is not dependent on action values [10]. The initial predispositions to social stimuli would initiate social interactions, which would then provide opportunities to learn optimal policies, which would maximise the chance of eliciting positive outcomes.
Individual differences in social experience could lead to different action value assignments. For example, deaf infants raised by deaf parents show more gaze-following behaviour than hearing infants raised by hearing parents, suggesting adaptations to such communicative context [11].
Another study showed that infants of depressed mothers and families with low attachment quality tend to follow gaze less, suggesting that familial environment affected the expected value of gazefollowing [12]. Social experiences in Box 1. Two possible sources of action value A question raised by the action value calculator model is what constitutes the action goals in each social context, against which action values would be calculated.

Acquiring information
Infants often acquire new information by interacting with others. Informational value can be considered a reinforcer for infants, regardless of positive emotional valence, and may modulate social behavioural decision-making. For example, detecting potential threats may have high action values. Consistent with this, infants have been reported to show a negativity bias when viewing faces, preferentially attending to negative (e.g., angry faces).

Social interest
Another possibility is that interacting with other people is an incentive for infants. For example, if a visual target is a well-known object, infants are still motivated to engage in joint attention. In this situation, the incentive which drives joint-attention is not acquiring information but engagement in social interactions.
Here, we argue that the action value calculations would integrate values based on these two (and possibly more) goals as the scalar summation of the value of particular action reflected in reward expectations indexed by physiological states, which would then explain infants' social behaviour in a given context. Our value-based framework of social actions raises the importance of the internal calculation process of values in infants instead of external cues. development may therefore shape optimal policies of social actions.

Concluding remarks
Here, we proposed the action value calculator model, in an attempt to coherently explain empirical studies suggesting that infants modulate social behaviours in different contexts. The model hypothesises that infants calculate values of action alternatives and execute options which would most likely lead to rewarding outcomes in a given context. We further suggest that physiological states could be used to measure subjective action values. Future studies should investigate the generalisability of this model to social actions other than gaze-following.