A robotic model of reaching and grasping development
Piero Savastano, Stefano Nolfi
We present a neurorobotic model that develops reaching and grasping skills analogous to those displayed by infants during their early developmental stages. The learning process is realized in an incremental manner, taking into account the reflex behaviors initially possessed by infants and the neurophysiological and cognitive maturations occurring during the relevant developmental period. The behavioral skills acquired by the robots closely match those displayed by children. Moreover, the comparison of the results obtained in a control non-incremental experiment demonstrates how the limitations characterizing the initial developmental phase channel the learning process toward better solutions.
2. Experimental Scenario
The experimental scenario in which we train the robot is derived from the experiments carried on with children of about 4 months of age by Spencer and Thelen  and von Hofsten  (see Fig. 1).
Figure 1. Left: The simulated iCub robot. Center and Right: Typical experimental settings found in the literature.
The robot's neural controller is constituted by a recurrent neural network that receives proprioceptive input from the right arm, torso, and head, exteroceptive input from the camera and the tactile sensors located on the right hand, and controls the motors of the torso, head, and of the right arm/hand (14 degrees of freedom in total).
3. Training Process
We evaluate the performance level of the robot at each time step by taking the smaller score between the perceptual modalities: pmultimodal = min (psight; ptouch). The agents are trained through a trial and error process in which the free parameters are varied randomly and variations are retained or discarded depending on whether they lead to maximization of pmultimodal at the end of the 18 trials. This is realized by using an evolutionary method.
The robot is subjected to an incremental training process organized into the following three phases that model the three corresponding stages of the development of reaching/grasping capabilities in infants:
- The pre-reaching phase, that in infants extends from birth to approximately 4 months of age, is characterized by the presence of simple head orientation and grasping reflex behaviors, by a low involvement of cortical areas, and by a low visual acuity.
- A gross-reaching phase, that extends approximately from month 4 to the first year of age, is characterized by an improved visual acuity and by an higher involvement of cortical areas.
- A fine-reaching phase, that follows the first year of life, is characterized by the increasing role played by visual information concerning the hand-object relation in later infancy and adulthood.
Figure 2. The neural architecture used for the pre- (Left), groos- (Center) and fine-reaching (Right) phases.
The transition between one phase to the other is performed by taking the best 20 agents from the previous phase, combining them in a new population, applying changes to the neural architecture and the sensors and starting 10 repetitions of the new phase. The task, environment and performance evalua- tion are kept constant across the phases.
4. Results and Discussion
At the end of the pre-reaching phase the robots manage to reach (i.e. touch the object) in about half of the trials. The robots' behavior is characterized by an exploratory behavior that is realized by extending the arm and by producing circular movements around the area in which the object can be located. In other words, although the robots visually sense the ball and are rewarded for looking at it, they do not yet rely on visual information to bring their hand toward the ball. Moreover, the behavior displayed by the robots at the end of the pre-reaching phase is characterized by a large use of the DOFs of the trunk and of the shoulder and by a reduced use (locking) of the elbow DOF. This is demonstrated by the fact that, as in the case of infants, the distance between the shoulder and the hand remains almost constant during reaching attempts (Fig. 3, Bottom).
Figure 3. Top: Video of the pre-reaching phase. Bottom: Demonstration of the elbow-freeze.
During the gross-reaching phase the robots improve their ability to grasp the object (i.e. touch the object with the palm and at least one of the fingers). While the frequency of successful grasps improves significantly (p < 0:001, two-tailed Mann-Whitney U test), the frequency of reaches is surprisingly lower (p < 0:001). This is due to the fact that the robots specialize on certain regions of their peripersonal space in which they can robustly grasp an object, instead of trying to reach it in each position. The effects of this specialization can be clearly seen in Fig. 4 that shows the spatial distribution of successful reaches (upper row) and grasps (lower row) at the end of the three phases. As can be observed in the figure: while in the pre- and even more in the fine-reaching phases the robots manage to reach the object in all the possible nine areas, in the gross-reaching phase they display only partially good reaching skills. As for the grasping capacity, instead, a clear improvement can be observed from the pre- to the gross-reaching phase. Thus the temporary regression of the reaching capability occurring during this phase seems to be due to the development and integration of a grasping skill into the robots' behavioral repertoire.
Figure 4. Top: Video of the gross-reaching phase. Bottom: Spatial distriburion of reaches and grasps across the phases.
During the fine-reaching phase, robots' ability to reach and grasp the target object improves significantly (p < 0:001) and is successfully performed in all areas (Fig. 4). As far as the freezing and un-freezing of the DOFs is concerned, as expected, a significant increase in mobility is observed for the elbow, shoulder and trunk joints from the gross- to the fine-reaching phase (p < 0:001). The shoulder is freezed during the gross-learning phase (p < 0:001), while the un-freezing of the elbow occurs only later on during the fine-reaching phase. Overall, the spontaneous reduction of joint mobility occurring during the pre- and gross-reaching phases and the re-extension occurring during the fine-reaching phase indicate that the freezing/un-freezing phenomena should not be necessarily ascribed to specific maturational constraints and can rather arise from the tendency of the learning agents to initially self-reduce the complexity of their task and to later re-extend it when they reach a competence level that allows them to take on the full-fledged task. In summary, with limited neural resources and no experience the robots tend to explore the environment with fast, blind, non-directed movements. As maturation and learning progress, they start to produce goal-directed movements based on both exteroceptive and proprioceptive information. The improvement is neither spatially nor temporally linear, i.e. the development and integration of a new capacity often cause a temporary regression in some of the existing skills.
Figure 5. Top: Video of the fine-reaching phase.
To analyze the role of tactile and visual sensory information and to analyze how this role varies during training we compared the performance of the robots at the end of the three training phases in a normal condition and in three control conditions in which: tactile sensory information was not provided (no touch), visual information was not provided (no sight), neither tactile nor visual sensory information were provided (nothing). As shown in Fig. 6, performance significantly varies within the four experimental conditions (normal vs. no touch vs. no sight vs. no exteroceptive, p < 0:01, Kruskal-Wallis test). The most notable effect is the strong impairment caused by the absence of visual information in the fine-reaching phase. In normal conditions and in absence of tactile information the fine-reachers outperform the gross-reachers (p < 0:05 and the gross-reachers outperform the pre-reachers (p < 0:05, two-tailed Mann-Whitney U test). Thus the absence of tactile information does not reverse the performance relation between the three phases. In the control conditions in which visual information is not provided or neither visual nor tactile information are provided, on the contrary, the gross-reachers outperform the fine-reachers (p < 0:05). This result represents a further evidence of the regression occurring during the fine-reaching phase concerning the ability to exploit tactile information.
Figure 6: Performance obtained by testing the best robots in the three phases with the different sensory conditions.
In this section we describe which maturational constraints have an adaptive role (i.e. channel the adaptive process toward better solutions) and which do not. Fig. 7 shows a graphical summary of the experimental conditions and their relations. The pre-reaching ('pre'), grossreaching ('gross'), and fine-reaching ('fine') labels indicate the standard experiments reported above. We used '-h' to indicate a pre-reaching phase in which the robots were provided with full visual acuity ('pre-h') or a gross-reaching phase following a 'pre-h' phase ('gross-h'). The asterisk indicates the absence of reflexes, the symbol '^' indicates non-incremental experiments. So 'gross^' indicates experiments in which the robots are trained directly with internal neurons without first undergoing a pre-reaching phase. To allow fair comparison, in this control condition the training lasts the sum of the pre- and gross-reaching phases. 'gross^*' is an experimental condition analogous to 'gross^' in which however the robots do not have the reflexes. Finally, 'fine^' indicates an experimental condition in which the robots are provided and trained directly with internal neurons and with neurons encoding the offset between the hand and the object position. Training length in this case equals the sum of the pre-, gross- and fine-reaching training length. The maturational constraint constituted by the lack of internal resources (i.e. internal neurons) during the pre-reaching phase plays an adaptive role since the 'gross' condition significantly outperforms the non-incremental 'gross^' condition (p < 0:01, two-tailed Mann-Whitney U test). On the other hand, the reduced visual acuity during the pre-reaching phases, the availability of reflexes, and the inability to perceive the offset between the hand and the object during the pre- and grossreaching phase do not constitute adaptive constraints.
Figure 7. Top: Overview of the comparison between the standard and control experimental conditions. Arrows indicates the time dependency between the developmental phases. The thickness of each rectangle represents the median performance over the ten replications. 'h' = high visual acuity, * = no reflexes, ^= non-incremental condition (see also the text). Bottom Left: Box plot of the performance obtained in all experimental phases and conditions. Each bar represents the performances obtained in each condition for ten different replications. Bottom Right: pairwise comparison of the conditions. A black square indicates a significant difference (p < 0,05, two-tailed Mann-Whitney U test).
We presented a neuro-robotic model of reaching and grasping development in infants that matches the complexity of the studied phenomenon with respect to the complexity of the robot's body structure and of the task, and that account for many aspects of the child developmental process. More specifically, the model allows the robots to develop capabilities analogous to those shown by human children through a series of stages that are similar to those observed in infants.
 J. P. Spencer and E. Thelen, "Spatially specific changes in infants' muscle coactivity as they learn to reach". Infancy, vol. 1, no. 3, pp. 275-302, 2000.
 C. von Hofsten, "Developmental changes in the organization of prereaching movements". Developmental Psychology, vol. 20, no. 3, pp. 378-388, 1984.