Exploring situated interaction with social robots using augmented reality

A vision for the future are social robots that will appear in supermarkets, schools, manufacturing indus-try, and the homes of people. The success of this development will depend on how well humans can communicate with these robots, and the most natural way of interacting with them is likely to be spoken face-to-face interaction. Recent years have seen steady improvements in speech technology, and it is increasingly being used, for example in voice assistants in mobile phones and cars. However, interaction with social robots has several characteristics that provide both new opportunities and challenges when it comes to modelling spoken interaction.

Firstly, the robot should be able to interact with several hu-mans (and possibly other robots) at the same time, which means that we need to model multi-party interaction. A central challenge is to model turn-taking and conversational dynamics, that is, who is currently taking part in the interaction, and what the current roles of the speakers are. Secondly, such interaction is typically situated, which means that the spoken discourse will involve references to, and manipulation of, objects in the shared physical space.

This requires the modelling of joint attention, which means that the robot should be able to read the gaze of humans, but also be able to produce accurate gaze behaviour. Thirdly, the robot has a physical embodiment, which allows it to use both verbal and non-verbal signals to coordinate the interaction. The design of that embodiment has im-portant consequences for how the robot will be perceived, and what kind of non-verbal signals it will be able to produce. When humans interact and collaborate, they coordinate their behaviours using verbal and non-verbal signals, expressed in the face and voice. If robots of the future must engage in social interaction with humans, it is essential that they can generate and understand these behaviours.


Top page top