Robots that teach themselves

Robots that teach themselves (on site) to perform tasks through interaction with the real world

Programming robots to do household chores or perform care tasks is difficult and time-consuming.

A more efficient approach would be to enable the robots to learn for themselves in practice. Erik Schuitema researched this with robot Leo, who can teach himself to walk. On Monday 12 November, Schuitema was awarded his PhD at TU Delft for his work on the subject.

Care

Service robots have the long-term potential to be very valuable in the home, health care, and in other labour-intensive environments. However, such environments are usually unique, not very structured and subject to change, and it is consequently both awkward and time-consuming to make service robots robust and versatile by manual programming.

Autodidactic

Robots that teach themselves (on site) to perform tasks through interaction with the real world could therefore be an attractive alternative. Reinforcement Learning (RL) enables a system to learn to solve tasks on the basis of feedback on its behaviour: good behaviour is enhanced with positive rewards, bad behaviour is punished with negative rewards. PhD candidate Erik Schuitema researched the chances and possibilities of such an approach with real robots.

Leo

This kind of research is usually conducted via simulations only. ‘Too little is yet known about the compatibility with real, actual hardware’, Schuitema explains. ‘It is a strength of TU Delft that it is able to create that very combination.

’Schuitema: ‘We designed and built a biped walking robot called Leo especially to research the application of reinforcement learning on real robots. Robot Leo is able to learn two basic motor tasks: placing one foot on a stair and walking.’ The advantage in this respect is that TU Delft has already gained extensive experience in designing and building walking robots. The research is being co-funded by the STW Technology Foundation.

Positive

Leo learns to walk by receiving a positive reward when ‘he’ moves a foot forwards, and negative rewards for the use of time and energy. The reward is simply a number in the computer which can increase or decrease. Leo tries to maximise the rewards by a process of trial and adjustment.

In this way Leo can teach himself to place one foot on a stair, a relatively simple task, within fifteen minutes. Learning to walk takes about 5 hours, in a simulation during which the robot falls over thousands of times. As the hardware was not resistant to that amount of falling, it was decided that Leo would be helped a little in the early learning stage. Initially, he could ‘copy’ the current way of manually programming walking robots. Leo could walk as well as this, and even improve on it a little, within several hours.