A simple task like “making coffee,” for example, would also include the step “grabbing a cup.” The researchers demonstrated VirtualHome in a 3D world inspired by Sims – a life-simulation video game.
The team’s AI agent can execute 1,000 of these interactions in the Sims-style world, with eight different scenes including a living room, kitchen, dining room, bedroom, and home office.
“Describing actions as computer programmes has the advantage of providing clear and unambiguous descriptions of all the steps needed to complete a task,” said Xavier Puig, a PhD student at MIT.
“These programmes can instruct a robot or a virtual character, and can also be used as a representation for complex tasks with simpler actions,” said Puig.
Unlike humans, robots need more explicit instructions to complete easy tasks – they can not just infer and reason with ease.
For example, one might tell a human to “switch on the TV and watch it from the sofa.” Here, actions like “grab the remote control” and “sit/lie on sofa” have been omitted, since they’re part of the commonsense knowledge that humans have.
To better demonstrate these kinds of tasks to robots, the descriptions for actions needed to be much more detailed. To do so, the team first collected verbal descriptions of household activities, and then translated them into simple code.
A programme like this might include steps like: walk to the television, switch on the television, walk to the sofa, sit on the sofa, and watch television.
Once the programmes were created, the team fed them to the VirtualHome 3D simulator to be turned into videos. Then, a virtual agent would execute the tasks defined by the programs, whether it was watching television, placing a pot on the stove, or turning a toaster on and off.