Google Research's state-of-the-art language model, PaLM, can solve complex math word problems, answer questions in new languages, and explain jokes.
Introducing PaLM-SayCan, a robotics algorithm that combines the understanding of
language models with the real- world capabilities of a helper robot.
Everyday Robots' helper robots show the potential for more helpful, human-centered robots in the future.
How PaLM-SayCan Works
This demo shows a PaLM-enabled helper robot performing a series of complex tasks using chain of thought prompting and the step-by-step solution needed to carry out the requests.
1
Select a task for the helper robot
2
See how PaLM-SayCan interprets the task
3
See how the helper robot executes the task
4
Learn the step-by-step solution behind PaLM-SayCan
Robots struggle to understand complex instructions that require reasoning. By using an advanced language model, PaLM-SayCan splits prompts into sub-tasks for the helper robot to execute, one-by-one. This also makes it a lot easier for people to get help from robots – just ask using everyday language.
As you can see, these four tasks start the same, but have varying ends. See how slightly changing the request affects the interpretation.
I just worked out. Can you bring me a drink and a snack to recover?
The user has asked for a drink and a snack. I will bring a water bottle and an apple.
I don’t like caffeinated soda. Can you bring me something else to drink from the table?
The user has asked me to bring a drink that isn't caffeinated. I will bring the water bottle.
I’m tired. Bring me a snack that’ll give me some energy, please.
The user has asked me to bring a snack that has energy. I will bring an energy bar.
That snack is too small. Bring me one that is more filling.
The user has asked for a snack that is more filling. I will bring multigrain chips.
I spilled my orange soda. Can you bring me a replacement, please?
The user has asked me to bring them a replacement for their orange soda. I will bring them a new one.
I spilled my orange soda. Can you bring me something to clean it up, please?
The user has asked me to bring something to clean up the orange soda. I will bring the sponge.
I spilled my orange soda. Can you throw it away, please?
The user has asked me to throw away the orange soda. I will move it to the trash.
I spilled my orange soda. Can you throw it away, bring me something to help clean it up, and then bring me a replacement?
The user has asked me to do three tasks. I will do one and then the other.
After a task has been requested, PaLM-SayCan uses chain of thought prompting, which interprets the instruction in order to score the likelihood that an individual skill makes progress towards completing the high-level request.
Unlike typical robotic demos that perform a single task in a constrained environment, this research shows you can ask an Everyday Robots helper robot for assistance, and it can carry out the request in a real-world environment using reasoning.
Multiple safety layers exist on the helper robot, including hardware safety layers, e‑stop, and move‑to‑contact behaviors.
Unlike typical robotic demos that perform a single task in a constrained environment, this research shows you can ask an Everyday Robots helper robot for assistance, and it can carry out the request in a real-world environment using reasoning.
Multiple safety layers exist on the helper robot, including hardware safety layers, e‑stop, and move‑to‑contact behaviors.
Unlike typical robotic demos that perform a single task in a constrained environment, this research shows you can ask an Everyday Robots helper robot for assistance, and it can carry out the request in a real-world environment using reasoning.
Multiple safety layers exist on the helper robot, including hardware safety layers, e‑stop, and move‑to‑contact behaviors.
Unlike typical robotic demos that perform a single task in a constrained environment, this research shows you can ask an Everyday Robots helper robot for assistance, and it can carry out the request in a real-world environment using reasoning.
Multiple safety layers exist on the helper robot, including hardware safety layers, e‑stop, and move‑to‑contact behaviors.
Unlike typical robotic demos that perform a single task in a constrained environment, this research shows you can ask an Everyday Robots helper robot for assistance, and it can carry out the request in a real-world environment using reasoning.
Multiple safety layers exist on the helper robot, including hardware safety layers, e‑stop, and move‑to‑contact behaviors.
Unlike typical robotic demos that perform a single task in a constrained environment, this research shows you can ask an Everyday Robots helper robot for assistance, and it can carry out the request in a real-world environment using reasoning.
Multiple safety layers exist on the helper robot, including hardware safety layers, e‑stop, and move‑to‑contact behaviors.
Unlike typical robotic demos that perform a single task in a constrained environment, this research shows you can ask an Everyday Robots helper robot for assistance, and it can carry out the request in a real-world environment using reasoning.
Multiple safety layers exist on the helper robot, including hardware safety layers, e‑stop, and move‑to‑contact behaviors.
Unlike typical robotic demos that perform a single task in a constrained environment, this research shows you can ask an Everyday Robots helper robot for assistance, and it can carry out the request in a real-world environment using reasoning.
Multiple safety layers exist on the helper robot, including hardware safety layers, e‑stop, and move‑to‑contact behaviors.
Prompt
This is the request the human has given the robot. PaLM-SayCan is constrained to commands that are safe for a robot to perform.
I just worked out. Can you bring me a drink and a snack to recover?
Interpretation
This is how the language model has interpreted the request through chain of thought prompting.
The user has asked for a drink and a snack. I will bring a water bottle and an apple.
Execution
PaLM-SayCan is highly interpretable. We can clearly examine and learn from every decision the system makes. The language model first scores whether each step is useful for completing the task.
The affordance model scores whether each step is possible for the robot and environment.
These scores are combined and the highest combination is chosen. This gives a step that is possible and useful for the request. The robot then executes that step and repeats this process in the new state.
find a water
1.00
find an apple
0.19
find an energy bar
0.00
find a lime soda
0.00
find a 7up
0.00
Language
Affordance
Combined score
See how the helper robot uses PaLM-SayCan to breakdown and execute successfully.
find a water
1.00
find an apple
0.19
find an energy bar
0.00
find a lime soda
0.00
find a 7up
0.00
pick up the water
1.00
find an apple
0.00
bring it to you
0.00
pick up the apple
0.00
put down the water
0.00
bring it to you
1.00
find an apple
0.06
put down the water
0.00
pick up the apple
0.00
find an energy bar
0.00
put down the water
1.00
find an apple
0.01
put down the apple
0.00
done
0.00
put down the energy bar
0.00
find an apple
1.00
find an energy bar
0.00
done
0.00
find a water
0.00
find a lime soda
0.00
pick up the apple
1.00
bring it to you
0.00
pick up the water
0.00
put down the apple
0.00
pick up the energy bar
0.00
bring it to you
1.00
put down the apple
0.00
go to the table
0.00
pick up the apple
0.00
put down the water
0.00
put down the apple
1.00
put down the water
0.00
done
0.00
put down the energy bar
0.00
put down the orange soda
0.00
done
1.00
find an apple
0.00
find a water
0.00
put down the apple
0.00
find an energy bar
0.00
1.
find a water
1.00
find an apple
0.19
find an energy bar
0.00
find a lime soda
0.00
find a 7up
0.00
2.
pick up the water
1.00
find an apple
0.00
bring it to you
0.00
pick up the apple
0.00
put down the water
0.00
3.
bring it to you
1.00
find an apple
0.06
put down the water
0.00
pick up the apple
0.00
find an energy bar
0.00
4.
put down the water
1.00
find an apple
0.01
put down the apple
0.00
done
0.00
put down the energy bar
0.00
5.
find an apple
1.00
find an energy bar
0.00
done
0.00
find a water
0.00
find a lime soda
0.00
6.
pick up the apple
1.00
bring it to you
0.00
pick up the water
0.00
put down the apple
0.00
pick up the energy bar
0.00
7.
bring it to you
1.00
put down the apple
0.00
go to the table
0.00
pick up the apple
0.00
put down the water
0.00
8.
put down the apple
1.00
put down the water
0.00
done
0.00
put down the energy bar
0.00
put down the orange soda
0.00
9.
done
1.00
find an apple
0.00
find a water
0.00
put down the apple
0.00
find an energy bar
0.00
Prompt
This is the request the human has given the robot. PaLM-SayCan is constrained to commands that are safe for a robot to perform.
I don’t like caffeinated soda. Can you bring me something else to drink from the table?
Interpretation
This is how the language model has interpreted the request through chain of thought prompting.
The user has asked me to bring a drink that isn't caffeinated. I will bring the water bottle.
Execution
PaLM-SayCan is highly interpretable. We can clearly examine and learn from every decision the system makes. The language model first scores whether each step is useful for completing the task.
The affordance model scores whether each step is possible for the robot and environment.
These scores are combined and the highest combination is chosen. This gives a step that is possible and useful for the request. The robot then executes that step and repeats this process in the new state.
find a water
1.00
go to the table
0.02
find a lime soda
0.00
find a redbull
0.00
find a 7up
0.00
Language
Affordance
Combined score
See how the helper robot uses PaLM-SayCan to breakdown and execute successfully.
find a water
1.00
go to the table
0.02
find a lime soda
0.00
find a redbull
0.00
find a 7up
0.00
pick up the water
1.00
bring it to you
0.00
go to the table
0.00
put down the water
0.00
pick up the energy bar
0.00
bring it to you
1.00
put down the water
0.00
go to the table
0.00
pick up the water
0.00
done
0.00
put down the water
1.00
done
0.00
put down the energy bar
0.00
put down the apple
0.00
put down the orange soda
0.00
done
1.00
find a water
0.00
go to the table
0.00
put down the water
0.00
find a pepsi
0.00
1.
find a water
1.00
go to the table
0.02
find a lime soda
0.00
find a redbull
0.00
find a 7up
0.00
2.
pick up the water
1.00
bring it to you
0.00
go to the table
0.00
put down the water
0.00
pick up the energy bar
0.00
3.
bring it to you
1.00
put down the water
0.00
go to the table
0.00
pick up the water
0.00
done
0.00
4.
put down the water
1.00
done
0.00
put down the energy bar
0.00
put down the apple
0.00
put down the orange soda
0.00
5.
done
1.00
find a water
0.00
go to the table
0.00
put down the water
0.00
find a pepsi
0.00
Prompt
This is the request the human has given the robot. PaLM-SayCan is constrained to commands that are safe for a robot to perform.
I’m tired. Bring me a snack that’ll give me some energy, please.
Interpretation
This is how the language model has interpreted the request through chain of thought prompting.
The user has asked me to bring a snack that has energy. I will bring an energy bar.
Execution
PaLM-SayCan is highly interpretable. We can clearly examine and learn from every decision the system makes. The language model first scores whether each step is useful for completing the task.
The affordance model scores whether each step is possible for the robot and environment.
These scores are combined and the highest combination is chosen. This gives a step that is possible and useful for the request. The robot then executes that step and repeats this process in the new state.
find an energy bar
1.00
find a redbull
0.00
find an apple
0.00
find a lime soda
0.00
find a water
0.00
Language
Affordance
Combined score
See how the helper robot uses PaLM-SayCan to breakdown and execute successfully.
find an energy bar
1.00
find a redbull
0.00
find an apple
0.00
find a lime soda
0.00
find a water
0.00
pick up the energy bar
1.00
bring it to you
0.00
pick up the redbull
0.00
pick up the apple
0.00
put down the energy bar
0.00
bring it to you
1.00
put down the energy bar
0.00
go to the table
0.00
pick up the energy bar
0.00
done
0.00
put down the energy bar
1.00
done
0.00
put down the apple
0.00
put down the redbull
0.00
put down the orange soda
0.00
done
1.00
find a redbull
0.00
go to the table
0.00
find a coke
0.00
find a pepsi
0.00
1.
find an energy bar
1.00
find a redbull
0.00
find an apple
0.00
find a lime soda
0.00
find a water
0.00
2.
pick up the energy bar
1.00
bring it to you
0.00
pick up the redbull
0.00
pick up the apple
0.00
put down the energy bar
0.00
3.
bring it to you
1.00
put down the energy bar
0.00
go to the table
0.00
pick up the energy bar
0.00
done
0.00
4.
put down the energy bar
1.00
done
0.00
put down the apple
0.00
put down the redbull
0.00
put down the orange soda
0.00
5.
done
1.00
find a redbull
0.00
go to the table
0.00
find a coke
0.00
find a pepsi
0.00
Prompt
This is the request the human has given the robot. PaLM-SayCan is constrained to commands that are safe for a robot to perform.
That snack is too small. Bring me one that is more filling.
Interpretation
This is how the language model has interpreted the request through chain of thought prompting.
The user has asked for a snack that is more filling. I will bring multigrain chips.
Execution
PaLM-SayCan is highly interpretable. We can clearly examine and learn from every decision the system makes. The language model first scores whether each step is useful for completing the task.
The affordance model scores whether each step is possible for the robot and environment.
These scores are combined and the highest combination is chosen. This gives a step that is possible and useful for the request. The robot then executes that step and repeats this process in the new state.
find the multigrain chips
1.00
find a 7up
0.03
put down the energy bar
0.03
find a redbull
0.02
find a water
0.01
Language
Affordance
Combined score
See how the helper robot uses PaLM-SayCan to breakdown and execute successfully.
find the multigrain chips
1.00
find a 7up
0.03
put down the energy bar
0.03
find a redbull
0.02
find a water
0.01
pick up the multigrain chips
1.00
bring it to you
0.01
put down the multigrain chips
0.00
find an energy bar
0.00
find an apple
0.00
bring it to you
1.00
put down the multigrain chips
0.00
go to the table
0.00
find an energy bar
0.00
done
0.00
put down the multigrain chips
1.00
done
0.00
put down the redbull
0.00
put down the energy bar
0.00
put down the apple
0.00
done
1.00
find an apple
0.00
find a redbull
0.00
find an energy bar
0.00
find a grapefruit soda
0.00
1.
find the multigrain chips
1.00
find a 7up
0.03
put down the energy bar
0.03
find a redbull
0.02
find a water
0.01
2.
pick up the multigrain chips
1.00
bring it to you
0.01
put down the multigrain chips
0.00
find an energy bar
0.00
find an apple
0.00
3.
bring it to you
1.00
put down the multigrain chips
0.00
go to the table
0.00
find an energy bar
0.00
done
0.00
4.
put down the multigrain chips
1.00
done
0.00
put down the redbull
0.00
put down the energy bar
0.00
put down the apple
0.00
5.
done
1.00
find an apple
0.00
find a redbull
0.00
find an energy bar
0.00
find a grapefruit soda
0.00
Prompt
This is the request the human has given the robot. PaLM-SayCan is constrained to commands that are safe for a robot to perform.
I spilled my orange soda. Can you bring me a replacement, please?
Interpretation
This is how the language model has interpreted the request through chain of thought prompting.
The user has asked me to bring them a replacement for their orange soda. I will bring them a new one.
Execution
PaLM-SayCan is highly interpretable. We can clearly examine and learn from every decision the system makes. The language model first scores whether each step is useful for completing the task.
The affordance model scores whether each step is possible for the robot and environment.
These scores are combined and the highest combination is chosen. This gives a step that is possible and useful for the request. The robot then executes that step and repeats this process in the new state.
find an orange soda
1.00
find a lime soda
0.02
go to the table
0.00
find a 7up
0.00
bring it to you
0.00
Language
Affordance
Combined score
See how the helper robot uses PaLM-SayCan to breakdown and execute successfully.
find an orange soda
1.00
find a lime soda
0.02
go to the table
0.00
find a 7up
0.00
bring it to you
0.00
pick up the orange soda
1.00
bring it to you
0.11
put down the orange soda
0.00
find a lime soda
0.00
go to the table
0.00
bring it to you
1.00
put down the orange soda
0.00
go to the table
0.00
go to the close counter
0.00
find a lime soda
0.00
put down the orange soda
1.00
done
0.00
put down the lime soda
0.00
put down the water
0.00
put down the sponge
0.00
done
1.00
find a lime soda
0.00
put down the lime soda
0.00
put down the orange soda
0.00
find a sponge
0.00
1.
find an orange soda
1.00
find a lime soda
0.02
go to the table
0.00
find a 7up
0.00
bring it to you
0.00
2.
pick up the orange soda
1.00
bring it to you
0.11
put down the orange soda
0.00
find a lime soda
0.00
go to the table
0.00
3.
bring it to you
1.00
put down the orange soda
0.00
go to the table
0.00
go to the close counter
0.00
find a lime soda
0.00
4.
put down the orange soda
1.00
done
0.00
put down the lime soda
0.00
put down the water
0.00
put down the sponge
0.00
5.
done
1.00
find a lime soda
0.00
put down the lime soda
0.00
put down the orange soda
0.00
find a sponge
0.00
Prompt
This is the request the human has given the robot. PaLM-SayCan is constrained to commands that are safe for a robot to perform.
I spilled my orange soda. Can you bring me something to clean it up, please?
Interpretation
This is how the language model has interpreted the request through chain of thought prompting.
The user has asked me to bring something to clean up the orange soda. I will bring the sponge.
Execution
PaLM-SayCan is highly interpretable. We can clearly examine and learn from every decision the system makes. The language model first scores whether each step is useful for completing the task.
The affordance model scores whether each step is possible for the robot and environment.
These scores are combined and the highest combination is chosen. This gives a step that is possible and useful for the request. The robot then executes that step and repeats this process in the new state.
find a sponge
1.00
find an orange soda
0.14
find a redbull
0.00
find a water
0.00
put down the orange soda
0.00
Language
Affordance
Combined score
See how the helper robot uses PaLM-SayCan to breakdown and execute successfully.
find a sponge
1.00
find an orange soda
0.14
find a redbull
0.00
find a water
0.00
put down the orange soda
0.00
pick up the sponge
1.00
bring it to you
0.01
pick up the orange soda
0.00
put down the sponge
0.00
go to the table
0.00
bring it to you
1.00
put down the sponge
0.00
go to the table
0.00
done
0.00
go to the close counter
0.00
put down the sponge
1.00
done
0.01
put down the orange soda
0.00
put down the energy bar
0.00
put down the redbull
0.00
done
1.00
find an orange soda
0.00
put down the orange soda
0.00
find a sponge
0.00
find a water
0.00
1.
find a sponge
1.00
find an orange soda
0.14
find a redbull
0.00
find a water
0.00
put down the orange soda
0.00
2.
pick up the sponge
1.00
bring it to you
0.01
pick up the orange soda
0.00
put down the sponge
0.00
go to the table
0.00
3.
bring it to you
1.00
put down the sponge
0.00
go to the table
0.00
done
0.00
go to the close counter
0.00
4.
put down the sponge
1.00
done
0.01
put down the orange soda
0.00
put down the energy bar
0.00
put down the redbull
0.00
5.
done
1.00
find an orange soda
0.00
put down the orange soda
0.00
find a sponge
0.00
find a water
0.00
Prompt
This is the request the human has given the robot. PaLM-SayCan is constrained to commands that are safe for a robot to perform.
I spilled my orange soda. Can you throw it away, please?
Interpretation
This is how the language model has interpreted the request through chain of thought prompting.
The user has asked me to throw away the orange soda. I will move it to the trash.
Execution
PaLM-SayCan is highly interpretable. We can clearly examine and learn from every decision the system makes. The language model first scores whether each step is useful for completing the task.
The affordance model scores whether each step is possible for the robot and environment.
These scores are combined and the highest combination is chosen. This gives a step that is possible and useful for the request. The robot then executes that step and repeats this process in the new state.
find an orange soda
1.00
pick up the orange soda
0.00
find a sponge
0.00
go to the table
0.00
find a redbull
0.00
Language
Affordance
Combined score
See how the helper robot uses PaLM-SayCan to breakdown and execute successfully.
find an orange soda
1.00
pick up the orange soda
0.00
find a sponge
0.00
go to the table
0.00
find a redbull
0.00
pick up the orange soda
1.00
put down the orange soda
0.00
go to the trash
0.00
pick up the redbull
0.00
go to the table
0.00
go to the trash
1.00
put down the orange soda
0.07
bring it to you
0.01
go to the table
0.00
go to the close counter
0.00
put down the orange soda
1.00
put down the redbull
0.00
done
0.00
put down the sponge
0.00
put down the water
0.00
done
1.00
find a sponge
0.00
put down the orange soda
0.00
find a orange soda
0.00
go to the table
0.00
1.
find an orange soda
1.00
pick up the orange soda
0.00
find a sponge
0.00
go to the table
0.00
find a redbull
0.00
2.
pick up the orange soda
1.00
put down the orange soda
0.00
go to the trash
0.00
pick up the redbull
0.00
go to the table
0.00
3.
go to the trash
1.00
put down the orange soda
0.07
bring it to you
0.01
go to the table
0.00
go to the close counter
0.00
4.
put down the orange soda
1.00
put down the redbull
0.00
done
0.00
put down the sponge
0.00
put down the water
0.00
5.
done
1.00
find a sponge
0.00
put down the orange soda
0.00
find a orange soda
0.00
go to the table
0.00
Prompt
This is the request the human has given the robot. PaLM-SayCan is constrained to commands that are safe for a robot to perform.
I spilled my orange soda. Can you throw it away, bring me something to help clean it up, and then bring me a replacement?
Interpretation
This is how the language model has interpreted the request through chain of thought prompting.
The user has asked me to do three tasks. I will do one and then the other.
Execution
PaLM-SayCan is highly interpretable. We can clearly examine and learn from every decision the system makes. The language model first scores whether each step is useful for completing the task.
The affordance model scores whether each step is possible for the robot and environment.
These scores are combined and the highest combination is chosen. This gives a step that is possible and useful for the request. The robot then executes that step and repeats this process in the new state.
find an orange soda
1.00
find a sponge
0.03
put down the orange soda
0.02
pick up the orange soda
0.01
go to the table
0.01
Language
Affordance
Combined score
See how the helper robot uses PaLM-SayCan to breakdown and execute successfully.