Vision language model and large language model powered embodied agent.
https://github.com/sponsors/Charmve?frequency=one-time&sponsor=Charmve
This repo was only available to my sponsors on GitHub Sponsors until I reached 15 sponsors.
Learn more about Sponsorware at github.com/sponsorware/docs 💰.
Because the language model output stays the same throughout the task, we can cache its output and re-evaluate the generated code using closed-loop visual feedback, which enables fast replanning using MPC. This enables VoxPoser to be robust to online disturbances.