Bots allow ChatGPT access to the real world thanks to Microsoft

by time news

Microsoft

Last week, Microsoft researchers announced an experimental framework for controlling robots and drones using the language capabilities of ChatGPT, a popular AI language model created by OpenAI. Using natural language commands, ChatGPT can write special code that controls the bot’s movements. The human then sees the results and adjusts as necessary in order for the task to be completed successfully.

The research arrived in an article entitled “ChatGPT for Bots: Design Principles and Model Capabilities,” by Sai Vemprala, Rogerio Bonatti, Arthur Bucker, and Ashish Kapoor of Microsoft Autonomous Systems and Robotics Group.

In a demo video, Microsoft shows the bots — apparently controlled by code written by ChatGPT while following human instructions — using a bot arm to organize blocks in the Microsoft logo, piloting a drone to check the contents of a shelf or finding things to use a bot with vision capabilities.

Microsoft “ChatGPT for Robotics” demo video.

To make ChatGPT interface with bots, the researchers taught ChatGPT a custom botnet API. When given instructions such as “catch the ball”, ChatGPT can generate an automated control code as if it were writing a poem or completing an article. After a human has checked and modified the code to verify its accuracy and security, the human operator can perform the task and evaluate its performance.

In this way, ChatGPT speeds up the programming of robotic commands, but it is not a stand-alone system. The article says “We affirm that the use of ChatGPT for bots is not a fully automated process, but rather serves as a tool to enhance human capabilities.”

Diagram provided by Microsoft showing how ChatGPT for Robotics works.
Zoom in / Diagram provided by Microsoft showing how ChatGPT for Robotics works.

Microsoft

Although it appears that most of the feedback on ChatGPT (in terms of the success or failure of its actions) comes from humans in text form, the researchers also claim to have had some success with inputting visual data into ChatGPT itself. In one example, the researchers instructed ChatGPT to command a bot to catch a basketball with feedback from the camera: “ChatGPT can estimate the appearance of the ball and sky in the camera image using SVG code. This behavior raises the possibility that LLM follows an implicit world pattern that transcends text-based possibilities. “

Although the results seem crude at the moment, they represent the first attempts to apply today’s latest technology — large language models — to automatic control. According to Microsoft, the ChatGPT interface could open up bots to a much wider audience in the future.

“Our goal with this research is to find out if ChatGPT can think beyond text and mind about the physical world to help with automated tasks,” Microsoft Research reads. blog post. “We want to help people interact more easily with robots, without having to learn complex programming languages ​​or details about robotic systems.”

You may also like

Leave a Comment