3 Breakthroughs for Google’s DeepMind — and How They are Reshaping Robotics

May 21, 2024

Google spent its first two decades capturing the low-hanging fruit of the digital age: Web search, email, video, maps, and other software. Its move into laptops, speakers, and smartphones signaled a new ambition for hardware mastery, but even these products focused entirely on the manipulation of digital space. The real, physical world was just too messy and too difficult to tame, even for a company that seemed to succeed at everything it tried.

Google’s self-driving car company, Waymo, is a case in point. Work on this project began in 2009, and Waymo’s fully autonomous transportation services finally became available to the public in 2020. Even so, the performance of these vehicles has not always been smooth. Between occasional erratic driving, actual accidents, and safety-related recalls, Waymo’s robotaxis have struggled to win public confidence. The fleet continues to improve, but its overall numbers remain small — and even now, 15 years after development began, Waymo’s services remain confined to a few select cities in the American west.

For robots moving through an open-world environment, whether on two legs, four wheels, or anything in between, there is no such thing as low-hanging fruit. In fact, even designing a robot that can reach up on command and physically pick a piece of low-hanging fruit from a tree, is itself an incredibly difficult task.

But maybe not for much longer.

Google’s DeepMind team has made 3 advances that will help robots learn faster and operate more effectively. Each is a form of robotics transformer (RT), which translates data-based input to physical, motor-based output.

The technical names for these innovations — AutoRT, SARA-RT, and RT-Trajectory — refer to tasks that our own brains accomplish routinely and with relative ease, but represent major breakthroughs for the software that guides today’s robots. Together, they help bridge the divide between the sterile world of software code and the complex messiness of reality.

AutoRT

This module helps Google’s robots gather experiential data by having them perform and repeat a range of tasks in new environments. It is based on the kinds of large language models (LLMs) that ChatGPT uses to draw conclusions based on language-based inferences, as well as visual language models (VLMs) that make their inferences using visual data. In short, AutoRT enables a commonsense understanding of each robot’s instructions and environment, so that it can gather information from the outside world in much the same way that we do.

From there, AutoRT records and scores the performance of each robot according to how effectively its output (physical activity) helps to achieve its goals. Multiply this training data by dozens of robots, all working continuously, and the system soon learns which behaviors yield the best results.

SARA-RT

This module, short for Self-Adaptive Robust Attention for Robotics Transformers, essentially tells the software algorithm what to focus on. As you read this paragraph, notice that your own eyes are only able to focus on a few words at a time, even though an entire screenful of words is within your field of vision. By limiting our own focus area to a manageable level, our brains prevent themselves from being overwhelmed by superfluous information.

SARA-RT operates in a similar way, compressing and condensing the full range of data input (from high-resolution cameras, microphones, tactile sensors, and other sources) into a file size that is small enough to let Google’s robot sensors operate in the field without expending too much energy or putting undue strain on its processor. Data compression also lets robots make quicker decisions, which in turn helps them react more intuitively to sudden changes in their environment.

RT-Trajectory

This module generates a set of visual suggestions for robots that are about to undertake a given task, such as chopping vegetables or wiping down a dining room table. Because each task can be performed in an essentially infinite number of ways, robots without this software feature would spend additional time selecting between a wide range of approaches before deciding on a (probably sub-optimal) solution.

RT-Trajectory, informed by direct human training as well as insights gained through experiential data, overlays a suggested movement pattern onto the robot’s own visual field, thereby ‘showing’ it how to accomplish the task. Just as we visualize certain actions in our mind’s eye before performing them, robots can now benefit from the same trick.

Tools for exponential growth

Each of the above software tools is based, to a large extent, on an analogous feature of the human brain. But other exciting advances belong entirely in the domain of software, without much overlap in the realm of human experience.

Technology leader NVIDIA, for example, recently created a real-world physics simulator, then added a large, bouncy yoga ball to the simulation. Then it put a four-legged robot on top of the yoga ball, and gave it the task of keeping its balance without falling over. Extensive trial and error inside the simulator yielded insights about motion, balance, and the collection of sensory data. Eventually, as expected, the robot learned to stay atop the yoga ball.

But there’s a twist in the story. Incredibly, the same algorithm that trained in the simulator, without any modification, can keep an actual robot balanced on a real-world yoga ball. Because all sensory information is just data anyway, lessons learned in an accurately simulated environment can be directly translated to the real world. And because digital simulations can be performed millions of times faster than real-world trials, iterative improvement may soon approach exponential levels.

With these and other advancements being made in tech companies the world over, a future of more capable and helpful robots may be a lot closer than we think.

Share this article

Subscribe to InnoHub!

Stay updated and inspired

เรานำข้อมูลมาใช้เพื่อการส่งมอบคอนเทนต์และบริการอย่างเหมาะสม เราจะปกป้องความเป็นส่วนตัวของคุณ คุณสามารถอ่านข้อมูลเพิ่มเติมได้ที่ Privacy Policy และคลิกสมัครเพื่อดำเนินการต่อ