New deep reinforcement learning technique helps AI to evolve

Hundreds of millions of years of evolution have produced a variety of life-forms, each intelligent in its own fashion. Each species has evolved to develop innate skills, learning capacities, and a physical form that ensures survival in its environment.

But despite being inspired by nature and evolution, the field of artificial intelligence has largely focused on creating the elements of intelligence separately and fusing them together after the development process. While this approach has yielded great results, it has also limited the flexibility of AI agents in some of the basic skills found in even the simplest life-forms.

In a new paper published in the scientific journal Nature, AI researchers at Stanford University present a new technique that can help take steps toward overcoming some of these limits. Called “deep evolutionary reinforcement learning,” or DERL, the new technique uses a complex virtual environment and reinforcement learning to create virtual agents that can evolve both in their physical structure and learning capacities. The findings can have important implications for the future of AI and robotics research.

Evolution is hard to simulate

In nature, the body and brain evolve together. Across many generations, every animal species has gone through countless cycles of mutation to grow limbs, organs, and a nervous system to support the functions it needs in its environment. Mosquitos are equipped with thermal vision to spot body heat. Bats have wings to fly and an echolocation apparatus to navigate dark spaces. Sea turtles have flippers to swim with and a magnetic field detector system to travel very long distances. Humans have an upright posture that frees their arms and lets them see the far horizon, hands and nimble fingers that can manipulate objects, and a brain that makes them the best social creatures and problem solvers on the planet.

Interestingly, all these species descended from the first life-form that appeared on Earth several billion years ago. Based on the selection pressures caused by the environment, the descendants of those first living beings evolved in many directions.

Studying the evolution of life and intelligence is interesting, but replicating it is extremely difficult. An AI system that would want to recreate intelligent life in the same way that evolution did would have to search a very large space of possible morphologies, which is extremely expensive computationally. It would need a lot of parallel and sequential trial-and-error cycles.

AI researchers use several shortcuts and predesigned features to overcome some of these challenges. For example, they fix the architecture or physical design of an AI or robotic system and focus on optimizing the learnable parameters. Another shortcut is the use of Lamarckian rather than Darwinian evolution, in which AI agents pass on their learned parameters to their descendants. Yet another approach is to train different AI subsystems separately (vision, locomotion, language, etc.) and then tack them on together in a final AI or robotic system. While these approaches speed up the process and reduce the costs of training and evolving AI agents, they also limit the flexibility and variety of results that can be achieved.

Deep evolutionary reinforcement learning

In their new work, the researchers at Stanford aim to bring AI research a step closer to the real evolutionary process while keeping the costs as low as possible. “Our goal is to elucidate some principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control,” they wrote in their paper.

Within the DERL framework, each agent uses deep reinforcement learning to acquire the skills required to maximize its goals during its lifetime. DERL uses Darwinian evolution to search the morphological space for optimal solutions, which means that when a new generation of AI agents are spawned, they only inherit the physical and architectural traits of their parents (along with slight mutations). None of the learned parameters are passed on across generations.

“DERL opens the door to performing large-scale in silico experiments to yield scientific insights into how learning and evolution cooperatively create sophisticated relationships between environmental complexity, morphological intelligence, and the learnability of control tasks,” the researchers wrote.

Simulating evolution

For their framework, the researchers used MuJoCo, a virtual environment that provides highly accurate rigid-body physics simulation. Their design space is called Universal Animal (Unimal), in which the goal is to create morphologies that learn locomotion and object-manipulation tasks in a variety of terrains.

Each agent in the environment is composed of a genotype that defines its limbs and joints. The direct descendant of each agent inherits the parent’s genotype and goes through mutations that can create new limbs, remove existing limbs, or make small modifications to characteristics, such as the degrees of freedom or the size of limbs.

Each agent is trained with reinforcement learning to maximize rewards in various environments. The most basic task is locomotion, in which the agent is rewarded for the distance it travels during an episode. Agents whose physical structures are better suited for traversing terrain learn faster to use their limbs for moving around.

To test the system’s results, the researchers generated agents in three types of terrains: flat (FT), variable (VT), and variable terrains with modifiable objects (MVT). The flat terrain puts the least selection pressure on the agents’ morphology. The variable terrains, on the other hand, force the agents to develop a more versatile physical structure that can climb slopes and move around obstacles. The MVT variant has the added challenge of requiring the agents to manipulate objects to achieve their goals.

The benefits of DERL

An image of AI-generated shapes in different configurations and a set of data tables regarding their morphological results.

Above: Deep evolutionary reinforcement learning generates a variety of successful morphologies across different environments.

Image Credit: TechTalks

One of the interesting findings of DERL is the diversity of the results. Other approaches to evolutionary AI tend to converge on one solution because new agents directly inherit the physique and learnings of their parents. But in DERL, only morphological data is passed on to descendants; the system ends up creating a diverse set of successful morphologies, including bipeds, tripeds, and quadrupeds with and without arms.

At the same time, the system shows traits of the Baldwin effect, which suggests that agents that learn faster are more likely to reproduce and pass on their genes to the next generation. DERL shows that evolution “selects for faster learners without any direct selection pressure for doing so,” according to the Stanford paper.

“Intriguingly, the existence of this morphological Baldwin effect could be exploited in future studies to create embodied agents with lower sample complexity and higher generalization capacity,” the researchers wrote.

Finally, the DERL framework also validates the hypothesis that more complex environments will give rise to more intelligent agents. The researchers tested the evolved agents across eight different tasks, including patrolling, escaping, manipulating objects, and exploration. Their findings show that in general, agents that have evolved in variable terrains learn faster and perform better than AI agents that have only experienced flat terrain.

Their findings seem to be in line with another hypothesis by DeepMind researchers that a complex environment, a suitable reward structure, and reinforcement learning can eventually lead to the emergence of all kinds of intelligent behaviors.

AI and robotics research

The DERL environment only has a fraction of the complexities of the real world. “Although DERL enables us to take a significant step forward in scaling the complexity of evolutionary environments, an important line of future work will involve designing more open-ended, physically realistic, and multiagent evolutionary environments,” the researchers wrote.

In the future, the researchers plan to expand the range of evaluation tasks to better assess how the agents can enhance their ability to learn human-relevant behaviors.

The work could have important implications for the future of AI and robotics and push researchers to use exploration methods that are much more similar to natural evolution.

“We hope our work encourages further large-scale explorations of learning and evolution in other contexts to yield new scientific insights into the emergence of rapidly learnable intelligent behaviors, as well as new engineering advances in our ability to instantiate them in machines,” the researchers wrote.