Artificial intelligence experts DeepMind is perhaps one of the most advanced and renowned in the world. They are behind the most advanced artificial intelligence systems such as the famous Alpha Zero and Alpha Go.
Recently however they made news when they submitted a paper to the peer-reviewed Artificial Intelligence journal. In it they propose that ‘reinforcement learning’ will be enough to make machines attain general artificial intelligence as seen in humans and animals. This is commonly known as artificial general intelligence (AGI).
Titled “Reward is enough” the paper argues ‘that intelligence and its associated abilities will emerge not from formulating and solving complicated problems but by sticking to a simple but powerful principle: reward maximization.’
This technique tries to mimic the evolution of natural intelligence in animals where trial-and-error in the natural world over billions of years were enough to develop the kind of abilities and behaviours that we today associate with general intelligence.
In fact, a lot of artificial intelligence systems in existence today have been the result of researchers trying to replicate certain elements of animal intelligence into machines. The understanding of the mammalian visual system, for instance, enabled developers to create systems that can recognize and categorize a wide range of images.
Similarly, the understanding of the language has powered the creation of natural language processing systems that answer questions, translate languages and can even generate text.
These are however rather narrow application of intelligence systems. Functioning in the real world is however much more complex and requires the combined applications of such systems and much more all at once. Some researchers argue that is the best path to achieving AGI. Combining all these systems to make one ‘super-system that can do all of these.
DeepMind is however taking a different approach to achieving general intelligence.
The researchers write:
“[We] consider an alternative hypothesis: that the generic objective of maximising reward is enough to drive behaviour that exhibits most if not all abilities that are studied in natural and artificial intelligence,”
This assertion is backed by the mechanisms of the natural world. Many scientists agree that there are no signs of intelligent design in complex organisms. Rather, it is through natural selection and mutation that lifeforms on the earth have been able to adapt and survive the earth’s harsh environments.
Through these simple rules living beings have been able to develop complex structures and even intelligence. These have been key to their survival through the years.
“The natural world faced by animals and humans, and presumably also the environments faced in the future by artificial agents, are inherently so complex that they require sophisticated abilities in order to succeed (for example, to survive) within those environments,” they explain. “Thus, success, as measured by maximising reward, demands a variety of abilities associated with intelligence. In such environments, any behaviour that maximises reward must necessarily exhibit those abilities. In this sense, the generic objective of reward maximization contains within it many or possibly even all the goals of intelligence.”
While this reasoning may be sound, some computer scientists have been quick to criticize this approach.
Samim Winiger, an AI researcher based in Berlin viewed the paper as a “somewhat fringe philosophical position, misleadingly presented as hard science.”
In somewhat typical DeepMind fashion, they chose to make bold statements that grabs attention at all costs, over a more nuanced approach,” said Winiger in an interview with CNBC. “This is more akin to politics than science.”
“The only thing that has fundamentally changed since the 1950/60s, is that science-fiction is now a valid tool for giant corporations to confuse and mislead the public, journalists and shareholders,” he added.
Another researcher Stephen Merity, also in an interview with CNBC highlighted the difficulty in general intelligence using this approach. “a stack of dynamite is likely enough to get one to the moon, but it’s not really practical,” he said.
DeepMind defended their position stating that reinforcement learning has been behind some of their most successful creations. Indeed, they used this approach in the development of Alpha Go and Alpha Zero.
They however stated that the company commits only a small portion of their team to reinforcement learning. They are also pursuing other fields of AI such as “population-based training” and “symbolic AI.”
No matter the criticism it will be interesting to see what DeepMind can achieve through reinforcement learning. As is often in the history of science and technology the best ideas often sound crazy at first.