The modern AI revolution began during an obscure research contest. It was 2012, the third year of the annual ImageNet competition, which challenged teams to build computer vision systems that would recognize 1,000 objects, from animals to landscapes to people.
In the first two years, the best teams had failed to reach even 75% accuracy. But in the third, a band of three researchers—a professor and his students—suddenly blew past this ceiling. They won the competition by a staggering 10.8 percentage points. That professor was Geoffrey Hinton, and the technique they used was called deep learning.
Hinton had actually been working with deep learning since the 1980s, but its effectiveness had been limited by a lack of data and computational power. His steadfast belief in the technique ultimately paid massive dividends. The fourth year of the ImageNet competition, nearly every team was using deep learning and achieving miraculous accuracy gains. Soon enough deep learning was being applied to tasks beyond image recognition, and within a broad range of industries as well.
Last year, for his foundational contributions to the field, Hinton was awarded the Turing Award, alongside other AI pioneers Yann LeCun and Yoshua Bengio. On October 20, I spoke with him at MIT Technology Review’s annual EmTech MIT conference about the state of the field and where he thinks it should be headed next.
The following has been edited and condensed for clarity.
I do believe deep learning is going to be able to do everything, but I do think there’s going to have to be quite a few conceptual breakthroughs. For example, in 2017 Ashish Vaswani et al. introduced transformers, which derive really good vectors representing word meanings. It was a conceptual breakthrough. It’s now used in almost all the very best natural-language processing. We’re going to need a bunch more breakthroughs like that.
Yes. Particularly breakthroughs to do with how you get big vectors of neural activity to implement things like reason. But we also need a massive increase in scale. The human brain has about 100 trillion parameters, or synapses. What we now call a really big model, like GPT-3, has 175 billion. It’s a thousand times smaller than the brain. GPT-3 can now generate pretty plausible-looking text, and it’s still tiny compared to the brain.
Both. There’s a sort of discrepancy between what happens in computer science and what happens with people. People have a huge amount of parameters compared with the amount of data they’re getting. Neural nets are surprisingly good at dealing with a rather small amount of data, with a huge numbers of parameters, but people are even better.
I agree that that’s one of the very important things. I also think motor control is very important, and deep neural nets are now getting good at that. In particular, some recent work at Google has shown that you can do fine motor control and combine that with language, so that you can open a drawer and take out a block, and the system can tell you in natural language what it’s doing.
For things like GPT-3, which generates this wonderful text, it’s clear it must understand a lot to generate that text, but it’s not quite clear how much it understands. But if something opens the drawer and takes out a block and says, “I just opened a drawer and took out a block,” it’s hard to say it doesn’t understand what it’s doing.
A long time ago in cognitive science, there was a debate between two schools of thought. One was led by Stephen Kosslyn, and he believed that when you manipulate visual images in your mind, what you have is an array of pixels and you’re moving them around. The other school of thought was more in line with conventional AI. It said, “No, no, that’s nonsense. It’s hierarchical, structural descriptions. You have a symbolic structure in your mind, and that’s what you’re manipulating.”
I think they were both making the same mistake. Kosslyn thought we manipulated pixels because external images are made of pixels, and that’s a representation we understand. The symbol people thought we manipulated symbols because we also represent things in symbols, and that’s a representation we understand. I think that’s equally wrong. What’s inside the brain is these big vectors of neural activity.
Absolutely. I have good friends like Hector Levesque, who really believes in the symbolic approach and has done great work in that. I disagree with him, but the symbolic approach is a perfectly reasonable thing to try. But my guess is in the end, we’ll realize that symbols just exist out there in the external world, and we do internal operations on big vectors.
Well, my problem is I have these contrarian views and then five years later, they’re mainstream. Most of my contrarian views from the 1980s are now kind of broadly accepted. It’s quite hard now to find people who disagree with them. So yeah, I’ve been sort of undermined in my contrarian views.