The primary factor behind this victory was deep learning. Disruptive innovations in this technology are occurring one after another, enabling achievements that were impossible for decades.
So, what exactly can deep learning do? Simply put, it enables three key capabilities: "recognition," "motor skill mastery," and "language comprehension." Let's explain these in order.
Artificial intelligence distinguishing between dogs and wolves
Matsuo: First, "recognition" refers to image recognition. Humans can instantly distinguish between photos of cats, dogs, and wolves, but this classification was extremely difficult for artificial intelligence. Computers judge based on features like round eyes for cats, long eyes and drooping ears for dogs, and long eyes and pointed ears for wolves. Consequently, previous AI systems would mistakenly classify photos of Siberian Huskies as wolves.
Humans, however, see a Siberian Husky and think "It looks wolf-like," yet still judge it to be a dog. Ask them to define that "dog-like quality" in words, though, and they struggle to provide a clear answer. These subtle human judgment criteria are called "features." As long as humans were defining these features, image recognition accuracy remained low.
Until now, all artificial intelligence involved humans modeling the real world, followed by machines performing automatic calculations. Recently, however, AI has begun to identify and abstract the crucial elements from the real world itself. The technology that sparked this shift is "deep learning."
As a result, image recognition accuracy improved dramatically. In 2012, an AI emerged with an error rate below 16%. The error rate continued to drop: to 11.7% in 2013, 6.7% in 2014, and by 2015, Microsoft achieved 4.9% and Google reached 4.3%. Since the human error rate for image recognition is 5.1%, 2015 marked the first time computers surpassed human accuracy in image recognition.
In the real world, humans perform numerous tasks utilizing image recognition capabilities. This result signifies the potential for automating all of them.

Robots can now learn on their own
Matsuo: The next development is "motor skill acquisition." Robots can now practice and improve on their own. Reinforcement learning technology has existed for a long time, where actions in specific situations are learned by labeling them as "good" or "bad." However, until now, humans defined the features used for these specific situations. Now, with deep learning, artificial intelligence can automatically extract these features itself.
In 2013, experimental footage was released showing AlphaGo learning to play Breakout. In this, the AI learned through image recognition how to move the round ball and the paddle to score points more easily. It started poorly but gradually improved, eventually beginning to aim for the left side. It realized this was where the highest scores could be achieved.
Computers have long excelled at medical diagnosis and proving mathematical theorems, but struggled with tasks like image recognition or stacking blocks—tasks a three-year-old child can do. This was called "Moravec's Paradox," and this situation persisted for decades. Now, it is beginning to change.
AI Understanding Language
Matsuo: Artificial intelligence is gradually gaining the ability to "understand language." For example, technologies are emerging where inputting an image generates text, or where text is expressed as an image.
This technology can be applied to translation. Previous translation relied on statistical language processing, which didn't understand meaning. However, translation mediated by images becomes translation that understands meaning. Just recently, Google Translate switched to a deep learning version, significantly improving its accuracy. Now, if you put a research paper into Google Translate, you can almost understand the meaning.
The reason this is possible is because artificial intelligence has gained an "eye" through image recognition. With this visual capability, tasks traditionally performed by humans using their eyes—such as in agriculture, construction, and food processing—can now be handled by robots and machines.
For example, when harvesting tomatoes, robots can now distinguish between "good tomatoes" and "bad tomatoes." Using robots significantly reduces costs and enables disease detection. With further evolution, nearly 100% mechanized tomato farms could be exported overseas as-is.
Introducing machines with vision into every conceivable industry, then service-izing and platform-izing them for overseas expansion, should create major new industrial sectors.
However, deep learning technology itself will become commoditized going forward. When that happens, the ultimate competitive advantage will lie in "data and hardware." Hardware is a strong suit for Japan, an area where Western companies struggle to catch up. By leveraging deep learning technology on a manufacturing foundation and advancing the global expansion of platforms, Japan can establish a dominant position.
Simultaneously, the utilization of artificial intelligence necessitates broader societal discussion. We must consider how to approach the "trolley problem" – such as sacrificing one person to avoid endangering many in autonomous vehicle scenarios – and how the international community views military applications. Discussions on intellectual property and rights are also essential.
We humans must engage in a society-wide discussion about what purpose we want to give artificial intelligence and what kind of society we want to build.
Namikawa: Thank you. Later, I'd also like to ask Professor Matsuo in detail about how advertising agencies should utilize artificial intelligence.
*Continued in Part 2
You can also read the interview here on Adtai!
Planning & Production: Dentsu Live Inc. Creative Unit Creative Room 2, Aki Kanahara