Category
Theme
image
Realistic Photograph of a huge meteorite falling in the sky over Tokyo, impressive quality. Generated by DALL·E2

The summer of 2022 may become a major turning point in AI history.

Hello. I'm Kodama, head of " AI MIRAI," which promotes AI utilization within the Dentsu Group. Today, I'd like to explore image-generating AI, which is generating significant buzz.

Starting with the July beta release of DALL·E2(※1), followed by Midjourney(※2) and StableDiffusion(※3) in August, image generation AI services have been released in quick succession. Since many are free or available for a small fee, making them accessible to anyone, this has become major news.

※1 DALL·E2 = An AI model developed by OpenAI that generates images from text.
※2 Midjourney = An AI model developed by Midjourney Inc. (US) that generates images from text. Operates on Discord.
※3 Stable Diffusion = An AI model developed by UK-based Stability.ai that generates images from text. It is open-sourced.


Furthermore, because Stable Diffusion is open-source, customized versions have appeared since its initial release, and its integration into other services is being explored, showing even greater potential for expansion. As of late August when this was written, information about new services is coming in daily, creating a festival-like atmosphere.

Using these image-generating AIs, even those who previously struggled with illustration can now easily and freely create art or generate photo materials that once required advanced compositing techniques. Will this take away work from illustrators and designers? And how will it change work in our advertising and creative industries?

Here, we'll explore not just the technology visible today, but also the changes likely to arrive in the near future.

Generate high-precision images with just text input

Before looking to the future, let's first confirm what's possible today.

The three services mentioned earlier are all tools that "generate images from text." For example, inputting text like "A giant UFO visiting the skies above Tokyo" (prompt – a command prompt, colloquially called a "spell") generates an image like this in under a minute. Currently, most only support English, but AI makes English translation instantaneous too.

image
High-resolution illustration of a giant UFO flying over Tokyo. Generated by Stable Diffusion

While each AI has its strengths and weaknesses, aside from some unnatural details, the generated images are highly sophisticated and, at first glance, rival the work of professional illustrators. Currently, illustrations seem to be their specialty, but photo generation is also possible. For example, a "black-and-white photo of a robot drinking a martini at a bar" is generated as follows.

image
A monochrome photo of a robot drinking a martini at a bar Generated by DALL·E 2

Since AI simultaneously learns from various images and language, crafting prompts can yield images closer to your vision or with higher precision. For instance, adding "4K" at the end of a prompt or inputting a magazine name can generate images resembling 4K quality or those fit for publication in that magazine.

The key points are "foundational models" and "open access"

However, we believe the impact of these AI tools extends far beyond "easily generating illustrations." When considering their future implications, two key points stand out: these AI tools are "foundation models," and they represent a trend toward "openization."

First, regarding "foundation models." While technical details are omitted, these are essentially AI systems trained on vast amounts of data to handle diverse tasks rather than specific applications.

Traditional AI systems were often trained on data tailored to a single purpose (task) and then output results accordingly. For example, "training it on English-Japanese paired data enables machine translation." However, recent foundation models increasingly include highly versatile ones that, by learning vast amounts of text data, can handle anything text-related—from quizzes to translation to programming.

Furthermore, in recent years, these models have expanded beyond text data. By learning from vast amounts of images and audio as well, they can now handle tasks that span different data formats. This image-generating AI is precisely one such capability of these foundation models.

image
This means that in the future, it will be possible to generate not only images but various types of data. For example, text input can generate audio or music, and models are gradually becoming capable of handling 3D models and videos as well.

Another trend is "open access. "

While efforts to develop these foundational models have been underway for years, what's truly groundbreaking this time is that they are being made available, along with their training data, at low cost or even for free.

Building foundational models requires massive amounts of data and computational resources, often in the billions or tens of billions. Until now, development was largely monopolized by major research institutions like OpenAI and Google Research, with commercial use restricted.

However, their release this summer has sparked the emergence of unprecedented services. As mentioned earlier, the capabilities of these foundational models extend far beyond image generation. In the future, a wide range of functionalities will become accessible to anyone.

It's a Cambrian explosion of diverse AI emerging simultaneously.

image
High-resolution illustration of the Cambrian sea. All swimming creatures are robots. Generated by DALL·E 2

Services likely to emerge and points to consider

Considering these trends, numerous services are likely to emerge over the next few months to years. For example, services like these might appear:

  • As they gain the ability to handle audio, it will become possible to "generate sound effects (SE) and music from text input" or "generate optimal background music (BGM) from photos or videos".
  • Since videos and 3D models are extensions of still images, it will also become possible to generate objects and avatars for the metaverse, as well as short videos.
  • Beyond creating new illustrations, it will be possible to extend existing illustrations or photos, and blend backgrounds with objects (compositing).
  • Text generation will also become more natural than ever before, enabling the system to read images and provide appropriate comments, or compile insights from various data sources into reports and articles using natural language.

In fact, the technology for most of the above has already been developed and will be implemented sooner or later.

Considering the trend toward openness, these features will likely be integrated into services we already use, rather than offered as standalone services. I feel the day is near when diverse generative capabilities will be built into the office software and creative apps we use daily.

On the other hand, the development and use of these tools will require legal and ethical constraints.

For example, copyright. Currently, if the person operating the AI demonstrates creativity in the output generated by AI, they are granted copyright. However, as technology advances, this concept will likely be debated and could change significantly. Precisely because this is an area where legal frameworks struggle to keep pace, system-level mechanisms to prevent misuse like plagiarism or theft of ideas become crucial.

Additionally, it is necessary to consider biases in training data and outputs in advance, ensuring consideration for DE&I (Diversity, Equity, and Inclusion), and exercising caution when dealing with celebrities or characters.

image
Portrait of a firefighter from the front Generated by DALL·E2
Current AI systems have not eliminated biases related to DE&I. When prompted with "firefighter," images of women appear, but they are all white.

Understanding legal frameworks, recognizing biases, and other ethical challenges will also be required of us living in the AI era.

How will creativity change?

Now, we finally reach the main topic. How will these image-generating AIs (or foundational models) change creative work in the advertising industry?

One trend will be the widespread adoption of workflows that leverage AI to streamline production and creation. In the world of translation, services are emerging that deliver fast and accurate translations not by relying solely on humans or AI, but by having AI perform the initial translation followed by human refinement of the details. Similarly, it seems likely that AI will handle the rough drafts, with human designers taking over for the finishing touches. In fact, within the Dentsu Group, we've already begun utilizing these tools for proposal materials and concept visuals, and we're also advancing R&D on ad generation using foundation models.

Creators will also need to understand technology. They must keep their antennae tuned to the fast-moving tech industry, discern what's possible with current technology and what will be possible just ahead, and design new customer experiences or evolve the tools they use. Technical skill and creativity will become more inseparable than ever before.

Finally, remember that all these AIs are merely tools. Not just in advertising, but for human creators like illustrators, writers, and photographers, more essential insight is required. As AI-generated content becomes commonplace, people may stop being moved by merely beautiful illustrations or videos. The human mission will be to constantly explore new horizons of expression—to discover what kind of expression and what kind of context can truly move people.

image
High-definition illustration of a huge meteorite falling in the sky over Tokyo beautiful sky impressive quality
Generated by Stable Diffusion

I believe the proliferation of generative AI is akin to the invention of the camera. When cameras emerged in the 1800s and gradually became widespread, landscape and portrait painters who relied solely on realism lost their livelihoods. In fact, Dominique Ingres, a leading figure at the French Academy of Fine Arts at the time, is said to have lobbied the government to ban photography.

On the other hand, this sense of crisis is said to have pulled painters away from realism, spurring the birth of Impressionism and driving the development of modern painting that led to abstract art. Furthermore, it paved the way for a new artistic genre: "art photography," utilizing the new technology of the camera. The range of artistic expression expanded greatly, triggered by the arrival of the camera, this technological black ship.

Considering this, the development of AI, while harboring diverse challenges, should ultimately become a catalyst that greatly expands the scope for creators to thrive. We intend to continue assessing technological trends, sometimes leading the way, to present new value and experiences to society.

image
"Impression: Sunrise" painted in 2050 AD, a sea of metal, artificial sun Generated by Stable Diffusion
 
Twitter

Was this article helpful?

Share this article

Author

Kodama Takuya

Kodama Takuya

Dentsu Group Inc. / dentsu Japan

After working as a client-facing producer for digital platform companies, he has been promoting the use of AI both within and outside the company since 2018. He is currently affiliated with Dentsu Group Inc., where he is involved in the AI and technology strategy for the entire Dentsu Group, encompassing not only Japan but also overseas operations.

Also read