Published Date: 2015/09/03

Designing the Digital Shapes the World—The Creator of Hatsune Miku Speaks on the "Future" ~ Mr. Wataru Sasaki, Crypton Future Media

Wataru Sasaki

Crypton Future Media

#Digital/Technology#Social issue#Digital#Social#Monthly Dentsu Inc. News

Hatsune Miku originally began as one of many desktop music (DTM) software products that generate songs using artificial voices. However, Hatsune Miku has now far surpassed that framework, demonstrating a cultural and social reach that extends far beyond. With successful concerts held overseas one after another, she has firmly established herself as a major movement symbolizing Japanese internet culture. This time, we spoke at length with Wataru Sasaki of Crypton Future Media, who was actually in charge of developing Hatsune Miku and is called her "creator," about the future of the internet and digital technology as seen through Hatsune Miku.
(Interviewer: Yuzo Ono, Planning Promotion Department Manager, Dentsu Digital Inc. Business Bureau)

Hatsune Miku sings the raw, relatable truths of the younger generation, including their sense of stagnation toward the "future"

──The name "Hatsune Miku" means "the first sound from the future." Does that name carry a sense of hope for the future?

Sasaki: Well, I've always loved futuristic audio equipment and technology. Since middle school, I've been fascinated by sound technology like synthesizers, records, and tape editing. By high school, I was already buying Crypton's products. Audio technology itself was inherently futuristic, especially in the 20th century. It was often linked to military technology; theremins and synthesizers were used in sci-fi movie soundtracks, and sound mixers were depicted as spaceship interiors. Sound technology was intrinsically tied to images of the future and science fiction.

Furthermore, by the early 1990s, consumer-grade hard disk recorders for full digital sound editing were already available. This led to the rise of personal production styles like home recording, DTM (Digital Music Production), and bedroom techno. Through dance music and other genres, this culture of individual production spread worldwide. Amidst this, the one thing that remained unrealized as a "possibility of the future" until the very end was a "synthesizer that could freely manipulate the human voice" like VOCALOID. I felt this was a truly futuristic endeavor.

The VOCALOID technology itself was developed by Yamaha and Spain's Pompeu Fabra University. Several software programs existed before Hatsune Miku, but their purpose was intended as "demo vocals" to let singers hear the song during the music production process. However, Hatsune Miku was conceived not merely as a demo vocal tool, but as a virtual singer, or an android-like idol singer, designed to have a futuristic presence and presence in her own right.

──Besides the name Hatsune Miku, were there any other naming proposals?

Sasaki: We didn't brainstorm many options and decide in a meeting; we progressed while editing the voice samples. I was practically the only full-time person assigned to it... well, actually, I also had marketing duties for other products, so everyone was working on it part-time. At Crypton back then, it was the smallest, most after-school club-like project. Because of that, we were juggling various tasks simultaneously. A major factor in deciding the image strategy, including the name Hatsune Miku, was the design. The motif used was Yamaha's legendary synthesizer, the DX7. Had we not been able to use that, her outfit might have been something like a sailor uniform, lacking any grounded sense of the future, and the name could have been different too. The hair color wasn't green initially either. But thanks to the efforts of the Yamaha representative, we secured permission to use the DX7 as the design motif, which is why we used that color throughout.

──Among fans, there seems to be quite a few who express liking Hatsune Miku because they feel a sense of the future in her. How do you personally feel about that?

Sasaki: Futuristic things are great because you can freely imagine them within your own fantasies and daydreams. So, I was happy that people perceived her that way. When I first created Hatsune Miku, I didn't have a clear image of the user, let alone the listener. I envisioned reaching people similar to myself—synthesizer enthusiasts, or those drawn to new technology even if they didn't fully understand it yet, people with a forward-thinking mindset.

Initially, it resonated directly with fans of sci-fi, androids, robot tech, cute girl anime... people who loved things like the Macross songstress, and it really took off. Then, I remember it gradually spreading to younger audiences. It felt like it evolved from appealing to middle-aged men who loved the future to resonating with young people who have to live in that future. Creators express everything through Hatsune Miku's voice: songs portraying Miku herself, bright J-POP-style pop and rock, and even songs reflecting their own feelings of stagnation or detachment towards the future and romantic relationships – songs that are almost fleeting.

While the image used to be dominated by pop songs, lately there's been an increase in tracks featuring her ethereal, floating voice. Many of these are somewhat mysterious pop-rock songs, like transparent vocals swaying in the air. This range, encompassing both serious and pop-oriented works, feels like a coexistence and harmony of diverse artistic inclinations. It suggests a culture where various expressions are accepted.

──I see. So when you say "future," it's not about hope for the future, but rather a sense of stagnation or resignation toward it.

Sasaki: When we say "future," it might sound dramatic, but it's more like a "sense of stagnation about the world we'll inhabit." Personally, I'm pretty anxious too—does everyone else feel different? Commercial pop songs, driven by tie-ins like commercials and dramas, are constantly "played" to encourage romance and consumption, or to stir up emotions inherent in people's lives. But compared to VOCALOID songs, I get the impression there are fewer that openly express "real, raw feelings" like "anxiety about the future." Many famous VOCALOID songs push the boundaries of what's broadcastable, right? In real life, I think that chain of honesty—where even negative emotions like "anxiety about the future" are expressed directly—is both the origin and the destination.

The internet changed the traditional creative format that aimed to convey overwhelming things

──Hatsune Miku is often cited as a prime example of secondary creation. What are your thoughts on that?

Sasaki: I think things that get fan-created have factors that make them easy to create. Hatsune Miku is arguably a very rare case. If you think about typical anime works, some story is contained within the constraints of 12 episodes of 30-minute anime. But for characters who are presumably alive within the anime, there are also parts of their lives and stories outside the main story. Fans naturally think, "Why not create and enjoy the ideal scenes, alternate worlds, or stylized versions we want to see?" This sparks imagination, which ultimately leads to fan creations. But Hatsune Miku has almost no original character design. The only concrete elements are her voice and the image of a twin-tailed girl.

This means stories and worlds created by users are essentially secondary creations that are extremely close to primary creations. I believe that seeing the primary creation, Hatsune Miku, emerge as a "singer whose existence was validated by majority vote" through secondary creations changed the stereotypical concept of fan works and reaffirmed its strength. In terms of increasing creative freedom online, I think it was a step forward. To put it bluntly, recent copyright extension measures and other "conveniences for protecting adult rights" are like cold water on the drive for online creation. Personally, I feel that going forward, a system where corporate rights assertions don't stand out negatively will likely be preferred.

──Speaking of derivative works, there's an interesting anecdote about Hachune Miku, the leek-wielding character who appeared online just five days after Hatsune Miku's release. I heard Mr. Sasaki thought, "Well, whatever happens, happens."

Sasaki: Back then, everyone just seemed so happy. It felt like Hatsune Miku was being carried on a portable shrine, and that excitement was spreading everywhere. The sense of using the digital space of Nico Nico Douga as a festival ground was incredible. I also felt that the fundamental, compelling essence of the music itself was connecting with the videos. I could talk about this forever.

What was wonderful about the creative process on NicoNico Douga online was that it wasn't commonplace; it was a space where you could freely experiment with all sorts of ideas that came to mind. While everyone was facing the same direction, each person presented unique ideas, which then chained together. Simultaneously, variations with different interpretations emerged, and they were applied through technological means. This chain of creation and surprise was what made it so good.

──It's often pointed out that there are few real-world examples of collective intelligence producing masterpieces in the artistic creation world. But I actually feel that might just be because the format is different. Hatsune Miku seems fundamentally created in a format entirely distinct from, say, film.

Sasaki: Initially, Hatsune Miku's content often had a humorous tone and fostered an extremely accessible atmosphere. There was a genuinely positive mood or vibe permeating it. This wasn't about collective intelligence; it was more like everyone bringing their own creations to share in a lively, communal spirit, enjoying each other's work as if watching fireworks.

If there's a type of collective intelligence that's hard to achieve online, it's probably when companies or individuals aim to create something overwhelmingly valuable with a specific purpose. Creating something that massive, involving repeated division and integration of labor, requires both money and time, and it's difficult for a soft, shared sense of empathy to develop. What was special about VOCALOID's online creation was its cycle: self-contained individual creations linked and cascaded together, fueled by the daily, close interaction between creators and listeners, which in turn generated new vitality.

──In other words, the creative format in the pre-internet era was designed to convey overwhelming works, but the internet itself changed that format.

Sasaki: Formats favored online tend to be accessible and lively, while older formats designed as closed systems where viewers couldn't interact might be at a disadvantage. For instance, knowledge intended for music enthusiasts or the rules and etiquette defining music genres can seem intimidating to both new creators and fans. Anime songs and VOCALOID, where the rules as music genres and the absolute pioneers or contributors aren't clearly defined, seemed to have a format that was easier to participate in.

On the other hand, the internet means most content is always accessible and caught in a constantly expanding spiral. It's hard to avoid the feeling that this era could lead to the individual significance of each piece of vast content becoming diluted. The meaning and importance of "search" only grows heavier, I think.

Wataru Sasaki

Is what voice actress Saki Fujita feels the cutting edge of the future?

──What are your thoughts on the music created by Vocaloid producers (people who compose and perform using software like Hatsune Miku's VOCALOID)?

Sasaki: First, I recall there being an overwhelming variety of songs that defied explanation. In the early days, sounds like straightforward pop or rock—where it felt like Hatsune Miku herself was singing, or had a universal, strong narrative quality—were popular. I myself thought that was the domain of VOCALOID's Hatsune Miku characters. I remember this continuing until around 2007-2010, and I think it's fair to say this period shaped the personal impression of Hatsune Miku as a character. My impression is that back then, in that time and place, many songs emerged from a convergence of chance and necessity – songs everyone wanted everyone else to hear. Lately, I feel songs with originality in their worldview or atmosphere, or those with a twisted expression, are becoming more prominent.

Also, while Hatsune Miku doesn't possess human-like vocal abilities, many young creators and listeners today aren't necessarily seeking human-like expression. Instead, they're drawn to the speed and rhythm of the words, the information density of the meaning, and ways of using rhyme to showcase the appeal of Japanese in a different way from rap or poetry reading. We've seen a lot of works like this emerge in recent years. How should I put it... I think part of it is that songs have diverged from human singers, and attention is shifting to the "words themselves, devoid of human presence" that VOCALOID uniquely enables. Within this somewhat Galapagos-like environment, a distinctive sense of tempo and rhythmic styles suited to internet slang are gradually emerging.

──Some point out that due to VOCALOID's influence, J-POP has seen fewer choruses with English lyrics, with lyrics describing emotions and situations in Japanese becoming the mainstream.

Sasaki: In the mid-1990s, global music trends were shifting due to hip-hop and R&B. When Japanese music followed American trends, it felt like we couldn't immediately catch up to the point of replacing the English atmosphere with Japanese. So, simply keeping the chorus in English was a common approach. However, English doesn't easily convey the emotional weight inherent in its meaning. Today's Japanese songs, having absorbed rap and English expressions, may have also developed in how melodies and Japanese lyrics are arranged.

──That brings to mind the "Japanese Rock Debate" from the 70s (the debate over whether rock music should be sung in English or if rock was possible in Japanese).

Sasaki: That's right. The free-spirited approach of pioneers like Haruomi Hosono (who was also a member of "Happy End," central to the Japanese rock debate) and the wordplay sensibility of today's VOCALOID might actually share a similar approach.

──How do you think the relationship between digital music and music played by humans will evolve in the future?

Sasaki: Technology, by its very nature, will keep changing, waving the banner of evolution. Within that, I think software will advance that assists music-making techniques, making creation more accessible. Composition-aid software will likely push the systematization of pop music. If technology advances to archive and easily utilize this, mainstream pop will be structurally analyzed, making similar songs easy to create. Anyone could try making a song that sounds like something they've heard before. However, this might diminish the sense of uniqueness in songs, potentially reducing the listener's act of seeking personal emotional expression within music. Whether this leads to music becoming more uniform or sparks the creation of even more distinctive expressions is an intriguing question.

──Speaking of the relationship between humans and technology, Hatsune Miku is based on the voice of voice actress Saki Fujita. The relationship between Hatsune Miku and Saki Fujita feels somewhat distant yet connected, a very intriguing dynamic. What are your thoughts?

Sasaki: The relationship between Ms. Fujita and Hatsune Miku is a very futuristic theme. A future where Hatsune Miku is influenced by further developed vocal synthesis technology will definitely come. Ms. Fujita isn't equal to Hatsune Miku, but when you process the sound joints handled by VOCALOID and gradually remove the digital noise components, Ms. Fujita's simple voice becomes increasingly apparent. Right now, the only ones who know that sensation are us, who are embedding the vocal material, and Mr. Fujita, who holds the image of the original voice.

But future technology might fundamentally reduce noise and errors. It might read the subtle fluctuations inherent in the original voice, connect them to expressions in other songs, and systematize them. The future where the structure of the human voice and song meets performance technology holds many possibilities. Furthermore, the widely known "Hatsune Miku" is "singing songs," whereas Fujita-san didn't strictly "sing" during recording... This difference feels like a real situation born from technology. As the concept of Hatsune Miku, Fujita-san absolutely lies at its root. While the full significance of this may not be fully appreciated yet, I think very few women have approached the essence and potential of technology in such a special way.

──I get the sense that what Saki Fujita is "thinking" or "feeling" right now is, in a way, incredibly futuristic.

Sasaki: I believe she is a unique individual who perceives the significance of Hatsune Miku's existence and VOCALOID technology from a special position and perspective. She will remain at the core of the increasingly intelligent synthesis mechanisms developing in the future. Even now, her voice and the vibrations of her throat are used to add detail to what would otherwise be inorganic synthetic technology. Even if the perspective and basis for evaluating her are ambiguous now, I believe the value of her voice will be properly recognized and appreciated in the future. I think her voice is the very foundation of Hatsune Miku and represents one of her greatest potentials.

Produced by Mr. Sasaki, Hatsune Miku's VOCALOID3 version released in 2013. Characterized by diverse expression, it includes five vocal libraries.

Digitalization also has drawbacks, such as narrowing musical diversity and creating information overload.

──What were your initial feelings when you first encountered digital technology and the internet?

Sasaki: As personal computers became widespread, while they were incredibly convenient, I felt that lighter data files—which had poorer sound quality than CDs—were increasingly favored. It felt like all music was being digitized and steadily sucked into the internet. While some music translates well to mp3, I felt it was difficult to convey the atmosphere of live performances, orchestral music, contemporary musical art, and other genres that rely heavily on the venue's ambiance—where recreating the sense of presence is challenging. I think different types of music have varying compatibility with the internet. In the digital world, music often gets ranked by simple numerical metrics like YouTube views. It feels like the history attached to music—other people's ratings, play counts—overwhelms the music itself.

The increase in information attached to music is inevitable, but it can be both a means to make music more accessible and a factor that distracts from focusing on the music itself. When the process of encountering music becomes controlled by SNS and the information attached to it, and when music evaluation leans more towards digital information, it leads to more music being consumed as "just a quick listen" triggered by small impulses, rather than experiences driven by personal motivation. The freedom to access and zap through music is wonderful. But I'm a little worried that the music that stays in our memories and leaves an impression might decrease, and that the diversity of our music listening experience might diminish.

──That diversity, for example, refers to the appeal of acoustic, live sound?

Sasaki: The sense of the creator's presence felt through the music, the atmosphere during performance... While this perception is ultimately rooted in a traditional way of experiencing music, it does seem like we're losing those subtle moods and nuances that were naturally present in "live" performances. I think this is partly due to technological advancements on the creator's side—specifically, the rapid expansion of environments where music can be completed by a small group or even a single person. What used to be an ensemble created by a sound engineer and several musicians can now be done entirely by one person. Even the bare minimum promotion of the music often falls to the creator themselves.

It's a somewhat ironic perspective, but no matter how strong your compositional or lyrical sensibilities are, having a good grasp of technology and being able to operate it skillfully becomes more important. Because it's convenient and can be done alone, the number of tasks you handle increases, the management aspects grow, and it becomes harder to focus on details that might be dispensable. In any case, while the barriers seem to have lowered, they also feel like they've risen.

──In a previous interview, you mentioned that "around 2005, the evolution of the internet significantly changed how music was consumed." Was that a major turning point?

Sasaki: I feel digital technology was incredibly beneficial for independent-minded people up to a certain point. In the early days when mobile phones became widespread, even while working day jobs, people could use their spare time to connect personally with others externally. This enabled them to create small music labels based on individual tastes.

However, once it became too easy and widespread, it sparked a boom. Similar labels proliferated, blurring the meaning of what a label represents. The internet also seems to have scaled this unit down to the individual level. Technology itself influences both how music is created and how it's consumed. And right now, technological advancement strongly favors those creating music.

──The internet's flattening effect seems to have rapidly erased the sense of historical perspective—not just in music, but across various genres—from a certain point onward.

Sasaki: For example, listening to popular music from 20 years ago in the late Showa era felt incredibly dated. But today, listening to music from 20 years ago in the 90s often doesn't feel that old. Musical experimentation has been cataloged, and instruments and expression have become systematized. While the details of the sound may change, the format remains similar. I think going forward, the amount of change in content will decrease relative to the passage of time.

A polarization is advancing between the older generation who see an extension of the 20th century and the new generation who don't.

──Some predict AI will threaten humanity or robots will take human jobs. What are your thoughts?

Sasaki: While manufacturers handling technology today likely have an awareness of enriching the world ahead, the result is that they're essentially forcing consumers into a spiral: compelling them to switch from products using old technology to those using new technology. This applies to TVs, PCs, and software alike. The balance between continuous development and promoting replacement is a serious problem.

For example, I'm currently working on an upgraded version of Miku, like many others. There's a balance between the ideal of providing a better creative tool and the corporate necessity of generating profits every few years. Incidentally, the justification for this upgrade is: "The voice quality has improved, so it won't get buried by other sounds, increasing the user's expressive possibilities."

Next might be something like "You can now semi-automatically select various human singing expressions, allowing you to experiment with diverse styles"? After that, finding compelling selling points might get tough, but I imagine some kind of functional improvement, increased versatility, or enhanced convenience will be used as the marketing pitch. At its core, vocal synthesis involves memorizing the shape of a human voice and restoring it realistically while altering length and pitch. I believe this concept, much like preservation technology for perishable goods like frozen foods, will continue to improve in quality. While this future is exciting, it might also mean any voice can be reproduced, potentially erasing the uniqueness of individual voices.

Technological advancement increases freedom of operation and editing power, but that technology itself will eventually lose its sci-fi novelty and become commonplace. Humans quickly grow accustomed to convenience. For better or worse, we've reached a point where those who design digital technology, systems, or the rules surrounding them are designing the world. So, rather than AI taking away jobs, I believe it's the efficient, commercial rules decided by someone somewhere that strip us of our jobs and our sense of possibility.

──As a result of networking and digitization, we've likely reached the limits of thinking things through as mere extensions of the past.

Sasaki: I think the future is polarizing between two generations separated by roughly a parent-child age gap: the older generation who believe the future lies along the extension of the rules and systems pushed by 20th-century economics, and the younger generation who believe there is no extension of the bubble era. I feel it's necessary to grasp and consider both perspectives, acknowledging this disconnect. It's emotionally difficult too.

──How do you think Hatsune Miku will evolve from here?

Sasaki: Well, regarding Hatsune Miku's evolution as a vocal synthesis software, the first step will likely be a straightforward evolution focused on quality improvement. However, I feel we need to carefully consider what makes "Hatsune Miku distinctive" across various aspects, as even noise can possess its own character. To put it extremely, we'll probably explore a dual approach: enhancing the core quality while also enabling the reproduction of diverse Hatsune Mikus, including those with noise or imperfect pronunciation.

The next version of "Hatsune Miku" thoroughly eliminates the unnatural elements that existed between VOCALOID and Saki Fujita's voice. This aims to reduce instances where listeners can't understand what's being said, depending on the lyrics and melody. The reason is to make creators' words easier for listeners to hear. However, some people might have preferred the ambiguous pronunciation we had before. Creators constantly wonder what fans are actually listening for in Hatsune Miku's voice.

For instance, though still speculative, creating a customizable or tailor-made Hatsune Miku for those who have specific ideas about her vocal expression could be one possibility. Five years ago, I experimented with expanding Hatsune Miku's voice through Append = additional content. Now, we have technology that allows for more precise modification of specific vocal parts or tendencies than was possible then.

For instance, I'd like to explore possibilities like a special mode where Hatsune Miku's voice becomes more mature as the pitch rises and more childlike as it lowers, creating an expressive effect; an unstable Hatsune Miku where the rise of specific overtones at the start of the voice softens under certain conditions; or a Hatsune Miku that can ambiguously morph vowels in several directions using morphing technology... However, I don't know the demand or the scale of the effect. I believe applying such technology requires tuning through various experiments, verification, and refinement based on user preferences and tastes. Couldn't we streamline the process and judgment of gradually generalizing this work by strategically using implementations for a broad audience alongside more limited experiments?

Therefore, what I want now is a wealth of ideas and the technology to test them, along with direct or indirect information exchange and communication with users who seek something concrete from Hatsune Miku. I want to understand the feelings of the users, the people who want to work with this.

Both now and in the future, the online world where Hatsune Miku exists feels like the other side of a well to me. I know many people are on that other side, but peering in just reveals its depth. It's like throwing a stone and hearing a splash (laughs). I peer into that other side every day. If I just remain indecisive in this situation, it won't lead to creative action. I need to think more practically from there and shift my perspective. Personally, I still lack imagination, diverse perspectives, and effort.

──Hatsune Miku is popular even among teens. What are your thoughts about future children?

Sasaki: Before the internet, the information we encountered daily was divided into two types: local gossip among residents and corporate-driven information from ads, magazines, and TV. Now, whether it's Twitter, Naver Matome, or Wikipedia, the lines between professional and amateur, for-profit and non-profit, are blurring. Hatsune Miku's songs are often sung by someone somewhere, but some are also professionally marketed. This coexistence feels new to me.

I think it's a tough era for kids, where it's hard to find clear benchmarks. But I believe the ability to create systems suited to the internet age lies in the sensibilities of the digital native generation and beyond. I hope they'll use the internet skillfully—or block it when needed—and wield knowledge and reason as weapons to forge new values for this digital age. Within that, I hope we see more humorous technologies like Hatsune Miku, things everyone can play around with.

＊Regarding the "future evolution of Hatsune Miku" mentioned during the interview, Crypton's new Hatsune Miku teaser page was just updated. Mr. Sasaki's vision of "pursuing Hatsune Miku's essence, ideas, and communication with users within new technologies" seems to be unfolding right now.
http://www.crypton.co.jp/mikuv4x

Designing the Digital Shapes the World—The Creator of Hatsune Miku Speaks on the "Future" ~ Mr. Wataru Sasaki, Crypton Future Media

Hatsune Miku sings the raw, relatable truths of the younger generation, including their sense of stagnation toward the "future"

The internet changed the traditional creative format that aimed to convey overwhelming things

Is what voice actress Saki Fujita feels the cutting edge of the future?

Digitalization also has drawbacks, such as narrowing musical diversity and creating information overload.

A polarization is advancing between the older generation who see an extension of the 20th century and the new generation who don't.

Back Numbers

Creating 21st-Century Happiness with Big Data ~ Kazuo Yano, Chief Engineer, Research & Development Group, Hitachi, Ltd.

When artificial intelligence surpasses humanity, what will happen? ~ Hiroshi Yamakawa, Director of Dwango AI Research Institute

Author

Wataru Sasaki

Also read

Designing the Digital Shapes the World—The Creator of Hatsune Miku Speaks on the "Future" ~ Mr. Wataru Sasaki, Crypton Future Media

Hatsune Miku sings the raw, relatable truths of the younger generation, including their sense of stagnation toward the "future"

The internet changed the traditional creative format that aimed to convey overwhelming things

Is what voice actress Saki Fujita feels the cutting edge of the future?

Digitalization also has drawbacks, such as narrowing musical diversity and creating information overload.

A polarization is advancing between the older generation who see an extension of the 20th century and the new generation who don't.

Back Numbers

Creating 21st-Century Happiness with Big Data ~ Kazuo Yano, Chief Engineer, Research &amp; Development Group, Hitachi, Ltd.

When artificial intelligence surpasses humanity, what will happen? ~ Hiroshi Yamakawa, Director of Dwango AI Research Institute

Author

Wataru Sasaki

Also read

Creating 21st-Century Happiness with Big Data ~ Kazuo Yano, Chief Engineer, Research & Development Group, Hitachi, Ltd.