Open source AI platforms gain attention as AI voice market grows

2024. 9. 26. 11:42
글자크기 설정 파란원을 좌우로 움직이시면 글자크기가 변경 됩니다.

이 글자크기로 변경됩니다.

(예시) 가장 빠른 뉴스가 있고 다양한 정보, 쌍방향 소통이 숨쉬는 다음뉴스를 만나보세요. 다음뉴스는 국내외 주요이슈와 실시간 속보, 문화생활 및 다양한 분야의 뉴스를 입체적으로 전달하고 있습니다.

(Yonhap)
Open source artificial intelligence (AI) platforms that distribute software for free are expanding their presence, with OpenAI responding by updating its voice AI features.

Voice AI is not a highly regarded market like large language models (LLMs) yet. But it is considered essential for the upcoming multimodal era, where various AIs, including text, images, and voice, are integrated.

According to multiple sources from the information technology (IT) industry, Kyutai, a non-profit AI research lab based in France, recently unveiled its self-developed voice AI model. Moshi is available in a free version, along with its code. The model is based on a language model called Helium, which has 7 billion parameters and is akin to human brain synapses.

It can even be used without an internet connection, allowing it to be stored and used on smartphones or tablets, in contrast with OpenAI’s voice AI, which is cloud-based. Moshi’s voice generation time is only 0.2 seconds, faster than OpenAI’s GPT-4, which takes 0.23 to 0.32 seconds.

Kyutai Chief Executive Officer Patrick Perez emphasized in a recent interview with Maeil Business Newspaper that his company will make AI easily accessible for everyone, while noting that research on Moshi and other multimodal foundation models will continue.

Kyutai is currently viewed as the French counterpart to OpenAI. It was co-founded in November 2023 by the iliad Group, CMA CGM Group, and Schmidt Futures, led by former Google CEO Eric Schmidt, with a total investment of 300 million euros. A core team of eight developed voice AI that rivals OpenAI’s, capable of very natural conversations and available for online trials, within six months.

Other companies have also released voice AI as open source, with notable examples including Meta, Coqui, Mozilla, and Kaldi.

Meta earlier unveiled MMS, capable of recognizing and generating over 4,000 languages. A significant advantage of MMS is its ability to learn from data without needing labeled training tags. For their parts, Mozilla’s DeepSpeech has improved GPU efficiency and Coqui has launched fast real-time voice recognition and text-to-speech conversion.

Both DeepSpeech and Coqui are open source, and the rationale for distributing AI in this format is to gain a first-mover advantage. Unlike closed models like OpenAI’s GPT or Anthropic’s Claude, open source allows anyone to access and use the source code for free. This increases technological accessibility for a broader use base and helps avoid dependence on certain closed models. Developing companies can build ecosystems around open source, encouraging many developers to adopt the technology and lead in standardizing it.

“The AI market is not solely driven by closed models like OpenAI or Anthropic,” an industry insider said. “Open source models are also demonstrating sufficiently good performance.”

The closed sector is also actively developing voice AI. OpenAI recently launching an updated voice mode for ChatGPT that improves usage in 50 languages, including Korean and Japanese, and is currently available to paid users in Korea.

OpenAI’s voice mode allows for adjustment of AI speech speed and can recognize the speaker’s emotions, with the company refining the Korean voice output to sound more natural and support nine different voice versions. Google also unveiled its AI voice assistant, Gemini, in August 2024. The assistant has been optimized for mobile environments, offering ten voices to choose from regarding tone and style.

According to market research firm Mordor Intelligence, the voice recognition market is projected to grow to $42.08 billion in 2029 from $14.95 billion in 2024. With the advancement of AI, it is expected to be widely adopted across various sectors, including smart homes and IoT, customer service and call centers, healthcare, automotive and navigation, educational tools, gaming and entertainment, banking and finance, legal and administrative services, accessibility support, and translation services.

Copyright © 매일경제 & mk.co.kr. 무단 전재, 재배포 및 AI학습 이용 금지

이 기사에 대해 어떻게 생각하시나요?