Naver aims to advance CLOVA X into multimodal AI

2024. 3. 4. 09:42
Hyperscale AI Technology at Naver Cloud‘s Head Nako Sung. [Photo by Lee Seung-hwan]
South Korean internet giant Naver Corp. has set the goal of advancing its interactive artificial intelligence (AI) agent CLOVA X into a multimodal entity that is capable of generating not only text but also voice and image coding.

“We’re focusing on expanding the capabilities (of CLOVA X) to process multiple modalities,” Nako Sung, head of Hyperscale AI Technology at Naver Cloud, told Maeil Business Newspaper last week. “We are updating HyperCLOVA X, the hyperscale AI that underpins CLOVA X.”

A multimodal AI refers to a service capable of processing information from various sources, including text, images, videos, and voice.

Major services include OpenAI‘s ChatGPT and Google LLC’s Gemini.

Currently, CLOVA X can edit images but lacks the ability to generate them.

Naver Cloud has announced plans to progressively integrate image and audio AI into CLOVA X, significantly enhancing its coding capabilities.

Citing the anticipated launch of GPT-5 by OpenAI, Sung projected a widening technology gap within the industry, emphasizing the necessity to “enhance inference performance while reducing costs” to ensure survival.

Parameters in the context of AI are akin to synapses in the human brain. With OpenAI’s GPT-3 boasting 175 billion parameters, inference costs are substantial, posing profitability challenges.

Against this backdrop, HyperCLOVA X is optimizing itself for Korean, English, and Japanese.

“From a big tech standpoint, Korea is a relatively small market to target,” Sung said. “On the other hand, HyperCLOVA X‘s extensive Korean-language dataset has resulted in superior performance.”

[Courtesy of Naver Corp.]
Naver Cloud is currently developing neural processing units (NPUs), essential for AI inference.

Sung underscored the significance of NPUs as inference chips in the cloud, emphasizing the imperative of developing AI chips to reduce inference costs and achieve profitability in the cloud business.

Looking ahead, Sung envisions AI evolving in a more personalized and localized manner because he believes that “each country or region has its own values.”

He highlighted HyperCLOVA X’s ability to tailor responses to locally sensitive issues.

“For instance, global big tech AI agents are likely to provide a generic response to territorial disputes, while HyperCLOVA X‘s response would be tailored to local sensitivities.”

