How is Yang Hongxia aiming to transform LLM development in Hong Kong?

Exclusive | PolyU’s top AI scientist Yang Hongxia seeks to revolutionise LLM development in Hong Kong

Mainland Chinese artificial intelligence (AI) scientist Yang Hongxia, the newly named professor at Hong Kong Polytechnic University (PolyU), is on a mission to revolutionise the development of large language models (LLMs) – the technology underpinning generative AI services like ChatGPT – from her base in the city.

In an interview with the South China Morning Post, Yang – who previously worked on AI models at ByteDance and Alibaba Group Holding’s Damo Academy – said she wants to change the existing resource-intensive process of training LLMs into a decentralised, machine-learning paradigm. Alibaba owns the South China Morning Post.

“The rapid advances in generative AI, spearheaded by OpenAI’s GPT series, have unlocked immense possibilities,” she said. “Yet this progress comes with significant disparities in resource allocation.”

What is the current state of LLM development?

At present, Yang said LLM development has mostly relied on deploying advanced and expensive graphics processing units (GPUs), from the likes of Nvidia and Advanced Micro Devices, in data centres for projects involving vast amounts of raw data, which has put deep-pocketed Big Tech companies and well-funded start-ups at a major advantage.

The entrance to the Hung Hom campus of Hong Kong Polytechnic University, where artificial intelligence scientist Yang Hongxia serves as a professor at the Department of Computing. Photo: Sun Yeung

What is Yang’s proposed approach?

Yang said she and her colleagues propose a “model-over-models” approach to LLM development. That calls for a decentralised paradigm in which developers train smaller models across thousands of specific domains, including code generation, advanced data analysis and specialised AI agents.

These smaller models would then evolve into a large and comprehensive LLM, also known as a foundation model. Yang pointed out that this approach could reduce the computational demands at each stage of LLM development.

Domain-specific models that are typically capped at 13 billion parameters – a machine-learning term for variables present in an AI system during training, which helps establish how data prompts yield the desired output – can deliver performance that is on par or exceeds OpenAI’s latest GPT-4 models, while using far fewer GPUs from around 64 to 128 cards.

That paradigm can make LLM development more accessible to university labs and small firms, according to Yang. An evolutionary algorithm then evolves over these domain-specific models to eventually build a comprehensive foundation model, she said.

What does success look like for Hong Kong?

Successfully initiating such LLM development in Hong Kong would count as a big win for the city, as it looks to turn into an innovation and technology hub.

Yang Hongxia, a leading artificial intelligence scientist, previously worked on AI models at TikTok-owner ByteDance in the United States and Alibaba Group Holding’s research arm Damo Academy. Photo: PolyU

Hong Kong’s dynamic atmosphere, as well as its access to AI talent and resources, make the city an ideal place to conduct research into this new development paradigm, Yang said. She added that PolyU president Teng Jin-guang shares this vision.

According to Yang, her team has already verified that small AI models, once put together, can outperform the most advanced LLMs in specific domains.

“There is also a growing consensus in the industry that with high-quality, domain-specific data and continuous pretraining, surpassing GPT-4/4V is highly achievable,” she said. Then multimodal GPT-4/4V analyses image inputs provided by a user, and is the latest capability OpenAI has made broadly available.

Yang said the next step is to build a more inclusive infrastructure platform to attract more talent into the AI community, so that some releases can be made by the end of this year or early next year.

“In the future, while a few cloud-based large models will dominate, small models across various domains will also flourish,” she said.

Yang, who received her PhD from Duke University in North Carolina, has published more than 100 papers in top-tier conferences and journals, and holds more than 50 patents in the US and mainland China. She played a key role in developing Alibaba’s 10-trillion-parameter M6 multimodal AI model.