廣告
xx
xx
回到網頁上方

Taiwan to release sovereign AI language dataset within month

Reporter TVBS News Staff
Release time:2025/06/18 17:00
  • S

  • M

  • L

Taiwan to release AI language data soon (Shutterstock) Taiwan to release sovereign AI language dataset within month
Taiwan to release AI language data soon (Shutterstock)

TAIPEI (TVBS News) — Minister of Digital Affairs Huang Yen-nan (黃彥男) underscored the critical role of comprehensive data collection in Taiwan's push to develop its own sovereign artificial intelligence capabilities during a session of the Legislative Yuan (立法院), Taiwan's parliament, on Wednesday (June 18). In his address to legislators, Huang explained that Taiwan's independently developed AI corpus — a vast collection of text and data reflecting the island's linguistic and cultural characteristics — would serve as a cornerstone for upcoming AI fundamental laws, with particular emphasis on establishing robust frameworks for data governance and oversight.

The Ministry of Digital Affairs (MODA, 數位發展部), Taiwan's government agency established in 2022 to oversee digital transformation and technological development, plans to release the initial phase of its comprehensive language dataset within the next two to three months, creating a pathway for both government entities and private sector companies to request access to this valuable resource. According to ministry officials, Taiwan has been engaged in a coordinated effort across multiple government departments since early June to catalog existing language resources and evaluate their potential incorporation into the AI training corpus, a process designed to ensure the dataset accurately represents Taiwan's linguistic landscape while meeting technical requirements for machine learning applications.

 

In a significant move toward linguistic inclusivity, officials confirmed that the AI corpus will incorporate both Hakka and Indigenous languages, reflecting Taiwan's multicultural heritage and linguistic diversity beyond Mandarin Chinese. The discussion took place during a special session of the Legislative Yuan's Transportation Committee (立法院交通委員會), which oversees digital affairs in Taiwan's parliamentary structure, where Minister Huang delivered a presentation on the government's AI industry development strategy. During the subsequent questioning period, Democratic Progressive Party (DPP, 民進黨), Taiwan's ruling party, Legislator Hsu Fu-kueil (徐富癸) raised concerns about potential algorithmic biases that might emerge from an overrepresentation of formal government documents in the corpus, specifically questioning whether sufficient attention was being given to incorporating Taiwan's minority languages like Hakka and Indigenous tongues.

Addressing these diversity concerns, Director Chuang Ming-fen (莊明芬) of the Department of Data Innovation (創新司), a specialized division within MODA focused on data applications, provided reassurance that the language database would extend far beyond administrative documents to encompass a rich tapestry of cultural narratives, historical records, and geographical information representative of Taiwan's diverse society. Chuang outlined that her department has already formulated a comprehensive corpus action plan and initiated preliminary AI model training and technical infrastructure development to support the ambitious project. She emphasized that the immediate priority remains expanding and enriching the language dataset to ensure its comprehensiveness, reiterating the ministry's commitment to releasing the first phase of this linguistic resource within the previously announced two-to-three-month timeframe.

The forthcoming release of Taiwan's indigenous language dataset represents a significant milestone in the island's technological advancement strategy, potentially enhancing the nation's artificial intelligence capabilities while positioning Taiwan among a select group of countries developing sovereign AI systems with localized language understanding. Beyond the technical achievements, the initiative signals Taiwan's broader commitment to enshrining principles of cultural diversity and linguistic inclusivity within its technological infrastructure, an approach that officials hope will produce AI systems more attuned to Taiwan's unique societal context and better equipped to serve its multicultural population of 23 million people. ◼

Taiwan Affairs

#sovereign AI# data governance# AI fundamental laws# language data# AI training corpus# Hakka languages# Indigenous languages# Taiwan AI development# diversity in AI# inclusivity in AI

readmore

notification icon
感謝您訂閱TVBS,跟上最HOT話題,掌握新聞脈動!

0.1293

0.0573

0.1866