Global Push Builds AI Models Tuned to Local Languages and Cultures

English’s Overwhelming Grip on AI Development (Image Credits: Unsplash)

Artificial intelligence has transformed industries worldwide, yet it largely overlooks the linguistic and cultural diversity of most people on the planet. Developers like Egyptian coder Assem Sabry are stepping in to change that by creating tailored models such as Horus, inspired by the ancient god of the sky. These efforts highlight a growing movement to make AI accessible and relevant for non-English speakers, who form the global majority.

English’s Overwhelming Grip on AI Development

AI language models perform exceptionally in English, with lesser proficiency in Chinese, but they struggle significantly with most other tongues. Training processes rely on vast web scrapes dominated by English content, while commercial priorities favor markets with the largest immediate returns. This imbalance leaves speakers of so-called minority languages – actually the world’s majority – underserved.

Researchers identified this issue clearly in a 2023 study from the Center for Democracy & Technology, which described non-English languages as “Lost in Translation” due to data smoothing and business incentives. Big Tech firms rushed to deploy AI with strong English support, sidelining others amid scarce training data. For years, the high costs of model training deterred investment in smaller language groups lacking obvious profitability.

Grassroots Innovators Take the Lead

Assem Sabry grew frustrated with the absence of culturally resonant AI in Egypt, where no significant industry existed. He developed Horus to reduce dependence on American or Chinese models and explore an Egyptian alternative. Using GPUs from Google Colab and open-source datasets, Sabry released the model in early April, achieving over 800 downloads on Hugging Face within its first week.

Sabry noted that two years earlier, building such models from scratch seemed impossible because AI quality lagged and open-source large language models were scarce. Now, affordable tools enable developers to fine-tune for specific needs. A fine-tuned version of Meta’s Llama 3.2, trained on 14,500 Indian legal-language pairs, has garnered more than 1,000 downloads since early April, proving demand in niche areas.

Institutional Projects Scale Up the Effort

Beyond individual efforts, collaborations backed by universities and governments are producing robust alternatives. Switzerland’s Apertus, developed by two universities and the Swiss National Supercomputing Center, utilized over 10 million GPU hours – worth tens of millions in commercial terms – to create a fully open, multilingual model. Latin America’s Latam-GPT targets regional needs, while Nigeria’s N-ATLaS addresses local contexts.

Other initiatives include Indonesia’s Sahabat-AI, AI Singapore’s SEA-LION, Vietnam’s GreenMind, Thailand’s OpenThaiGPT, and Europe’s Teuken 7B. These models challenge the hegemony of offerings from OpenAI, Anthropic, and Alibaba. The list demonstrates a patchwork of solutions emerging globally:

Apertus: Swiss multilingual model with massive compute support.
Latam-GPT: First AI designed for Latin America and the Caribbean.
N-ATLaS: Nigerian-focused language system.
SEA-LION: Southeast Asian variant from AI Singapore.
OpenThaiGPT: Thai-specific open-source collection.

Barriers Linger, But Momentum Builds

Progress accelerates thanks to open-source advancements and big AI firms imposing token limits, creating room for specialized players. Still, researcher Aliya Bhatia points out persistent hurdles in compute resources, infrastructure, and funding that block broader adoption. These factors have long reinforced English dominance, though recent shifts offer hope.

Bhatia argues that grassroots and institutional models prove it’s feasible to serve global majority users and languages, provided major companies learn from them. Developers now train and deploy at lower costs, altering the economic equation. Early adoption signals a viable market for culturally attuned AI beyond mainstream applications.

Key Takeaways

AI’s English bias stems from web-scraped training data and commercial focus, marginalizing most languages.

Open-source tools and affordable compute empower grassroots models like Horus and fine-tuned Llamas.

Global projects from Switzerland to Southeast Asia show scalable paths to inclusive AI.

This wave of localized AI models promises a more equitable technology landscape, where innovation reflects humanity’s full diversity. As these efforts gain traction, they pressure industry giants to diversify. What steps should major AI developers take next? Share your thoughts in the comments.

English’s Overwhelming Grip on AI Development

Grassroots Innovators Take the Lead

Institutional Projects Scale Up the Effort

Barriers Linger, But Momentum Builds

Leave a Comment Cancel reply

States

Financial Habits: How Your Zodiac Sign Spends (and Saves) Money

Blog

The Real Cause of the Dust Bowl—And Why It Could Happen Again

Blog

These 3 Zodiac Signs Will Soon See Proof That Their Intuition Was Right

Blog

The Devastating Truth: Top 10 Countries with the Highest Suicide Rates

Blog

U.S. Passport Changes Impacting Cruise Travel — Here’s How

Blog

Which ‘Midwest Nice’ Trait Does Your Sign Embody?

Global Push Builds AI Models Tuned to Local Languages and Cultures

CREDITS: Wikimedia CC BY-SA 3.0

English’s Overwhelming Grip on AI Development

Grassroots Innovators Take the Lead

Institutional Projects Scale Up the Effort

Barriers Linger, But Momentum Builds

Leave a Comment Cancel reply

most recent

States

Financial Habits: How Your Zodiac Sign Spends (and Saves) Money

Blog

The Real Cause of the Dust Bowl—And Why It Could Happen Again

Blog

These 3 Zodiac Signs Will Soon See Proof That Their Intuition Was Right

Blog

The Devastating Truth: Top 10 Countries with the Highest Suicide Rates

Blog

U.S. Passport Changes Impacting Cruise Travel — Here’s How

Blog

Which ‘Midwest Nice’ Trait Does Your Sign Embody?