
English’s Overwhelming Grip on AI Development (Image Credits: Unsplash)
Artificial intelligence has transformed industries worldwide, yet it largely overlooks the linguistic and cultural diversity of most people on the planet. Developers like Egyptian coder Assem Sabry are stepping in to change that by creating tailored models such as Horus, inspired by the ancient god of the sky. These efforts highlight a growing movement to make AI accessible and relevant for non-English speakers, who form the global majority.
English’s Overwhelming Grip on AI Development
AI language models perform exceptionally in English, with lesser proficiency in Chinese, but they struggle significantly with most other tongues. Training processes rely on vast web scrapes dominated by English content, while commercial priorities favor markets with the largest immediate returns. This imbalance leaves speakers of so-called minority languages – actually the world’s majority – underserved.
Researchers identified this issue clearly in a 2023 study from the Center for Democracy & Technology, which described non-English languages as “Lost in Translation” due to data smoothing and business incentives. Big Tech firms rushed to deploy AI with strong English support, sidelining others amid scarce training data. For years, the high costs of model training deterred investment in smaller language groups lacking obvious profitability.
Grassroots Innovators Take the Lead
Assem Sabry grew frustrated with the absence of culturally resonant AI in Egypt, where no significant industry existed. He developed Horus to reduce dependence on American or Chinese models and explore an Egyptian alternative. Using GPUs from Google Colab and open-source datasets, Sabry released the model in early April, achieving over 800 downloads on Hugging Face within its first week.
Sabry noted that two years earlier, building such models from scratch seemed impossible because AI quality lagged and open-source large language models were scarce. Now, affordable tools enable developers to fine-tune for specific needs. A fine-tuned version of Meta’s Llama 3.2, trained on 14,500 Indian legal-language pairs, has garnered more than 1,000 downloads since early April, proving demand in niche areas.
Institutional Projects Scale Up the Effort
Beyond individual efforts, collaborations backed by universities and governments are producing robust alternatives. Switzerland’s Apertus, developed by two universities and the Swiss National Supercomputing Center, utilized over 10 million GPU hours – worth tens of millions in commercial terms – to create a fully open, multilingual model. Latin America’s Latam-GPT targets regional needs, while Nigeria’s N-ATLaS addresses local contexts.
Other initiatives include Indonesia’s Sahabat-AI, AI Singapore’s SEA-LION, Vietnam’s GreenMind, Thailand’s OpenThaiGPT, and Europe’s Teuken 7B. These models challenge the hegemony of offerings from OpenAI, Anthropic, and Alibaba. The list demonstrates a patchwork of solutions emerging globally:
- Apertus: Swiss multilingual model with massive compute support.
- Latam-GPT: First AI designed for Latin America and the Caribbean.
- N-ATLaS: Nigerian-focused language system.
- SEA-LION: Southeast Asian variant from AI Singapore.
- OpenThaiGPT: Thai-specific open-source collection.
Barriers Linger, But Momentum Builds
Progress accelerates thanks to open-source advancements and big AI firms imposing token limits, creating room for specialized players. Still, researcher Aliya Bhatia points out persistent hurdles in compute resources, infrastructure, and funding that block broader adoption. These factors have long reinforced English dominance, though recent shifts offer hope.
Bhatia argues that grassroots and institutional models prove it’s feasible to serve global majority users and languages, provided major companies learn from them. Developers now train and deploy at lower costs, altering the economic equation. Early adoption signals a viable market for culturally attuned AI beyond mainstream applications.
Key Takeaways
- AI’s English bias stems from web-scraped training data and commercial focus, marginalizing most languages.
- Open-source tools and affordable compute empower grassroots models like Horus and fine-tuned Llamas.
- Global projects from Switzerland to Southeast Asia show scalable paths to inclusive AI.
This wave of localized AI models promises a more equitable technology landscape, where innovation reflects humanity’s full diversity. As these efforts gain traction, they pressure industry giants to diversify. What steps should major AI developers take next? Share your thoughts in the comments.






