Large language models aren't effective for many languages. Scholars explain what's at stake for the approximately 5 billion people who don't speak English by Stanford Institute for Human-Centered Artificial Intelligence (HAI) 👨⚖️Five billion people worldwide risk systematic exclusion from the AI revolution due to linguistic barriers in large language models. Current AI systems excel for English speakers but fail dramatically for languages like Vietnamese and Nahuatl, creating a new form of digital stratification that transcends traditional economic divides. 💡The problem extends beyond translation—these models embed cultural assumptions that distort local contexts while potentially accelerating the homogenization of human knowledge systems. This creates a recursive exclusion where marginalized communities lose both immediate technological benefits and long-term influence over systems that will shape future social and economic structures. 🚨The technical challenge reflects deeper questions about who controls AI development and whose worldviews become encoded in these increasingly powerful systems. Policymakers face a critical window to intervene before technological path dependence locks in current inequities. 🧭Is key to establish international standards for multilingual AI benchmarking, create incentive structures rewarding genuine linguistic inclusivity over superficial localization, and develop governance frameworks treating language data as cultural infrastructure deserving protective policies—recognizing that linguistic diversity in AI isn't just about fairness but about preserving the cognitive and cultural resources humanity needs for complex global problem-solving. https://lnkd.in/etzw3HpB
Language devaluation by AI systems
Explore top LinkedIn content from expert professionals.
Summary
Language devaluation by AI systems refers to the way artificial intelligence tools can weaken the richness, individuality, and cultural specificity of human language by prioritizing standardized, machine-friendly patterns. This phenomenon risks marginalizing unique voices, non-English speakers, and diverse communication styles, ultimately threatening creativity, authenticity, and social inclusion.
- Champion authentic voice: Make a conscious effort to preserve your personal style and quirks when using AI writing tools so that your content reflects genuine individuality rather than sounding generic.
- Advocate for inclusivity: Encourage developers and organizations to build AI systems that respect and understand the diversity of languages and regional expressions, instead of flattening or penalizing distinctive cultural communication.
- Question uniformity: Stay alert to the risks of homogenized language in digital spaces and promote practices that value original expression and literary nuance, especially in education and professional settings.
-
-
The headline that caught my eye this week was "The Great Language Flattening." Here's my take: Anyone who has struggled with the English of Shakespeare is aware that language evolves and adapts to time and context. What is new is that we are all living through what the first real feedback loop between human and artificial linguistic evolution. It is fascinating that research participants unconsciously adopted AI writing patterns, tripling their word count after exposure to ChatGPT's verbose style. That's the opposite of historical patterns, where linguistic efficiency has been adaptive — saying more with less has been valuable because both creating and processing language required significant cognitive resources. Simon Kirby of the University of Edinburgh suggests a potential future in which a sparse communication from one person is transformed into lengthy text by AI, then sent to another person, who then uses AI to summarize the "TL/DR" nature of the communication. So the end product is about the same length as the initial prompt, but intermediated on both ends by AI. The artisanal countermovement some linguists predict — where human idiosyncrasies become markers of authenticity — suggests we may be entering an era where the imperfect becomes precious precisely because it isn't replicable by machines. https://lnkd.in/eFK3mrNQ
-
There’s something eerie about the way we write now. It’s too clear… it’s too tidy. It all sounds the same. Welcome to the age of autocorrected expression – Powered By AI. We’re not just using ChatGPT to fix our grammar. We’re starting to let it fix us. In doing so, we might be losing something deeper than a typo. Let me be clear: AI is a gift. For people who struggle with language, neurodivergent thinkers or anyone frozen by a blank page… this is a game-changer (and I HATE using that phrase). It unlocks access, speed and fluency. That’s not just powerful… that’s progress. But for average writers, something else is happening. These tools don’t amplify your voice... it actually begins to average it. Like a calculator for language: you input your prompt and out comes something accurate, efficient and beige. It’s why so much content today feels like a LinkedIn post and a group-edited Wikipedia entry. Polished… but bloodless. There’s a word for this: convergence. Researchers have started to track how AI-trained text converges our language… standardizing vocabulary, tone, even sentence structure. The result (and I’ll bet you already know where this is going)? A homogenized, corporate-y cadence that’s everywhere and from nowhere… it has no real soul (I'll leave a link to an article from The Verge in the comments). Writing isn’t just about saying something “correctly.” It’s about saying something humanly. When we outsource our voice to a system that was trained to sound like everyone… we start sounding like no one. There’s real risk here. Especially for younger generations. Writing used to be how we found our voice. Writing used to be how we made meaning out of what we read, saw… experienced. It wasn’t ever about what you said… it was always about how you said it. Now, it might be how we lose it… if we’re not careful. If you’ve ever received a heartfelt message from someone… a handwritten note, a clunky-but-sincere email… you know what I’m talking about. It wasn’t perfect… it was personal… it was personable. AI doesn’t struggle… it doesn’t hesitate… it doesn’t reveal itself. But that struggle with the words? That struggle is the signal. Now, we confuse clarity with trust. But sometimes the mess is the message. Let’s not mistake utility for intimacy. A well-written email is nice… a real voice is unforgettable. So here’s the uncomfortable question: If your words weren’t yours… would anyone know the difference? And if the answer is no… what happens to connection? To creativity? What happens when sounding smart replaces sounding like you? In the future, maybe authenticity becomes a premium again. Like vinyl… like film… like a handwritten postcard in a mailbox full of bills. AI will keep getting better. The results will sound more like you, me… anyone. But the most valuable thing in your writing won’t be its polish. It’ll be the part that couldn’t have been written by anyone (or anything) else. Because it came from you... uniquely you.
-
𝗧𝗵𝗲𝗿𝗲’𝘀 𝗮 𝗿𝗮𝗻𝘁 𝗜’𝘃𝗲 𝗯𝗲𝗲𝗻 𝗵𝗼𝗹𝗱𝗶𝗻𝗴 𝗶𝗻 — 𝗮𝗻𝗱 𝘁𝗼𝗱𝗮𝘆, 𝗜’𝗺 𝗹𝗲𝘁𝘁𝗶𝗻𝗴 𝗶𝘁 𝗼𝘂𝘁. Let’s talk about 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗯𝗶𝗮𝘀 in AI detection tools. Because yes, it’s a thing. And no, we’re not talking about it enough. Earlier this year (and yep, I brought it up at Search Africon 😉), I noticed a trend: Everyone suddenly became an expert in “spotting AI writing.” There was even a viral tweet listing out “AI giveaway words” like... delve (??). Apparently, if your writing includes the word “delve,” ChatGPT wrote it. Wild. But it got deeper when I saw a Nigerian student write something herself — only for an AI detector to flag it as AI-generated. She didn’t use ChatGPT. She didn’t even reference AI. She just wrote how she naturally writes — and got penalised for it. This, my friends, is where language bias comes in. I had a conversation with the Semrush team at brightonSEO about their new AI detection feature (which, by the way, is super clever). But I had to ask: “How will it account for cultural and linguistic context?” Because here’s what many people miss: 🇳🇬 For Nigerians, English is often a second language. But it's taught formally — with dictionaries, textbooks, and classic shows. So when we speak casually, it might still sound formal. It’s just... how we learned it. Need an example? Please watch the iconic interview with Patrick Obahiagbon (attached). Formal English? ✔️ Dictionary mode? ✔️ Absolute meme material in the Nigerian and Black British community? ✔️✔️✔️ But this is more than memes. This is about how AI tools can unintentionally discriminate — especially against non-native English speakers. Because what’s being flagged as “AI tone” is often just... our tone. So here’s my plea to the industry: Let’s build AI tools that understand nuance. Let’s challenge the idea that formal = fake. And let’s not reduce our cultural communication styles to “red flags.” Because what sounds “too polished” in one context is someone else's everyday vocabulary. In Nigeria, even your uncle's WhatsApp message might sound like it’s been proofread by a barrister. (It’s giving “𝘪𝘯 𝘭𝘪𝘨𝘩𝘵 𝘰𝘧 𝘵𝘩𝘦 𝘧𝘰𝘳𝘦𝘨𝘰𝘪𝘯𝘨, 𝘬𝘪𝘯𝘥𝘭𝘺 𝘳𝘦𝘷𝘦𝘳𝘵”😂) Here’s a thought: Could AI tools be trained to recognise regional language patterns or IP-based context? Not for surveillance — but for understanding. Because not everyone writes like they tweet. And not every “delve” is AI. Open to hearing more thoughts on this — especially from the AI, SEO and ED&I communities. Let’s get into it 👇🏾
-
At a loss for words 'I recalled Zadie Smith’s essay “Fail Better,” in which she tries to arrive at a definition of great literature. She writes that an author’s literary style is about conveying “the only possible expression of a particular human consciousness.” Literary success, then, “depends not only on the refinement of words on a page, but in the refinement of a consciousness.”' 'For me, as for Smith, writing is an attempt to clarify what the world is like from where I stand in it.' 'That definition of writing couldn’t be more different from the way AI produces language: by sucking up billions of words from the internet and spitting out an imitation. Nothing about that process reflects an attempt at articulating an individual perspective.' 'If AI constitutes a dramatic technical leap—and I believe it does—then, judging from history, it will also constitute a dramatic leap in corporate capture of human existence. Big Tech has already transmuted some of the most ancient pillars of human relationships—friendship, community, influence—for its own profit. Now it’s coming after language itself.' 'What about the cost to literature when all that humans have put on the internet gets vacuumed up repurposed in Big Tech’s image? To start, an AI-dominated literature would reflect the values, biases, and writing styles embedded in the most powerful AI models. Over time, it would all start to sound alike. Some research even suggests that if later AI models are trained using AI-produced text—which would be hard to avoid—the sameness of the material could trigger a scenario called model collapse, in which AI loses its grasp on how real human language functions and is no longer able to form coherent sentences. One wonders whether, at that point, humans will still have the ability themselves.' Excellent article by Vauhini Vara in WIRED https://lnkd.in/dj-2S2hP
-
AI models are at risk of degrading in quality as they increasingly train on AI-generated data, leading to what researchers call "model collapse." New research published in Nature reveals a concerning trend in AI development: as AI models train on data generated by other AI, their output quality diminishes. This degradation, likened to taking photos of photos, threatens the reliability and effectiveness of large language models. The study highlights the importance of using high-quality, diverse training data and raises questions about the future of AI if the current trajectory continues unchecked. 🖥️ Deteriorating Quality with AI Data: Research indicates that AI models progressively degrade in output quality when trained on content generated by preceding AI models, a cycle that exacerbates each generation. 📉 The phenomenon of Model Collapse: Described as the process where AI output becomes increasingly nonsensical and incoherent, "model collapse" mirrors the loss seen in repeatedly copied images. 🌐 Critical Role of Data Quality: High-quality, diverse, and human-generated data is essential to maintaining the integrity and effectiveness of AI models and preventing the degradation observed with synthetic data reliance. 🧪 Mitigating Degradation Strategies: Implementing measures such as allowing models to access a portion of the original, high-quality dataset has been shown to reduce some of the adverse effects of training on AI-generated data. 🔍 Importance of Data Provenance: Establishing robust methods to track the origin and nature of training data (data provenance) is crucial for ensuring that AI systems train on reliable and representative samples, which is vital for their accuracy and utility. #AI #ArtificialIntelligence #ModelCollapse #DataQuality #AIResearch #NatureStudy #TechTrends #MachineLearning #DataProvenance #FutureOfAI