Reconstructing Linguistic Equity in Multimodal AI Systems.

“Through the Binary Lens” brings together articles where code meets culture. By weaving technical and human perspectives, this series uncovers how software, language, and global expansion influence one another in ways both seen and unseen.

Linguistic equity involves ensuring that every language, each with its unique structure, cultural heritage, and connotations, is treated with equal importance by AI systems. Too often, dominant languages and modalities overshadow languages that do not conform to common patterns. This inequity can lead to loss of subtle cultural contexts and even affect decision-making processes in global applications.

Philosophical and Sociological Considerations

As we begin this discussion, let us keep in mind that our precise word choices directly shape how we frame the problem. The Sapir-Whorf hypothesis suggests that language influences thought, implying that a linguistic bias in AI models can subtly shape how people perceive reality. If an AI system prioritises one linguistic structure over another, it is not merely a technical issue; it alters the way information is framed and disseminated across cultures.

Consider, for instance, the problem of ambiguous pronouns. Suppose an AI is trained primarily on English and then fine-tuned for a language like Japanese, which often omits pronouns in conversation. Should the AI infer and insert missing pronouns to align with English-centric assumptions, or should it preserve the omission to maintain cultural authenticity? Each choice carries ideological weight.

Figure 1 - Pronoun Ambiguity: Context-Aware LLM Fine-Tuning

Instead of a simple translation pipeline, let’s fine-tune an LLM to handle ambiguous pronouns in a culturally appropriate way. We create a dataset where each sentence includes gender-neutral references in one language but forces gender specification in another. Using LoRA fine-tuning (to avoid full retraining), we adjust a multilingual model’s behavior:

from peft import get_peft_model, LoraConfig
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer


base_model = "Helsinki-NLP/opus-mt-en-ja"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForSeq2SeqLM.from_pretrained(base_model)


# Apply LoRA
config = LoraConfig(
    r=16, lora_alpha=32, lora_dropout=0.05, target_modules=["q_proj", "v_proj"]
)
model = get_peft_model(model, config)


# Fine-tuning example
train_data = [
    {"input": "Alice gave a book to Bob. She liked it.", "target": "アリスはボブに本を渡した。彼女はそれを気に入った。"},
    {"input": "Alice gave a book to Bob. He liked it.", "target": "アリスはボブに本を渡した。彼はそれを気に入った。"},
]


# Adjust weights to reduce hallucination in low-resource translations
model.train()

Why we think this is clever:

  • Uses LoRA for low-cost fine-tuning rather than retraining a full transformer.
  • Provides cultural adaptation where gender omission is normal (e.g., Japanese).
  • Reduces hallucinations where models invent incorrect gender assumptions.

Does the AI correctly resolve “she”? Or does it reinforce gender biases present in its training data? These are not trivial questions; they directly impact whether an AI system maintains or distorts meaning across languages.

A broader sociological perspective would reveal that linguistic hierarchies in AI systems mirror real-world power structures. Postcolonial theorists like Ngūgĩ wa Thiong’o argue that linguistic dominance is a form of control; languages historically associated with colonial power are often embedded as defaults in AI training datasets. This has tangible effects: a brand using AI-driven localisation might find that its tone and intent shift subtly when moving from English to Swahili or Tagalog, simply because the AI was not trained with equal linguistic depth across languages.

Figure 2 - Contrastive Embeddings to Detect Linguistic Bias

How do we correct this imbalance? One approach is to rethink dataset curation strategies. Instead of simply scaling up datasets in dominant languages and hoping low-resource languages catch up, AI models should actively prioritise underrepresented linguistic patterns. Self-supervised learning techniques, such as contrastive learning, can help balance multimodal training.

Rather than just tweaking translations, let’s quantify and mitigate English-centric bias. We use contrastive embeddings to measure how far a model’s sentence representation drifts from its expected linguistic space:

import torch
from transformers import AutoModel, AutoTokenizer


# Load multilingual model
model_name = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)


def get_embedding(text):
    tokens = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    with torch.no_grad():
        output = model(**tokens)
    return output.last_hidden_state.mean(dim=1)


# Compare English vs. Hebrew phrase embeddings
eng = get_embedding("The meeting was postponed due to unforeseen events.")
heb = get_embedding("הפגישה נדחתה בגלל אירועים בלתי צפויים.")


cosine_similarity = torch.nn.functional.cosine_similarity(eng, heb)
print(f"Semantic similarity: {cosine_similarity.item()}")

Why we think this is clever:

  • We quantify how English-biased a translation is by comparing semantic drift between embeddings.
  • Detects when a model forces unnatural phrasing because of dominant-language influence.
  • Can be expanded into a bias correction pipeline, automatically flagging problematic outputs.

There is also a philosophical dimension to this. If we consider AI as a tool for extending human cognition, then ensuring linguistic equity is not just a technical necessity but an ethical obligation. By extending John Rawls' concept of the “veil of ignorance”, we could suggest that fair systems should be designed as if the designer does not know which linguistic group they will belong to. Applying this principle, we should structure AI language models so that no single linguistic framework is assumed to be universal.

Detecting English-Centric Translation Bias

We could imagine a system that automatically flags mistranslations that arise from English-centric sentence structures. Using self-supervised anomaly detection, we train a model to detect phrases where the syntactic structure deviates from natural patterns in low-resource languages:

from sklearn.ensemble import IsolationForest
import numpy as np


# Example embeddings for "The project was delayed" in different languages
sentence_vectors = np.array([
    get_embedding("The project was delayed").numpy(),
    get_embedding("El proyecto se retrasó").numpy(),
    get_embedding("פרויקט התעכב").numpy(),
    get_embedding("โครงการล่าช้า").numpy(),
])


# Train an anomaly detector
detector = IsolationForest(contamination=0.1)
detector.fit(sentence_vectors)


# Detect anomalies (potential MT artifacts)
anomalies = detector.predict(sentence_vectors)
print("Anomalous sentences (potentially unnatural translations):", anomalies)

Why we think this is clever:

  • Detects AI-generated artifacts when translations follow English-centric grammar.
  • Flags cases where the model forces a structure unnatural for the target language.
  • Scales to any multilingual pipeline for real-time error detection.

The industry is indeed already marching towards this route: Google’s mT5 model exemplifies this approach, treating multiple languages as equally important from pretraining onward.

Finally, industry leaders need to recognise that the economic incentives currently favoring dominant languages must be realigned. Open-source initiatives like Masakhane, which focus on African languages, show that community-driven efforts can counteract the biases in corporate-funded models. But these projects require sustained investment. If AI localisation is to serve a truly global audience, funding must prioritise not just major-market languages but also those at risk of being left behind.

Quentin Lucantis @orb