HN – Help a fellow dev on AI-localization?

We built an AI-based localization pipeline for our software product (HR domain) and would love feedback/ suggestions from others working in production MT/localization, so that we can learn and improve.

Current methodology:

GPT-5-nano forward translation + back-translation

text-embedding-3-small cosine similarity on source vs. back-translated text.

Threshold: ≥0.92 = auto-approved

On a recent ~970-string Spanish localization run:

~75% of strings passed automatically

We then had two human translators review outputs, and both flagged several problematic cases:

"Add Attachment" → Agregar Adjunto

Better: Adjuntar Archivo

"Pay Grades" → Grados de Pago

Better: Escalas salariales

"Sub Unit" → Subunidad

Better: Departamento

All three examples still scored 0.94+ cosine similarity.

Google Translate also back-translates Adjunto more like “Please attach,” which suggests

the issue isn’t just subjective reviewer preference.

Also, currently we pass a note with each transunit, so theres proper context for Ai

Questions:

How can we improve translation accuracy so it becomes more contextually rich and produces outputs that make better sense?

What routing metrics actually correlate best with human acceptance for UI localization?

Has anyone quantified improvements from using cross-engine back-translation (e.g., OpenAI + Google/DeepL) versus single-engine loops?

Would appreciate insights from teams running MT/localization pipelines at scale.

Help a fellow dev on AI-localization?

1 comments