Lost in Translation

A Look at the Viability of AI translation
Most authors understand that the United States is not the only place with millions of potential readers. In fact, the United States is not even the largest market for English-language readers. India gets that designation. Yet the majority of independent authors distribute only within the U.S. The few that go beyond that only include Canada, U.K., Australia, and New Zealand.
In fact, there are 27 countries around the world where English is the primary language and an additional17 countries were English is an official language in addition to one or more other languages. For example, Canada has English as the primary language except in Quebec, where French is the primary language but English is an official language as well. Bottomline, those who are not distributing beyond the United States are certainly missing out on potential sales and, more importantly, on capturing a growing online market.
But what about those countries with other official languages? Or countries where English may be spoken by many people, but the official language is something else? Right now the three fastest growing book translation markets are German, Italian, Spanish, and French. What is their penetration in the world compared to English? Below is a chart of the number of speakers where the language is their first language.
English |
German | Italian | French | Spanish |
369.7 million |
75.5 million | 64.6 million |
77.3 million |
463.0 million |
Mandarin Chinese at 921.5 million speakers is not one to overlook. However, the use of different characters and the need or understanding the complexities of distribution within China and its territories is an entire article in itself. Another language to consider is Hindi, the largest non-English language used in India with over 300 million Hindi speakers. That is also something I’m strongly considering but haven’t researched yet.
Those of us who are U.S. centric tend to think we are the most literate nation or have the most readers, or have the most readers who will buy books. But that is not the case. According to 2018 results from PEW Research, Americans read a mean average of 12 books per year, and the typical (median) American has read four books in the past 12 months. Unfortunately, 27% of Americans read zero books in that year. The chart below looks at global reading behavior per person per week of selected countries. Note that India is at the top. The USA is 23rd out of the 27 listed countries. It certainly makes me wonder why I haven’t seriously considered translation before or concentrated some marketing effort on other english-speaking countries with a large number of readers.
What has stopped most indie authors from considering translation is the cost. To hire good translators for a typical 75K book averages around $5,000 for the translation and then another $2,500 for an editor to make sure the translation was done well. That has driven the desire for faster development of accurate translations using artificial intelligence (AI). The ability of AI to produce natural language text has been increasing in accuracy by leaps and bounds. Already there are mobile apps you can put on your phone that you to speak with someone in a different language or listen and have your phone simultaneously translate what is being said into your language. It’s not 100% accurate but it is accurate enough that the understanding of the conversation is clear with little need for clarification. The question is what happens when we ask AI to translate an entire novel or a memoir or a business book.
Artificial Intelligence (AI) and Translation Accuracy
Wouldn’t it be wonderful to have a key on our keyboard that we could hit and say “translate” to a specific language, knowing it would be 100% accurate? That is the gold standard, but it is definitely not available yet and I doubt will be available in my lifetime. It is nearly here in some circumstances. How is it that Google can translate any website for me so I can understand it? Part of that is that websites use a lot of the same words. Part of it is also that it doesn’t have to be 100% accurate. If a sentence translates to “You go now to About and back come.” Though it’s not grammatically correct, I know what it means. That’s all I need. But for books where language is critical to understanding that isn’t good enough.
The big leap in translation accuracy and quickness happened around 2016 to 2017 when “neural translation” was made possible. Neural networks are computing systems vaguely inspired by the biological neural networks that constitute animal and human brains. In computing it is based on a collection of connected units or nodes called artificial neurons. Like the synapses in a biological brain, a computing “neuron” can transmit a signal to other neurons. Neural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model.
Sounds like science fiction, right? And we are in the early stages of this kind of complex computing. For those who are language challenged, like me, the first question is how is accuracy assured? I remember learning Spanish in high school and taking it again in college. Though I received an A grade every time, today I still can’t easily put together a grammatically correct sentence in Spanish and I am nowhere near fluency. AI uses “learning machines” that never get tired or bored and keep making matches with what was “correct” before and they add that to their database and keep going. In the early 1990’s I participated in a machine learning project to predict certain learning responses as they related to types of learners (visual learners, kinetic learners, auditory learners). It was rudimentary but I was very surprised at how quickly the database can be built when thousands of people are participating. Things have come a long way in the last three decades.
In AIs tasked with translation, the computer “learns” how to get better by matching patterns in every piece of translation against known accurate translations and makes “guesses” as to what is the best. This initially began by Google importing hundreds of thousands of books and essays and research works that were in the public domain (and others voluntarily provided) that had been translated by humans. The second part, which is what is ongoing now, is having qualified human translators check what the computer believes is good translation based on its “learning.”
At Google there are thousands of human volunteer translators, and companies that have paid translators to do multi-language functionality for their websites, all adding to the database. In addition, there is a large paid cadre of translators for special projects who pour through the computer translations and make corrections. That means those corrections now enter the database as “good translations” to add to the comparison again. With every correction and every nuance of language entered, the computer is getting ever closer to an accurate and context-sensitive result.
How Accurate Is It Today?
Other companies outside of Google have entered the AI translation business for profit (most starting with Google’s work). They tend to have specialized in a particular area of translation (e.g., legal work, medical research, scientific papers, etc.) which allows for a smaller database as the number of word choices and interpretations of sentences are more limited. This results in a more accurate AI translation. In some cases it can be up to 85% to 90% accuracy which is very good.
The problem is that technical nonfiction or instructional nonfiction doesn’t tend to have as many language nuances as needed in a fiction book or narrative nonfiction memoir or psychology self-help guide. Fiction is much more complex. Fiction relies on “voice.” Not only the voice of the author but also the voice and relationship of characters. Unlike English, many languages have different ways of identifying those relationships through the use of pronouns and possessives, metaphor and simile. Great writers are known to use these techniques frequently to set mood, themes, and imbue character relationships with a turn of phrase.
The Pronoun Problem
For example, in English we have the second person pronoun “you” that applies to a lot of different types of situations and people. “You must go to the store. I’d like you to take me to the doctor.” “What are you (speaking to a group of people in a meeting) thinking about this process? This same pronoun is used whether I am addressing a family member, a cab driver, my employer, the President or a group of people. But in many other languages that word “you” is said differently based on the relationship between the speaker and the other person. Is it a formal relationship (e.g., an employer, an esteemed colleague, teacher, or a stranger)? Or is it a personal relationship (e.g., a friend, a close family member, a lover). Also, in some cases the age of the person makes a difference, where children are addressed informally in the same or slightly different ways than an adult. To complicate matters, in a few languages when you are using “you” in the plural form it is yet a third different word.
In French, for example, “tu” is used when talking to someone informally, such as with a child, a friend or a classmate, while “vous” is used to address someone with respect, such as an elder, a teacher, an official or a stranger. In Spanish, it is even more difficult. “Usted” and “ustedes” are the formal singular and plural forms of “you” respectively, while “tu” and “vosotros” are the respective informal forms. This usage even changes depending on which Spanish-speaking country you are in. In German, it’s equally as complicated. One uses “du” for informal singular “you” situations, “ihr” for informal plural situations, and “sie” for all formal needs.
If you are writing a self-help book, what would you choose? You certainly don’t know your readers personally, so you would think it would be formal (they are strangers to you). But, if your writing voice prefers to address them more as colleagues or friends, you may want to use the informal version. However, will that use reduce the esteem you should have as an expert or make your reader feel child-like? Or will being informal make them trust you more and feel like you are a friend? The answer is it all depends on the language and the culture. It is difficult for AI to discern that.
If you are writing fiction, with different characters and different relationships, it becomes much more difficult for AI to make the right decision. Decisions tend to be based on other words in the sentence for it to “guess” what the intention is, along with what has come before. The weighting of that is dependent on how many instances it has. It is possible that, if the formal form was used in the previous pages, it might default to the formal form even though the speaking character has changed to an informal relationship. This is particularly true in dialog depending on other cues in the sentences that may signal the relationship.
To complicate matters, once you determine the formality, other words in the same sentence also change form in the verbs used and sometimes the placement of the verb in the sentence based on that formal vs informal choice of address.
The Metaphor and Simile Problem
Metaphor and simile rely on those nuances to get the point across. Let’s take two examples that may seem obvious to the English reader. If I write, “Louisa was a rock for her mother.” I probably mean that Louisa was always there or Louisa was very strong, or a combination of both those meanings. In English it’s clear what being a “rock” means in a relationship. However, if I translate that literally word for word into another language it could be read that Louisa doesn’t move or can’t talk around her mother because rocks don’t move or talk. It could even be read that her mother named the rock Louisa. Instead of translating word for word “Louisa was a rock,” a good translator will instead translate it as “Louisa was always there.” She may even contact the writer to be sure that is the intended meaning.
Let’s take the opposite gender and look at another phrase with both language and cultural differences. If I write “Clive is like a sheep,” I probably mean he is easily manipulated. However, depending on the language and culture that phrase could mean any of these: Clive has long hair; Clive is a drunkard; Clive doesn’t answer back; Clive follows without thinking; or that Clive is a young fellow waiting for girls to follow him. None of those capture my intended meaning of “easily manipulated.” The closest might be “follows without thinking.” So again, the translator would catch that and instead translate it without the metaphor saying: “John is easily manipulated.” A poor translator may choose a different interpretation–Clive is waiting for girls to follow him–that is not the intended meaning. The reader will then become quite confused when in the next chapter they learn Clive is gay.
Humor is the Most Difficult
Most translators find humor to be the most problematic. Often a direct translation of the words won’t make sense at all because humor relies a lot more on idioms, wordplay, and cultural knowledge. Also, what is considered funny varies from country to country, and often within different cultures within a country. This is where human translators cannot be beat. Only a human translator knows the context of the original language and can choose something that works in the translated language. Often what a good translator will choose is completely different than what was written, but it delivers the same funny feeling. In 2012, Jascha Hoffman wrote a brief, humorous essay on this problem in the New York Times, titled Me Translate Funny One Day. I recommend it both for the levity of the writing and for a good explanation of how a human translator is definitely necessary.
So, Should You Use AI or Not? Why Did You Just Read This Whole Article to Learn You Can’t Get What You Want?
As the Rolling Stone song goes, “You can’t always get what you want, but if you try sometimes, you might find, you get what you need.” — Mick Jagger
If you are serious about translation, I think you should try it. I did a brief trial using DeepL with the first chapter of a novel. I asked my husband (who used to be a German to English translator in a past life) to look it over. It ended up being ~84% accurate. He said that he understood everything and the story as a whole (in that first chapter) would get across to any german reader. However, the other 16% corrected would make it a lot stronger. Note, I write in third person which avoids the second person “you” problem I mentioned above except in dialog.
Now I need to cajole him into reading the entire novel in German and comparing it to the English version. (He usually doesn’t read my novels at all). I may have to make many promises in exchange for this work over a number of months. If the accuracy holds the same, it will be worth using AI. If the accuracy goes down significantly, I need to consider hiring an editor.
In terms of indie author accessibility to AI translation, there are two that are affordable. One is DeepL ($9/month for up to 5 books/month). You can load the entire book at once and get back the entire translation within a minute. I did this for an English to German translation and was amazed at how fast it was. I’m also going to try it for French (another language my husband speaks, though not as well as German) and Spanish. I have an extended family of Spanish speakers due to intermarriage with a couple of siblings. So a good pool of readers there.
The other option is using Google Translate online. It’s free but very time consuming. You can copy and paste ~5,000 characters including spaces (about 600 words depending on your typical word length and size of paragraphs) at a time. I tried this with a 38K Novella and it took me about ten hours of work (pasting 108 different sections) and also making them look good in book form. The reason I did this was to compare Google Translate to DeepL. The accuracy for the first chapter of the novella was closer to 65%. The more words I did, the less accurate it became as there seemed to be no memory of the prior group of words. For me, using Google Translate is not a viable option in time spent or accuracy.
For both systems, it will be highly variable depending on how you write, the complexity of the language you use, and the voice and characters. For me, I don’t think that AI translation will ever equal human translators in my lifetime. So translators are not going to lose their jobs. However, I do believe that we are close now, and in three more years it will be even better. For now, I can get a solid first draft that can then be edited by a human translator. The good news, for those who must pay for it, that cuts the cost in half. Instead of $5,000 for a 75K word initial translation of a book, and then paying an editor to check it. I would only have to pay the editor. Good translation editors charge between 30% and 50% ($1,500 to $2,500) of what a full translation would cost. This cost is based on many variables, including the language, the country where the editor lives and works, and the individual editor’s evaluation of how messy that draft is. Some editors will only quote an hourly rate. Others quote a per word rate. Most will want to look at a couple chapters before quoting anything so they have a sense of how difficult it may be.
I’m excited for AI translation. I’ll let you know how my experiment pans out in 2021. For anyone who has undertaken AI translation, I’d love to hear what you used, how it worked, if you used a human editor after the AI draft, and if you received any feedback from readers once it was published. You can send your feedback to me in an email to Maggie AT maggielynch.com and I’d love to write a follow up to this article in 2021 showcasing your feedback and any book(s) you had translated as a thank you for taking the time to respond.