If you’ve taken a language class, it’s likely you’ve heard of grammatical gender. It’s when words are divided up into arbitrary classifications, which happen to be named after human genders. So a Spanish table, la mesa, is “female,” even though there is nothing particularly feminine about tables. Teachers generally do their best to hammer it into you that grammatical gender has nothing to do with men and women. Recent research, however, has found that artificial intelligence has never been taught to recognize this difference.
How Robots Learn To Talk
Before getting into the reasons why computers are bad with gender, we first need to understand how artificial intelligence processes language. Language is a complex system, so in order to teach an AI system which words are related to each other in meaning, engineers often use a popular model known as word embeddings. To create this model, AI takes in large amounts of text as its input, which it then will “learn” from. This type of AI learns somewhat like humans do: by listening to people talking, and eventually venturing to speak on its own. Humans take in feedback about language continuously, though, which makes them better at improving over time.
After taking in all this language, AI then tries to build a massive network to create its own definitions for words. Words are grouped together with other words that have similar meanings. For example, the word “astronaut” will acquire a meaning close to the words “outer space” and “profession,” and it will be far away from “fruit” and “curse words.” By creating a massive, complex web of connections, the AI gradually begins to understand and create language. These kinds of systems, known as dense vector word representations, are widely implemented by AI companies like Google and Facebook.
The Human Source of Bias
Problems arise, though, because these models are not objective. They’re learning from human language, which is far from bias-free. In March 2016, Microsoft released Tay, an advanced Twitterbot that was created to interact with people. The idea was that the more people it interacted with, the more it would learn. As people tweeted at Tay, it would process what was being said and gradually improve its English. Instead, Tay was spammed by trolls, and in less than a day, it had to be shut down for spewing racist rhetoric.
Tay is an extreme example, but it does show in a very obvious way how AI can learn bad behaviors. An article published in Science discussed research into these biases, found in more commonly used AI. Basically, it was discovered that names like “Brett” and “Allison” were mapped closer to positive words in the network than “Alonzo” and “Shaniqua,” displaying a clear racial bias. Similarly, they found gender bias with male names being closer to career words, and female names closer to family words. This same kind of bias has been found in humans by the Implicit Association Test, which provides evidence that this bias is passing directly from man to machine.
How Grammatical Gender Is Confused With Human Gender
While these biases seem somewhat obvious and easy to understand for humans, subtle factors can influence AI in ways that are hard to catch. These subtle biases bring us to how AI models confuse human and grammatical gender. To fully understand this, we needed to talk to one of Babbel’s computational linguistics engineers, Kate McCurdy. She researches gender and AI, and how they intersect with different languages. McCurdy said that, like most research, language processing is dominated by the English language, which causes problems when it’s generalized to other languages.
Kate McCurdy is a computational linguistics engineer at Babbel, and she’s presented talks on the topic of gender in AI at a number of conferences.
“Because these models were developed with English in mind, when you put them in other languages, they may go off the rails in different ways,” McCurdy said. “With languages that have grammatical gender, what the [AI] model can do is accidentally learn to associate the gender of objects with the gender of people.”
As a concrete example of this, McCurdy created a chatbot that uses current models of AI. You would ask the chatbot for a recommendation on what toy you should get for either your daughter or son, and the bot would suggest either a ball or a doll. She found that in German, where ball is male and doll is female, the bot will suggest toys along the grammatical gender lines. But if asked in Spanish, where both of the toys are female, it will suggest the toys pretty evenly to both.
McCurdy’s chatbot may not seem like a massive problem. After all, humans can intervene and fix that. The problem is that these language gender biases are so embedded that it can be difficult to fix such a complex system. You can actually witness the issues of gender by looking at machine translation. A paper put out by Stanford University looked at the gender biases in translation technology. Researchers show that if you take a language with gender neutral pronouns like Turkish (no “he” or “she,” just “it”), Google will often translate along perceived gender lines. Because English needs to have a gender on its pronouns, it will just choose the gender based on the probabilities of the sentence appearing in English.
Solutions To The Problem
It is important to note is that gender is one of many, many factors, so biases are hard to actually fix, and they won’t always present themselves in such obvious ways. There are attempts to adjust to the bias, which will be incredibly important as AI starts to invade almost every aspect of our life. One such solution is an algorithmic approach, which is discussed in a paper by researchers at Boston University. They are hoping that they can teach machines to automatically find and eliminate bias in AI. McCurdy said there is definitely hope that this problem can be solved, but it will require research across languages to identify the issues we are facing.
“The big-picture takeaway is that models of language that we use in AI and language technology are very, very complex, and they can reflect a lot of things,” McCurdy said. “They can reflect things that are encoded already, they can reflect existing cultural biases and prejudices in the world, and they can also interact with specific properties of different languages in ways that can be hard to anticipate, but can end up affecting people downstream.”
Artificial intelligence is likely to be the defining technological debate of the 21st century. Stephen Hawking has already warned that it may be the end of humanity as we know it. Perhaps the best way to head off this upcoming robot apocalypse is by teaching robots to not have the same biases and problems humans do. I’m not suggesting that telling a robot that both boys and girls can play with dolls if they want to will save the world, but it certainly isn’t a bad place to start.