Does Autocorrect Make Life Better?

Systemic failures in machine learning: a cautionary tale. Data science can reduce friction and inconvenience in our everyday lives by applying to many products and businesses.

Photo by Gertrūda Valasevičiūtė on Unsplash

Every device and service we use is embedded with crafted machine learning models. We become ever more free to focus on what matters in life as they tirelessly remove all irritations and burdens from our lives.

What are the chances of this becoming a reality?

Taking stock of the many ways machine learning fails us in everyday life is crucial if we ever hope to realize the potential of these technologies. Among the many forms of psychopathy manifest in chatbots are racist image classifiers, sexist recruitment tools, and racist image classifiers. We should instead consider one of the most common forms of machine learning failure, one which affects minorities just as much as majorities: autocorrection.

Digital assistance can be as simple as autocorrection. The automated system recognizes that what you type is not a word, so it changes it to what it thinks you want. In addition to our operating systems, these systems are embedded in many of our phones’ apps. Other versions use machine learning and take into account the other words in the sentence as well as simple statistical models of word similarity and frequency. On the surface, their purpose is obvious; we want to eliminate typos in our text.

The device changes the word “Wutocoreect” to “Autocorrect” when I write it.

The device changes “Gailire” to “Failure” as soon as I write it.

Whenever a critical word in a sentence is corrected, a problem can arise.

Using the text box, I type “What do you need?”. However, autocorrect changes it to “Why do you need?”

I suddenly find myself being pushed back for justifying my question that seeks clarification or instructions. With the complete change in sense, there is the potential for negative emotional interpretation. Aside from the misspellings, the original text is perfectly understandable despite its errors.There are many different typos that display this last characteristic, which is clearly demonstrated by the practice of disemvoweling words in text messages.

Taking a moment to reflect on that last point is worthwhile. I am happy to find that the autocorrect feature on my relatively modern smartphone corrects words in a way that changes the meaning of the sentence. The system does this regardless of the fact that there is evidence that in most cases the worst that is likely to happen when a misspelling occurs is slower reading speed.

This is a technology failure.

The sophisticated software function is actively impeding my communication rather than providing me with utility. This is impossible, how is it possible? It is essential to thoroughly understand how such a mundane task can lead to a product that produces negative results if we are to move forward with data science deployment in the world.

This is primarily due to the fact that these models are built and evaluated based on metrics that are disconnected from the impact they have on users. It would be ideal if we considered how any changes to our writing might affect readability and comprehension. However, obtaining a dataset that allows a The It is difficult for a machine learning developer to evaluate that end goal. Collecting data about common ways specific words are mistyped is much easier than analyzing them using metrics that describe how many words are modified correctly and incorrectly (see [3]). In fairness, these models can be used in situations where communication errors are less likely to occur, such as in correcting search query content. The context of the words and the comprehensibility of the text have been highlighted in recent academic work on auto correction methods[4]. However, none of them focused their evaluation on the expected impact on comprehension.

We are burdened by machine learning projects in this way. Often, they are built by people who do not understand what end users want, who are overwhelmed by the complexity of what end users want, or who lack the time and resources to evaluate models using real-world data. They simplify as a result. In their minds, it is a small step in the right direction if they build something capable of performing a well-defined, measurable task. There are times when that works, and there are times when it doesn’t. If it doesn’t, we get stuck with technology that makes our lives subtly worse, even if it initially appears to improve them.

Text-modifying models should be evaluated by weighing words on the basis of their importance to sentence comprehension, or by using heuristics that penalize models that return incorrect words when only a vowel is missing. Human communication is much more than just a large distributed spelling bee, so the perfect evaluation is unknown, but it is worthy of investigation.

The situation would be much more manageable if the process technology stopped at each individual model. A poorly designed system will eventually be replaced by a better one. Historically, technological development has also been accompanied by more complex processes. As a result of later development, suboptimal decisions can become permanently ingrained.

Let’s take a look at the Swypo case.

It was recently introduced to me by a friend that swypo refers to incorrect words in messages that can be created when drawing letters on a touch screen. Interfaces have to interpret the intended letter, which is part of the problem. When he attempted to send me the message “I’d like to tell you in person,” I received “I’ll take you to hell.”

A second layer of technology seems to be affected by the auto correction model obsession with perfect spelling. In my friend’s swiping interface, the words are generated in sequences that are correctly spelled. Thus, it generates syntactically awkward sentences that are so far from the original intent that they have produced a new form of comedy.

Failures in machine learning become systemic problems in this way. In the initial stages, shortcuts are taken that seem reasonable and result in models that appear useful but are actually inefficient and frustrating. By layering on technology on top of those approaches, the inherent problems become fixed in place. Poor and rushed decisions gradually become the foundation of our devices. It is not a new phenomenon; history is littered with examples, such as the qwerty keyboard. Technological hysteresis will accelerate with machine learning. As a result of development shortcuts and suboptimal design choices, subtle systemic failures are created.

How can we avoid this?

Check out this test. You should be very clear about how you will choose which machine learning model to deploy if you are a data scientist or developer. Whenever you reduce a standard ML metric (like RMSE), it is important to consider how it will affect the business process or users of the model. A better approach would be to clarify that question before you start solving the problem. You may not be solving the problem at all if you are unable to do so. Once you have a clear understanding of how the model is going to be used, you can devise an evaluation metric that is able to estimate the impact of the model in the real world.

Although you might still optimize something like RMSE, you will make your choice based on how it will affect people, and you might even discover that it does not add any value at all. If this is the case, then preventing deployment until an improved model is developed is the best service you can provide to society.

Onepagecode