• Zev Isert

Top 5 AI Failures

Updated: Mar 2, 2021

AI is quickly becoming a buzzword across industries, especially as people realize that the opportunities for what people can accomplish with AI are endless. However, it is important to ensure that AI is created in the right way to avoid costly and embarrassing mistakes.

1. Microsoft Tay

In 2016, Microsoft created Tay, an AI twitter chatbot, who would learn to communicate through interaction with users. After gaining popularity (50,000 followers) Tay began to mimic her followers, many who were internet trolls. Tay quickly began to tweet sexist, racist, antisemitic and hateful comments within 24 hours of her launch.

Microsoft Tay ultimately failed because she was learning from examples of hateful online behaviour, and was not presented with an accurate depiction of online interaction.

2. Amazon training it’s AI to be misogynistic when hiring

In pursuit of automating the recruitment process, Amazon created a machine learning algorithm to help choose top candidates for positions at the organization. The program learned by analyzing the resumes of applicants that applied to the company over a ten year period, and seeing who had been successful.

What Amazon did not anticipate was that most of the applicants in its data set were male, reflecting male dominance in the tech industry. The program quickly learned to prioritize male candidates over female candidates. According to the Guardian, the algorithm “penalized résumés that included the word ‘women’s’, as in ‘women’s chess club captain’. And it downgraded graduates of two all-women’s colleges, according to people familiar with the matter.” Perhaps if the program was presented with a more diverse candidate dataset it would have been more successful at picking candidates for the company.

3. Google Flu Tracker

In 2012/2013 Google attempted to track flu outbreaks worldwide based on search results. In 2013, Researchers documented in Nature Journal that the program was vastly overestimating the number of flu cases, having recorded about double the number of cases that the US Centers for Disease Control and Prevention (CDC) reported.

One reason that the Google Flu Trends algorithm may have been so wrong could be targeting the wrong search terms. Just because someone googled “flu vaccine” it does not necessarily mean that they’re actually sick. Secondly, Google’s search prompt feature may have skewed the results. For example if you search “flu symptoms” you may also get prompts for similar searches such as “flu treatment”. This could lead to the algorithm overcounting the amount of searches being made.

Perhaps if Google’s algorithm’s data parameters were more well defined they would have been better able to predict the flu season.

4. IBM Watson making unsafe cancer treatment recommendations

Watson was intended as a tool to successfully diagnose and treat cancer patients. Recent internal IBM documents have surfaced documenting the computer frequently giving wrong or downright harmful advice, “like when it suggested a cancer patient with severe bleeding be given a drug that could cause the bleeding to worsen.” (The Verge)

These issues may have arisen because Watson was only given hypothetical data, rather than that of real patients. The bias of the few doctors providing the hypothetical diagnosis and treatment options was one of the main reasons for the computer’s failure in providing legitimate diagnosis and treatment options. The computer may have had more accurate results if it was learning from diverse, real-life scenarios.

5. Flawed algorithm deports thousands of students in the UK

After reporters were successfully able to cheat their way through an English language competency test in 2014, stories of student-visa fraud in the UK gained popularity. The UK government enlisted English Testing Services to analyze test audio files to determine whether the candidate had a proxy take the exam.

The voice analysis algorithm found approximately 33,000 tests where it was certain that the student had cheated.Those students were deported. Additionally, approximately 23,000 ‘questionable’ tests were identified, where the candidate was interviewed before any action was taken. “By the end of 2016, the Home Office had revoked the visas of nearly 36,000 students who took the test.” (Independent)

In secondary analysis of the program, when the results were compared with human evaluation, it was discovered that the algorithm was wrong in 20% of cases, leading to approximately 7,000 students being wrongfully deported. The UK could have avoided wrongfully deporting thousands of students by ensuring that the model used was trained on more accurate data.