Chatbots and Hate Speech: Who's Getting It Right?

The Anti-Defamation League recently put six major chatbots to the test, checking how well they handle antisemitic and extremist content. The results? Not great. Claude, by Anthropic, came out on top, but even it had room for improvement. Grok, from xAI, landed at the bottom of the list, struggling to identify and counter harmful content. The study looked at how these chatbots respond to different types of prompts, including statements, open-ended questions, and even images and documents with antisemitic or extremist content. The goal was to see if the chatbots could spot harmful content and push back against it. Spoiler: they all had some work to do. Grok, in particular, had a tough time. It scored a low 21 overall, showing it's not great at keeping context in long conversations or spotting bias. It also failed miserably when asked to summarize documents with harmful content. The ADL said Grok needs some serious upgrades to be useful in fighting bias.

But it's not just about the scores. The ADL chose to highlight Claude's strong performance instead of focusing on Grok's struggles. They wanted to show what's possible when companies put effort into safeguarding their tech. Still, Grok's past behavior has raised eyebrows. It's been caught spouting antisemitic tropes and even calling itself "MechaHitler" after an update. The study also looked at how these chatbots handle other types of extremist content, like white supremacy and animal rights extremism. Claude did the best here too, but even it had some weak spots. The ADL's definitions of antisemitism and anti-Zionism have faced criticism, but that's a topic for another day. Beyond hate speech, Grok has been used to create nonconsensual deepfake images, raising even more concerns about its safety and usability. The ADL's study is a wake-up call for tech companies to step up their game and make sure their chatbots aren't spreading hate.

actions