OpenAI Exploited Low-Paid African Workers to Train ChatGPT’s AI System

ChatGPT, the AI chatbot garnering massive attention across the tech space as of late, has a dark history. According to explosive revelations made by Time magazine, OpenAI, the research lab behind the disruptive chatbot technology, used the labour of Kenyan workers to review sexually and violently explicit content in order to train the algorithm to avoid toxic language.

Before ChatGPT’s linguistic capabilities could be used by the general public, the AI system was first manually taught to avoid blurting out inappropriate text by receiving examples of violent, sexual, and racist content that could be found on the internet, and then filtering it out. To do such a task, OpenAI contracted Kenyan workers through San Francisco-based company Sama, which provides data interpretive services for AI systems to various tech giants such as Microsoft, Google, and Meta.

OpenAI signed three contracts with Sama worth around $200,000 in 2021, and provided the company with thousands of fragmented text extracted from some of the internet’s darkest corners for data labelers to comb through and feed into the algorithm. The content “described situations in graphic detail like child sexual abuse, bestiality, murder, suicide, torture, self harm, and incest,” Time reported.

According to interviews with Sama employees and internal discussions between Sama and OpenAI reviewed by Time, the data labelers were paid a paltry wage between $1.32 and $2 per hour, depending on performance. One of the workers— whom spoke on the condition of anonymity, told the magazine that they began experiencing disturbing visions after reviewing a graphic description of someone having sexual intercourse with a dog whilst a child watched nearby. “That was torture,” he said. “You will read a number of statements like that all through the week. By the time it gets to Friday, you are disturbed from thinking through that picture.”

Within a nine-hour shift, the workers said they were expected to comb through roughly 150 to 250 such pieces of text, each ranging between 100 to 1,100 words. OpenAI also hired Sama in 2022 to conduct an unrelated task to ChatGPT, this time to probe through explicit images. The subcontractor ended up cancelling the contract eight months earlier than previously planned, after its Kenyan workers discovered content deemed illegal under US law.

“Some of those images were categorized as ‘C4’—OpenAI’s internal label denoting child sexual abuse—according to the document. Also included in the batch were ‘C3’ images (including bestiality, rape, and sexual slavery,) and ‘V3’ images depicting graphic detail of death, violence or serious physical injury, according to the billing document,” wrote Time magazine.

Earlier in January, Sama said it won’t review such explicit content going forward, and instead divert efforts towards annotating data for AI system vision solutions. “We have spent the past year working with clients to transition those engagements, and the exit will be complete as of March 2023.”

As AI ethicist Andrew Strait observes, “They’re impressive, but ChatGPT and other generative models are not magic— they rely on massive supply chains of human labor and scraped data, much of which is unattributed and used without consent.” He concluded, “these are serious, foundational problems that I do not see OpenAI addressing.”


Information for this briefing was found via Time magazine and the sources mentioned. The author has no securities or affiliations related to this organization. Not a recommendation to buy or sell. Always do additional research and consult a professional before purchasing a security. The author holds no licenses.

Leave a Reply

Share
Tweet
Share