Beware of Generative AI trained by Generative AI

Training an AI system to perform specific tasks accurately and reliably requires massive amounts of data. Using human labor to label AI programs is an important part of training machine learning models for some applications such as driverless cars. Workers in countries such as India and Sri Lanka, and workers from Mechanical Turk, a platform made by Amazon, are two sources of labor for this task.

For many of today’s ‘machine learning’ or ‘deep learning’ programs, including image recognition for self-driving vehicles, we think of thousands of humans in India or Sri Lanka labeling every picture so that a AI programs can refer to each human labeling. From time to time it attempts tasks such as recognizing a traffic sign or telling a pedestrian from a cyclist. The primary advantage of using Indian or Sri Lankan labelers lies in cost-efficiency. You can potentially hire more employees for the same money and do the labeling job cost-effectively. An interesting consideration is possible cultural and linguistic differences, a phenomenon I call ‘English to English’ translation. While many Indians and Sri Lankans are proficient in English, subtle linguistic nuances or culturally specific contexts may be overlooked. On the other hand, Amazon’s Mechanical Turk (MTurk) is a crowdsourcing marketplace that connects ‘requesters’ (those who need tasks completed) with ‘workers’ willing to complete them. MTurk boasts of a global reach with a vast pool of workers from various backgrounds. This diversity can be especially useful for tasks requiring multilingual and multicultural knowledge.

The flexible nature of MTurk also provides a significant advantage. Workers can choose jobs according to their skill-set and work on them at their convenience. As a result, requesters can usually complete their tasks relatively quickly. Furthermore, MTurk’s integrated quality control mechanisms help ensure that work output is of a reasonable standard. Depending on the complexity of the task, the cost-effectiveness of its use varies. Simple tasks can be cost-efficient, but complex tasks that require highly skilled workers can be more expensive than obtaining labor from low-wage countries. The anonymous and impersonal nature of the platform can also mean variable quality and a lack of accountability. Workers on the platform are paid on a per-job basis, which can get done in a hurry without proper care for quality, especially if the pay is low.

However, some newer generic AI models can be tailored for specific tasks using only a few examples, whereas their ‘deep learning’ predecessors required thousands of examples and many hours of additional training. Computer scientists call this ‘few-shot learning’, and believe GPT3 was the first real example of a powerful change in the way mankind trained machines. System architects are able to provide GPT 3 with only a few simple instructions to write their own programs.

This throws generative AI systems into a different class entirely, but doesn’t reduce their downsides. In fact, they are known to have distortions including biases and a propensity for profanity, and companies like OpenAI are actively seeking input from even ordinary users to improve their training models for better outputs.

That said, the dangers still lurk. The first is from places like the Dark Web. I recently wrote about the phenomenon of “jailbreaking” generative AI systems, where an ‘ethical hacking’ firm called Adversa.AI has broken a whole range of big-language generative AI offerings, including GPT4, Google’s Bard, Anthropic, and others. have shown significant success in breaking into Cloud and Microsoft’s Bing chat system. The efficiency with which a single set of commands can dodge all these models is a wonder, a disgusting lesson in the vulnerability of these systems (rb.gy/ovhdz,

But there is now news of more vulnerabilities, according to MIT. technology review ,rb.gy/yrsox, While both mTurk’s offshore employees and workers provide unique advantages in labeling AI programs, it is critical to establish proper quality control mechanisms to ensure high-quality data labeling. This is because the quality of AI models is highly dependent on the quality of the input data – and that starts with human labor.

The magazine says that it appears as though gig workers on platforms such as m-Turk may be using generative AI to complete their tasks. It reports that a team of researchers from the Swiss Federal Institute of Technology hired 44 people on a gig work platform to summarize 16 citations from medical research papers. The responses were then analyzed using an AI model that looked for signs of ChatGPT output. The team also extracted the workers’ keystrokes to look for other indicators that suggested the responses generated came from elsewhere.

According to the magazine, the team estimated that around 33% to 46% of employees used AI models such as OpenAI’s ChatGPT. It quotes a researcher who says, “Using AI-generated data to train an AI can introduce more errors into an already error-prone model. Large language models routinely present false information as fact. If they generate incorrect outputs that are themselves used to train other AI models, the errors can be absorbed by those models and amplified over time.

It seems to me that now is the time for governments around the world to step up and control what may soon be dangerous trends. But to be completely honest, I don’t know where to start.

Siddharth Pai is the Co-Founder of Sienna Capital, a Venture Fund Manager.

catch all business News, market news, today’s fresh news events and Breaking News Update on Live Mint. download mint news app To get daily market updates.

More
Less

UPDATE: June 28, 2023, 12:29 AM IST