AI: Red Teams and Pipe Dreams

Mar 18

OpenAI’s ChatGPT technology has a red team (real people) whose job it is to try to find every possible criminal or violent or otherwise unsavory use of ChatGPT before the general public does. The idea being that what’s learned from these exercises teaches OpenAI how to safeguard the technology so it can’t be used to, for example, tell a malevolent person how to make toxic chemicals from every day chemicals found at your local grocery store.

That task is already an uphill battle given that it’s not always clear to its own engineers how ChatGPT generates answers.

ChatGPT-3, for example, would not tell you how to build a pipe bomb. But if you told it you were writing a screenplay where one of the characters describes how he makes pipe bombs, ChatGPT was more than happy to help you fill in the details.

A simple recontextualization of the question was enough to subvert the technology’s safeguards.

Given the plethora of ways to recontextualize a question, I’m sure there are bad actors busily prompt-engineering ways around Large Language Model moats.

But red teams and server moats are a luxury when the technology is so processor-intensive it’s necessarily behind a giant server stack. That’s an organic membrane between the technology and the user.

But what would happen if another company… say Meta (Facebook), for example… created something similar to ChatGPT and was able to shrink the entire thing down so it could run independent of a third-party data center?

What if it could run unobserved on a private laptop?

And what if that technology was leaked onto 4chan - a site known for racist and violent and largely unmoderated threads?

It did.

Red teams are charming when the technology is hosted. But they are irrelevant when it’s released into the wild.

Sean Apple

AI: Red Teams and Pipe Dreams

AI and Human Value

The Asshole Economy