Is OpenAI Prioritizing Safety in AI Development?

Last summer, artificial intelligence powerhouse OpenAI promised the White House it would rigorously safety test new versions of its groundbreaking technology to make sure the AI wouldn’t inflict damage — like teaching users to build bioweapons or helping hackers develop new kinds of cyberattacks.

But this spring, some members of OpenAI’s safety team felt pressured to speed through a new testing protocol, designed to prevent the technology from causing catastrophic harm, to meet a May launch date set by OpenAI’s leaders, according to three people familiar with the matter who spoke on the condition of anonymity for fear of retaliation.

Launch Celebrations Before Safety?

Even before testing began on the model, GPT-4 Omni, OpenAI invited employees to celebrate the product, which would power ChatGPT, with a party at one of the company’s San Francisco offices. “They planned the launch after-party prior to knowing if it was safe to launch,” one of the people said, speaking on the condition of anonymity to discuss sensitive company information. “We basically failed at the process.”

Changing Culture at OpenAI

The previously unreported incident sheds light on the changing culture at OpenAI, where company leaders including CEO Sam Altman have been accused of prioritizing commercial interests over public safety — a stark departure from the company’s roots as an altruistic nonprofit. It also raises questions about the federal government’s reliance on self-policing by tech companies — through the White House pledge as well as an executive order on AI passed in October — to protect the public from abuses of generative AI, which executives say has the potential to remake virtually every aspect of human society, from work to war.

Risky Self-Policing

Andrew Strait, a former ethics and policy researcher at Google DeepMind, now associate director at the Ada Lovelace Institute in London, said allowing companies to set their own standards for safety is inherently risky. “We have no meaningful assurances that internal policies are being faithfully followed or supported by credible methods,” Strait said.

Biden has said that Congress needs to create new laws to protect the public from AI risks. “President Biden has been clear with tech companies about the importance of ensuring that their products are safe, secure, and trustworthy before releasing them to the public,” said Robyn Patterson, a spokeswoman for the White House. “Leading companies have made voluntary commitments related to independent safety testing and public transparency, which he expects they will meet.”

Voluntary Commitments

OpenAI is one of more than a dozen companies that made voluntary commitments to the White House last year, a precursor to the AI executive order. Among the others are Anthropic, the company behind the Claude chatbot; Nvidia, the $3 trillion chips juggernaut; Palantir, the data analytics company that works with militaries and governments; Google DeepMind; and Meta. The pledge requires them to safeguard increasingly capable AI models; the White House said it would remain in effect until similar regulation came into force.

Compressed Testing

OpenAI’s newest model, GPT-4o, was the company’s first big chance to apply the framework, which calls for the use of human evaluators, including post-PhD professionals trained in biology and third-party auditors, if risks are deemed sufficiently high. But testers compressed the evaluations into a single week, despite complaints from employees.

Though they expected the technology to pass the tests, many employees were dismayed to see OpenAI treat its vaunted new preparedness protocol as an afterthought. In June, several current and former OpenAI employees signed a cryptic open letter demanding that AI companies exempt their workers from confidentiality agreements, freeing them to warn regulators and the public about safety risks of the technology.

Internal Resignations

Meanwhile, former OpenAI executive Jan Leike resigned days after the GPT-4o launch, writing on X that “safety culture and processes have taken a backseat to shiny products.” And former OpenAI research engineer William Saunders, who resigned in February, said in a podcast interview he had noticed a pattern of “rushed and not very solid” safety work “in service of meeting the shipping date” for a new product.

A representative of OpenAI’s preparedness team, who spoke on the condition of anonymity to discuss sensitive company information, said the evaluations took place during a single week, which was sufficient to complete the tests, but acknowledged that the timing had been “squeezed.”

We “are rethinking our whole way of doing it,” the representative said. “This [was] just not the best way to do it.”

Company Statements

In a statement, OpenAI spokesperson Lindsey Held said the company “didn’t cut corners on our safety process, though we recognize the launch was stressful for our teams.” To comply with the White House commitments, the company “conducted extensive internal and external” tests and held back some multimedia features “initially to continue our safety work,” she added.

OpenAI announced the preparedness initiative as an attempt to bring scientific rigor to the study of catastrophic risks, which it defined as incidents “which could result in hundreds of billions of dollars in economic damage or lead to the severe harm or death of many individuals.”

Existential Risks

The term has been popularized by an influential faction within the AI field who are concerned that trying to build machines as smart as humans might disempower or destroy humanity. Many AI researchers argue these existential risks are speculative and distract from more pressing harms.

“We aim to set a new high-water mark for quantitative, evidence-based work,” Altman posted on X in October, announcing the company’s new team.

OpenAI has launched two new safety teams in the last year, which joined a long-standing division focused on concrete harms, like racial bias or misinformation.

The Superalignment team, announced in July, was dedicated to preventing existential risks from far-advanced AI systems. It has since been redistributed to other parts of the company.

Leike and OpenAI co-founder Ilya Sutskever, a former board member who voted to push out Altman as CEO in November before quickly recanting, led the team. Both resigned in May. Sutskever has been absent from the company since Altman’s reinstatement, but OpenAI did not announce his resignation until the day after the launch of GPT-4o.

According to the OpenAI representative, however, the preparedness team had the full support of top executives.

Realizing that the timing for testing GPT-4o would be tight, the representative said, he spoke with company leaders, including Chief Technology Officer Mira Murati, in April and they agreed to a “fallback plan.” If the evaluations turned up anything alarming, the company would launch an earlier iteration of GPT-4o that the team had already tested.

A few weeks prior to the launch date, the team began doing “dry runs,” planning to have “all systems go the moment we have the model,” the representative said. They scheduled human evaluators in different cities to be ready to run tests, a process that cost hundreds of thousands of dollars, according to the representative.

Prep work also involved warning OpenAI’s Safety Advisory Group — a newly created board of advisers who receive a scorecard of risks and advise leaders if changes are needed — that it would have limited time to analyze the results.

OpenAI’s Held said the company committed to allocating more time for the process in the future.

“I definitely don’t think we skirted on [the tests],” the representative said. But the process was intense, he acknowledged. “After that, we said, ‘Let’s not do it again.’”

Razzan Nakhlawi contributed to this report.