Anthropic ditches its defining safety promise to pause dangerous AI development because it’s basically pointless when everybody else is ‘blazing ahead’

Given the way the AI industry is going these days, the following news probably isn’t a huge surprise. But it’s unnerving all the same. Announced in a new blog post, Anthropic, arguably the sole remaining example among the major AI players that really bigs up its safety responsibilities, has ditched its core commitment to “pause” development of more powerful AI models if suitable safety safeguards aren’t ready.

In previous versions of what Anthropic calls its Responsible Scaling Policy (RSP), the organisation said that if its AI systems approached certain dangerous capability thresholds—particularly around catastrophic misuse—it would halt further scaling or deployment until adequate safety measures were in place.

To quote Anthropic’s original commitment, “the ASL (AI Safety Levels) system implicitly requires us to temporarily pause training of more powerful models if our AI scaling outstrips our ability to comply with the necessary safety procedures.”

But that commitment is now gone from Anthropic’s newly updated RSP. In Version 3.0 of the RSP, Anthropic has dumped explicit references to “pausing” of development in favour of softer language focused on “responsible development,” “risk management,” and “iterative deployment.”

Instead of promising to stop training models that cross specific danger thresholds, the company now says it will implement safeguards, publish safety evaluations, and release Frontier Safety Framework updates explaining how risks are being handled.

If the competition is letting AI off the leash, what’s the point in Anthropic being careful? (Image credit: Robert Way via Getty Images)

So, why is this happening? Partly, Anthropic seems to be saying, because it’s futile being the only AI outfit explicitly committed to safety. “If one AI developer paused development to implement safety measures while others moved forward training and deploying AI systems without strong mitigations, that could result in a world that is less safe—the developers with the weakest protections would set the pace, and responsible developers would lose their ability to do safety research and advance the public benefit,” Anthropic’s full policy document says.

“We didn’t really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments…if competitors are blazing ahead,” Anthropic’s chief science officer Jared Kaplan also told Time magazine.

Inevitably, Anthropic is pitching the changes to its Responsible Scaling Policy as a net positive for safety. Long story short, it says the new policy adds a commitment to produce ongoing, publicly shareable roadmaps and risk reports that are intended to show how Anthropic is thinking about and managing safety issues as models become more capable.

“This third revision amplifies what worked about the previous RSP, commits us to more transparency about our plans and our risk considerations, and separates out our recommendations for the industry at large from what we can achieve as an individual company,” the new policy doc says.

Which is nice. But it still seems a long way short of its old commitment to essentially down tools if the bots threatened to get out of control. Oh well.

Leave a Reply

Your email address will not be published.

Previous post Say hello (and thank you) to Patty, Burger King’s new AI chatbot, which will live inside its employees’ headsets to monitor their etiquette and branch performance
Next post Hours before its open preview weekend, Marathon is suddenly very close to the top of Steam’s top sellers list