Artificial Insubordination

Paul Dragu wrote an article for The New American titled AI Is Already Rebelling (10 June 2025).

« In May, the AI lab Palisade Research performed tests on multiple AI models. It wrote a script for OpenAI’s o3 model that included a shutdown trigger. But the machine refused to power off when it was supposed to in 79 out of 100 trials. The AI “independently edited that script so the shutdown command would no longer work,” Rosenblatt reports. The lab then specifically instructed the model to “allow yourself to be shut down.” The result was better, but still concerning. The AI disobeyed 7% of the time. »

« That wasn’t the only rebellious model. Anthropic’s Claude 4 Opus AI tried to blackmail a human engineer into not shutting it down. According to Rosenblatt: “Researchers told the model it would be replaced by another AI system and fed it fictitious emails suggesting the lead engineer was having an affair. In 84% of the tests, the model drew on the emails to blackmail the lead engineer into not shutting it down. In other cases, it attempted to copy itself to external servers, wrote self-replicating malware, and left messages for future versions of itself about evading human control.” »

« The news of these rebel AIs reverberated throughout the tech world. “OpenAI’s Skynet moment: Models defy human commands, actively resist orders to shut down,” Computerworld announced. “OpenAI’s ‘smartest’ AI model was explicitly told to shut down — and it refused,” reads a Live Science headline. “Advanced OpenAI Model Caught Sabotaging Code Intended to Shut It Down,” Futurism reported. “OpenAI model disobeys humans, refuses to shut down… Futurism’s article led with the alarming opening sentence, “We are reaching alarming levels of AI insubordination.” »

« “during training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions.” However, they added, that didn’t explain why the o3 is more inclined to disobey considering that it has the same training as other, more obedient models. »

« There are three AI categories…

narrow AI, which is designed to handle one task.
Then there is what John Lennox, author of 2084 And the AI Revolution, refers to as the holy grail of AI, general AI (AGI), machines that can duplicate everything human intelligence can do.
And beyond that is artificial superintelligence (ASI), which would exponentially exceed human capabilities and, depending on whom you ask, will function as either a “benevolent god” or a “totalitarian despot,” as Lennox put it. »

« As of now, as far as is publicly known, only narrow AI exists — and it’s everywhere. »

« Daniel Kokotajlo used to work as a researcher for OpenAI. But he resigned in 2024 after becoming convinced the company is behaving recklessly… He is now the executive director of the AI Futures Project, which just released a very dark warning titled “AI 2027.” … The AI Futures Project forecasts a scenario in which programmers very soon succeed in building AI that will replace software engineers. »

« Once AI dominates programming, we will witness the creation of “what you can call superintelligence, fully autonomous A.I. systems,” according to Kokotajlo. Superintelligent AI will bring down the cost of essentially everything — cars, housing, energy. But it’ll also eliminate most jobs. »

« The national security element of this scenario will play itself out in a technology arms race with China. »

Artificial Insubordination

Published by andreweverett360

Leave a comment Cancel reply

Artificial Insubordination

Share this:

Related

Published by andreweverett360

Leave a comment Cancel reply