The AI Safety Tug of War
A watered-down AI safety bill may become law – but does it still go too far?
Smart. Competent. Tech-savvy.
Future AIs will be all this, and more. That’s a double-edged sword.
An AI that can independently design an app that looks and works exactly how I want, complete with logins and good cybersecurity? Great. Would love it.1
An AI that can generate thousands of cyberattacks to hijack cars across the country? Uh…less great.
The main goal of SB 1047, a California bill, is to prevent that kind of thing. That is: unambiguous, big-deal, life-or-death AI catastrophes.
But finding a balance between “preventing catastrophes” and “hindering AI progress” has proven tricky. Originally, the bill gave lawmakers wide leeway to shut down AIs before a catastrophe occurred. Some AI safety advocates felt that this leeway was necessary — and assumed, implicitly, that it would be used judiciously by well-informed people.
Others were skeptical. This “pre-harm enforcement” could end up getting abused by pearl-clutching bureaucrats jumping at shadows. That could spell a disaster for the American AI industry!
Compromises were made. For better or for worse, the “pre-harm enforcement” was basically eliminated from the bill.2 Instead it:
Mandates safety testing of new top-of-the-line AIs, to try to prevent catastrophes.
Establishes standards for the extent to which the AI developer is liable if a catastrophe happens anyway.
Now the revised bill looks likely to become law — unless Governor Newsom vetoes it, which is a live possibility.
California State Senator Scott Wiener, who authored SB 1047, describes it as a “common-sense, light-touch” bill. But others aren’t so sure. Let’s dive into what the bill says — and why AI experts have been arguing about it.
What does SB 1047 say?
First let’s get familiar with the bill’s specific focus: preventing critical harms that are caused or materially enabled by frontier AIs.
Critical harms: This means mass casualties or upwards of $500m in critical infrastructure damage, from e.g. cyberattacks, biological or chemical weapons, or other “grave harms to public safety…of comparable severity.”
(Obviously, many other kinds of AI-related harm exist — and I hope that other bills will be passed to address them.3)
Caused or materially enabled: The AI must be in some way key to the critical harm. If an AI gives a terrorist instructions on how to make a chemical weapon, but similar instructions could be found via Googling, then the AI developer cannot be blamed for it.
Frontier AIs: Top-of-the-line AIs are created via months-long training processes which use vast amounts of computational power, or “compute.”4 SB 1047 only covers AIs trained using truly gargantuan quantities of compute: the threshold is set so that no existing AI surpasses it, though the next generation of top AIs likely will.
The precise details here matter a lot. To qualify as a “frontier AI” under this bill, the compute used to train an AI must:
Cost5 at least $100,000,000, indexed for inflation, AND
Use at least 10^26 FLOPs (a FLOP is a basic unit of computation).
$100m compute budgets are rare. The AI big names (e.g. OpenAI, Anthropic, Google, and Meta), and perhaps other extremely wealthy startups, will be the only ones who are plausibly on the hook here. Smaller AI developers are more or less definitionally off it: they can go on training AIs without safety testing and without liability for catastrophes.
So what exactly are the safety testing requirements? And how do they affect liability?
Safety tests: write your own
The bill requires frontier AI developers doing business in California to write and publish their own “Safety and Security Protocol.” The SSP describes how they’ll conduct safety testing.
SSPs are already becoming a common voluntary practice. OpenAI, Anthropic, and Google all have them. SB 1047, according to Scott Wiener, is simply intended to legally mandate the safety testing these companies have already promised to do. Their responsibilities are:
A frontier AI developer must publish an SSP which explains what testing methods they are using to check if a newly trained frontier AI poses an “unreasonable risk” of causing or materially enabling critical harms. If it poses an unreasonable risk, they must not deploy the AI.
(Importantly: it’s possible to modify an AI through additional training, known as fine-tuning. If a third party fine-tunes an AI, it still needs to not pose an unreasonable risk — or the original developer may be liable.6 This is especially relevant if the AI is made open-source — published freely in full online7 — because open-source AIs can be fine-tuned with no oversight from the developer.)
The developer must update their SSP annually.
Starting in 2026, the developer must retain a third-party auditor, whose job is to publish reports verifying that the developer is actually following their own SSP and that the SSP is adequate to comply with the law.
When deploying a new frontier AI, the developer must make statements declaring that the developer is in compliance with these provisions.
If the requirements here sound light on specifics, it’s because they are. The most SB 1047 asks is that, in the course of complying with them, AI developers should “consider industry best practices and applicable guidance” from relevant government agencies (such as the US AI Safety Institute). Even then, they only have to “consider” it.
What SB 1047 does not do is prescribe specific safety tests. That’s because the field of safety-testing AIs is too new. Best practices aren’t known yet. Even the staunchest AI safety advocates agree that writing rigid testing requirements into law at this stage would not be a long-term solution.
It’s worth mentioning that the US AI Safety Institute recently announced safety testing partnerships with OpenAI and Anthropic. So, there will be some governmental involvement in safety testing whether SB 1047 becomes law or not. However, the AI Safety Institute, which is housed under NIST, doesn’t have enforcement powers — it can only make recommendations.
What if a catastrophe happens anyway?
If a critical harm occurs, and the developer did not take “reasonable care” to safety-test the AI, then the California Attorney General can bring a civil suit against them.
The bill contains some guidance on what the “reasonable care” standard should mean: basically, their SSP needs to have been in line with industry best practices, and they need to have been abiding by it.
“Reasonable care” doesn’t mean their SSP needs to have been perfect, or “guaranteed” to catch any risks. Nobody knows how to do that, short of never deploying any AI at all.
So to recap, the situation where an AI developer can be penalized is if:
They spent $100m+ training a frontier AI.
They did not take “reasonable care” to safety test the AI.
The AI caused or materially enabled a critical harm.
If all that happened, then the AI developer can be fined: 10% of the AI’s training compute costs for a first violation, 30% for subsequent ones. They can also be assessed monetary damages by a court.8
Absent SB 1047, it would be much less clear how liable an AI developer might be in the event of a catastrophe.
“Full shutdown” and other provisions
Under SB 1047, frontier AI developers also have a few other responsibilities besides safety testing:
They must implement best-practices cybersecurity to prevent copies of their AIs from being stolen. (After all, what’s the point of safety testing an AI if it can be stolen from you before it’s been safety tested?)
They must implement the ability to do a “full shutdown” of any frontier AI under their control.
The words “under their control” make the “full shutdown” provision kind of meaningless if the AI is open-source. Anything open-source can never be eradicated from the internet.
The original form of this provision drew criticism for leaving it unclear whether “full shutdown” meant all copies of an AI — which would make publishing open-source frontier AIs effectively illegal. But the bill has since been amended to clarify that if a copy of an AI is outside developer control, then it does not need to be deleted in a “full shutdown.”
For the sake of completeness, here are the other provisions of the bill.9 These are comparatively, though not entirely, uncontroversial.
Know Your Customer requirements for compute providers (if someone is using lots of compute — like, enough to train a frontier AI — the compute provider needs to know who they are).
Whistleblower protections for AI workers (these also apply to whistleblowers raising concerns about critical harms from non-frontier AIs).
Establishment of a nine-person Board of Frontier Models, composed of members with various relevant areas of expertise.10 The Board has the power to:
Increase the compute threshold (both the dollar threshold and the FLOPs threshold).
Issue binding requirements for what third-party SSP auditors should do.
Criticisms of the bill
Tech companies and open-source advocates — including high-profile AI researcher Fei-Fei Li — complained that earlier, more restrictive versions of the bill would have dampened AI progress and crippled open-source AI.
Open-source AI is legitimately easier to misuse. Even if its developer tries to install some safety guardrails telling it to deny harmful requests, these can be easily removed by a malicious user via fine-tuning. In contrast, if an AI is closed-source — controlled by the developer — then it can’t be fine-tuned without the developer’s permission. Closed-source AIs can still be “jailbroken,” but it’s less effective. And closed-source AI developers can monitor who is using them and ban them if necessary.
But, despite the risks that come with the lack of oversight, open-source AIs have a lot of important benefits, too. Since they’re freely available online, anyone can use them to make whatever AI tools they want. And they’re essential to academic AI research. Most of our knowledge about how AIs work internally is from research conducted on open-source AIs.
SB 1047 won’t affect any AI that already exists. Nonetheless, it would be a serious downside if we could never have open-source AIs above the compute threshold — as would have been the case if the strong form of the “full shutdown” provision were enacted.
At this point, though, SB 1047 has been whittled down a lot. Many criticisms — like the one about “full shutdowns” — are no longer valid. And some criticisms were never valid in the first place.
This snarky but comprehensive Guide to SB 1047 addresses a long list of criticisms of the bill.11 Here, I’ll just talk about a few — starting with some of the sillier ones, and moving on toward criticisms I think are worth taking seriously.
Will this bill benefit Big Tech at the cost of Little Tech?
As complaints about SB 1047 go, this one is perhaps the most ridiculous. The bill won’t affect smaller startups that aren’t spending $100m+ training their AIs!12
Notably, no major AI developer supports the bill. Even Anthropic, reputed for its safety-consciousness, has taken a neutral stance; OpenAI, Google, and Meta all oppose it. This is a pretty strong hint that it’s not to their advantage.
This misleading claim — making the bill sound like a play by Big Tech to entrench their own interests — has been widely circulated by tech investor (and Trump donor) Marc Andreessen. It just so happens that Andreessen is a member of the board of directors of Meta and a major investor in OpenAI.
Will this bill make it totally impossible for developers to publish frontier AIs, because they can’t prove their AIs are safe?
No, because the bill doesn’t require them to “prove” their models are safe — just confirm they satisfy a “reasonable care” standard of testing.
Will the compliance requirements be too expensive?
That seems implausible. If a company is spending $100m+ training their AIs, then they can afford to do some safety testing and paperwork.
Shouldn’t AI be regulated federally rather than in California?
Of course it should be. But it probably won’t be.
The DEFIANCE Act (targeting nonconsensual deepfake pornography) has fortunately gathered bipartisan support, but on most other AI-related topics it seems unlikely Congress will manage to do anything useful anytime soon.
California, home of Silicon Valley and most populous American state, may be the next best place. SB 1047 applies to any AI developer that does business in California, so it would effectively legislate for the country.
Can’t we leave the safety testing to the US AI Safety Institute?
The AI Safety Institute’s partnerships with OpenAI and Anthropic are a promising first step in the direction of federal oversight of frontier AI.
But I’m not sure how much farther this can go. As mentioned, the AI Safety Institute has no enforcement powers. It’s also unclear whether it has the funding or capacity to conduct safety tests for a larger number of AI developers. I’d be delighted to see it attain that capacity and maintain it for the forseeable future — but I’m not at all comfortable relying on the assumption that it will. So while I’m glad these agreements are happening, I think SB 1047 is still called for.
Is the “reasonable care” standard still high enough to dampen AI progress, especially for open-source AI?
This is the most serious complaint about the bill.
It’s possible that the lawyers at Meta — a champion of open-source AI — will take one look at the bill and go “Okay, never mind. Too high a chance that we end up liable for some downstream misuse. Let’s just not build any open-source AIs big enough to pass the compute threshold.” Tech journalist Timothy Lee anticipates that this may cause open-source AI to lag further behind closed-source, as AI developers avoid open-sourcing $100m+ frontier AIs.
But companies will only be subject to civil penalties if they’re found to have failed to exercise “reasonable care” in their safety testing.13 This might be easier for closed-source AIs, since their safety guardrails are harder to circumvent. But it’s still possible to safety test open-source AI.
If I were at Meta, and I wanted to safety test a prospective open-source frontier AI, I wouldn’t safety test the “Good” version that has guardrails installed. That would be pointless. I’d instead test versions that had no guardrails — or even “Evil” versions that have been fine-tuned the same way a malicious user might do. If “Evil” versions of the AI aren’t smart enough to be dangerous, then we could publish the AI (the “Good” version of it, obviously).
And if an “Evil” version of the AI is smart enough to be dangerous — if I thought it posed an unreasonable risk — then perhaps this is a situation where dampening AI progress is in fact the desirable outcome.
It’s not quite this simple, since whether “reasonable care” was taken ultimately comes down to a judgment call. A company might worry that, even if in its own view, it took reasonable care to conduct safety-testing, a court might find otherwise in the event of a critical harm.
However, I think the third-party audits are a good safeguard against this. Companies shouldn’t be the ones judging whether their own safety testing standards are good enough. Courts without technical expertise may not be able to judge either — but they can consult the audits from a neutral third party with technical expertise. The audits will provide a written record saying that the company was doing a good job fulfilling their responsibilities — or that they weren’t.
I expect that if an AI developer is complying with the law, as determined by the auditor reports, then they are not going to get in trouble. And my best understanding is that legal experts think the bill’s current “reasonable care” expectation is a pretty standard degree of liability — not an overly burdensome one, as may have been the case for the original version of the bill.
Isn’t it possible that frontier AI capable of causing critical harms would also help prevent critical harms?
This is an interesting point. If SB 1047 isn’t passed, maybe open-source AI will be slightly more powerful, and more widely used to help with cybersecurity. If it is passed, maybe open-source AI will be worse, and closed-source AI won’t be as widely used for cybersecurity because it’s comparatively expensive. Maybe — you could imagine — there’d actually be more AI catastrophes.
Overall, though, I think we’re less likely to see catastrophes with SB 1047 than without it. Even if a “dangerous” AI would also theoretically be helpful at preventing critical harms, I don’t expect it to be sufficiently widely used for this to matter. A technology that improves both offense and defense is going to be a problem unless it’s actually implemented on the defense side, and we’re already not terribly competent on the implementation front (see also: Crowdstrike).
Do the other benefits of potentially dangerous frontier AIs outweigh the downsides?
This is another strong argument against the bill. After all, the internet could be considered “dangerous.” Certainly it has materially enabled critical harms. But it would be insane to ban it for that reason.
As with regulating anything, there is a genuine tradeoff. I think frontier AIs will be a real positive in the world, including for, say, education. But I think most of the purposes for which it’s important to have really high-quality frontier AI — and even some for which it’s not — are just going to use closed-source AI anyway. And since I expect SB 1047 to have less of a dampening effect on closed-source AI, I’m inclined to view this tradeoff as worth it.
Shouldn’t we wait another year or two to regulate AI, until we have a better idea how to do it effectively?
The problem is that I’m not sure that’s a real option.
If we could, I’d be in favor of tabling SB 1047 for a year and allowing further amendments to try to get the best possible version of the bill. But high-profile bills that fail to pass do not typically get passed a year or two later. And I expect SB 1047 is better than nothing — and probably not much worse than whatever we’d come up with if we got an extra year or two to think about it.
I’m concerned that if we put off passing AI safety legislation, then tech companies will end up opening Pandora’s Box without legal repercussions.
So I support SB 1047. I hope it becomes law.
And if it doesn’t, I support another go at writing a bill to prevent AI catastrophes. Let’s not wait till one happens to take action about it.
Alas, my family’s homebrew RPG version of DnDBeyond will have to wait for me to have more free time.
[Correction 9/4/24/] There is one major exception: if a catastrophe is clearly going to occur due to an AI developer’s noncompliance with the law, a court may give the AI developer an injunction to e.g. not release a dangerous AI.
One reason SB 1047 has such a narrow scope is that already-commonplace AI problems (like political deepfakes, AI-enabled scams, and nonconsensual deepfake pornography) cannot be solved via safety testing. That would be like shutting the barn door after the horse has bolted. Safety testing is more useful as a pre-emptive measure against risks posed by future AIs.
Compute isn’t a perfect yardstick to measure how capable an AI is, but it’s a solid approximation. Plus, “how much compute was used to train this AI?” has a straightforward answer, unlike the much more nebulous “how smart is the AI?”
As assessed by the developer, using the average market price of compute as of the time of training.
Of course, given truly drastic amounts of additional training, an AI can be practically overwritten into a totally different AI. The bill accounts for this: if you spend $10m+ in compute costs (indexed to inflation) fine-tuning someone else’s AI, then they are no longer responsible for the resulting AI – you now count as its developer, just as if you had trained the AI from scratch.
Strictly speaking, the term for this is “open-weights” — to qualify as open-source, the training data needs to be published on the web, too. But most AI writing ignores this distinction.
[Correction 9/4/24] The fine may be significantly less than the monetary damages assigned: damages could be as much as the cost of the critical harm, which could be $500m+.
There’s also a bit at the end that says, basically, “hey, California should have our own computing cluster to be used for public benefit and academic research!” I’m leaving this for a footnote because it’s the one part of the bill that nobody seems to be arguing about.
The members are appointed by three sources: five by the California Governor (with Senate confirmation), two by the Speaker of the Assembly, and two by the Senate Rules Committee. There are some restrictions on who can be a member, in order to try to prevent AI developers from having too much influence over the Board.
One major place where my interpretation of the bill differs from Zvi Mowshowitz’s: he argues that under ordinary common law, it is already possible to sue an AI developer for damages if they do not take “reasonable care” to prevent a critical harm. I’m skeptical of this claim. It implies AI developers simply don’t realize that they are already liable for the things this bill makes them explicitly liable for. I would expect lawyers at big tech companies to be well aware of what they can and can’t get away with. If nothing else, the strength of their opposition to the bill is suggestive that it does increase their probability of being liable.
Except for the whistleblower protections, which apply to all AI workers.
Or in their compliance with other provisions of the law, e.g. best-practices cybersecurity.