It would be unfortunate if powerful AIs were…

Jun 2

AIs are malleable, so it's surprisingly easy to get a "good" AI to do bad stuff. This is one reason why AI security is so important.

Read →

4 Comments

Girish Sastry

Jul 14

On a quick read, this post equivocates a bit between the security of AI systems (i.e. their behavior being malleable) and the security of the weights. Do you have a take on which is more important?

Expand full comment

Reply (1)

Julian Hazell

Jul 14Edited

I think weight security is more important than prosaic concerns related to AI systems' behaviour (though TBD how this shapes out re: scheming).

It's very non-obvious to me that malleability is a bad thing from the perspective of takeover risk, which I think is ~probably a more worrisome threat than misuse risk (the focus of this piece).

It might be good if AIs have some "malleability" initially — so developers could shape their values/goals/motivations with intention — and then are hard to shape iff they're stolen by adversaries. But that doesn't seem possible based on how models are currently trained AFAICT.

Expand full comment

kaass

Jun 2

How much of this is moot if/when open-source models get really capable?

Expand full comment

Gauraventh

Jun 2

Are there infosec project ideas you're excited about? What have other folks done that you think resembles good work here?

Expand full comment