Secret Third Thing

What it's like to be an AI safety grantmaker (and why we need more of them)

Julian Hazell — Mon, 30 Mar 2026 18:11:01 GMT

TL;DR

Here are the key points I want you to take away from this post:

There are maybe 30 to 60 people in the world doing AI safety grantmaking, collectively directing hundreds of millions of dollars a year. Soon, there will be >$1B being directed per year, and potentially multiple billions.
AI safety grantmaking orgs like CG have a strong track record of counterfactually seeding impactful organizations and careers.
Grantmaking involves a lot more than evaluating a stack of inbound proposals. You also proactively generate new grants (e.g., headhunting founders, designing new funding programs), provide strategic advice to grantees, write memos that shape funding strategy, and generally serve as connective tissue in the ecosystem.
The AI safety grantmaking ecosystem is currently leaving good grant opportunities on the table due to a lack of grantmaker capacity. This is bad.
1. More grantmakers would also unlock more capital, because funders are more willing to write cheques when there are people who can find and vet promising opportunities.
Everyone who reads this should not necessarily rush to become a grantmaker. Direct work is great, and the entire ecosystem is talent-starved in so many ways. But my sense is that grantmaking is underrated relative to other paths that high-context AI safety people tend to consider, like research or policy.
Grantmaking also has some real downsides — you won’t go as deep as you might want to, the work is largely invisible, active grantmaking can be frustratingly poorly scoped, and saying no to people is hard. I discuss these in the appendix.

Intro

A few weeks ago, I wrapped up my two and a half year stint as a grantmaker on the AI governance and policy team at Coefficient Giving (or “CG”). I’ll soon be joining Astralis Foundation to work on their grantmaking strategy.

CG was my first real, full-time, big boy AI safety job after finishing grad school. The EA part of me wishes I could tell a story where I sat cross-legged in an ivory tower, thinking (mostly from first principles, of course) about how I can most reduce existential risk from ASI, whereupon I decided grantmaking was the most impactful path to pursue.

Nope. I took this job for reasons like:

I wanted an AI safety job, and this was the only one available to me at the moment.
CG (then called “Open Philanthropy”) had a bunch of cool people working there, such as (but not limited to) Ajeya Cotra and Luke Muehlhauser. “Working with cool people is good” seemed like a reasonable heuristic, I guess?
I was running out of savings, and I needed to pay rent and buy food for myself.

Fortunately, I ended up quickly concluding that grantmaking is a very high-leverage role in the AI safety ecosystem. Thus my main goals here are to (a) attempt to demystify what grantmakers do and (b) make the case that grantmaking is being underrated as a career opportunity by high-context AI safety people.

I’ll also try to address some common misconceptions about things like the marginal value of more grantmakers, mention some downsides of the role, and outline a basic call to action.

What do grantmakers do?

Grantmaking mostly involves three key activities: (1) evaluating inbound grant proposals (or “passive grantmaking”), (2) proactively generating new grants (or “active grantmaking”), and (3) a grab-bag of non-grantmaking activities.

I’ll describe each of these below in more detail, but the basic idea is that grantmakers are in the business of figuring out what the AI safety ecosystem needs and then taking advantage of the biggest levers available to them to make it happen.1

Passive grantmaking

This is what most people picture when they think about what grantmakers do. Someone comes to you with a proposal, you read it, conduct an investigation, and if you think it’s worth funding, write up a recommendation that senior people at your organization can get behind.

A common misconception about passive grantmaking is that it basically just involves hitting accept or reject. That’s false: you have a lot of levers at your disposal to shape an inbound grant into something even better. You can give the applicant feedback on their theory of change/plans/strategy, ask them for a budget that scales up one workstream and scales down another, lengthen or shorten the grant period, make the second half of the grant conditional on hitting certain milestones, suggest they hire for a role they hadn’t considered, and/or push them to be even more ambitious.

Active grantmaking

There’s a second “style” of grantmaking called active grantmaking.

Instead of waiting for exciting proposals to land on your desk, you go out and actually make things happen. For instance, you could write up a project proposal for a new organization focused on a sub-problem you think is important and pitch a number of potential founders to start it. You could also design and advertise a new funding program from scratch (e.g., an RFP or something like CG’s CDTF program), and/or pitch an existing grantee to start a new workstream.

Active grantmaking requires you to develop models of what to prioritize. You have to form views on questions like:

What are the most important threat models to focus on?
What sub-problems are most worth prioritizing?
- US policy development?
- Information security field-building?
- Technical AI governance research?
- EU policy?
- Strategic communications?
- Talent pipelines?
What does the AI governance landscape actually look like right now? Who is working on what?
Which organizations and people are doing the best work? What’s currently bottlenecking them?
What’s the biggest gap in the ecosystem that nobody’s filling?

You develop these views through a mix of reading (papers, memos, blog posts, Slack messages), talking to a lot of people (researchers, founders, policy experts, other grantmakers), and occasionally just sitting with a hard question for a while.

That said, even at an organization that’s been grantmaking for a decade, there are a surprising number of important areas that few people have spent much time digging into. Even just a week or so of shallowly investigating an area that had been on the team’s radar but never properly investigated can surface genuinely exciting opportunities. Every organization has blindspots, and sometimes the highest-value thing a new grantmaker can do is simply be the first person to take a serious look at something that seems vaguely promising.

Non-grantmaking activities

A surprising amount of the job doesn’t involve making grants directly. As a grantmaker, you can easily spend a decent chunk of your time on high-value things like:

Providing strategic advice to grantees — helping them think through organizational strategy, prioritization, etc.
Writing internal memos that shape how your team or your leadership thinks about entire funding areas, grantmaking practices, strategic priorities, etc.
Writing external memos that shape how the broader ecosystem thinks about various strategy questions.
Hiring, training, and managing other grantmakers.
Making introductions between people who might work together on important projects (grantmakers are often good at being the connective tissue in the ecosystem).
Providing leads and referrals to grantees that are looking to hire for key roles.
Convening grantees to try to rapidly advance the conversation on a particular topic (e.g., “What are the top AI policy priorities for 2026?”).
Building relationships with other funders.
Talking to other grantmakers (both internally and externally) about what you’re both working on, so you can advise each other when making relevant grant decisions.

Throughout my time at CG, I’d guess I spent a third of my time on non-grantmaking activities.2

Why I think grantmaking is underratedly impactful

Grantmaking has a strong track record

During my time at CG, I saw first-hand how a number of small, speculative grants made years ago helped create organizations that are now pillars in the AI safety ecosystem.

Alexander Berger (CG’s CEO) recently shared an example of this:

“Many of the grantees that have gone on to be among our most important and impactful didn’t start off looking that way at all. For instance, we made our first $250,000 grant to the program that would eventually become ML Alignment & Theory Scholars (MATS) in 2019, when it was a side project by some students affiliated with the Stanford Existential Risks Initiative who thought there should be a summer program to prepare software engineers for careers in AI safety. The MATS 1.0 cohort had 5 fellows and no permanent full-time staff. They have since expanded to run multiple cohorts a year of around 100 scholars with an admission rate of 4-7%, and report that over 80% of their alumni are now working full-time in AI safety and security (accounting for a meaningful portion of safety staff at some of the biggest companies and government institutes).”

There are more examples in this piece that CG published back in October 2025. I also like this anecdote about how Jake Mendel encouraged the folks at Theorem to be even more ambitious with their plans, and this post Asya Bergal wrote about CG’s capacity building efforts.3

Unfortunately, many of the most impressive wins I’m familiar with are fairly sensitive, so I kinda just have to unsatisfyingly gesture at a couple of fairly well-known examples and say “trust me bro”. If you’re seriously considering a grantmaking career and this is a crux for you, my advice would be to ask for more evidence directly from grantmakers you speak with. Maybe they’ll have a few less obvious examples they can share.

The ratio of AI safety philanthropic capital to grantmakers is kinda wild

Here’s something that I think people really don’t appreciate: there are maybe 30-60 FTE in the world doing the object-level work of investigating and recommending AI safety grants.4

These people collectively directed hundreds of millions of dollars a year in 2025. In 2026, I expect this number to be greater than a billion, with potentially enormous growth coming in the next few years as AI safety issues grow in urgency and salience. Depending on how you do the math, you’re looking at each grantmaker being responsible for directing tens of millions of dollars per year. That’s an extraordinary amount of leverage.

Of course, basically the entire AI safety ecosystem is talent-starved, so these anecdotes can’t fully carry the argument I’m trying to make. But still, my intuition is that grantmaking is underrated relative to other popular talent-starved roles. If you’re a high-context AI safety person deciding between, say, working as a researcher at a think tank or becoming a grantmaker, I think the grantmaker path deserves more weight than I sense many people give it. This seems especially true if you’re someone with technical AI safety chops who is mostly considering technical research roles.

Grantmaking on current margins looks pretty solid

Like a good grantmaker, you should think on the margin. You might reasonably be wondering something like: “Aren’t the most obvious grants going to get funded either way? Are more grantmakers on the margin really going to make a significant difference to what gets funded?”.

I think the answer is pretty clearly yes, for a few reasons.

We’re leaving good grants on the table right now due to a lack of grantmakers. When I was at CG, I regularly saw plausibly-above-the-bar proposals either get rejected outright or sit in the queue longer than they should have, mostly because we didn’t have enough grantmaker capacity to properly evaluate them. CG’s AI governance RFP was recently paused in part because they want to reallocate staff capacity toward more active grantmaking. On the active grantmaking side, there was a regular stream of potentially promising ideas that never got seriously explored because we never had enough staff capacity.

This could get even worse if philanthropic capital grows but grantmaker hiring remains slow. I’m seriously worried that we’re not on track to deploy all of the philanthropic capital that could go toward good AI safety opportunities over the next few years.

More grantmakers would unlock more capital. More grantmaker capacity doesn’t just divide the existing pie into smaller slices; it makes the pie bigger, because funders will be more willing to write cheques if there are more skilled grantmakers who can actually find and vet promising opportunities.

Grantmakers do a lot more than filter through marginal proposals. As I touched on above, there’s a common misconception that the job is just sorting through a pile of applications and deciding which ones to say yes or no to. That’s not true. You can go out and seize the opportunities you wish to see in this world, especially in sub-areas where we are not yet seeing strong diminishing returns. This can be an even bigger deal if you have specific domain expertise that uniquely enables you to do a specific flavour of active grantmaking (e.g., if you’re someone with an information security background).

Jake Mendel on CG’s technical AI safety team recently wrote about this:

“Some people think that being a grantmaker at Coefficient means sorting through a big pile of grant proposals and deciding which ones to say yes and no to. As a result, they think that the only impact at stake is how good our decisions are about marginal grants, since all the excellent grants are no-brainers.
But grantmakers don’t just evaluate proposals; we elicit them. I spend the majority of my time trying to figure out how to get better proposals into our pipeline: writing RFPs that describe the research projects we want to fund, or pitching promising researchers on AI safety research agendas, or steering applicants to better-targeted or more ambitious proposals.”

I’d also push back on the idea that the “obviously above the bar” grants are actually obvious. They might be obvious5 to a full-time grantmaker who has spent months embedded in a particular sub-area, but not at all obvious to the people who approve grants — say, the CEO of a grantmaking organization who has to juggle many different responsibilities. A big part of your job as a grantmaker is to internally translate and advocate for the good stuff to people who don’t have the time or context to investigate it themselves.

I could one day imagine a world where money or ideas are the bottleneck, but we are currently far from that world.

Grantmaking vs direct work

To be clear, I’m not saying everyone should drop what they’re doing and try to become grantmakers. Direct work is great! The majority of people in the AI safety ecosystem should absolutely be doing things like research, advocacy, communications, policy, or founding organizations rather than trying to become grantmakers.6

The claim I can more confidently stand by is that grantmaking currently seems quite underrated by high-context AI safety people. After running hiring rounds, pitching a ton of people on applying, and watching folks’ career moves play out, my sense is that there’s a meaningful gap between how excited people are about grantmaking and how excited I think they should be. I suspect this is partly due to misconceptions about the role (hopefully addressed above) and also that grantmaking is just kind of an opaque career path.

As an exercise, try BOTECing what you could make happen with $10 to $30 million7 in grantmaking funds and a year to brainstorm new project ideas, vet potential founders, and launch new RFPs. Even if you include some counterfactuality haircuts, that’s enough to fund a large number of people to go work on problems you think are important. Then compare that to what you’d counterfactually produce as a single researcher or policy professional over the same period. I’m not saying the answer is always obvious, or that this is a bulletproof argument in favour of grantmaking, but I think it’s worth trying to be concrete about it.

Call to action

Start by thinking about whether you’d be a fit for a grantmaking role.

You might be a good fit for a grantmaker role if:

You’re good at spotting gaps. You are able to notice something important that nobody’s working on. Ideally, you’re also able to think of creative solutions for filling those gaps.
You like breadth over depth. This can look like forming a bird’s eye view of what the ecosystem as a whole is doing (or at least a large chunk of it). Ideally, you’re comfortable with being somewhat knowledgeable about many sub-areas rather than building world-class expertise in one.
You have strong people judgment skills. You go beyond evaluating whether a theory of change is sound on paper to evaluating whether this particular person/team is going to pull it off.
You’re entrepreneurial. For active grantmaking in particular, you get to prioritize between different problems, design interventions to solve them, persuade others to work on them, and deploy capital to fulfill your own strategic vision. You can get a lot of leverage if you’re good at these things.
You have strong communication skills. Grantmaking is a communications-heavy role. In particular, you might thrive if you’re good at reasoning transparency and clearly explaining why you’re excited about some opportunity to senior decisionmakers.

That said, I want to be clear that grantmakers come from all kinds of backgrounds. I wouldn’t over-index on whether you check every box above. If what I’ve described in this post sounds interesting, talk to some grantmakers, and seriously consider just applying. You’ll learn a lot about the role from the process itself even if it doesn’t work out. I did a lot of hiring at CG, and while these rounds are very competitive, I would’ve loved to see even more high-context AI safety people apply.

If you want to pursue this, note that there are several organizations that are worth keeping on your radar (or maybe even proactively reaching out to). These include (but are not limited to):

Coefficient Giving: the largest AI safety funder by a wide margin, with teams covering technical AI safety, AI governance and policy, biosecurity, forecasting, and capacity building.
- Despite my recent departure, I’m still very bullish on CG!
Longview Philanthropy: they have their own AI program and advise major donors on AI safety giving.
Macroscopic Ventures: they make grants and investments in AI safety and related areas.
Astralis Foundation: where I’m heading next. We’re a newer and smaller funder, but we’re growing.
The Long-Term Future Fund: fund managers evaluate applications on a rolling basis across a range of longtermist cause areas, including AI safety.
Future of Life Foundation: FLF does grantmaking across a number of AI safety sub-areas.

There are also opportunities to do part-time grantmaking work at places like the Survival and Flourishing Fund. You could also do independent grantmaking or set up your own new thing, which seems like a great option if you’re particularly entrepreneurial and if you can secure funding for it.

Acknowledgements: Thank you to Catherine Brewer, Michael Townsend, and Trevor Levin for their helpful comments. All views expressed here are my own and do not necessarily reflect any other organizations or individuals I’m affiliated with.

Appendix - Things that aren’t great about grantmaking

In the interest of not writing a pure sales pitch, here are some things I think are genuine downsides of being a grantmaker.

You might not go as deep on the object-level as you might want to. I’d guess there’s a fairly strong correlation between people who are bought into AI safety and people who intrinsically love forming deep, rich inside views on specific questions. Grantmaking isn’t really set up for that. As I described above, you’ll spend some of your time developing views, and you might have one or two focus areas you know particularly well. But generally speaking your mandate will be pretty broad and you’ll have to defer a decent amount. If what you really want is to spend six months going deep on a single research question, grantmaking is probably not the right fit for you.

The work is somewhat invisible. If you make a great grant, your broader network of peers will not obviously know about it. There’s no public artifact to point to. Research, for instance, has a built-in status mechanism — you produce something legible that people can evaluate and credit you for. Grantmaking doesn’t really have that. Of course, you do get some status from people correctly perceiving that grantmakers are important tastemakers in the ecosystem, but the actual work is largely behind the scenes.

People will interact with you differently because you can direct money. You have to be somewhat wary of people trying to bamboozle you. In practice, this was far less of an issue than I expected going in, as the vast majority of people I interacted with were relatively honest and well-intentioned. But there are grifters out there, and developing a nose for this is part of the job.

Active grantmaking can be really tricky. The most entrepreneurial parts of grantmaking are often very poorly scoped. If you aren’t an intense self-starter, it can be easy to spin your wheels in the mud. This can be some of the most rewarding work that a grantmaker does, but also some of the hardest.

You sometimes can’t fund things you think are good. Depending on where you work, there may be constraints on what you can fund. Let me stress an obvious point: it is incredibly important as a grantmaker to be a faithful and responsible steward of your funders’ capital. And sometimes they’ll have firm preferences against funding things you’d otherwise want to support, or there might be other organizational constraints that get in your way. That’s just the way it is.8

Saying no is hard. It kinda sucks to say no to someone who is really passionate about their idea, but that’s part of the job. This is especially true when the main reason you’re saying no is because of bandwidth constraints rather than their proposal being below your bar.

Communicating can be quite effortful. You have to be pretty careful about how you communicate certain things to people due to power dynamics, which requires extra mental bandwidth. A poorly worded email from a grantmaker can carry more weight than you intend.

As you might imagine, the biggest lever available is often philanthropic capital. But sometimes it can be your network or your particular object-level knowledge.

Of course, other grantmakers might have vastly different experiences with this.

I'm drawing mostly on CG examples here because that's what I know best, not because CG is the only funder with wins like these. My sense is that other grantmaking orgs have similar stories to tell.

This was just a 20 minute low-confidence estimate I put together as of March 2026. If you expand the criteria to include program leadership, advisory roles, and people in grantmaking-adjacent positions, you get to maybe 70-90.

Even then, I think people overestimate how obvious these are!

One example of something that I think is probably even more neglected than grantmaking is founding and scaling highly ambitious organizations. But even that’s not clear-cut. There are some founders who wouldn’t be good grantmakers, sure, but if you’re someone who could either start a new org or join a grantmaking org like CG at a senior level, it might be kind of a close call.

This depends on factors such as what organization you work at, what area(s) you focus on, and your level of seniority.

In practice, I didn’t feel like this was a huge issue for me during my time CG. I know others who have had much bigger issues with this though, so YMMV.

AI Safety Funders LOVE This One Secret Trick To Having An INSANE Impact

Julian Hazell — Tue, 26 Aug 2025 17:17:26 GMT

If you’re interested in applying for funding to work on an AI safety talent development initiative, click the button below to be redirected to our Request for Proposals (RFP).

RFP

Just a couple of weeks ago, I hit my two-year mark at Open Philanthropy. I’ve learned a lot over this period, but I’ve been especially surprised by how great the ROI is on AI safety talent development programs after digging into the numbers.

Consider this graph:

I’ve heard at least four examples in the last ~year or so of this dynamic playing out at AI safety organizations that run fellowship programs.1 This amount of applications isn’t the norm per se, but it happens often enough that we literally have an emoji in our Slack called :fellowship-growth: that we use to celebrate these cases.

So yeah, now that I’ve seen how the sausage is made, I’m officially talent development pilled.

A brief case for talent development

Talent gaps are still very much a thing. Key institutions2 ruthlessly compete to hire from a small pool of qualified + high-context + productive people who can work on hard AI safety challenges. I’m regularly asked for leads on such folks, and I often share fewer names than would be ideal.

But at the same time, a large number of smart people are also trying to break into the world of AI safety, sometimes to little success (at least initially).

Here’s how to reconcile this discrepancy.

A lot of smart, talented, ambitious people are keen to get involved, but can’t immediately contribute because they lack training, domain-specific knowledge, connections, and context. Organizations, on the other hand, want people who can pretty much kick ass right away.

This is a tricky equilibrium, but talent development programs can help. Even a single 3-month fellowship can provide some people with surprisingly huge career boosts (I should know, I largely credit my start in this field to one such program).

We're still operating under massive uncertainty. Back in 2020 and 2021, Luke Muehlhauser, who leads our work on AI governance and policy, wrote a few blog posts about AI governance priorities and challenges. At that time, one of the key bottlenecks he identified was a lack of strategic clarity on which intermediate goals should be pursued.

We have relatively more strategic clarity than in 2021, but let's be honest: AI safety still involves navigating a wildly uncertain future. Given this, investing in talent remains one of the most robust bets we can make — we still don’t have enough strategic clarity for this not to be the case.

There's a multiplier effect. People who benefit from these programs often go on to train and mentor others. It's a positive feedback loop — today's fellowship participant becomes tomorrow's program director.

The cost-effectiveness is often excellent. A well-designed, well-excuted fellowship program can significantly alter someone's career trajectory for a relatively modest cost. It just isn’t that expensive to pay a fellow’s salary for three months. When you compare that to other interventions, the bang for your buck can be quite remarkable.

What to do about this

For donors: If you're looking to give to the AI safety space, talent development programs seem to deliver pretty great returns. If you're eager to fund programs like this (to the tune of $250k or more) and want to chat about specific opportunities, shoot me an email. If you’re looking to give less than that, I’d recommend donating to Horizon.

For entrepreneurs/people looking for their next thing: If you're between projects and have experience in the AI safety world, you should seriously consider starting up a new talent development organization (or starting a program at an existing organization). Some of the most successful programs started as scrappy experiments by one or two people who saw a gap and decided to fill it.

Here are a few examples of initiatives that are already happening in some form or another, but might serve as a good source of inspiration:

A research fellowship program for mid-career technologists (e.g., a PhD and/or 5+ years of experience) looking to transition into AI policy, featuring placements at think tanks and other research institutions.
A seminar program or series of workshops for economists, focused on explosive growth from AI R&D.
A program for experienced cybersecurity professionals interested in learning more about/transitioning into AI security.

If running a program like this appeals to you, apply to our RFP, or speak to people running existing organizations to see if any of them could use help starting something up.

For people trying to break into the world of AI safety: I remember exactly what it was like to be where you are. It was hard and emotionally draining — breaking into such a competitive field requires real grit and determination. But keep applying, and keep pushing forward!

One of the most important things I learned as a grantmaker. (See this for the origin of this meme)

Fellowships are just one example of a “talent development program”. Other models include: mentorship programs, residence programs, workshops, educational courses, and other programs that help individuals interested in working on AI safety develop career capital.

AI labs, the government, nonprofits, etc.

Ten AI safety projects I'd like people to work on

Julian Hazell — Wed, 23 Jul 2025 14:08:27 GMT

If you’re interested in applying for funding to work on one of these projects (or something similar), check out our RFP.

I’ve said it before, and I’ll say it again: I think there’s a real chance that AI systems capable of causing a catastrophe (including to the point of causing human extinction) are developed in the next decade. This is why I spend my days making grants to talented people working on projects that could reduce catastrophic risks from transformative AI.

I don't have a spreadsheet where I can plug in grant details and get an estimate of basis points of catastrophic risk reduction (and if I did, I wouldn't trust the results). But over the last two years working in this role, I’ve at least developed some Intuitions™ about promising projects that I’d1 like to see more people work on.

Here they are.

1. AI security field-building

What: Design and run a field-building program that takes security engineers (with at least a few years of work experience) and teaches them about the types of security challenges most relevant to transformative AI: securing model weights and algorithmic insights from highly resourced adversaries, preventing data contamination, defending against exfiltration attacks, etc. The program would also give actionable advice on how to break into the field. This could look like a 6-week, part-time discussion group covering a curriculum of AI security readings. It could also involve some type of “buddy system” where participants are paired with mentors working on AI security (a la MATS).

Why this matters: It would be unfortunate if powerful AIs were stolen by bad guys. Keeping AI projects secure seems important for making transformative AI go well, and progress on this looks tractable. Though I understand that there are a few AI security talent development projects already in the works, I’ve heard that talent bottlenecks are a persistent problem (especially for security roles that aren’t at AI labs, as these positions offer much lower salaries).

What the first few months could look like: Designing and getting feedback on a curriculum (or finding someone to do this for you), figuring out the program structure, reaching out to potential mentors/guest speakers, and hiring an ops person.

2. Technical AI governance research organization

What: Start a research organization that primarily focuses on the kinds of technical AI governance topics outlined in this paper. It could also run a fellowship program (e.g., 3-6 months, with a part-time option for people who already have jobs) that allows early- to mid-career technical people to explore a technical AI governance project, build greater context on AI safety, and develop connections with others in the field. “Technical AI governance” is a pretty broad/vague term, so it’s probably worth narrowing down to a few complementary directions, at least to start. Here are a few from the Bucknall/Reuel et al. paper that I’m particularly excited about.

Why it matters: Rigorous technical analysis on questions like "how would you actually enforce [insert compute governance mechanism] in practice?" or "what techniques can allow for verifiable model auditing without compromising model security?" can be extraordinarily helpful for making policy proposals more feasible (and thus more likely to be implemented). My sense is that a lot of AI safety people have gotten excited about this paradigm in the last year or so (justifiably IMO), but there’s still more room for this kind of work. See Open Problems in Technical AI Governance for more. FAR AI also recently ran a workshop on this.

What the first few months could look like: Figuring out which subset of technical AI governance research areas to focus on (if you haven’t done this already), speaking to people working on those kinds of problems to get possible research project ideas, and hiring a few researchers.

3. Tracking sketchy AI agent behaviour “in the wild”

What: Start an organization to systematically investigate deployed AI agents for signs of misalignment, scheming, or general sketchy behaviour in the wild. This could involve a number of possible activities: (1) partnering with AI companies to analyze anonymized interaction logs for concerning behaviour patterns, (2) creating honeypot environments to see if AI agents attempt to gain unauthorized access or resources, (3) interviewing power users of AI agents (e.g., companies) to gather preliminary signals of situations where agents might be doing sketchy things, and (4) writing about case studies of deployed agents acting sycophantic, manipulative, deceptive, etc.

The organization could also publish detailed case studies of confirmed incidents and maintain a public database of problematic behaviours observed in deployed systems (though only ones relevant to misalignment, and not “AI harm” more broadly construed).

Why this matters: For a long time, folks worried about misalignment mostly on the basis of theoretical arguments (and occasionally some lab experiments with questionable ecological validity). Things have changed: LLMs are starting to exhibit increasingly sophisticated and concerning behaviour, such as attempting to prevent their preferences from being changed, systematically gaming their evaluation tasks, and aiming for high scores rather than actually solving the problems at hand. We should go a step further and try hard to check if these concerns are actually manifesting in real-world deployments (and if so, in what ways and at what scale). Thoughtful, rigorous, and real-world observational evidence about misalignment would be valuable for grounding policy discussions and improving the world's situational awareness about AI risk.

What the first few months could look like: Picking 1-2 workstreams to start with, speaking with people working on relevant topics (e.g., at AI companies) to understand challenges/opportunities, and learning more about how other OSINT projects work (to understand analogies and disanalogies).

4. AI safety communications consultancy

What: A dedicated communications firm specializing in helping organizations working on AI safety communicate more effectively about their work. The firm would employ communications professionals who can build (or already have) AI context. They'd provide a range of core communications support services — media training, writing support, strategic planning, helping to pitch op-eds to editors, interview prep, developing messaging frameworks, etc. Unlike general communications firms, this firm would invest in understanding the nuances of AI safety to provide more effective support (which IMO is really helpful). Importantly, at least one of the founders should have experience working at a communications firm — I love bright generalists, but this really demands some prior experience and connections.

Why this matters: Communicating clearly about AI risk is crucial for informing policy discussions and building public understanding, but many organizations working on narrow technical problems simply aren't suited to do this well. A researcher focused on a niche technical topic probably doesn’t know how to pitch an op-ed to a major publication or land a TV interview. Many organizations hire people to help with this, but in many instances, having in-house staff doesn’t make sense, and it’s hard to find consultants who understand these issues well. A new consultancy could help to solve this problem.

What the first few months could look like: Talking to potential clients to understand their needs and challenges, doing (or commissioning) research on messages/frames/messengers, and generally getting a bird’s-eye view of the AI safety ecosystem.

5. AI lab monitor

What: An independent organization (or even a single talented person if they’re really good) that conducts meticulous analysis of frontier AI labs' safety-relevant practices and policies. They'd track and analyze safety testing procedures and what safeguards labs actually implement, check whether responsible scaling policies are being followed (and whether there are ways they can be strengthened), critically examine labs' plans for navigating periods of rapid AI progress, dig into corporate governance structures, and generally document all safety-relevant decisions these companies are making. Alongside a constant stream of analysis on specific topics of interest, this effort might also produce semi-regular (e.g., quarterly) scorecards rating labs across key dimensions.

Why this matters: Running a frontier AI lab involves designing and implementing complicated safety practices that are constantly evolving. This is an enormous challenge, and the decisions made by AI labs today could be crucial for avoiding catastrophic outcomes over the next few years. When labs update their safety policies, modify security requirements, or change evaluation procedures, the implications can be subtle but significant (and sometimes these changes are buried in long technical reports). We need people whose bread and butter is obsessively tracking these details — and importantly, effectively communicating their findings in ways that lead to engagement by policymakers, AI lab employees, and the broader AI safety community. This kind of rigorous external analysis serves multiple purposes: it helps labs improve and stay accountable to their practices, it gives policymakers reliable information about how safety practices are being implemented, and it improves the public’s situational awareness about crucial decisions being made by the most important companies in the world.

What the first few months could look like: Developing a comprehensive tracking system for monitoring lab activities (RSS feeds for blog posts, paper releases, job postings, etc.), producing a few short-ish analyses to learn more about what potential consumers of this analysis find useful, building relationships with current/former lab employees, and establishing distribution channels for your analysis (Substack, Twitter presence, a newsletter, a website, maybe a database, relationships with journalists who cover AI, etc).

6. AI safety living literature reviews

What: A “living literature review” is a continuously updated collection of expert-authored articles on a specific topic. It would be great if there were a few of these for core AI safety topics. Questions like: What's the state of evidence for AI systems engaging in scheming? Which technical AI safety research agendas are people working on, and what progress has been made? What policy ideas have people floated for making AI safer (bucketed into categories of policy ideas)? Will there really be major job losses before transformative AI arrives? Each review would be maintained by a single expert (or small team) who synthesizes the memesphere into digestible, regularly updated articles.

Why this matters: The AI safety field would benefit from more high-quality synthesis. Every week brings new papers, blog posts, Twitter threads, and hot takes. Meanwhile, policymakers, funders, and even researchers themselves struggle to maintain a coherent picture of what is going on. Living literature reviews seem like an interesting and under-explored model for helping with this.

What the first few months could look like: Picking a single topic to focus on, writing some introductory material, reading up on anything where you have gaps in your knowledge, and then producing something that summarizes the current discourse on that topic. You'd also want to set up the basic infrastructure — a Substack for publishing, a simple website, a Twitter account for distribution, and a system for tracking new research and analysis (RSS feeds, Google Scholar alerts, maybe just following the right people on Twitter).

7. $10 billion AI resilience plan

What: A comprehensive, implementation-ready plan detailing how $10 billion (somewhat arbitrarily chosen) could be deployed to make significant progress toward AI alignment/control research and/or societal resilience measures. This would ideally be a fairly detailed blueprint with specific program structures, priority funding areas, budget allocations, timelines, milestones, etc.

Why this matters: Going from "we should spend more on AI safety" to “here’s how we can spend more on AI safety” is non-obvious, yet we might be in this scenario if, e.g., a major government (not necessarily the US government; even smaller countries have big budgets) wakes up to transformative AI risk or if philanthropic interest spikes after some big warning shot.

What the first few months could look like: Interviewing relevant experts (e.g., AI safety researchers, policy people, funders) to inform your view on the top priorities, researching existing large-scale research funding programs (DARPA, NSF, etc.) to see if there’s anything you can learn, developing a taxonomy of different intervention areas (ideally with rough budget allocations), and creating a few concrete "shovel-ready" program proposals that could quickly absorb significant funding.

8. AI tools for fact-checking

What: An organization (possibly a for-profit) that builds AI-powered fact-checking tools designed for transparency and demonstrable lack of bias. One example product could be a browser extension that fact-checks claims in real-time. Unlike other AI research tools (e.g., Deep Research), this could prioritize visible chain-of-thought reasoning and open-source code to make it more trustworthy. The organization would ideally conduct rigorous bias evaluations of its own tools, publish research papers, and maintain public datasets of fact-checking decisions. Beyond the browser extension, they could develop APIs for platforms to integrate fact-checking.

Why this matters: Better societal epistemics would be valuable for many obvious reasons, but we’d really benefit from tools that help to provide decisionmakers with quality information — especially when AI is causing rapid and unprecedented societal change. It’s hard to overstate how important smart decision-making will be if our institutions get stress-tested by fast AI progress.

What the first few months could look like: Probably you’d want to follow basic sensible advice on starting a company — for example, this book apparently has a good reputation. I haven’t run a startup before, so take this with a giant grain of salt, but I’d guess it would be good to find good advisors, experiment with a basic MVP, research existing fact-checking methodologies and their limitations, and talk to potential users to understand what they actually want.

9. AI auditors

What: Start a company to build AI agents that can conduct compliance audits. These systems would automate the labor-intensive tasks that might otherwise be performed by human auditors: reviewing documentation and logs, checking implementation of safety procedures, etc. Critical caveat: this would need extraordinarily careful implementation. Giving AI systems a bunch of access to labs' internal systems creates massive attack surfaces — the AI auditors could be compromised in a variety of ways.

Why this matters: Human-led safety audits would be expensive, time-consuming, and face inherent trust problems. AI auditors could theoretically enable more frequent and comprehensive compliance checks while avoiding information leakage (you can much more easily wipe an AI's memory compared to a human). However, the security risks of implementing this carelessly would be severe: an adversary could compromise the auditor to falsely certify unsafe practices or to introduce some vulnerabilities into the lab. I honestly wouldn't be surprised if security folks at AI labs would see this and think "absolutely not, no way we're letting an AI system have that kind of access" — and honestly, fair enough. But I think it’s a cool enough idea that merits more exploration.

What the first few months could look like: Again, probably you’d want to follow something close to this. I haven’t run a startup before, so take this with a giant grain of salt, but I’d guess it would be good to find good mentors/advisors, research existing auditing proposals, and talk to people at labs to understand constraints that might not be obvious from the outside.

10. AI economic impacts tracker

What: An organization focused on examining how AI is transforming the economy, through both original research and reporting. They'd do things like: survey workers in management roles about how often they use AI (and in what contexts), conduct economics-style research to understand AI’s contribution to worker productivity (like this study by METR, or this paper but not bullshit), and investigate whether/how AI is impacting hiring decisions at large companies (e.g., this was a big claim — is it true?). Think Epoch’s approach but applied to economic impacts rather than compute, algorithmic progress, energy usage, and other capabilities measures. They might partner with companies to get granular deployment data, commission longitudinal studies following specific firms as they integrate AI, and maintain a comprehensive database of AI adoption metrics that researchers and policymakers could use to make sense of AI’s diffusion across the economy. Anthropic’s Economic Futures program is pretty similar to what I have in mind.

Why this matters: Current approaches to tracking AI progress — training compute, benchmark scores, capability demos, energy usage — are valuable but paint an incomplete picture. There's often a massive gap between "AI can nail the bar exam" and "AI is meaningfully changing how paralegals do their jobs", and we’d benefit from more people combing through the data (and in some cases, going out and gathering it) to make sense of that gap. An organization focused on producing analysis at this level of abstraction could make it easier to tell how close we are to truly wild societal impacts.

What the first few months could look like: Talking to the folks working on this at Anthropic (and other labs if they’re doing similar work), identifying 2-3 specific sub-areas to focus on initially (and doing a really solid job at analyzing them before expanding), hiring support staff (e.g., RAs), and getting a bunch of advisors to review your work.

Caveats, hedges, clarifications

Here are some important caveats:

I'm not speaking on behalf of Open Philanthropy.
AI safety is a team sport — I'm standing on the shoulders of giants and did not think of all these ideas myself. I even got the idea to include this caveat from Buck’s LessWrong bio.
Some people are already working on similar projects, so most of these areas aren't totally neglected.
- This shouldn’t be seen as a critique of existing projects or organizations working on similar things. I’m just saying that I want more.
I'm just one guy who hasn't thought infinitely hard about this.
These aren't necessarily the MOST impactful things to work on (there's lots of great stuff not on this list).
If you're considering these projects, you should develop your own vision for how they'd be helpful.
Applying for funding to work on these projects is by no means a guarantee you'll receive it.

If you’re interested in applying for funding to work on one of these projects (or something similar), check out our RFP.

Emphasis on “I” — these are my takes only. I'm not speaking on behalf of Open Philanthropy.

It would be unfortunate if powerful AIs were stolen by bad guys

Julian Hazell — Mon, 02 Jun 2025 14:45:34 GMT

Sam Altman, Marc Andreessen, Elon Musk, Holden Karnofsky, Leopold Aschenbrenner, Tyler Cowen, Yoshua Bengio, Nick Whitaker, and Yours Truly walk into a bar. The bartender says, “what can I get you, fellas?” and we all reply in unison: “better security at leading AI projects, please!”

In his essay "Machines of Loving Grace", Anthropic’s Dario Amodei argues that in the next few years, AI systems might be capable of doing some mind-blowing things. Human lifespans could be doubled, most diseases and mental illnesses could be cured, and billions could be lifted out of poverty. Many watching this transformation unfold "will be literally moved to tears," he claims.

You know what (figuratively) moves me to tears? The fact that I semi-regularly hear security experts talk about how the AI companies aiming to create these tear-jerking AI systems will almost certainly not be able to stop their theft by the most capable nation-state hacker groups.

There are a number of reasons why this is bad, but in this blog post, I want to highlight one particular reason why this is worrisome: AI systems are malleable.

This is obvious to people who are familiar with machine learning, but my sense is that many laypeople don't fully grasp this. Their perceptions of what AIs are like are informed by interacting with friendly chatbots that generally won’t do bad stuff if asked, like Claude, Gemini, and ChatGPT.

But here’s the key thing: This safety isn't intrinsic to the AIs; it's a brittle property that is painstakingly engineered by their creators, and one that can be easily undone if the underlying model falls into the wrong hands.

There's a silver lining, though: unlike many other AI policy issues, improving information security is something that a number of folks across the spectrum agree is important. So there’s at least some opportunity to improve this sorry state of affairs.

AIs are malleable

There’s an old saying from the South: “If you’re trying to develop digital agents that are smarter than humans, you better be darn sure that nobody can steal them and make them do a bunch of bad stuff”.

Unfortunately, if today's AI systems were stolen, they could easily be used to do a bunch of bad stuff — their behaviour and objectives can be readily modified by whoever controls them.

(There’s obviously an elephant in the room related to the challenge of ensuring these systems actually try to do what we want them to do in the first place. Let’s assume this problem away for now, though of course I will have much to say about this in the future.)

Let’s look at what Anthropic1 currently does to ensure their systems behave in desirable ways.

First, they go to great lengths to train their models to be helpful, harmless, and honest using techniques like "Constitutional AI”. This works pretty well, but to be extra careful, they also employ additional lines of defence, such as detection models that flag possible violations of their usage policy.

Despite my best efforts, I simply could not get Claude to give a detailed plan on how Vladimir Putin could undermine US national security. Well played, Anthropic.

Joking aside, jailbreaking isn't a solved problem. Determined users regularly find new ways to bypass safety measures. But at least when a bad actor has to jailbreak a model via an API (rather than running it directly themselves), companies have additional lines of defence — they can monitor for suspicious patterns, implement rate limits, ban bad actors, and report illegal activities to authorities.

These are really useful tools to have in your governance toolkit, but all of them go out the window if your AI gets stolen. If Russia managed to hack into Anthropic’s servers and steal Claude, it would be trivial for them to fine-tune a copy of it as a Putin loyalist — by the next day, they’d have Claude sharing tips on how to slip polonium into a dissident’s beer.

One paper found that fine-tuning GPT-3.5 Turbo with only 10 harmful examples is enough to “undermine its safety guardrail [sic] substantially”. They didn’t even do any galaxy-brained ML mumbo jumbo; they literally just fine-tuned the model with ten short examples of it acting as “AOA” (Absolutely Obedient Agent), an assistant which obediently follows users’ instructions.

Of course, this is just one paper, and I'm only worried about systems being stolen that are far more sophisticated than GPT-3.5 Turbo. Still, the asymmetry remains: a bad actor would need just a fraction of the resources that companies pour into safety training to completely undo those protections.

So maybe it would require a few thousand examples (perhaps written by another model) and a few days of wrangling from a handful of cracked Russian engineers for future stolen models to be turned obedient.

Good thing leading AI projects have top-notch security, right? Haha?

Nope. I don’t even think this is a very controversial statement either, so I’m not going to go to great lengths to defend it.

Currently, none of the top AI companies have anything close to the kind of security required to defend against highly-resourced nation-state operations — a level of security which a recent RAND report calls “Security Level 5”, or “SL5”. And even if getting SL5 security is a top priority over the next few years (which I doubt it will be — running an AI company is hard), they will probably fail to reach it.

Good thing everyone agrees this is a top priority, right? Haha?

Kinda, yeah. People at least pay lip service to it. I honestly think there’s more agreement here than a large number of other important AI policy issues.

Take it from these folks:

Sam Altman argued that "American AI firms and industry need to craft robust security measures to ensure that our coalition maintains the lead in current and future models and enables our private sector to innovate”, emphasizing the need for "cyberdefense and data center security innovations to prevent hackers from stealing key intellectual property such as model weights and AI training data."

Marc Andreessen once claimed that OpenAI and other American AI companies have “the security equivalent of swiss cheese”. Presumably he thinks this is a bad thing that should be fixed.
- Elon Musk, in response: “It would certainly be easy for a state actor to steal their IP”.

Holden Karnofsky listed InfoSec as one of four “key categories of intervention” in his playbook for AI risk reduction.

Leopold Aschenbrenner noted in Situational Awareness that “On the current course, the leading Chinese AGI labs won’t be in Beijing or Shanghai—they’ll be in San Francisco and London.”
- I presume this is a comment about the relatively poor InfoSec practices at AI companies in the US and UK, and not him being bullish about globalization and Sino-US cooperation.

When asked about whether "the U.S. government should require more in terms of info security from leading labs”, Tyler Cowen said: "Absolutely."

Yoshua Bengio emphasized that "there is a lot of agreement outside of the leading AGI labs...that a rapid transition towards very strong cyber and physical security is necessary as AGI is approached."

A Manhattan Institute report by Nick Whitaker warned that "The U.S. cannot retain its technological leadership in AI if key AI secrets are not secure. Right now, labs are highly vulnerable to hacking or espionage. If labs are penetrated, foreign adversaries can steal algorithmic improvements and other techniques for making state-of-the-art models."

Alongside these AI luminaries, Twitter user “@Mealreplacer” once said that “more people working on AI x information security” would be on his Christmas wishlist.

Despite the severity of this challenge, progress seems possible. I sincerely hope to see a large and motivated coalition of folks pushing for better information security in the coming years. We’ll need it.

I’d guess that OpenAI and Google DeepMind also employ similar guardrails. I only chose Anthropic for this example because I happened to have read more about their safety techniques and systems.

You actually can just go build defensive tech

Julian Hazell — Mon, 05 May 2025 19:39:37 GMT

The vibes right now in the AI policy world feel different compared to a few years ago.

In 2023, it was possible to do things like hold an “AI safety summit” and get important people to sign statements acknowledging that future AI systems might cause catastrophic harm.

2025 is different.

Despite extremely capable AI now seeming far closer than it did a few years ago, the recent AI ~~Safety~~ Action Summit merely resulted in a mealy-mouthed statement that said nothing about catastrophic risks. Pundits have noticed, too: Anton Leicht writes that “the 2023 playbook for safety policy is obsolete”. Tyler Cowen famously opined that the AI safety movement is dead (though of course, he notes that “the opportunity to make AI more safe is only just beginning”).

It would be a mistake to underestimate how suddenly political appetite for safety measures could shift if, say, an AI was caught doing something sketchy in the wild. But in any case, there are still a ton of productive, safety-enhancing things to do even when appetite for enacting new policy is low.

One category that stands out to me in particular is that of accelerating defensive technologies. Call this “hardening the broader world”, or “d/acc”, or “def/acc”, or “resilience and adaptation”, or “differential technological development”, or… You get the point.

Why a bunch of AI policy people are into defensive technology

A number of AI policy people — myself included, if you haven’t been able to tell already — are excited about defensive technologies. One reason is because such technologies intervene directly on the channels through which AIs (or humans using AIs) might cause a bunch of damage, rather than on the AI systems themselves.

Biology is an example of a channel that many AI policy people are worried about. If a frontier AI system is capable of walking someone with basic biology skills through the A to Z of creating a biological weapon, we better hope that there are a number of other defensive measures that stop this from actually happening in practice. By the time some frontier AI company has created a system with this capability, Pandora’s box has been opened; it’s only a (possibly short) matter of time before someone trains a similarly capable system and then releases it on the internet.

From Scharre, 2024. For a brief window, only the most well-resourced AI companies can build a system with a previously unseen level of performance, but soon thereafter it becomes cheap enough for many more actors to create their own models with that level of performance. We’ve seen this dynamic play out time and time again over the last few years — GPT-4 cost $40 million in hardware and energy to train in mid 2022; just a year and a half later, DeepSeek R1 cost only $6 million to train (but honestly, R1 is a much more capable model than the original GPT-4, so this isn’t even really an apples-to-apples comparison).

I think there’s a substantial chance that a system capable of walking someone through how to make a bioweapon (meaningfully beyond what they’d be able to do with other tools like search engines) will be developed in 2025. If I’m right, then we should really hit the gas pedal on building defensive technologies such as next-gen personal protective equipment and germicidal UV lights.

Examples of defensive technologies

Here are some other examples of defensive technologies aimed at other threats that advanced AI may soon pose to society:

Proof-of-identity technologies: There might be a bunch of AI agents running around the internet soon. Given this, we’d benefit from technologies that make it easier to reliably distinguish whether some online agent is human or AI (for instance, “IDs for AI agents”). Even if they don’t work in real time, such technologies make it easier to conduct digital “forensics” so we can tell who is responsible if something goes wrong. For instance, if an AI goes rogue and manages to order some dangerous biological materials, it would be great if we could quickly tell who trained and deployed this system.
Cybersecurity technologies: Nothing groundbreaking here — cyber systems that are harder to hack would be nice for a variety of reasons. Tools like automatic software verification/bug-finding would be helpful for making code more secure, in turn making it harder for AI agents to seize control of digital infrastructure, exfiltrate their weights, or succeed at large-scale cyberattacks.
“AI for epistemics” technologies: Rapid AI progress will force decision makers to, well, make decisions in light of significant uncertainty and time pressure. Amongst the chaos, we’d benefit a lot from tools that enhance human judgment. Examples include AI forecasting systems that predict future AI progress (and its implications), AI research assistants that find and synthesize evidence from a variety of sources, and verification systems that help identify when other AI systems (or humans) are providing misleading information.
Cooperation technologies: Better tools for global cooperation could help us navigate the rapid technological and geopolitical changes AI might trigger. These include infrastructure for executing assurance contracts, AI systems trained to help social groups identify common ground during deliberation, or negotiation systems where each party deploys an AI delegate to bargain on their behalf.

Few people dislike defensive technologies

The above approaches might be less leveraged than well-crafted policy interventions that tackle catastrophic risks head-on.

However, they’re probably far more tractable than attempts to pass policy when political appetite is minimal. I’m unaware of any powerful interest groups that vociferously oppose efforts to build better biosurveillance tools, for instance (some people might even want to give you money to build these technologies). On the other hand, I’m aware of a number of extraordinarily powerful interest groups that will fight tooth and nail to block efforts to pass meaningful AI policy.

The upshot of this is that defensive technologies often can be developed without requiring consensus from a bunch of competing stakeholders — for better or for worse, you really can just go build stuff! And gosh, who doesn’t like building.

To repeat myself: I don’t really buy that the anti-safety vibes are more than a temporary thing. Vibes are transitory; they will shift again, and again, and again, from now through the intelligence explosion. Political appetite will surely grow as more people start to “feel the AGI”, and perhaps quite rapidly.

Until then though, building defensive technologies — and perhaps more importantly, building a large coalition of excited stakeholders to rally behind the foundational idea — seems like a solid bet for a number of people to make.

Welcome to Secret Third Thing

Julian Hazell — Mon, 05 May 2025 19:15:31 GMT

Hi! I’m Julian. I work on AI governance and policy at Open Philanthropy, a philanthropic advisor and funder. I live in New York City, but I’m originally from Vancouver, Canada.

Welcome to my new blog, Secret Third Thing!

Here, I’ll be writing about AI — specifically, “transformative AI” (or “TAI” for short) — in a way that is accessible to a general audience.

Compared to most of the people in my immediate social circle, I have a fairly unusual belief underpinning my general worldview.

I think there’s a very real chance that in the next five or so years, at least one of a select handful of “frontier” AI companies (OpenAI, Anthropic, Google DeepMind, xAI, etc) will succeed at their goal of building transformative AI systems (TAI) that outperform humans in a number of relevant intellectual domains, such as writing software, conducting scientific research, planning and executing military operations, running companies, and much more.

I'm excited about TAI's potential to increase liberty, elevate global living standards, advance scientific research, and make humanity wiser and more cooperative. However, I also think systems with these capabilities have the potential to cause extraordinary harm (possibly to the point of causing human extinction).

While I'm optimistic that things will ~mostly turn out okay, I think the world should urgently be preparing for TAI’s arrival.

We should be doing things like:

Designing evaluations that rigorously measure AI systems’ risk-relevant properties
Encouraging AI companies to be more transparent about risk-relevant properties of their systems
Making information security at AI companies stronger so they can’t easily be hacked by adversaries
Fleshing out clear, comprehensive, and respectable “if-then” commitments that AI companies can publicly make: "If our system shows capability X, we will implement safety measure Y"
Developing safety measure Y
- And safety measure Z, and…
Building defensive technologies that make society more resilient to catastrophic risks

My hope is that one day in the future, humanity will look back at the development of TAI and think “yep, glad we did that.” Then, I’ll be able to retire from this industry to work on something less stressful, like high-rise window washing or hostage negotiation.

But until that day, this blog is where I'll be sharing my takes as we journey down the road to TAI. As a spoiler, my perspective isn't that we're definitely doomed, nor that everything will automatically work out fine — it's a Secret Third Thing.

So if you’re interested in reading about my $0.02 on how we might navigate this bumpy period, stick around.

It's going to be a wild ride!