Avsnitt

  • Originally released in March 2021.

    Brian Christian is a bestselling author with a particular knack for accurately communicating difficult or technical ideas from both mathematics and computer science.

    Listeners loved our episode about his book Algorithms to Live By — so when the team read his new book, The Alignment Problem, and found it to be an insightful and comprehensive review of the state of the research into making advanced AI useful and reliably safe, getting him back on the show was a no-brainer.

    Brian has so much of substance to say this episode will likely be of interest to people who know a lot about AI as well as those who know a little, and of interest to people who are nervous about where AI is going as well as those who aren't nervous at all.

    Links to learn more, summary and full transcript.

    Here’s a tease of 10 Hollywood-worthy stories from the episode:

    • The Riddle of Dopamine: The development of reinforcement learning solves a long-standing mystery of how humans are able to learn from their experience.
    • ALVINN: A student teaches a military vehicle to drive between Pittsburgh and Lake Erie, without intervention, in the early 1990s, using a computer with a tenth the processing capacity of an Apple Watch.
    • Couch Potato: An agent trained to be curious is stopped in its quest to navigate a maze by a paralysing TV screen.
    • Pitts & McCulloch: A homeless teenager and his foster father figure invent the idea of the neural net.
    • Tree Senility: Agents become so good at living in trees to escape predators that they forget how to leave, starve, and die.
    • The Danish Bicycle: A reinforcement learning agent figures out that it can better achieve its goal by riding in circles as quickly as possible than reaching its purported destination.
    • Montezuma's Revenge: By 2015 a reinforcement learner can play 60 different Atari games — the majority impossibly well — but can’t score a single point on one game humans find tediously simple.
    • Curious Pong: Two novelty-seeking agents, forced to play Pong against one another, create increasingly extreme rallies.
    • AlphaGo Zero: A computer program becomes superhuman at Chess and Go in under a day by attempting to imitate itself.
    • Robot Gymnasts: Over the course of an hour, humans teach robots to do perfect backflips just by telling them which of 2 random actions look more like a backflip.

    We also cover:

    • How reinforcement learning actually works, and some of its key achievements and failures
    • How a lack of curiosity can cause AIs to fail to be able to do basic things
    • The pitfalls of getting AI to imitate how we ourselves behave
    • The benefits of getting AI to infer what we must be trying to achieve
    • Why it’s good for agents to be uncertain about what they're doing
    • Why Brian isn’t that worried about explicit deception
    • The interviewees Brian most agrees with, and most disagrees with
    • Developments since Brian finished the manuscript
    • The effective altruism and AI safety communities
    • And much more

    Producer: Keiran Harris.
    Audio mastering: Ben Cordell.
    Transcriptions: Sofia Davis-Fogel.

  • Saknas det avsnitt?

    Klicka här för att uppdatera flödet manuellt.

  • Originally released in May 2023.

    Imagine you are an orphaned eight-year-old whose parents left you a $1 trillion company, and no trusted adult to serve as your guide to the world. You have to hire a smart adult to run that company, guide your life the way that a parent would, and administer your vast wealth. You have to hire that adult based on a work trial or interview you come up with. You don't get to see any resumes or do reference checks. And because you're so rich, tonnes of people apply for the job — for all sorts of reasons.

    Today's guest Ajeya Cotra — senior research analyst at Open Philanthropy — argues that this peculiar setup resembles the situation humanity finds itself in when training very general and very capable AI models using current deep learning methods.

    Links to learn more, summary and full transcript.

    As she explains, such an eight-year-old faces a challenging problem. In the candidate pool there are likely some truly nice people, who sincerely want to help and make decisions that are in your interest. But there are probably other characters too — like people who will pretend to care about you while you're monitoring them, but intend to use the job to enrich themselves as soon as they think they can get away with it.

    Like a child trying to judge adults, at some point humans will be required to judge the trustworthiness and reliability of machine learning models that are as goal-oriented as people, and greatly outclass them in knowledge, experience, breadth, and speed. Tricky!

    Can't we rely on how well models have performed at tasks during training to guide us? Ajeya worries that it won't work. The trouble is that three different sorts of models will all produce the same output during training, but could behave very differently once deployed in a setting that allows their true colours to come through. She describes three such motivational archetypes:

    Saints — models that care about doing what we really wantSycophants — models that just want us to say they've done a good job, even if they get that praise by taking actions they know we wouldn't want them toSchemers — models that don't care about us or our interests at all, who are just pleasing us so long as that serves their own agenda

    And according to Ajeya, there are also ways we could end up actively selecting for motivations that we don't want.

    In today's interview, Ajeya and Rob discuss the above, as well as:

    How to predict the motivations a neural network will develop through trainingWhether AIs being trained will functionally understand that they're AIs being trained, the same way we think we understand that we're humans living on planet EarthStories of AI misalignment that Ajeya doesn't buy intoAnalogies for AI, from octopuses to aliens to can openersWhy it's smarter to have separate planning AIs and doing AIsThe benefits of only following through on AI-generated plans that make sense to human beingsWhat approaches for fixing alignment problems Ajeya is most excited about, and which she thinks are overratedHow one might demo actually scary AI failure mechanisms

    Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.

    Producer: Keiran Harris

    Audio mastering: Ryan Kessler and Ben Cordell

    Transcriptions: Katy Moore

  • Originally released in October 2018.

    Paul Christiano is one of the smartest people I know. After our first session produced such great material, we decided to do a second recording, resulting in our longest interview so far. While challenging at times I can strongly recommend listening - Paul works on AI himself and has a very unusually thought through view of how it will change the world. This is now the top resource I'm going to refer people to if they're interested in positively shaping the development of AI, and want to understand the problem better. Even though I'm familiar with Paul's writing I felt I was learning a great deal and am now in a better position to make a difference to the world.

    A few of the topics we cover are:

    • Why Paul expects AI to transform the world gradually rather than explosively and what that would look like
    • Several concrete methods OpenAI is trying to develop to ensure AI systems do what we want even if they become more competent than us
    • Why AI systems will probably be granted legal and property rights
    • How an advanced AI that doesn't share human goals could still have moral value
    • Why machine learning might take over science research from humans before it can do most other tasks
    • Which decade we should expect human labour to become obsolete, and how this should affect your savings plan.

    Links to learn more, summary and full transcript.

    Here's a situation we all regularly confront: you want to answer a difficult question, but aren't quite smart or informed enough to figure it out for yourself. The good news is you have access to experts who *are* smart enough to figure it out. The bad news is that they disagree.

    If given plenty of time - and enough arguments, counterarguments and counter-counter-arguments between all the experts - should you eventually be able to figure out which is correct? What if one expert were deliberately trying to mislead you? And should the expert with the correct view just tell the whole truth, or will competition force them to throw in persuasive lies in order to have a chance of winning you over?

    In other words: does 'debate', in principle, lead to truth?

    According to Paul Christiano - researcher at the machine learning research lab OpenAI and legendary thinker in the effective altruism and rationality communities - this question is of more than mere philosophical interest. That's because 'debate' is a promising method of keeping artificial intelligence aligned with human goals, even if it becomes much more intelligent and sophisticated than we are.

    It's a method OpenAI is actively trying to develop, because in the long-term it wants to train AI systems to make decisions that are too complex for any human to grasp, but without the risks that arise from a complete loss of human oversight.

    If AI-1 is free to choose any line of argument in order to attack the ideas of AI-2, and AI-2 always seems to successfully defend them, it suggests that every possible line of argument would have been unsuccessful.

    But does that mean that the ideas of AI-2 were actually right? It would be nice if the optimal strategy in debate were to be completely honest, provide good arguments, and respond to counterarguments in a valid way. But we don't know that's the case.

    Get this episode by subscribing: type '80,000 Hours' into your podcasting app.

    The 80,000 Hours Podcast is produced by Keiran Harris.

  • Can there be a more exciting and strange place to work today than a leading AI lab? Your CEO has said they're worried your research could cause human extinction. The government is setting up meetings to discuss how this outcome can be avoided. Some of your colleagues think this is all overblown; others are more anxious still.

    Today's guest — machine learning researcher Rohin Shah — goes into the Google DeepMind offices each day with that peculiar backdrop to his work.

    Links to learn more, summary and full transcript.

    He's on the team dedicated to maintaining 'technical AI safety' as these models approach and exceed human capabilities: basically that the models help humanity accomplish its goals without flipping out in some dangerous way. This work has never seemed more important.

    In the short-term it could be the key bottleneck to deploying ML models in high-stakes real-life situations. In the long-term, it could be the difference between humanity thriving and disappearing entirely.

    For years Rohin has been on a mission to fairly hear out people across the full spectrum of opinion about risks from artificial intelligence -- from doomers to doubters -- and properly understand their point of view. That makes him unusually well placed to give an overview of what we do and don't understand. He has landed somewhere in the middle — troubled by ways things could go wrong, but not convinced there are very strong reasons to expect a terrible outcome.

    Today's conversation is wide-ranging and Rohin lays out many of his personal opinions to host Rob Wiblin, including:

    What he sees as the strongest case both for and against slowing down the rate of progress in AI research.Why he disagrees with most other ML researchers that training a model on a sensible 'reward function' is enough to get a good outcome.Why he disagrees with many on LessWrong that the bar for whether a safety technique is helpful is “could this contain a superintelligence.”That he thinks nobody has very compelling arguments that AI created via machine learning will be dangerous by default, or that it will be safe by default. He believes we just don't know.That he understands that analogies and visualisations are necessary for public communication, but is sceptical that they really help us understand what's going on with ML models, because they're different in important ways from every other case we might compare them to.Why he's optimistic about DeepMind’s work on scalable oversight, mechanistic interpretability, and dangerous capabilities evaluations, and what each of those projects involves.Why he isn't inherently worried about a future where we're surrounded by beings far more capable than us, so long as they share our goals to a reasonable degree.Why it's not enough for humanity to know how to align AI models — it's essential that management at AI labs correctly pick which methods they're going to use and have the practical know-how to apply them properly.Three observations that make him a little more optimistic: humans are a bit muddle-headed and not super goal-orientated; planes don't crash; and universities have specific majors in particular subjects.Plenty more besides.

    Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.

    Producer: Keiran Harris

    Audio mastering: Milo McGuire, Dominic Armstrong, and Ben Cordell

    Transcriptions: Katy Moore

  • Originally released in August 2021.

    Chris Olah has had a fascinating and unconventional career path.

    Most people who want to pursue a research career feel they need a degree to get taken seriously. But Chris not only doesn't have a PhD, but doesn’t even have an undergraduate degree. After dropping out of university to help defend an acquaintance who was facing bogus criminal charges, Chris started independently working on machine learning research, and eventually got an internship at Google Brain, a leading AI research group.

    In this interview — a follow-up to our episode on his technical work — we discuss what, if anything, can be learned from his unusual career path. Should more people pass on university and just throw themselves at solving a problem they care about? Or would it be foolhardy for others to try to copy a unique case like Chris’?

    Links to learn more, summary and full transcript.

    We also cover some of Chris' personal passions over the years, including his attempts to reduce what he calls 'research debt' by starting a new academic journal called Distill, focused just on explaining existing results unusually clearly.

    As Chris explains, as fields develop they accumulate huge bodies of knowledge that researchers are meant to be familiar with before they start contributing themselves. But the weight of that existing knowledge — and the need to keep up with what everyone else is doing — can become crushing. It can take someone until their 30s or later to earn their stripes, and sometimes a field will split in two just to make it possible for anyone to stay on top of it.

    If that were unavoidable it would be one thing, but Chris thinks we're nowhere near communicating existing knowledge as well as we could. Incrementally improving an explanation of a technical idea might take a single author weeks to do, but could go on to save a day for thousands, tens of thousands, or hundreds of thousands of students, if it becomes the best option available.

    Despite that, academics have little incentive to produce outstanding explanations of complex ideas that can speed up the education of everyone coming up in their field. And some even see the process of deciphering bad explanations as a desirable right of passage all should pass through, just as they did.

    So Chris tried his hand at chipping away at this problem — but concluded the nature of the problem wasn't quite what he originally thought. In this conversation we talk about that, as well as:

    • Why highly thoughtful cold emails can be surprisingly effective, but average cold emails do little
    • Strategies for growing as a researcher
    • Thinking about research as a market
    • How Chris thinks about writing outstanding explanations
    • The concept of 'micromarriages' and ‘microbestfriendships’
    • And much more.

    Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app.

    Producer: Keiran Harris
    Audio mastering: Ben Cordell
    Transcriptions: Sofia Davis-Fogel

  • Originally released in December 2022.

    Large language models like GPT-3, and now ChatGPT, are neural networks trained on a large fraction of all text available on the internet to do one thing: predict the next word in a passage. This simple technique has led to something extraordinary — black boxes able to write TV scripts, explain jokes, produce satirical poetry, answer common factual questions, argue sensibly for political positions, and more. Every month their capabilities grow.

    But do they really 'understand' what they're saying, or do they just give the illusion of understanding?

    Today's guest, Richard Ngo, thinks that in the most important sense they understand many things. Richard is a researcher at OpenAI — the company that created ChatGPT — who works to foresee where AI advances are going and develop strategies that will keep these models from 'acting out' as they become more powerful, are deployed and ultimately given power in society.

    Links to learn more, summary and full transcript.

    One way to think about 'understanding' is as a subjective experience. Whether it feels like something to be a large language model is an important question, but one we currently have no way to answer.

    However, as Richard explains, another way to think about 'understanding' is as a functional matter. If you really understand an idea you're able to use it to reason and draw inferences in new situations. And that kind of understanding is observable and testable.

    Richard argues that language models are developing sophisticated representations of the world which can be manipulated to draw sensible conclusions — maybe not so different from what happens in the human mind. And experiments have found that, as models get more parameters and are trained on more data, these types of capabilities consistently improve.

    We might feel reluctant to say a computer understands something the way that we do. But if it walks like a duck and it quacks like a duck, we should consider that maybe we have a duck, or at least something sufficiently close to a duck it doesn't matter.

    In today's conversation we discuss the above, as well as:

    • Could speeding up AI development be a bad thing?
    • The balance between excitement and fear when it comes to AI advances
    • What OpenAI focuses its efforts where it does
    • Common misconceptions about machine learning
    • How many computer chips it might require to be able to do most of the things humans do
    • How Richard understands the 'alignment problem' differently than other people
    • Why 'situational awareness' may be a key concept for understanding the behaviour of AI models
    • What work to positively shape the development of AI Richard is and isn't excited about
    • The AGI Safety Fundamentals course that Richard developed to help people learn more about this field

    Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app.

    Producer: Keiran Harris
    Audio mastering: Milo McGuire and Ben Cordell
    Transcriptions: Katy Moore

  • Originally released in July 2020.

    80,000 Hours, along with many other members of the effective altruism movement, has argued that helping to positively shape the development of artificial intelligence may be one of the best ways to have a lasting, positive impact on the long-term future. Millions of dollars in philanthropic spending, as well as lots of career changes, have been motivated by these arguments.

    Today’s guest, Ben Garfinkel, Research Fellow at Oxford’s Future of Humanity Institute, supports the continued expansion of AI safety as a field and believes working on AI is among the very best ways to have a positive impact on the long-term future. But he also believes the classic AI risk arguments have been subject to insufficient scrutiny given this level of investment.

    In particular, the case for working on AI if you care about the long-term future has often been made on the basis of concern about AI accidents; it’s actually quite difficult to design systems that you can feel confident will behave the way you want them to in all circumstances.

    Nick Bostrom wrote the most fleshed out version of the argument in his book, Superintelligence. But Ben reminds us that, apart from Bostrom’s book and essays by Eliezer Yudkowsky, there's very little existing writing on existential accidents.

    Links to learn more, summary and full transcript.

    There have also been very few skeptical experts that have actually sat down and fully engaged with it, writing down point by point where they disagree or where they think the mistakes are. This means that Ben has probably scrutinised classic AI risk arguments as carefully as almost anyone else in the world.

    He thinks that most of the arguments for existential accidents often rely on fuzzy, abstract concepts like optimisation power or general intelligence or goals, and toy thought experiments. And he doesn’t think it’s clear we should take these as a strong source of evidence.

    Ben’s also concerned that these scenarios often involve massive jumps in the capabilities of a single system, but it's really not clear that we should expect such jumps or find them plausible. These toy examples also focus on the idea that because human preferences are so nuanced and so hard to state precisely, it should be quite difficult to get a machine that can understand how to obey them.

    But Ben points out that it's also the case in machine learning that we can train lots of systems to engage in behaviours that are actually quite nuanced and that we can't specify precisely. If AI systems can recognise faces from images, and fly helicopters, why don’t we think they’ll be able to understand human preferences?

    Despite these concerns, Ben is still fairly optimistic about the value of working on AI safety or governance.

    He doesn’t think that there are any slam-dunks for improving the future, and so the fact that there are at least plausible pathways for impact by working on AI safety and AI governance, in addition to it still being a very neglected area, puts it head and shoulders above most areas you might choose to work in.

    This is the second episode hosted by our Strategy Advisor Howie Lempel, and he and Ben cover, among many other things:

    • The threat of AI systems increasing the risk of permanently damaging conflict or collapse
    • The possibility of permanently locking in a positive or negative future
    • Contenders for types of advanced systems
    • What role AI should play in the effective altruism portfolio

    Get this episode by subscribing: type 80,000 Hours into your podcasting app. Or read the linked transcript.

    Producer: Keiran Harris.
    Audio mastering: Ben Cordell.
    Transcriptions: Zakee Ulhaq.

  • Originally released in May 2023.

    It’s easy to dismiss alarming AI-related predictions when you don’t know where the numbers came from.

    For example: what if we told you that within 15 years, it’s likely that we’ll see a 1,000x improvement in AI capabilities in a single year? And what if we then told you that those improvements would lead to explosive economic growth unlike anything humanity has seen before?

    You might think, “Congratulations, you said a big number — but this kind of stuff seems crazy, so I’m going to keep scrolling through Twitter.”

    But this 1,000x yearly improvement is a prediction based on *real economic models* created by today’s guest Tom Davidson, Senior Research Analyst at Open Philanthropy. By the end of the episode, you’ll either be able to point out specific flaws in his step-by-step reasoning, or have to at least *consider* the idea that the world is about to get — at a minimum — incredibly weird.

    Links to learn more, summary and full transcript.

    As a teaser, consider the following:

    Developing artificial general intelligence (AGI) — AI that can do 100% of cognitive tasks at least as well as the best humans can — could very easily lead us to an unrecognisable world.

    You might think having to train AI systems individually to do every conceivable cognitive task — one for diagnosing diseases, one for doing your taxes, one for teaching your kids, etc. — sounds implausible, or at least like it’ll take decades.

    But Tom thinks we might not need to train AI to do every single job — we might just need to train it to do one: AI research.

    And building AI capable of doing research and development might be a much easier task — especially given that the researchers training the AI are AI researchers themselves.

    And once an AI system is as good at accelerating future AI progress as the best humans are today — and we can run billions of copies of it round the clock — it’s hard to make the case that we won’t achieve AGI very quickly.

    To give you some perspective: 17 years ago we saw the launch of Twitter, the release of Al Gore's *An Inconvenient Truth*, and your first chance to play the Nintendo Wii.

    Tom thinks that if we have AI that significantly accelerates AI R&D, then it’s hard to imagine not having AGI 17 years from now.

    Wild.

    Host Luisa Rodriguez gets Tom to walk us through his careful reports on the topic, and how he came up with these numbers, across a terrifying but fascinating three hours.

    Luisa and Tom also discuss:

    • How we might go from GPT-4 to AI disaster
    • Tom’s journey from finding AI risk to be kind of scary to really scary
    • Whether international cooperation or an anti-AI social movement can slow AI progress down
    • Why it might take just a few years to go from pretty good AI to superhuman AI
    • How quickly the number and quality of computer chips we’ve been using for AI have been increasing
    • The pace of algorithmic progress
    • What ants can teach us about AI
    • And much more

    Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app.

    Producer: Keiran Harris
    Audio mastering: Simon Monsour and Ben Cordell
    Transcriptions: Katy Moore

  • Originally released in July 2019.

    From 1870 to 1950, the introduction of electricity transformed life in the US and UK, as people gained access to lighting, radio and a wide range of household appliances for the first time. Electricity turned out to be a general purpose technology that could help with almost everything people did.

    Some think this is the best historical analogy we have for how machine learning could alter life in the 21st century.

    In addition to massively changing everyday life, past general purpose technologies have also changed the nature of war. For example, when electricity was introduced to the battlefield, commanders gained the ability to communicate quickly with units in the field over great distances.

    How might international security be altered if the impact of machine learning reaches a similar scope to that of electricity? Today's guest — Helen Toner — recently helped found the Center for Security and Emerging Technology at Georgetown University to help policymakers prepare for such disruptive technical changes that might threaten international peace.

    • Links to learn more, summary and full transcript
    • Philosophy is one of the hardest grad programs. Is it worth it, if you want to use ideas to change the world? by Arden Koehler and Will MacAskill
    • The case for building expertise to work on US AI policy, and how to do it by Niel Bowerman
    • AI strategy and governance roles on the job board

    Their first focus is machine learning (ML), a technology which allows computers to recognise patterns, learn from them, and develop 'intuitions' that inform their judgement about future cases. This is something humans do constantly, whether we're playing tennis, reading someone's face, diagnosing a patient, or figuring out which business ideas are likely to succeed.

    Sometimes these ML algorithms can seem uncannily insightful, and they're only getting better over time. Ultimately a wide range of different ML algorithms could end up helping us with all kinds of decisions, just as electricity wakes us up, makes us coffee, and brushes our teeth -- all in the first five minutes of our day.

    Rapid advances in ML, and the many prospective military applications, have people worrying about an 'AI arms race' between the US and China. Henry Kissinger and the past CEO of Google Eric Schmidt recently wrote that AI could "destabilize everything from nuclear détente to human friendships." Some politicians talk of classifying and restricting access to ML algorithms, lest they fall into the wrong hands.

    But if electricity is the best analogy, you could reasonably ask — was there an arms race in electricity in the 19th century? Would that have made any sense? And could someone have changed the course of history by changing who first got electricity and how they used it, or is that a fantasy?

    In today's episode we discuss the research frontier in the emerging field of AI policy and governance, how to have a career shaping US government policy, and Helen's experience living and studying in China.

    We cover:

    • Why immigration is the main policy area that should be affected by AI advances today.
    • Why talking about an 'arms race' in AI is premature.
    • How Bobby Kennedy may have positively affected the Cuban Missile Crisis.
    • Whether it's possible to become a China expert and still get a security clearance.
    • Can access to ML algorithms be restricted, or is that just not practical?
    • Whether AI could help stabilise authoritarian regimes.

    Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app.

    The 80,000 Hours Podcast is produced by Keiran Harris.

  • Originally released in June 2022.

    If a business has spent $100 million developing a product, it's a fair bet that they don't want it stolen in two seconds and uploaded to the web where anyone can use it for free.

    This problem exists in extreme form for AI companies. These days, the electricity and equipment required to train cutting-edge machine learning models that generate uncanny human text and images can cost tens or hundreds of millions of dollars. But once trained, such models may be only a few gigabytes in size and run just fine on ordinary laptops.

    Today's guest, the computer scientist and polymath Nova DasSarma, works on computer and information security for the AI company Anthropic. One of her jobs is to stop hackers exfiltrating Anthropic's incredibly expensive intellectual property, as recently happened to Nvidia. As she explains, given models’ small size, the need to store such models on internet-connected servers, and the poor state of computer security in general, this is a serious challenge.

    Links to learn more, summary and full transcript.

    The worries aren't purely commercial though. This problem looms especially large for the growing number of people who expect that in coming decades we'll develop so-called artificial 'general' intelligence systems that can learn and apply a wide range of skills all at once, and thereby have a transformative effect on society.

    If aligned with the goals of their owners, such general AI models could operate like a team of super-skilled assistants, going out and doing whatever wonderful (or malicious) things are asked of them. This might represent a huge leap forward for humanity, though the transition to a very different new economy and power structure would have to be handled delicately.

    If unaligned with the goals of their owners or humanity as a whole, such broadly capable models would naturally 'go rogue,' breaking their way into additional computer systems to grab more computing power — all the better to pursue their goals and make sure they can't be shut off.

    As Nova explains, in either case, we don't want such models disseminated all over the world before we've confirmed they are deeply safe and law-abiding, and have figured out how to integrate them peacefully into society. In the first scenario, premature mass deployment would be risky and destabilising. In the second scenario, it could be catastrophic -- perhaps even leading to human extinction if such general AI systems turn out to be able to self-improve rapidly rather than slowly.

    If highly capable general AI systems are coming in the next 10 or 20 years, Nova may be flying below the radar with one of the most important jobs in the world.

    We'll soon need the ability to 'sandbox' (i.e. contain) models with a wide range of superhuman capabilities, including the ability to learn new skills, for a period of careful testing and limited deployment — preventing the model from breaking out, and criminals from breaking in. Nova and her colleagues are trying to figure out how to do this, but as this episode reveals, even the state of the art is nowhere near good enough.

    In today's conversation, Rob and Nova cover:

    • How good or bad is information security today
    • The most secure computer systems that exist
    • How to design an AI training compute centre for maximum efficiency
    • Whether 'formal verification' can help us design trustworthy systems
    • How wide the gap is between AI capabilities and AI safety
    • How to disincentivise hackers
    • What should listeners do to strengthen their own security practices
    • And much more.

    Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app.

    Producer: Keiran Harris
    Audio mastering: Ben Cordell and Beppe Rådvik
    Transcriptions: Katy Moore

  • Originally released in November 2018.

    After dropping out of a machine learning PhD at Stanford, Daniel Ziegler needed to decide what to do next. He’d always enjoyed building stuff and wanted to shape the development of AI, so he thought a research engineering position at an org dedicated to aligning AI with human interests could be his best option.

    He decided to apply to OpenAI, and spent about 6 weeks preparing for the interview before landing the job. His PhD, by contrast, might have taken 6 years. Daniel thinks this highly accelerated career path may be possible for many others.

    On today’s episode Daniel is joined by Catherine Olsson, who has also worked at OpenAI, and left her computational neuroscience PhD to become a research engineer at Google Brain. She and Daniel share this piece of advice for those curious about this career path: just dive in. If you're trying to get good at something, just start doing that thing, and figure out that way what's necessary to be able to do it well.

    Catherine has even created a simple step-by-step guide for 80,000 Hours, to make it as easy as possible for others to copy her and Daniel's success.

    Blog post with links to learn more, a summary & full transcript.

    Daniel thinks the key for him was nailing the job interview.

    OpenAI needed him to be able to demonstrate the ability to do the kind of stuff he'd be working on day-to-day. So his approach was to take a list of 50 key deep reinforcement learning papers, read one or two a day, and pick a handful to actually reproduce. He spent a bunch of time coding in Python and TensorFlow, sometimes 12 hours a day, trying to debug and tune things until they were actually working.

    Daniel emphasizes that the most important thing was to practice *exactly* those things that he knew he needed to be able to do. His dedicated preparation also led to an offer from the Machine Intelligence Research Institute, and so he had the opportunity to decide between two organisations focused on the global problem that most concerns him.

    Daniel’s path might seem unusual, but both he and Catherine expect it can be replicated by others. If they're right, it could greatly increase our ability to get new people into important ML roles in which they can make a difference, as quickly as possible.

    Catherine says that her move from OpenAI to an ML research team at Google now allows her to bring a different set of skills to the table. Technical AI safety is a multifaceted area of research, and the many sub-questions in areas such as reward learning, robustness, and interpretability all need to be answered to maximize the probability that AI development goes well for humanity.

    Today’s episode combines the expertise of two pioneers and is a key resource for anyone wanting to follow in their footsteps. We cover:

    • What are OpenAI and Google Brain doing?
    • Why work on AI?
    • Do you learn more on the job, or while doing a PhD?
    • Controversial issues within ML
    • Is replicating papers a good way of determining suitability?
    • What % of software developers could make similar transitions?
    • How in-demand are research engineers?
    • The development of Dota 2 bots
    • Do research scientists have more influence on the vision of an org?
    • Has learning more made you more or less worried about the future?

    Get this episode by subscribing: type '80,000 Hours' into your podcasting app.

    The 80,000 Hours Podcast is produced by Keiran Harris.

  • Originally released in August 2022.

    Today’s release is a professional reading of our new problem profile on preventing an AI-related catastrophe, written by Benjamin Hilton.

    We expect that there will be substantial progress in AI in the next few decades, potentially even to the point where machines come to outperform humans in many, if not all, tasks. This could have enormous benefits, helping to solve currently intractable global problems, but could also pose severe risks. These risks could arise accidentally (for example, if we don’t find technical solutions to concerns about the safety of AI systems), or deliberately (for example, if AI systems worsen geopolitical conflict). We think more work needs to be done to reduce these risks.

    Some of these risks from advanced AI could be existential — meaning they could cause human extinction, or an equally permanent and severe disempowerment of humanity. There have not yet been any satisfying answers to concerns about how this rapidly approaching, transformative technology can be safely developed and integrated into our society. Finding answers to these concerns is very neglected, and may well be tractable. We estimate that there are around 300 people worldwide working directly on this. As a result, the possibility of AI-related catastrophe may be the world’s most pressing problem — and the best thing to work on for those who are well-placed to contribute.

    Promising options for working on this problem include technical research on how to create safe AI systems, strategy research into the particular risks AI might pose, and policy research into ways in which companies and governments could mitigate these risks. If worthwhile policies are developed, we’ll need people to put them in place and implement them. There are also many opportunities to have a big impact in a variety of complementary roles, such as operations management, journalism, earning to give, and more.

    If you want to check out the links, footnotes and figures in today’s article, you can find those here.

    Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app.

    Producer: Keiran Harris
    Editing and narration: Perrin Walker and Shaun Acker
    Audio proofing: Katy Moore

  • Article originally published February 2022.

    In this episode of 80k After Hours, Perrin Walker reads our career review of China-related AI safety and governance paths.

    Here’s the original piece if you’d like to learn more.

    You might also want to check out Benjamin Todd and Brian Tse's article on Improving China-Western coordination on global catastrophic risks.

    Get this episode by subscribing to our more experimental podcast on the world’s most pressing problems and how to solve them: type 80k After Hours into your podcasting app.

    Editing and narration: Perrin Walker
    Audio proofing: Katy Moore