Avsnitt
-
Feedback is essential for learning. Whether you’re studying for a test, trying to improve in your work or want to master a difficult skill, you need feedback.
The challenge is that feedback can often be hard to get. Worse, if you get bad feedback, you may end up worse than before.
Original text:
https://www.scotthyoung.com/blog/2019/01/24/how-to-get-feedback/
Author:
Scott YoungA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. -
I’ve been obsessed with managing information, and communications in a remote team since Get on Board started growing. Reducing the bus factor is a primary motivation — but another just as important is diminishing reliance on synchronicity. When what I know is documented and accessible to others, I’m less likely to be a bottleneck for anyone else in the team. So if I’m busy, minding family matters, on vacation, or sick, I won’t be blocking anyone.
This, in turn, gives everyone in the team the freedom to build their own work schedules according to their needs, work from any time zone, or enjoy more distraction-free moments. As I write these lines, most of the world is under quarantine, relying on non-stop video calls to continue working. Needless to say, that is not a sustainable long-term work schedule.
Original text:
https://www.getonbrd.com/blog/public-by-default-how-we-manage-information-visibility-at-get-on-board
Author:
Sergio NouvelA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. -
Saknas det avsnitt?
-
(In the process of answering an email, I accidentally wrote a tiny essay about writing. I usually spend weeks on an essay. This one took 67 minutes—23 of writing, and 44 of rewriting.)
Original text:
https://paulgraham.com/writing44.html
Author:
Paul GrahamA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. -
This introduces the concept of Pareto frontiers. The top comment by Rob Miles also ties it to comparative advantage.
While reading, consider what Pareto frontiers your project could place you on.
Original text:
https://www.lesswrong.com/posts/XvN2QQpKTuEzgkZHY/being-the-pareto-best-in-the-world
Author:
John WentworthA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. -
I am approaching the end of my AI governance PhD, and I’ve spent about 2.5 years as a researcher at FHI. During that time, I’ve learnt a lot about the formula for successful early-career research.
This post summarises my advice for people in the first couple of years. Research is really hard, and I want people to avoid the mistakes I’ve made.
Original text:
https://forum.effectivealtruism.org/posts/jfHPBbYFzCrbdEXXd/how-to-succeed-as-an-early-stage-researcher-the-lean-startup#Conclusion
Author:
Toby ShevlaneA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. -
The next four weeks of the course are an opportunity for you to actually build a thing that moves you closer to contributing to AI Alignment, and we're really excited to see what you do!
A common failure mode is to think "Oh, I can't actually do X" or to say "Someone else is probably doing Y."
You probably can do X, and it's unlikely anyone is doing Y! It could be you!
Original text:
https://www.neelnanda.io/blog/become-a-person-who-actually-does-things
Author:
Neel NandaA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. -
We took 10 years of research and what we’ve learned from advising 1,000+ people on how to build high-impact careers, compressed that into an eight-week course to create your career plan, and then compressed that into this three-page summary of the main points.
(It’s especially aimed at people who want a career that’s both satisfying and has a significant positive impact, but much of the advice applies to all career decisions.)
Original article:
https://80000hours.org/career-planning/summary/
Author:
Benjamin ToddA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. -
This guide is written for people who are considering direct work on technical AI alignment. I expect it to be most useful for people who are not yet working on alignment, and for people who are already familiar with the arguments for working on AI alignment. If you aren’t familiar with the arguments for the importance of AI alignment, you can get an overview of them by doing the AI Alignment Course.
by Charlie Rogers-Smith, with minor updates by Adam Jones
Source:
https://aisafetyfundamentals.com/blog/alignment-careers-guide
Narrated for AI Safety Fundamentals by Perrin WalkerA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. -
This post summarises a new report, “Computing Power and the Governance of Artificial Intelligence.” The full report is a collaboration between nineteen researchers from academia, civil society, and industry. It can be read here.
GovAI research blog posts represent the views of their authors, rather than the views of the organisation.
Source:
https://www.governance.ai/post/computing-power-and-the-governance-of-ai
Narrated for AI Safety Fundamentals by Perrin WalkerA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. -
We’ve released a paper, AI Control: Improving Safety Despite Intentional Subversion. This paper explores techniques that prevent AI catastrophes even if AI instances are colluding to subvert the safety techniques. In this post:
We summarize the paper;We compare our methodology to the methodology of other safety papers.Source:
https://www.alignmentforum.org/posts/d9FJHawgkiMSPjagR/ai-control-improving-safety-despite-intentional-subversion
Narrated for AI Safety Fundamentals by Perrin WalkerA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. -
Most conversations around the societal impacts of artificial intelligence (AI) come down to discussing some quality of an AI system, such as its truthfulness, fairness, potential for misuse, and so on. We are able to talk about these characteristics because we can technically evaluate models for their performance in these areas. But what many people working inside and outside of AI don’t fully appreciate is how difficult it is to build robust and reliable model evaluations. Many of today’s existing evaluation suites are limited in their ability to serve as accurate indicators of model capabilities or safety.
At Anthropic, we spend a lot of time building evaluations to better understand our AI systems. We also use evaluations to improve our safety as an organization, as illustrated by our Responsible Scaling Policy. In doing so, we have grown to appreciate some of the ways in which developing and running evaluations can be challenging.Here, we outline challenges that we have encountered while evaluating our own models to give readers a sense of what developing, implementing, and interpreting model evaluations looks like in practice.
Source:
https://www.anthropic.com/news/evaluating-ai-systems
Narrated for AI Safety Fundamentals by Perrin WalkerA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. -
The UK recognises the enormous opportunities that AI can unlock across our economy and our society. However, without appropriate guardrails, such technologies can pose significant risks. The AI Safety Summit will focus on how best to manage the risks from frontier AI such as misuse, loss of control and societal harms. Frontier AI organisations play an important role in addressing these risks and promoting the safety of the development and deployment of frontier AI.
The UK has therefore encouraged frontier AI organisations to publish details on their frontier AI safety policies ahead of the AI Safety Summit hosted by the UK on 1 to 2 November 2023. This will provide transparency regarding how they are putting into practice voluntary AI safety commitments and enable the sharing of safety practices within the AI ecosystem. Transparency of AI systems can increase public trust, which can be a significant driver of AI adoption.
This document complements these publications by providing a potential list of frontier AI organisations’ safety policies.
Source:
https://www.gov.uk/government/publications/emerging-processes-for-frontier-ai-safety/emerging-processes-for-frontier-ai-safety
Narrated for AI Safety Fundamentals by Perrin WalkerA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. -
Generative AI allows people to produce piles upon piles of images and words very quickly. It would be nice if there were some way to reliably distinguish AI-generated content from human-generated content. It would help people avoid endlessly arguing with bots online, or believing what a fake image purports to show. One common proposal is that big companies should incorporate watermarks into the outputs of their AIs. For instance, this could involve taking an image and subtly changing many pixels in a way that’s undetectable to the eye but detectable to a computer program. Or it could involve swapping words for synonyms in a predictable way so that the meaning is unchanged, but a program could readily determine the text was generated by an AI.
Unfortunately, watermarking schemes are unlikely to work. So far most have proven easy to remove, and it’s likely that future schemes will have similar problems.
Source:
https://transformer-circuits.pub/2023/monosemantic-features/index.html
Narrated for AI Safety Fundamentals by Perrin WalkerA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. -
Research in mechanistic interpretability seeks to explain behaviors of machine learning (ML) models in terms of their internal components. However, most previous work either focuses on simple behaviors in small models or describes complicated behaviors in larger models with broad strokes. In this work, we bridge this gap by presenting an explanation for how GPT-2 small performs a natural language task called indirect object identification (IOI). Our explanation encompasses 26 attention heads grouped into 7 main classes, which we discovered using a combination of interpretability approaches relying on causal interventions. To our knowledge, this investigation is the largest end-to-end attempt at reverse-engineering a natural behavior "in the wild" in a language model. We evaluate the reliability of our explanation using three quantitative criteria–faithfulness, completeness, and minimality. Though these criteria support our explanation, they also point to remaining gaps in our understanding. Our work provides evidence that a mechanistic understanding of large ML models is feasible, pointing toward opportunities to scale our understanding to both larger models and more complex tasks. Code for all experiments is available at https://github.com/redwoodresearch/Easy-Transformer.
Source:
https://arxiv.org/pdf/2211.00593.pdf
Narrated for AI Safety Fundamentals by Perrin WalkerA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. -
Using a sparse autoencoder, we extract a large number of interpretable features from a one-layer transformer.
Mechanistic interpretability seeks to understand neural networks by breaking them into components that are more easily understood than the whole. By understanding the function of each component, and how they interact, we hope to be able to reason about the behavior of the entire network. The first step in that program is to identify the correct components to analyze.
Unfortunately, the most natural computational unit of the neural network – the neuron itself – turns out not to be a natural unit for human understanding. This is because many neurons are polysemantic: they respond to mixtures of seemingly unrelated inputs. In the vision model Inception v1, a single neuron responds to faces of cats and fronts of cars . In a small language model we discuss in this paper, a single neuron responds to a mixture of academic citations, English dialogue, HTTP requests, and Korean text. Polysemanticity makes it difficult to reason about the behavior of the network in terms of the activity of individual neurons.
Source:
https://transformer-circuits.pub/2023/monosemantic-features/index.html
Narrated for AI Safety Fundamentals by Perrin WalkerA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. -
By studying the connections between neurons, we can find meaningful algorithms in the weights of neural networks.
Many important transition points in the history of science have been moments when science “zoomed in.” At these points, we develop a visualization or tool that allows us to see the world in a new level of detail, and a new field of science develops to study the world through this lens.For example, microscopes let us see cells, leading to cellular biology. Science zoomed in. Several techniques including x-ray crystallography let us see DNA, leading to the molecular revolution. Science zoomed in. Atomic theory. Subatomic particles. Neuroscience. Science zoomed in.
These transitions weren’t just a change in precision: they were qualitative changes in what the objects of scientific inquiry are. For example, cellular biology isn’t just more careful zoology. It’s a new kind of inquiry that dramatically shifts what we can understand.
The famous examples of this phenomenon happened at a very large scale, but it can also be the more modest shift of a small research community realizing they can now study their topic in a finer grained level of detail.
Source:
https://distill.pub/2020/circuits/zoom-in/
Narrated for AI Safety Fundamentals by Perrin WalkerA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. -
Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior—for example, to evaluate whether a model faithfully followed instructions or generated safe outputs. However, future superhuman models will behave in complex ways too difficult for humans to reliably evaluate; humans will only be able to weakly supervise superhuman models. We study an analogy to this problem: can weak model supervision elicit the full capabilities of a much stronger model? We test this using a range of pretrained language models in the GPT-4 family on natural language processing (NLP), chess, and reward modeling tasks. We find that when we naively fine-tune strong pretrained models on labels generated by a weak model, they consistently perform better than their weak supervisors, a phenomenon we call weak-to-strong generalization. However, we are still far from recovering the full capabilities of strong models with naive fine-tuning alone, suggesting that techniques like RLHF may scale poorly to superhuman models without further work.
We find that simple methods can often significantly improve weak-to-strong generalization: for example, when fine-tuning GPT-4 with a GPT-2-level supervisor and an auxiliary confidence loss, we can recover close to GPT-3.5-level performance on NLP tasks. Our results suggest that it is feasible to make empirical progress today on a fundamental challenge of aligning superhuman models.
Source:
https://arxiv.org/pdf/2312.09390.pdf
Narrated for AI Safety Fundamentals by Perrin WalkerA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. -
Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique for steering large language models (LLMs) toward desired behaviours. However, relying on simple human feedback doesn’t work for tasks that are too complex for humans to accurately judge at the scale needed to train AI models. Scalable oversight techniques attempt to address this by increasing the abilities of humans to give feedback on complex tasks.
This article briefly recaps some of the challenges faced with human feedback, and introduces the approaches to scalable oversight covered in session 4 of our AI Alignment course.
Source:
https://aisafetyfundamentals.com/blog/scalable-oversight-intro/
Narrated for AI Safety Fundamentals by Perrin WalkerA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. -
The two tasks of supervised learning: regression and classification. Linear regression, loss functions, and gradient descent.
How much money will we make by spending more dollars on digital advertising? Will this loan applicant pay back the loan or not? What’s going to happen to the stock market tomorrow?
Original article:
https://medium.com/machine-learning-for-humans/supervised-learning-740383a2feab
Author:
Vishal MainiA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. -
The field of AI has undergone a revolution over the last decade, driven by the success of deep learning techniques. This post aims to convey three ideas using a series of illustrative examples:
There have been huge jumps in the capabilities of AIs over the last decade, to the point where it’s becoming hard to specify tasks that AIs can’t do.This progress has been primarily driven by scaling up a handful of relatively simple algorithms (rather than by developing a more principled or scientific understanding of deep learning).Very few people predicted that progress would be anywhere near this fast; but many of those who did also predict that we might face existential risk from AGI in the coming decades.I’ll focus on four domains: vision, games, language-based tasks, and science. The first two have more limited real-world applications, but provide particularly graphic and intuitive examples of the pace of progress.
Original article:
https://medium.com/@richardcngo/visualizing-the-deep-learning-revolution-722098eb9c5Author:
Richard NgoA podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website. - Visa fler