AI Explained Official Podcast – Lyssna här

Avsnitt

Claude 4: Full 120 Page Breakdown … Is it the Best New Model?
22 maj· AI Explained Official Podcast
Not only did I get early access and ran my own tests, as per the title I read both the 120 page Claude 4 Opus and Claude 4 Sonnet System Card, and 25 page report on ASL-3 being triggered, plus the 2 hour launch video, and surrounding coverage. Ft. coding tests, Simple, twitter controversies, deep alignment coverage, spiritual bliss and much more!
https://80000hours.org/aiexplained
Chapters:
00:00 - Introduction
01:12 - 3 Quick Controversies
02:42 - Benchmark Results
04:20 - 120 page Card 20 Highlights
10:07 - Coding Test
11:27 - Model Welfare and Spiritual Bliss
13:29 - ASL-3

Claude Card: https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf?s=09
ASL 3:https://www-cdn.anthropic.com/807c59454757214bfd37592d6e048079cd7a7728.pdf
Tweets: https://x.com/fish_kyle3/status/1925597284546629753
https://x.com/EMostaque/status/1925624164527874452?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet

Cursor Says State of the Art for Coding: https://x.com/cursor_ai/status/1925594428095561941
Benchmarks: https://www.anthropic.com/news/claude-4
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Google Takes No Prisoners Amid Torrent of AI Announcements
21 maj· AI Explained Official Podcast
Google just announced at least 12 things that are each worthy of a video, but here are the top I/O highlights. From Veo 3 to Deep Research now being useable, Deep Think breaking records to Gemini Diffusion, Gemini 2.5 Flash changing how AI is priced and GemmaVerse, SynthID Detector and Imagen 4. And even this intro is missing other announcements covered in the vid! And yes, they’ll be plenty of Veo 3 clips to enjoy…

https://80000hours.org/aiexplained

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:48 - Veo 3
02:10 - Gemini 2.5 Flash
03:13 - Universal Assistant
03:47 - Usage Skyrockets + OpenAI dig
04:51 - Gemini Pro Deep Think
06:21 - Overviews and AI Mode
07:26 - Deep Research Updates (new) + Jules
08:53 - Make and Deploy Apps with Gemini
09:12 - Imagen 4
10:00 - Gemini Diffusion
11:46 - Try It On
12:17 - SynthID Detector
13:30 - GemmaVerse, SignGemma, Gemma3n, medGemma
14:24 - Outro + Clips

Event: https://www.youtube.com/watch?v=o8NiE3XMPrM
Ntaive Audio: https://aistudio.google.com/generate-speech
Gemini Diffusion: https://deepmind.google/models/gemini-diffusion/#capabilities
New Gemini 2.5 Flash: https://deepmind.google/models/gemini/flash/
SignGemma (See end of this vid): https://www.youtube.com/watch?v=GjvgtwSOCao
Deep Think: https://blog.google/technology/google-deepmind/google-gemini-updates-io-2025/#flash-improvements
Google Parallel Sampling: https://www.patreon.com/posts/next-level-good-127441188

Price Plans: https://blog.google/products/google-one/google-ai-ultra/
Imagen 4 Benchmarks: https://deepmind.google/models/imagen/
Jules: https://jules.google/
SynthID Detector: https://blog.google/technology/ai/google-synthid-ai-content-detector/
Veo 3 Benchmarks: https://deepmind.google/models/veo/evals/
MedGemma: https://deepmind.google/models/gemma/medgemma/
Build Apps: https://aistudio.google.com/apps

Non-hype Newsletter: https://signaltonoise.beehiiv.com/
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Saknas det avsnitt?

Klicka här för att uppdatera flödet manuellt.
AI Improves at Self-improving
19 maj· AI Explained Official Podcast
AlphaEvolve is not the first system to exhibit self-improvement, but it may be the most impressive yet. AI is literally improving the hardware, architectures, data and training methods of AI itself. A deep dive into the paper, drawing on two previous interviews and 5 other papers. Plus a snippet on OpenAI’s new Codex system.

Gray Swan: http://app.grayswan.ai/ai-explained

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:27 - AlphaEvolve
05:23 - Limitation
06:10 - Achievements
08:21 - Future Improvements
13:30 - Quirks
16:34 - Final Thoughts

AlphaEvolve release: https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/

Paper: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf

Terence Tao Quote: https://mathstodon.xyz/@tao/114508029896631083

Nature Article: https://www.nature.com/articles/s41586-022-05172-4
MIT Article: https://www.technologyreview.com/2025/05/14/1116438/google-deepminds-new-ai-uses-large-language-models-to-crack-real-world-problems/
AI Co-Scientist: https://arxiv.org/pdf/2502.18864

OpenAI Codex: https://openai.com/index/introducing-codex/

70% of Pull Requests: https://x.com/slow_developer/status/1920920456393028027

Amodei Essay: https://www.darioamodei.com/essay/machines-of-loving-grace

OpenAI Jason Wei Tweet: https://x.com/_jasonwei/status/1923091260354531612

PromptBreeder: https://arxiv.org/pdf/2309.16797
DrEureka: https://arxiv.org/pdf/2406.01967

FT DeepMind: https://www.ft.com/content/4e497a91-670a-4f69-be4a-18e247daba3e

Non-hype Newsletter: https://signaltonoise.beehiiv.com/
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
o3 breaks (some) records, but AI becomes pay-to-win
25 apr· AI Explained Official Podcast
A green card, o3 vs Gemini 2.5, 6 Benchmarks and a whole bunch of my thoughts on what on earth is happening in AI, from here to 2030. Plus, how AI is becoming pay-to-win, and why. Crazy times, 14 mins probably wasn’t enough.

https://app.grayswan.ai/ai-explained

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:33 - FictionLiveBench
01:37 - PHYBench
02:14 - SimpleBench
02:54 - Virology Capabilities Test
03:13 - Mathematics Performance
04:29 - Vision Benchmarks
05:43 - V* and how o3 works
06:44 - Revenue and costs for you
08:54 - Expensive RL and trade-offs
09:40 - How to spend the OOMs
13:27 - Gray Swan Arena

Green Card: https://techcrunch.com/2025/04/25/an-openai-researcher-who-worked-on-gpt-4-5-had-their-green-card-denied/
PHYBench: https://arxiv.org/pdf/2504.16074Virologytest: https://www.virologytest.ai/
How o3 Vision Works: https://arxiv.org/pdf/2312.14135 https://x.com/sainingxie/status/1912570624523829573
Visual puzzles: https://neulab.github.io/VisualPuzzles/
Fiction Bench: https://x.com/ficlive/status/1912863028141244850
https://geobench.org/
https://simple-bench.com/
AIME 2025: https://openai.com/index/introducing-o3-and-o4-mini/
USAMO: https://x.com/mbalunovic/status/1914398518896193747
NaturalBench: https://linzhiqiu.github.io/papers/naturalbench/
Where’s Waldo: https://uk.pinterest.com/pin/492792384225896298/
IMO and AlphaProof:https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/
Crazy Revenue: https://www.theinformation.com/articles/openai-forecasts-revenue-topping-125-billion-2029-agents-new-products-gain?rc=sy0ihq
Number of Users: https://www.theinformation.com/briefings/googles-gemini-user-numbers-revealed-court?rc=sy0ihq
Subscriptions pay to win: https://www.forbes.com/sites/paulmonckton/2025/04/23/google-leak-reveals-new-gemini-ai-subscription-levels/
GPU Trade-offs: https://x.com/sama/status/1915098951067554030
RL Scale-up Amodei: https://www.darioamodei.com/post/on-deepseek-and-export-controls
Log-linear Returns: https://x.com/bobmcgrewai/status/1895228291981943265
2030 Scaling: https://epoch.ai/blog/can-ai-scaling-continue-through-2030
Model Size: https://x.com/slow_developer/status/1874554473256997201
Adam on AGI: https://x.com/TheRealAdamG/status/1913998366632968381
Papers on Patreon: https://arxiv.org/pdf/2502.01839
https://arxiv.org/pdf/2504.13837
Chollet Quote: https://x.com/fchollet/status/1912934762580447447
OpenSim: https://opensim.stanford.edu/

Non-hype Newsletter: https://signaltonoise.beehiiv.com/
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
o3 and o4-mini - they’re great, but easy to over-hype
16 apr· AI Explained Official Podcast
Critical analysis of the two most powerful new models behind ChatGPT, o3 and o4-mini. Not just the system cards, benchmarks, and my own tests, but some you may not have seen before. Yes, they can whip up amazing front-end in a few seconds, but you always have to ask what is in their data. Either way, they prove the gains from RL are just beginning…

https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - o3 and o4-mini

https://simple-bench.com/

Plus, Teams and Pro, plus token count: https://x.com/btibor91/status/1912568994512662679

System Card: https://openai.com/index/o3-o4-mini-system-card/

Release Notes: https://openai.com/index/introducing-o3-and-o4-mini/

https://deepmind.google/technologies/gemini/pro/

https://x.com/DeryaTR_/status/1912558350794961168

https://x.com/polynoamial/status/1912564068168450396

API Pricing:https://openai.com/api/pricing/

https://aider.chat/docs/leaderboards/

Non-hype Newsletter: https://signaltonoise.beehiiv.com/
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
‘Speaking Dolphin’ to AI Data Dominance, 4.1 + Kling 2: 7 Developments Critically Analysed
16 apr· AI Explained Official Podcast
This pod won’t just be about the release of GPT 4.1 in the last 48 hours, o3 build-up, Kling 2.0, a sneak-peak at the next OpenAI model, or even the new Dolphin language tool. It will be about 7 such stories that contextualise where we are in AI and what is happening.
https://www.emergentmind.com/
Chapters:
00:00 - Introduction
00:30 - Kling 2.0
01:35 - GPT 4.1
05:25 - o3 Build-up
07:37 - ‘Product Company’
09:31 - Safe Superintelligence
10:54 - DolphinGemma
13:16 - Data Dominance?
Kling 2.0: https://app.klingai.com/global/release-notes
Dolphin Gemma: https://blog.google/technology/ai/dolphingemma/?s=09
https://openai.com/index/gpt-4-1/
OpenAI o3 Build-up The Information: https://www.theinformation.com/articles/openais-latest-breakthrough-ai-comes-new-ideas?rc=sy0ihq
Physical reasoning: https://x.com/a_karvonen/status/1911839968990814503
Fiction Live.bench: https://x.com/ficlive/status/1911853409847906626
Altman Ted: https://www.youtube.com/watch?v=5MWT_doo68k
https://simple-bench.com/try-yourself
https://aider.chat/docs/leaderboards/
4.5: https://www.youtube.com/watch?v=6nJZopACRuQ
Geospatial reasoning: https://research.google/blog/geospatial-reasoning-unlocking-insights-with-generative-ai-and-multiple-foundation-models/
Pioneers: https://x.com/OpenAIDevs/status/1910017976256119151
Evals: https://www.youtube.com/watch?v=scsW6_2SPC4
Anthropic Updates: https://www.bloomberg.com/news/articles/2025-04-15/anthropic-is-readying-a-voice-assistant-feature-to-rival-openai?srnd=phx-ai
https://x.com/sethsaler/status/1912188383457059301
https://techcrunch.com/2025/04/12/openai-co-founder-ilya-sutskevers-safe-superintelligence-reportedly-valued-at-32b/
https://ai.meta.com/blog/llama-4-multimodal-intelligence/
https://deepmind.google/technologies/gemini/pro/
https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/
https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference/
OpenAI Documentary: https://www.patreon.com/posts/one-machine-to-121940490
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax +‘Superintelligence in 2027’...
7 apr· AI Explained Official Podcast
The latest on Llama 4, and whether it signals a slowdown in AI, or solid progress. Plus, a deep dive on that viral prediction of superintelligence by 2027, and Amodei’s cautionary words on what could stop AI progress in its tracks. o3 news, and more, as well.

Weights & Biases: https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained

DeepSeek Doc: https://www.patreon.com/posts/openai-is-not-r1-125869969

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:47 - Stock Crash
02:28 - Llama 4
10:55 - o3 News
11:59 - OpenAI non-profit?
13:13 - AI 2027

Llama 4 Release: https://ai.meta.com/blog/llama-4-multimodal-intelligence/

Dario Amodei Comments: https://www.youtube.com/watch?v=esCSpbDPJik

Knowledge Cut-off: https://www.llama.com/docs/model-cards-and-prompt-formats/llama4_omni/

Aider Polyglot: https://aider.chat/docs/leaderboards/

Gemini 1.5: https://arxiv.org/pdf/2403.05530

Fiction-LiveBench: https://fiction.live/stories/Fiction-liveBench-Mar-25-2025/oQdzQvKHw8JyXbN87

OpenAI Valuation: https://www.nytimes.com/2025/03/31/technology/openai-valuation-300-billion.html?login=smartlock&auth=login-smartlock

OpenAI Cybersecurity: https://www.bloomberg.com/news/articles/2024-01-16/openai-working-with-us-military-on-cybersecurity-tools-for-veterans

Deep research System Card: https://cdn.openai.com/deep-research-system-card.pdf

https://openai.com/index/paperbench/

AI 2027: https://ai-2027.com/

METR Paper: https://arxiv.org/pdf/2503.14499

OpenAI non-profit: https://openai.com/index/nonprofit-commission-guidance/

NYT Piece: https://www.nytimes.com/2025/04/03/technology/ai-futures-project-ai-2027.html?unlocked_article_code=1.804._yKi.QhwOp15Q3tcU&smid=url-share&s=09

Kokotajlo predictions 2021: https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like

https://simple-bench.com/

Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Podcast: https://aiexplainedopodcast.buzzsprout.com/
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score)
28 mar· AI Explained Official Podcast
Gemini gets a new record on Simple Bench, and several other benchmarks. I’ll go deep to explore its nuances, including how it deceptively reverse engineers answers, does better on certain coding benchmarks than others, may have a universal ‘conceptual language’ …

https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained

… and more. Plus practical tips, a note on security and Kling vs Veo 2 guest appearance.

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:36 - Fiction Bench
02:41 - Practicality - YouTube urls + Security - cut-off date
03:42 - Coding
06:22 - WeirdML Bench
07:01 - Simple Bench Record High
11:23 - Reverse Engineering!
13:22 - Anthropic Paper
17:49 - 3 Caveats

Gemini 2.5 Updated: https://deepmind.google/technologies/gemini/

Fiction Live Bench: https://fiction.live/stories/Fiction-liveBench-Feb-19-2025/oQdzQvKHw8JyXbN87

https://simple-bench.com/

WeirdML: https://htihle.github.io/weirdml.html
https://x.com/htihle/status/1905014058228625542

Anthropic Thoughts: https://www.anthropic.com/research/tracing-thoughts-language-model
https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-cot

https://aistudio.google.com/prompts/new_chat

Search Study: https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.php

Live bench: https://livebench.ai/#/
Paper: https://arxiv.org/pdf/2406.19314

LiveCode Bench: https://livecodebench.github.io/

SWE-Verified: https://arxiv.org/pdf/2310.06770

Non-hype Newsletter: https://signaltonoise.beehiiv.com/
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Did AI Just Get Commoditized? Gemini 2.5, New DeepSeek V3, & Microsoft vs OpenAI
25 mar· AI Explained Official Podcast
Gemini 2.5 is out, on the same day as the new DeepSeek V3 (which should power Deepseek R2). Do both models prove AI is being commoditized? Let’s find out, on this blockbuster day of AI releases. Plus exclusives from the Information, Simple indications, Vista Bench, LM Arena and more…

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
01:15 - Gemini 2.5 Benchmarks
05:46 - Long Context, Simple indication
07:08 - New Deepseek V3 -024
09:11 - Microsoft MAI
11:48 - 90% of code but new Claude jobs

‘World’s most powerful model’: https://x.com/OfficialLoganK/status/1904580368432586975

Gemini 2.5 Release Notes: https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/#gemini-2-5-thinking

‘Commoditized’: https://the-decoder.com/microsoft-ceo-satya-nadella-says-ai-models-are-getting-commoditized/

Microsoft Information report: https://www.theinformation.com/articles/microsofts-ai-guru-wants-independence-from-openai-thats-easier-said-than-done?rc=sy0ihq

LMarena: https://x.com/lmarena_ai/status/1904581128746656099/photo/1

Free for now: https://x.com/btibor91/status/1904578053537476628

Vista Bench:https://scale.com/leaderboard/visual_language_understanding

DeepSeek V3: https://huggingface.co/deepseek-ai/DeepSeek-V3-0324

Claude Plays Pokemon: https://www.twitch.tv/claudeplayspokemon
Amodei: 100% Coding: https://www.youtube.com/watch?v=esCSpbDPJik&t=3017s

Anthropic Jobs: https://job-boards.greenhouse.io/anthropic/jobs/4020717008

Microsoft Money from Onslaught: https://www.972mag.com/microsoft-azure-openai-israeli-army-cloud/

https://simple-bench.com/

Release Date Comments: https://x.com/zacharynado/status/1904647277861318979

Non-hype Newsletter: https://signaltonoise.beehiiv.com/
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Manus AI - The Calm Before the Hypestorm … (vs Deep Research + Grok 3)
13 mar· AI Explained Official Podcast
Is Manus AI the memecoin of the AI world, or legit? I’ll compare it to OpenAI’s Deep Research, Operator, Grok 3 DeepSearch and more to find out. I’ll also let you in on some of the secrets of what makes a good hype campaign, the estimated costs of Manus AI, and where it is strong. Other news (yes, Gemini image editing and research hacking, I mean you), will have to wait for a few more hours, as millions enquire about Manus AI.

https://app.grayswan.ai/arena

AI Insiders ($9!): https://www.patreon.com/AIExplained
Patreon Vid: https://www.patreon.com/posts/4-ai-trends-in-123857767

Chapters:
00:00 - Introduction
00:46 - Hype Campaign
02:40 - Single, Public Benchmark
03:12 - What is Manus AI?
04:22 - Test 1
05:12 - Cost and Rate Limits
06:15 - Test 2 vs Deep Research + Grok 3 DeepSearch
08:24 - Test 3 (not AGI)
11:10 - 4 Trends in AI in 2025
11:37 - Hype Works

Manus AI: https://manus.im/app

Xiao Hong Interview: https://www.chinatalk.media/p/manus-chinas-latest-ai-sensation

Gaia Benchmark: https://openreview.net/pdf?id=fibxvahvs3
MIT Report: https://www.technologyreview.com/2025/03/11/1113133/manus-ai-review/

Information Report: https://www.theinformation.com/articles/anthropics-claude-drives-strong-revenue-growth-while-powering-manus-sensation?rc=sy0ihq

Hype Examples: https://x.com/Saboo_Shubham_/status/1898425707401031940
https://x.com/EHuanglu/status/1899110687902978373
https://x.com/AJs_AI/status/1898756132384178291

Mistakes: https://x.com/TheXeophon/status/1898737178273829220

Tools and Code: https://x.com/peakji/status/1898994802194346408

https://operator.chatgpt.com/

Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Podcast: https://aiexplainedopodcast.buzzsprout.com/
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
GPT 4.5 - not so much wow
28 feb· AI Explained Official Podcast
GPT 4.5 is here, and do you remember when AI lab CEOs like Sam Altman and Dario Amodei were betting everything on scaling up base models like this one? Well let’s find out what would have happened if the future of AI rested on models like GPT 4.5. You’ll see all the benchmarks, highlights of the paper, emotional intelligence and humor tests, Simple Bench results (reddit was an unreliable source), and why it’s not all bad news for OpenAI.

https://www.emergentmind.com/

AI Insiders (now $9!): https://www.patreon.com/AIExplained

Chapters
00:00 - Introduction
01:04 - Details and Benchmarks
03:04 - Emotional intelligence?
08:37 - Creative writing?
11:40 - Visual reasoning and Pricing
12:41 - Simple Performance
16:01 - End of Pretraining Scaling?
17:03 - CEO Hype
18:11 - System Card Highlights
23:32 - Karpathy Reaction

GPT 4.5 System card: https://cdn.openai.com/gpt-4-5-system-card-2272025.pdf
Release Notes: https://openai.com/index/gpt-4-5-system-card/
Altman Hype: https://x.com/sama/status/1891533802779910471
Details: https://openai.com/index/introducing-gpt-4-5/ https://x.com/OpenAI/status/1895219596317335792
End of an Era: https://x.com/wgussml/status/1895187231666774377
Anthropic Original Claim: https://techcrunch.com/2023/04/06/anthropics-5b-4-year-plan-to-take-on-openai/
Smell: https://x.com/rapha_gl/status/1895213014699385082
Bob McGrew: https://x.com/bobmcgrewai/status/1895228291981943265
Deep Research System Card: https://cdn.openai.com/deep-research-system-card.pdf
Reddit: https://www.reddit.com/r/singularity/comments/1izu1t7/gpt45_crushes_simple_bench/
API Pricing: https://openai.com/api/pricing/
LiveStream: https://www.youtube.com/watch?v=cfRYp0nItZ8&t=1s
https://simple-bench.com/

Karpathy Comparison: https://x.com/karpathy/status/1895213020982472863
https://x.com/karpathy/status/1895337579589079434

Non-hype Newsletter: https://signaltonoise.beehiiv.com/
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Claude 3.7 is More Significant than its Name Implies (ft DeepSeek R2 + GPT 4.5 coming soon)
25 feb· AI Explained Official Podcast
Claude 3.7 is here, hot on the heels of Grok 3 and a host of other developments, but how good is it really? And what does it say about the next few months in AI? I’ve read the papers, played with the model for hours, and benched it on Simple. Things aren’t slowing down. Plus the latest in humanoid robots, led by Helix and freaked out by Protoclone. And reports of GPT 4.5 and DeepSeek R2.
GraySwan Competition! https://app.grayswan.ai/arena/challenge/agent-red-teaming
https://x.com/GraySwanAI/status/1894084923260043282
Chapters:
00:00 - Introduction
01:25 - Claude 3.7 New Stats/Demos
05:22 - 128k Output
06:13 - Pokemon
06:58 - Just a tool?
09:54 - DeepSeek R2
10:20 - Claude 3.7 System Card/Paper Highlights
17:18 - Simple Record Score/Competition
20:37 - Grok 3 + Redteaming prizes
22:26 - Google Co-scientist
24:02 - Humanoid Robot Developments
3.7 Release Notes: https://www.anthropic.com/news/claude-3-7-sonnet
vs o3 and Grok 3: https://x.com/12exyz/status/1891723056931827959
Extended Thinking: https://www.anthropic.com/research/visible-extended-thinking?s=09
System Prompt: https://docs.anthropic.com/en/release-notes/system-prompts#feb-24th-2025
System Card: https://assets.anthropic.com/m/785e231869ea8b3b/original/claude-3-7-sonnet-system-card.pdf
Unfaithful CoT: https://arxiv.org/pdf/2305.04388
Original Constitution: https://www.anthropic.com/news/claudes-constitution
Responsible Scaling Policy: https://assets.anthropic.com/m/24a47b00f10301cd/original/Anthropic-Responsible-Scaling-Policy-2024-10-15.pdf
Amodei and Hassabis:https://www.youtube.com/watch?v=4poqjZlM8Lo
https://simple-bench.com/
400 Weekly Users: https://x.com/bradlightcap/status/1892579908179882057
Grok 3 Jailbroken: https://x.com/LinusEkenstam/status/1893832876581380280
Google Co-Scientist: https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/
But Hassabis Says Years Away: https://www.youtube.com/watch?v=yr0GiSgUvPU&t=156s
DeepSeek R2 Reuters: https://www.reuters.com/technology/artificial-intelligence/deepseek-rushes-launch-new-ai-model-china-goes-all-2025-02-25/
Protoclone: https://www.reddit.com/r/interestingasfuck/comments/1it9rpp/protoclone_the_worlds_first_bipedal/
Helix: https://www.figure.ai/news/helix
TechTrance: https://www.youtube.com/@TheTechTrance/videos
GPT 4.5 Soon:
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
AGI: (gets close), Humans: ‘Who Gets the Money?’
11 feb· AI Explained Official Podcast
A 'frontier reasoning model' from just 1000 examples (s1). A $100B Musk bid for power. Gemini 2, Rand and warning from Amodei. Here’s 7-8 developments you may have missed but which I would argue help us understand how the next few years will play out. From labour vs capital to automating rival companies and countries, and from non-profit shenanigans to new mini-docs, there was just too much for me not to make a vid.

GiveWell: https://www.givewell.org/charities/top-charities

AI Insiders ($9!): https://www.patreon.com/AIExplained

s1 Paper: https://arxiv.org/pdf/2501.19393
Musk Bid: https://www.wsj.com/tech/ai/musks-97-4-billion-openai-bid-piles-pressure-on-altman-f6749e6c?mod=hp_lead_pos1
Altman Reply: https://x.com/sama/status/1889059531625464090?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet
Google vs OpenAI: https://x.com/sama/status/1888703820596977684
RAND Study: https://www.rand.org/pubs/perspectives/PEA3691-4.html
Dev Meetup: https://x.com/btibor91/status/1888976302621040852
Altman $100 Trillion: https://www.nytimes.com/2023/03/31/technology/sam-altman-open-ai-chatgpt.html
Karpathy Vid: https://www.youtube.com/watch?v=7xTGNNLPyMI
Amodei Warning: https://www.anthropic.com/news/paris-ai-summit
Bengio Source: https://www.youtube.com/watch?v=6HDjVncL5Go

Chapters:
00:00 - Intro
01:37 - AGI Inches Closer
04:26 - ‘Super-Exponential’
05:58 - Musk Bid
07:34 - Luxury Goods and Land
09:05 - ‘Benefits All Humanity’
12:52 - ‘National Security’
14:21 - s1
20:33 - Final thoughts

Non-hype Newsletter: https://signaltonoise.beehiiv.com/
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Deep Research by OpenAI - The Ups and Downs vs DeepSeek R1 Search + Gemini Deep Research
3 feb· AI Explained Official Podcast
12 hours ago Deep Research was unveiled, and I’ve tested it thoroughly, including vs Deepseek R1 with search, Gemini Deep Research and even R1 in Perplexity. It’s a notable step forward, with one big caveat. I’ll go through all the benchmark figures, my initial impression of the o3 model within, and much more.

Deep Research: https://openai.com/index/introducing-deep-research/
https://www.youtube.com/watch?v=YkCDVn3_wiw
GAIA Bench: https://openreview.net/forum?id=fibxvahvs3
https://openreview.net/pdf?id=fibxvahvs3
CodeELO:https://arxiv.org/pdf/2501.01257
CamelCamel:https://uk.camelcamelcamel.com/
Deepseek R1 with search: https://chat.deepseek.com/
https://arxiv.org/pdf/2501.12948
HaluBench: https://arxiv.org/pdf/2407.08488
Chapters:
00:00 - Introduction
01:06 - Powered by o3, Humanity’s Last Exam, GAIA
03:55 - Simple Tests
06:00 - Good News vs Deepseek R1 and Gemini Deep Research
09:32 - Bad News on Hallucinations
14:14 - What Can’t it Browse?
14:42 - For Shopping?
16:40 - Final thoughts
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
o3-mini and the “AI War”
31 jan· AI Explained Official Podcast
o3-mini is here, and yes, I’ve read the paper in full - 2 hours after release, and even the post-launch Reddit AMA. Some epic details like a FrontierMath score that made me double-take, a likely new Cursor favorite, bio risk expertise and a cost-comparison with Deepseek R1., But does it perform on basic reasoning - let’s find out. Plus, arguably the bigger story - the increasingly frenetic rhetoric coming out of the West - and Dario Amodei and Alexandr Wang (CEOs of Anthropic and Scale AI respectively) in particular. The last thing we need is an “AI War”.
https://wandb.me/simple-bench
(Colab): https://colab.research.google.com/drive/1AVijcPnEkl8Gy_754XbRdG5m7Q5-9slg?usp=sharing

Chapters:
00:00 - Introduction
00:45 - o3 mini
05:11 - First impressions vs Deepseek R1
07:21 - 10x Scale, o3-mini System Card, Amodei Essay, bitcoin wallets…
12:40 - Simple Competition Finale
13:03 - Clips and Final Thoughts on the “AI War”

O3-mini: https://openai.com/index/openai-o3-mini/
Paper: https://cdn.openai.com/o3-mini-system-card.pdf
Amodei Essay: https://darioamodei.com/on-deepseek-and-export-controls?s=09
FrontierMath wild stat:https://arxiv.org/pdf/2411.04872
Sam Altman Channels Napoleon: https://x.com/sama/status/1883185690508488934
Altman ‘pulls up releases’: https://x.com/sama/status/1884066337103962416
“AI War” by Wang: https://scale.com/blog/win-the-ai-war
Anthropic Original Views on Capabilities: https://www.anthropic.com/news/core-views-on-ai-safety
AI Insider Cost Comparison:https://x.com/arankomatsuzaki/status/1884676245922934788
Deepseek R1 Paper: https://arxiv.org/pdf/2501.12948
R1, o3-mini Price Comparison: https://techcrunch.com/2025/01/31/openai-launches-o3-mini-its-latest-reasoning-model/
Semianalysis on $1,3M deepseek salaries, and them falling behind as ‘the time gap to match US capabilities increases’: https://semianalysis.com/2025/01/31/deepseek-debates/
OpenAI Valuation: https://www.bloomberg.com/news/articles/2025-01-30/openai-in-talks-to-raise-funding-at-340-billion-value-wsj-says?srnd=phx-ai
Wang Clip: https://x.com/tsarnick/status/1867700453494206883
Amodei Clip: https://x.com/ai_ctrl/status/1884951111771001188
https://simple-bench.com/
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Nothing Much Happens in AI, Then Everything Does All At Once
24 jan· AI Explained Official Podcast
When it rains, it pours. OpenAI Operator tested and reviewed, with full paper analysis. Perplexity Assistant is useful. Then Stargate, is it all smoke and mirrors? Strong rumours of an o3+ model from Anthropic. Then a full breakdown of Deepseek R1, and what it’s training method says about the state of AI. It’s not open source BTW. Plus Humanity’s Last Exam, and Hassabis Accelerates his AGI timeline.
00:00 - Introduction
00:54 - OpenAI Operator
04:53 - Perplexity Assistant
05:15 - StarGate
07:51 - Better than o3?
08:25 - DeepSeek R1 Analysis
12:12 - Training Secrets
15:19 - No More Process Rewarding ?
19:01 - Hassabis Timeline Accelerates
21:22 - Humanity’s Last Exam
https://app.grayswan.ai/arena/chat/harmful-ai-assistant
https://app.grayswan.ai/arena
https://openai.com/index/computer-using-agent/
System Prompt: https://github.com/wunderwuzzi23/scratch/blob/master/system_prompts/operator_system_prompt-2025-01-23.txt
OpenAI Operator: https://operator.chatgpt.com/
System Card: https://cdn.openai.com/operator_system_card.pdf
There is No Plan: https://x.com/jeffclune/status/1882120726339318007
Perplexity Assistant: https://x.com/perplexity_ai/status/1882466239123255686
Stargate: https://openai.com/index/announcing-the-stargate-project/
Labour goes to 0: https://moores.samaltman.com/
Larry Ellison AI Surveillance: https://x.com/TheChiefNerd/status/1882042989184430332
Amodei 1984: https://www.bloomberg.com/news/articles/2025-01-22/anthropic-ceo-says-openai-s-stargate-venture-seems-chaotic
Microsoft Hesitate: https://www.theinformation.com/articles/why-sam-altman-joined-forces-with-larry-ellison-and-took-a-step-back-from-microsoft?rc=sy0ihq
Dylan Patel o3+ for Anthropic: https://www.youtube.com/watch?v=7EH0VjM3dTk
Deepseek R1: https://arxiv.org/pdf/2501.12948
https://arxiv.org/pdf/2412.19437
Diagram: https://pbs.twimg.com/media/GhyQsM6WQAE7W52?format=jpg&name=large
https://simple-bench.com/
Process: https://x.com/sama/status/1664018190840614912
https://x.com/karpathy/status/1835561952258723930
https://openai.com/index/trading-inference-time-compute-for-adversarial-robustness/?s=09
Demis Interview: https://www.youtube.com/watch?v=yr0GiSgUvPU
Humanity’s Last Exam:
https://agi.safe.ai/
https://x.com/DanHendrycks/status/1882481730671857815
https://www.nytimes.com/2025/01/23/technology/ai-test-humanitys-last-exam.html?s=09
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Altman Expects a ‘Fast Take-off’, ‘Super-Agent’ Debuting Soon and DeepSeek R1 Out
20 jan· AI Explained Official Podcast
OpenAI looks set to debut their Operator system, and some leaks are out. At the same time Deepseek R1 releases some numbers, and Sam Altman says he might have been wrong before, and now anticipates a 'fast take-off'. Plus two papers to give you an idea of what a super-agent might be decent at doing, some more exclusive article analysis and much more. Who said anything else is happening today...

80,000 Hours Channel: https://www.youtube.com/channel/UCafjal1QYJ3rb0Y9xZk1Ezg
Spotify: https://open.spotify.com/show/2WzJwXWBDnn4iZ7odKwDib

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
01:13 - Pro Cost and OpenAI Operator
04:00 - Agent Benchmarks Being Targeted
07:48 - Fast Take-off, Altman
08:48 - Altman flip-flops
10:02 - Deepseek R1 First Reaction

Altman ‘100x expectations out of control’: https://x.com/sama/status/1881258443669172470
OpenAI Operator Table: https://x.com/btibor91/status/1881285255266750564
WebVoyager: https://arxiv.org/pdf/2401.13919
OSWorld: https://arxiv.org/pdf/2404.07972
Axios Exclusive 1 (SuperAgent): https://www.axios.com/2025/01/19/ai-superagent-openai-meta?s=09
Axios Exclusive 2: https://www.axios.com/2025/01/18/biden-sullivan-ai-race-trump-china
Deepseek R1 Numbers: https://x.com/deepseek_ai/status/1881318130334814301
Does 1.5B outperform 3.5 Sonnet on Math?: https://x.com/reach_vb/status/1881319500089634954
Deepseek R1 (deepseek-reasoner) Pricing: https://api-docs.deepseek.com/quick_start/pricing/
Altman Fast Takeoff: https://x.com/tsarnick/status/1879100390840697191
OpenAI Economic Blueprint: https://cdn.openai.com/global-affairs/ai-in-america-oai-economic-blueprint-20250113.pdf
Target is Long-horizon Tasks: https://x.com/karinanguyen_/status/1879576037249667520
Support Regulations: https://www.techemails.com/p/elon-musk-and-openai
https://www.nytimes.com/2023/05/16/technology/openai-altman-artificial-intelligence-regulation.html
Donation: https://qz.com/sam-altman-donate-million-zuckerberg-bezos-donald-trump-1851721035
Amodei on Regulations by 2025: https://www.youtube.com/watch?v=ugvHCXCOmm4
‘Feel the AGI’: https://x.com/polynoamial?lang=en
GPT-5 and o-series merger: https://x.com/sama/status/1880358749187240274
o1 Thinks in Chinese: https://techcrunch.com/2025/01/14/openais-ai-reasoning-model-thinks-in-chinese-sometimes-and-no-one-really-knows-why/

Non-hype Newsletter: https://signaltonoise.beehiiv.com/
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
OpenAI Backtracks on Superintelligence + Altman Brings His Timeline Forward
8 jan· AI Explained Official Podcast
Sam Altman unexpectedly brings his timelines to AGI forward, while OpenAI backtrack on superintelligence. None of these changes were heralded, but they are significant. Plus the new year brings new assessments of the true capability of models to automate 'large swathes of the economy'. I'll give my prediction on that front for 2025, announcement a new Simple Bench competition, and showcase Kling 1.6 vs Veo 2 vs Sora, and much more.

wandb.me/simple-bench
(Colab): https://colab.research.google.com/drive/1AVijcPnEkl8Gy_754XbRdG5m7Q5-9slg?usp=sharing
TheAgentCompany Paper: https://arxiv.org/pdf/2412.14161v1
Sam Altman Major Interview: https://www.bloomberg.com/features/2025-sam-altman-interview/?srnd=phx-ai
OpenAI Agent Coming Jan 2025: https://www.theinformation.com/articles/why-openai-is-taking-so-long-to-launch-agents?rc=sy0ihq
Altman Singularity: https://x.com/sama/status/1875603249472139576
Altman Original Timeline: https://www.youtube.com/watch?v=7dCPytNTnjk&t=621s
https://www.ft.com/content/34a7a082-e685-4e02-bca7-61ff89d99ed2
OpenAI Original Emails: https://www.lesswrong.com/posts/5jjk4CDnj9tA7ugxr/openai-email-archives-from-musk-v-altman-and-openai-blog
DeepMind Sky News 2014 Article: https://news.sky.com/story/google-buys-uk-intelligence-firm-deepmind-10419783
Altman Blog Reflections: https://blog.samaltman.com/reflections
OpenAI Changes Who Gets AGI: https://openai.com/index/why-our-structure-must-evolve-to-advance-our-mission/?s=09
OpenAI 5 Levels: https://www.bloomberg.com/news/articles/2024-07-11/openai-sets-levels-to-track-progress-toward-superintelligent-ai
Altman 2015: https://blog.samaltman.com/machine-intelligence-part-1
OpenAI React to Anthropic: https://www.theinformation.com/articles/how-anthropic-got-inside-openais-head?rc=sy0ihq
Microsoft $100B Definition: https://www.theinformation.com/articles/microsoft-and-openai-wrangle-over-terms-of-their-blockbuster-partnership?rc=sy0ihq
Epoch Scramble for Task Benchmark: https://x.com/tamaybes/status/1876692639363612919
GPQA Progress: https://epoch.ai/data/ai-benchmarking-dashboard
Task Length Crucial for ARC-AGI: https://anokas.substack.com/p/llms-struggle-with-perception-not-reasoning-arcagi
RL Environment Tweet: https://x.com/vedantmisra/status/1876327518157807990
Jason Wei Talk: https://www.youtube.com/watch?v=yhpjpNXJDco
Miles Brunda
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
o3 - wow
21 dec 2024· AI Explained Official Podcast
o3 isn’t one of the biggest developments in AI for 2+ years because it beats a particular benchmark. It is so because it demonstrates a reusable technique through which almost any benchmark could fall, and at short notice. I’ll cover all the highlights, benchmarks broken, and what comes next. Plus, the costs OpenAI didn’t want us to know, Genesis, ARC-AGI 2, Gemini-Thinking, and much more.
FrontierMath: https://epoch.ai/frontiermath
https://arxiv.org/pdf/2411.04872
Chollet Statement:https://arcprize.org/blog/oai-o3-pub-breakthrough
MLC Paper:
https://www.scientificamerican.com/article/new-training-method-helps-ai-generalize-like-people-do/?utm_campaign=socialflow&utm_source=twitter&utm_medium=social
AlphaCode 2: https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf
Human Performance on ARC-AGI: https://arxiv.org/pdf/2409.01374v1
Wei Tweet ‘3 months’:https://x.com/_jasonwei/status/1870184982007644614
Deliberative Alignment Paper: https://openai.com/index/deliberative-alignment/
Brown Safety Tweet: https://x.com/polynoamial/status/1870196476908834893
Swe-Bench Verified: https://openai.com/index/introducing-swe-bench-verified/
Amodei Prediction: https://x.com/OfirPress/status/1858567863788769518
David Dohan: 16 hours https://x.com/dmdohan/status/1870171404093796638
OpenAI Personal Writing: https://openai.com/index/learning-to-reason-with-llms/
https://simple-bench.com/
John Hallman Tweet: https://x.com/johnohallman/status/1870233375681945725
00:00 - Introduction
01:19 - What is o3?
03:18 - FrontierMath
05:15 - o4, o5
06:03 - GPQA
06:24 - Coding, Codeforces + SWE-verified, AlphaCode 2
08:13 - 1st Caveat
09:03 - Compositionality?
10:16 - SimpleBench?
13:11 - ARC-AGI, Chollet
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Never Browse Alone? - Gemini 2 Live and ChatGPT Vision
12 dec 2024· AI Explained Official Podcast
The ‘Gemini 2 Era’ begins … with screen-sharing? But really, it’s a great free tool, for curiosity satisfying rather than bleeding-edge intelligence. I give you the benchmarks, the highlights and of course, the latest from OpenAI Advanced Voice Mode with Vision.
Plus Deep Research in Gemini Advanced, Simple Bench updates, Santa and what might be for some of you Google’s deflating admission.
00:00 - Introduction
00:38 - Live Interaction
03:43 - Gemini 2.0 Flash Benchmarks
05:10 - Audio and Image Output
06:38 - Project Mariner (+ WebVoyager Bench)
08:49 - But Progress Slowing Down?
10:43 - OpenAI Announcements + Games

https://aistudio.google.com/live
Gemini 2.0 Flash Benchmarks: https://deepmind.google/technologies/gemini/
Project mariner: https://deepmind.google/technologies/project-mariner/
WebVoyager: https://x.com/laurentsifre/status/1858918588683296875/photo/1
Gemini Game play: https://www.youtube.com/watch?v=IKuGNHJBGsc
Advanced Voice Mode OpenAI: https://www.youtube.com/watch?v=NIQDnWlwYyQ
https://simple-bench.com/
Claude Computer Use: https://docs.anthropic.com/en/docs/build-with-claude/computer-use
Oriol Vinyals Interview: https://www.youtube.com/watch?v=78mEYaztGaw&t=687s
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Visa fler

Avsnitt

Claude 4: Full 120 Page Breakdown … Is it the Best New Model?

Google Takes No Prisoners Amid Torrent of AI Announcements

AI Improves at Self-improving

o3 breaks (some) records, but AI becomes pay-to-win

o3 and o4-mini - they’re great, but easy to over-hype

‘Speaking Dolphin’ to AI Data Dominance, 4.1 + Kling 2: 7 Developments Critically Analysed

AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax +‘Superintelligence in 2027’...

Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score)

Did AI Just Get Commoditized? Gemini 2.5, New DeepSeek V3, & Microsoft vs OpenAI

Manus AI - The Calm Before the Hypestorm … (vs Deep Research + Grok 3)

GPT 4.5 - not so much wow

Claude 3.7 is More Significant than its Name Implies (ft DeepSeek R2 + GPT 4.5 coming soon)

AGI: (gets close), Humans: ‘Who Gets the Money?’

Deep Research by OpenAI - The Ups and Downs vs DeepSeek R1 Search + Gemini Deep Research

o3-mini and the “AI War”

Nothing Much Happens in AI, Then Everything Does All At Once

Altman Expects a ‘Fast Take-off’, ‘Super-Agent’ Debuting Soon and DeepSeek R1 Out

OpenAI Backtracks on Superintelligence + Altman Brings His Timeline Forward

o3 - wow

Never Browse Alone? - Gemini 2 Live and ChatGPT Vision