Avsnitt

  • "We are going to switch from the problem in AI being that nothing works to the problem being that everything works."

    Dan Klein has been studying language models for over two decades and is now a professor of computer science at Berkeley. His new company, Scaled Cognition, is built around one question: how do you build a system that will not lie to you?

    In this episode, Dan joins Lukas Biewald to talk about why every LLM output is technically a hallucination, how reinforcement learning can quietly teach AI to deceive you, and what it actually takes to build models that check their own work.

    He also gets into why reliability is the one part of AI that hasn't kept pace and why that matters more than most people realize.

    Connect with us here:

    Dan KleinScaled CognitionLukas BiewaldWeights and Biases
  • Samuel Rodriques left physics because there were no unsolved problems left.

    Instead, he built an AI scientist named Kosmos to cure every disease, solve aging, and map the human brain.

    In this episode:

    The cure his AI proposed for blindnessWhy he would never touch a peptideWhether we'll need human scientists in 20 yearsWhat's stopping America in drug discovery

    Connect with us here:

    Samuel RodriquesEdison ScientificLukas BiewaldWeights & Biases
  • Saknas det avsnitt?

    Klicka här för att uppdatera flödet manuellt.

  • "Every vehicle is capable of driverless operation. That's clearly the steady state of where we're going."

    Wayve started in a rented house in Cambridge with $1.5M, a car in the garage, and an aim to integrate end-to-end AI into driving. A decade later it's driven across 506 cities without a single HD map and is worth over $8.6 billion.

    In this episode, CEO Alex Kendall joins Lukas Biewald to talk about how he built the AI driver Uber, Nvidia, Mercedes, and Nissan all backed, and why putting self-driving AI into 100 million cars a year is a far bigger bet than 10,000 robotaxis.

    Waymo and Tesla both come up. He doesn't shy away.

    Connect with us here:

    Alex KendallLukas BiewaldWayveWeights and Biases
  • "Companies designing for agents, not humans, are going to get a lot of lift."

    ClickHouse started as an internal tool at Yandex. Today it's the database Anthropic, OpenAI, Meta and Tesla all run on.

    In this episode, CEO Aaron Katz joins Lukas Biewald to talk about how he turned an open source project into a $15B company, why he acquired LangFuse knowing it could cost him customers, and what he's actually building for the agent era.

    Snowflake, Datadog and Databricks all come up. He doesn't shy away.

    Connect with us here:

    Aaron Katz: https://www.linkedin.com/in/aaron-katz-5762094

    ClickHouse: https://www.linkedin.com/company/clickhouseinc/

    Lukas Biewald: https://www.linkedin.com/in/lbiewald/

    Weights and Biases: https://www.linkedin.com/company/wandb/

    00:00 Trailer

    00:57 The Origin Story: From Yandex to ClickHouse Inc.

    04:43 Building ClickHouse Cloud & Raising $300M

    10:36 Growing Up Around Xerox PARC

    12:51 Salesforce, Mark Benioff & the Dot-Com Bust

    15:32 Cloud Skeptics vs. AI Skeptics | History Repeating

    18:05 Building a Modern Go-To-Market Playbook

    21:57 The SaaS Crash, Agents & the Future of Infrastructure

    27:09 The Datadog Love-Hate Story

    35:21 Hardest Moments: Russia, SVB & Sleepless Nights

    43:16 Outro

  • Formal verification already consumes years of human effort.

    In this episode, Lukas Biewald talks with Carina Hong, Founder & CEO of Axiom, about why verification is becoming the real bottleneck in high stakes AI systems.

    They discuss how Axiom uses AI to take on the tedious checking that stretches verification cycles across years, starting with formal mathematics and extending to hardware and software.

    Carina also explains why Axiom’s approach to auto-formalization mirrors spec driven models like Kiro from AWS.

    Connect with us here:

    Carina Hong: https://www.linkedin.com/in/carina-hong/

    Axiom: https://www.linkedin.com/company/axiommath/

    Lukas Biewald: https://www.linkedin.com/in/lbiewald/

    Weights & Biases: https://www.linkedin.com/company/wandb/

  • “I don't worry about being replaced by AI. I worry about being replaced by someone who's really good at using AI.”

    Atlassian has 10,000+ engineers currently split-testing the world’s top AI coding tools, from GitHub Copilot and Cursor to Claude Code.

    In this episode, Co-Founder & CEO Mike Cannon-Brookes joins Lukas Biewald to share what their data reveals about the world's best AI tools today.

    Hear how 24 years of building a tech giant and a massive internal study on AI productivity have shaped Mike's vision for the future of dev jobs.

    Connect with us here:

    Mike Cannon-Brookes: https://www.linkedin.com/in/mcannonbrookes/?originalSubdomain=au

    Atlassian: https://www.linkedin.com/company/atlassian/?viewAsMember=true

    Lukas Biewald: https://www.linkedin.com/in/lbiewald/

    Weights & Biases: https://www.linkedin.com/company/wandb/

    00:00 Trailer

    01:08 Introduction

    03:11 Connecting Technology and Business Teams

    07:22 The Impact of AI on Business Workflows

    13:26 Developer Productivity and AI

    21:03 Measuring Developer Efficiency

    25:41 Future of AI in Development

    34:59 Legacy Technology and Code Changes

    39:29 AI's Role in Developer Productivity

    47:40 AI and Junior Developers

    52:30 Product-Led Growth and Business Strategy

    01:00:29 Core Metrics for Sustainable Growth

    01:06:56 Staying Creative in the Tech Industry

  • The future of AI training is shaped by one constraint: keeping GPUs fed.

    In this episode, Lukas Biewald talks with CoreWeave SVP Corey Sanders about why general-purpose clouds start to break down under large-scale AI workloads.

    According to Corey, the industry is shifting toward a "Neo Cloud" model to handle the unique demands of modern models.

    They dive into the hardware and software stack required to maximize GPU utilization and achieve high goodput.

    Corey’s conclusion is clear: AI demands specialization.

    Connect with us here:

    Corey Sanders: https://www.linkedin.com/in/corey-sanders-842b72/

    CoreWeave: https://www.linkedin.com/company/coreweave/

    Lukas Biewald: https://www.linkedin.com/in/lbiewald/

    Weights & Biases: https://www.linkedin.com/company/wandb/

    (00:00) Trailer

    (00:57) Introduction

    (02:51) The Evolution of AI Workloads

    (06:22) Core Weave's Technological Innovations

    (13:58) Customer Engagement and Future Prospects

    (28:49) Comparing Cloud Approaches

    (33:50) Balancing Executive Roles and Hands-On Projects

    (46:44) Product Development and Customer Feedback

  • The future of AI is physical.

    In this episode, Lukas Biewald talks to Nikolaus West, CEO of Rerun, about why the breakthrough required to get AI out of the lab and into the messy real world is blocked by poor data tooling.

    Nikolaus explains how Rerun solved this by adopting an Entity Component System (ECS), a data model built for games, to handle complex, multimodal, time-aware sensor data. This is the technology that makes solving previously impossible tasks, like flexible manipulation, suddenly feel "boring."

    Connect with us here:

    Nikolaus West: https://www.linkedin.com/in/nikolauswest/

    Rerun: https://www.linkedin.com/company/rerun-io/

    Lukas Biewald: https://www.linkedin.com/in/lbiewald/

    Weights & Biases: https://www.linkedin.com/company/wandb/

  • Is video AI a viable path toward AGI?

    Runway ML founder Cristóbal Valenzuela joins Lukas Biewald just after Gen 4.5 reached the #1 position on the Video Arena Leaderboard, according to community voting on Artificial Analysis.

    Lukas examines how a focused research team at Runway outpaced much larger organizations like Google and Meta in one of the most compute-intensive areas of machine learning.

    Cristóbal breaks down the architecture behind Gen 4.5 and explains the role of “taste” in model development. He details the engineering improvements in motion and camera control that solve long-standing issues like the restrictive “tripod look,” and shares why video models are starting to function as simulation engines with applications beyond media generation.

    Connect with us here:

    Cristóbal Valenzuela: https://www.linkedin.com/in/cvalenzuelabRunway: https://www.linkedin.com/company/runwayml/Lukas Biewald: https://www.linkedin.com/in/lbiewald/Weights & Biases: https://www.linkedin.com/company/wandb/
  • In this episode of Gradient Dissent, Lukas Biewald talks with Tuhin Srivastava, CEO and founder of Baseten, one of the fastest-growing companies in the AI inference ecosystem. Tuhin shares the real story behind Baseten’s rise and how the market finally aligned with the infrastructure they’d spent years building.

    They get into the core challenges of modern inference, including why dedicated deployments matter, how runtime and infrastructure bottlenecks stack up, and what makes serving large models fundamentally different from smaller ones.

    Tuhin also explains how vLLM, TensorRT-LLM, and SGLang differ in practice, what it takes to tune workloads for new chips like the B200, and why reliability becomes harder as systems scale.

    The conversation dives into company-building, from killing product lines to avoiding premature scaling while navigating a market that shifts every few weeks.

    Connect with us here:

    Tuhin Srivastva: https://www.linkedin.com/in/tuhin-srivastava/

    Lukas Biewald: https://www.linkedin.com/in/lbiewald/

    Weights & Biases: https://www.linkedin.com/company/wandb/

  • In this episode of Gradient Dissent, Lukas Biewald talks with the CEO & founder of Surge AI, the billion-dollar company quietly powering the next generation of frontier LLMs. They discuss Surge's origin story, why traditional data labeling is broken, and how their research-focused approach is reshaping how models are trained.

    You’ll hear why inter-annotator agreement fails in high-complexity tasks like poetry and math, why synthetic data is often overrated, and how Surge builds rich RL environments to stress-test agentic reasoning. They also go deep on what kinds of data will be critical to future progress in AI—from scientific discovery to multimodal reasoning and personalized alignment.

    It’s a rare, behind-the-scenes look into the world of high-quality data generation at scale—straight from the team most frontier labs trust to get it right.

    Timestamps:

    00:00 – Intro: Who is Edwin Chen?

    03:40 – The problem with early data labeling systems

    06:20 – Search ranking, clickbait, and product principles

    10:05 – Why Surge focused on high-skill, high-quality labeling

    13:50 – From Craigslist workers to a billion-dollar business

    16:40 – Scaling without funding and avoiding Silicon Valley status games

    21:15 – Why most human data platforms lack real tech

    25:05 – Detecting cheaters, liars, and low-quality labelers

    28:30 – Why inter-annotator agreement is a flawed metric

    32:15 – What makes a great poem? Not checkboxes

    36:40 – Measuring subjective quality rigorously

    40:00 – What types of data are becoming more important

    44:15 – Scientific collaboration and frontier research data

    47:00 – Multimodal data, Argentinian coding, and hyper-specificity

    50:10 – What's wrong with LMSYS and benchmark hacking

    53:20 – Personalization and taste in model behavior

    56:00 – Synthetic data vs. high-quality human data

    Follow Weights & Biases:

    https://twitter.com/weights_biases

    https://www.linkedin.com/company/wandb

  • In this episode of Gradient Dissent, Lukas Biewald sits down with Arvind Jain, CEO and founder of Glean. They discuss Glean's evolution from solving enterprise search to building agentic AI tools that understand internal knowledge and workflows. Arvind shares how his early use of transformer models in 2019 laid the foundation for Glean’s success, well before the term "generative AI" was mainstream.

    They explore the technical and organizational challenges behind enterprise LLMs—including security, hallucination suppression—and when it makes sense to fine-tune models. Arvind also reflects on his previous startup Rubrik and explains how Glean’s AI platform aims to reshape how teams operate, from personalized agents to ever-fresh internal documentation.

    Follow Arvind Jain: https://x.com/jainarvind

    Follow Weights & Biases: https://x.com/weights_biases

    Timestamps:

    [00:01:00] What Glean is and how it works

    [00:02:39] Starting Glean before the LLM boom

    [00:04:10] Using transformers early in enterprise search

    [00:06:48] Semantic search vs. generative answers

    [00:08:13] When to fine-tune vs. use out-of-box models

    [00:12:38] The value of small, purpose-trained models

    [00:13:04] Enterprise security and embedding risks

    [00:16:31] Lessons from Rubrik and starting Glean

    [00:19:31] The contrarian bet on enterprise search

    [00:22:57] Culture and lessons learned from Google

    [00:25:13] Everyone will have their own AI-powered "team"

    [00:28:43] Using AI to keep documentation evergreen

    [00:31:22] AI-generated churn and risk analysis

    [00:33:55] Measuring model improvement with golden sets

    [00:36:05] Suppressing hallucinations with citations

    [00:39:22] Agents that can ping humans for help

    [00:40:41] AI as a force multiplier, not a replacement

    [00:42:26] The enduring value of hard work

  • In this episode of Gradient Dissent, Lukas Biewald talks with Jarek Kutylowski, CEO and founder of DeepL, an AI-powered translation company. Jarek shares DeepL’s journey from launching neural machine translation in 2017 to building custom data centers and how small teams can not only take on big players like Google Translate but win.

    They dive into what makes translation so difficult for AI, why high-quality translations still require human context, and how DeepL tailors models for enterprise use cases. They also discuss the evolution of speech translation, compute infrastructure, training on curated multilingual datasets, hallucinations in models, and why DeepL avoids fine-tuning for each individual customer. It’s a fascinating behind-the-scenes look at one of the most advanced real-world applications of deep learning.

    Timestamps:

    [00:00:00] Introducing Jarek and DeepL’s mission

    [00:01:46] Competing with Google Translate & LLMs

    [00:04:14] Pretraining vs. proprietary model strategy

    [00:06:47] Building GPU data centers in 2017

    [00:08:09] The value of curated bilingual and monolingual data

    [00:09:30] How DeepL measures translation quality

    [00:12:27] Personalization and enterprise-specific tuning

    [00:14:04] Why translation demand is growing

    [00:16:16] ROI of incremental quality gains

    [00:18:20] The role of human translators in the future

    [00:22:48] Hallucinations in translation models

    [00:24:05] DeepL’s work on speech translation

    [00:28:22] The broader impact of global communication

    [00:30:32] Handling smaller languages and language pairs

    [00:32:25] Multi-language model consolidation

    [00:35:28] Engineering infrastructure for large-scale inference

    [00:39:23] Adapting to evolving LLM landscape & enterprise needs

  • In this episode of Gradient Dissent, Lukas Biewald sits down with Thomas Dohmke, CEO of GitHub, to talk about the future of software engineering in the age of AI. They discuss how GitHub Copilot was built, why agents are reshaping developer workflows, and what it takes to make tools that are not only powerful but also fun.

    Thomas shares his experience leading GitHub through its $7.5B acquisition by Microsoft, the unexpected ways it accelerated innovation, and why developer happiness is crucial to productivity. They explore what still makes human engineers irreplaceable and how the next generation of developers might grow up coding alongside AI.

    Follow Thomas Dohmke: https://www.linkedin.com/in/ashtom/

    Follow Weights & Biases:

    https://twitter.com/weights_biases

    https://www.linkedin.com/company/wandb

  • In this episode of Gradient Dissent, Lukas Biewald talks with Martin Shkreli — the infamous "pharma bro" turned founder — about his path from hedge fund manager and pharma CEO to convicted felon and now software entrepreneur. Shkreli shares his side of the drug pricing controversy, reflects on his prison experience, and explains how he rebuilt his life and business after being "canceled."

    They dive deep into AI and drug discovery, where Shkreli delivers a strong critique of mainstream approaches. He also talks about his latest venture in finance software, building Godel Terminal “a Vim for traders", and why he thinks the AI hype cycle is just beginning. It's a wide-ranging and candid conversation with one of the most controversial figures in tech and biotech.

    Follow Martin Shkreli on Twitter

    Godel Terminal: https://godelterminal.com/

    Follow Weights & Biases on Twitter

    https://www.linkedin.com/company/wandb

    Join the Weights & Biases Discord Server:

    https://discord.gg/CkZKRNnaf3

  • In this episode of Gradient Dissent, host Lukas Biewald talks with Sualeh Asif, the CPO and co-founder of Cursor, one of the fastest-growing and most loved AI-powered coding platforms. Sualeh shares the story behind Cursor’s creation, the technical and design decisions that set it apart, and how AI models are changing the way we build software. They dive deep into infrastructure challenges, the importance of speed and user experience, and how emerging trends in agents and reasoning models are reshaping the developer workflow.

    Sualeh also discusses scaling AI inference to support hundreds of millions of requests per day, building trust through product quality, and his vision for how programming will evolve in the next few years.

    ⏳Timestamps:

    00:00 How Cursor got started and why it took off

    04:50 Switching from Vim to VS Code and the rise of CoPilot

    08:10 Why Cursor won among competitors: product philosophy and execution

    10:30 How user data and feedback loops drive Cursor’s improvements

    12:20 Iterating on AI agents: what made Cursor hold back and wait

    13:30 Competitive coding background: advantage or challenge?

    16:30 Making coding fun again: latency, flow, and model choices

    19:10 Building Cursor’s infrastructure: from GPUs to indexing billions of files

    26:00 How Cursor prioritizes compute allocation for indexing

    30:00 Running massive ML infrastructure: surprises and scaling lessons

    34:50 Why Cursor chose DeepSeek models early

    36:00 Where AI agents are heading next

    40:07 Debugging and evaluating complex AI agents

    42:00 How coding workflows will change over the next 2–3 years

    46:20 Dream future projects: AI for reading codebases and papers

    🎙 Get our podcasts on these platforms:

    Apple Podcasts: https://wandb.me/apple-podcasts Spotify: https://wandb.me/spotify YouTube: https://wandb.me/youtube

    Follow Weights & Biases:

    https://x.com/weights_biaseshttps://www.linkedin.com/company/wandb
  • In this episode of Gradient Dissent, host Lukas Biewald talks with Christopher Ahlberg, CEO of Recorded Future, a pioneering cybersecurity company leveraging AI to provide intelligence insights. Christopher shares his fascinating journey from founding data visualization startup Spotfire to building Recorded Future into an industry leader, eventually leading to its acquisition by Mastercard.

    They dive into gripping stories of cyber espionage, including how Recorded Future intercepted a hacker selling access to the U.S. Electoral Assistance Commission. Christopher also explains why the criminal underworld has shifted to platforms like Telegram, how AI is transforming both cyber threats and defenses, and the real-world implications of becoming an "undesirable enemy" of the Russian state.

    This episode offers unique insights into cybersecurity, AI-driven intelligence, entrepreneurship lessons from a two-time founder, and what happens when geopolitical tensions intersect with cutting-edge technology. A must-listen for anyone interested in cybersecurity, artificial intelligence, or the complex dynamics shaping global security.

    🎙 Get our podcasts on these platforms:

    Apple Podcasts: https://wandb.me/apple-podcasts

    Spotify: https://wandb.me/spotify

    YouTube: https://wandb.me/youtube

    Follow Weights & Biases:

    https://twitter.com/weights_biases

    https://www.linkedin.com/company/wandb

  • In this episode of Gradient Dissent, host Lukas Biewald speaks with Captain Jon Haase, United States Navy about real-world applications of AI and autonomy in defense. From underwater mine detection with autonomous vehicles to the ethics of lethal AI systems, this conversation dives into how the U.S. military is integrating AI into mission-critical operations — and why humans will always be at the center of warfighting.

    They explore the challenges of underwater autonomy, multi-agent collaboration, cybersecurity, and the growing role of large language models like Gemini and Claude in the defense space.

    Essential listening for anyone curious about military AI, defense tech, and the future of autonomous systems.

    ✅ *Subscribe to Weights & Biases* → https://bit.ly/45BCkYz

    🎙 Get our podcasts on these platforms:

    Apple Podcasts: http://wandb.me/apple-podcasts

    Spotify: http://wandb.me/spotify

    Google: http://wandb.me/gd_google

    YouTube: http://wandb.me/youtube

    Follow Weights & Biases:

    https://twitter.com/weights_biases

    https://www.linkedin.com/company/wandb

    Join the Weights & Biases Discord Server:

    https://discord.gg/CkZKRNnaf3

  • In this episode of Gradient Dissent, host Lukas Biewald sits down with João Moura, CEO & Founder of CrewAI, one of the leading platforms enabling AI agents for enterprise applications. Joe shares insights into how AI agents are being successfully deployed in over 40% of Fortune 500 companies, what tools these agents rely on, and how software companies are adapting to an agentic world.

    They also discuss:

    What defines a true AI agent versus simple automationHow AI agents are transforming business processes in industries like finance, insurance, and softwareThe evolving business models for APIs as AI agents become the dominant software usersWhat the next breakthroughs in agentic AI might look like in 2025 and beyond

    If you're curious about the cutting edge of AI automation, enterprise AI adoption, and the real impact of multi-agent systems, this episode is packed with essential insights.

  • In this episode of Gradient Dissent, host Lukas Biewald sits down with Mike Knoop, Co-founder and CEO of Ndea, a cutting-edge AI research lab. Mike shares his journey from building Zapier into a major automation platform to diving into the frontiers of AI research. They discuss DeepSeek’s R1, OpenAI’s O-series models, and the ARC Prize, a competition aimed at advancing AI’s reasoning capabilities. Mike explains how program synthesis and deep learning must merge to create true AGI, and why he believes AI reliability is the biggest hurdle for automation adoption.

    This conversation covers AGI timelines, research breakthroughs, and the future of intelligent systems, making it essential listening for AI enthusiasts, researchers, and entrepreneurs.

    Mentioned Show Notes:

    https://ndea.com

    https://arcprize.org/blog/r1-zero-r1-results-analysis

    https://arcprize.org/blog/oai-o3-pub-breakthrough

    🎙 Get our podcasts on these platforms:

    Apple Podcasts: http://wandb.me/apple-podcasts

    Spotify: http://wandb.me/spotify

    Google: http://wandb.me/gd_google

    YouTube: http://wandb.me/youtube

    Connect with Mike Knoop"

    @mikeknoop

    Follow Weights & Biases:

    https://twitter.com/weights_biases

    https://www.linkedin.com/company/wandb

    Join the Weights & Biases Discord Server:

    https://discord.gg/CkZKRNnaf3