Machine Learning Street Talk (MLST) – Lyssna här

Avsnitt

"Blurring Reality" - Chai's Social AI Platform (SPONSORED)
26 maj· Machine Learning Street Talk (MLST)
"Blurring Reality" - Chai's Social AI Platform - sponsored
This episode of MLST explores the groundbreaking work of Chai, a social AI platform that quietly built one of the world's largest AI companion ecosystems before ChatGPT's mainstream adoption. With over 10 million active users and just 13 engineers serving 2 trillion tokens per day, Chai discovered the massive appetite for AI companionship through serendipity while searching for product-market fit.
CHAI sponsored this show *because they want to hire amazing engineers* --
CAREER OPPORTUNITIES AT CHAI
Chai is actively hiring in Palo Alto with competitive compensation ($300K-$800K+ equity) for roles including AI Infrastructure Engineers, Software Engineers, Applied AI Researchers, and more. Fast-track qualification available for candidates with significant product launches, open source contributions, or entrepreneurial success.
https://www.chai-research.com/jobs/
The conversation with founder William Beauchamp and engineers Tom Lu and Nischay Dhankhar covers Chai's innovative technical approaches including reinforcement learning from human feedback (RLHF), model blending techniques that combine smaller models to outperform larger ones, and their unique infrastructure challenges running exaflop-class compute.
SPONSOR MESSAGES:
***
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers in Zurich and SF.
Goto https://tufalabs.ai/
***
Key themes explored include:
- The ethics of AI engagement optimization and attention hacking
- Content moderation at scale with a lean engineering team
- The shift from AI as utility tool to AI as social companion
- How users form deep emotional bonds with artificial intelligence
- The broader implications of AI becoming a social medium
We also examine OpenAI's recent pivot toward companion AI with April's new GPT-4o, suggesting a fundamental shift in how we interact with artificial intelligence - from utility-focused tools to companion-like experiences that blur the lines between human and artificial intimacy.
The episode also covers Chai's unconventional approach to hiring only top-tier engineers, their bootstrap funding strategy focused on user revenue over VC funding, and their rapid experimentation culture where one in five experiments succeed.
TOC:
00:00:00 - Introduction: Steve Jobs' AI Vision & Chai's Scale
00:04:02 - Chapter 1: Simulators - The Birth of Social AI
00:13:34 - Chapter 2: Engineering at Chai - RLHF & Model Blending
00:21:49 - Chapter 3: Social Impact of GenAI - Ethics & Safety
00:33:55 - Chapter 4: The Lean Machine - 13 Engineers, Millions of Users
00:42:38 - Chapter 5: GPT-4o Becoming a Companion - OpenAI's Pivot
00:50:10 - Chapter 6: What Comes Next - The Future of AI Intimacy
TRANSCRIPT: https://www.dropbox.com/scl/fi/yz2ewkzmwz9rbbturfbap/CHAI.pdf?rlkey=uuyk2nfhjzezucwdgntg5ubqb&dl=0
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Google AlphaEvolve - Discovering new science (exclusive interview)
14 maj· Machine Learning Street Talk (MLST)
Today GoogleDeepMind released AlphaEvolve: a Gemini coding agent for algorithm discovery. It beat the famous Strassen algorithm for matrix multiplication set 56 years ago. Google has been killing it recently. We had early access to the paper and interviewed the researchers behind the work.
AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms
https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
Authors: Alexander Novikov*, Ngân Vũ*, Marvin Eisenberger*, Emilien Dupont*, Po-Sen Huang*, Adam Zsolt Wagner*, Sergey Shirobokov*, Borislav Kozlovskii*, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, Matej Balog*
(* indicates equal contribution or special designation, if defined elsewhere)
SPONSOR MESSAGES:
***
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.
Goto https://tufalabs.ai/
***
AlphaEvolve works like a very smart, tireless programmer. It uses powerful AI language models (like Gemini) to generate ideas for computer code. Then, it uses an "evolutionary" process – like survival of the fittest for programs. It tries out many different program ideas, automatically tests how well they solve a problem, and then uses the best ones to inspire new, even better programs.
Beyond this mathematical breakthrough, AlphaEvolve has already been used to improve real-world systems at Google, such as making their massive data centers run more efficiently and even speeding up the training of the AI models that power AlphaEvolve itself. The discussion also covers how humans work with AlphaEvolve, the challenges of making AI discover things, and the exciting future of AI helping scientists make new discoveries.
In short, AlphaEvolve is a powerful new AI tool that can invent new algorithms and solve complex problems, showing how AI can be a creative partner in science and engineering.
Guests:
Matej Balog: https://x.com/matejbalog
Alexander Novikov: https://x.com/SashaVNovikov
REFS:
MAP Elites [Jean-Baptiste Mouret, Jeff Clune]
https://arxiv.org/abs/1504.04909
FunSearch [Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, Pushmeet Kohli & Alhussein Fawzi]
https://www.nature.com/articles/s41586-023-06924-6
TOC:
[00:00:00] Introduction: Alpha Evolve's Breakthroughs, DeepMind's Lineage, and Real-World Impact
[00:12:06] Introducing AlphaEvolve: Concept, Evolutionary Algorithms, and Architecture
[00:16:56] Search Challenges: The Halting Problem and Enabling Creative Leaps
[00:23:20] Knowledge Augmentation: Self-Generated Data, Meta-Prompting, and Library Learning
[00:29:08] Matrix Multiplication Breakthrough: From Strassen to AlphaEvolve's 48 Multiplications
[00:39:11] Problem Representation: Direct Solutions, Constructors, and Search Algorithms
[00:46:06] Developer Reflections: Surprising Outcomes and Superiority over Simple LLM Sampling
[00:51:42] Algorithmic Improvement: Hill Climbing, Program Synthesis, and Intelligibility
[01:00:24] Real-World Application: Complex Evaluations and Robotics
[01:05:39] Role of LLMs & Future: Advanced Models, Recursive Self-Improvement, and Human-AI Collaboration
[01:11:22] Resource Considerations: Compute Costs of AlphaEvolve
This is a trial of posting videos on Spotify, thoughts? Email me or chat in our Discord
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Saknas det avsnitt?

Klicka här för att uppdatera flödet manuellt.
Prof. Randall Balestriero - LLMs without pretraining and SSL
23 apr· Machine Learning Street Talk (MLST)
Randall Balestriero joins the show to discuss some counterintuitive findings in AI. He shares research showing that huge language models, even when started from scratch (randomly initialized) without massive pre-training, can learn specific tasks like sentiment analysis surprisingly well, train stably, and avoid severe overfitting, sometimes matching the performance of costly pre-trained models. This raises questions about when giant pre-training efforts are truly worth it.
He also talks about how self-supervised learning (where models learn from data structure itself) and traditional supervised learning (using labeled data) are fundamentally similar, allowing researchers to apply decades of supervised learning theory to improve newer self-supervised methods.
Finally, Randall touches on fairness in AI models used for Earth data (like climate prediction), revealing that these models can be biased, performing poorly in specific locations like islands or coastlines even if they seem accurate overall, which has important implications for policy decisions based on this data.
SPONSOR MESSAGES:
***
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.
Goto https://tufalabs.ai/
***
TRANSCRIPT + SHOWNOTES:
https://www.dropbox.com/scl/fi/n7yev71nsjso71jyjz1fy/RANDALLNEURIPS.pdf?rlkey=0dn4injp1sc4ts8njwf3wfmxv&dl=0
TOC:
1. Model Training Efficiency and Scale
[00:00:00] 1.1 Training Stability of Large Models on Small Datasets
[00:04:09] 1.2 Pre-training vs Random Initialization Performance Comparison
[00:07:58] 1.3 Task-Specific Models vs General LLMs Efficiency
2. Learning Paradigms and Data Distribution
[00:10:35] 2.1 Fair Language Model Paradox and Token Frequency Issues
[00:12:02] 2.2 Pre-training vs Single-task Learning Spectrum
[00:16:04] 2.3 Theoretical Equivalence of Supervised and Self-supervised Learning
[00:19:40] 2.4 Self-Supervised Learning and Supervised Learning Relationships
[00:21:25] 2.5 SSL Objectives and Heavy-tailed Data Distribution Challenges
3. Geographic Representation in ML Systems
[00:25:20] 3.1 Geographic Bias in Earth Data Models and Neural Representations
[00:28:10] 3.2 Mathematical Limitations and Model Improvements
[00:30:24] 3.3 Data Quality and Geographic Bias in ML Datasets
REFS:
[00:01:40] Research on training large language models from scratch on small datasets, Randall Balestriero et al.
https://openreview.net/forum?id=wYGBWOjq1Q
[00:10:35] The Fair Language Model Paradox (2024), Andrea Pinto, Tomer Galanti, Randall Balestriero
https://arxiv.org/abs/2410.11985
[00:12:20] Muppet: Massive Multi-task Representations with Pre-Finetuning (2021), Armen Aghajanyan et al.
https://arxiv.org/abs/2101.11038
[00:14:30] Dissociating language and thought in large language models (2023), Kyle Mahowald et al.
https://arxiv.org/abs/2301.06627
[00:16:05] The Birth of Self-Supervised Learning: A Supervised Theory, Randall Balestriero et al.
https://openreview.net/forum?id=NhYAjAAdQT
[00:21:25] VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning, Adrien Bardes, Jean Ponce, Yann LeCun
https://arxiv.org/abs/2105.04906
[00:25:20] No Location Left Behind: Measuring and Improving the Fairness of Implicit Representations for Earth Data (2025), Daniel Cai, Randall Balestriero, et al.
https://arxiv.org/abs/2502.06831
[00:33:45] Mark Ibrahim et al.'s work on geographic bias in computer vision datasets, Mark Ibrahim
https://arxiv.org/pdf/2304.12210
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
How Machines Learn to Ignore the Noise (Kevin Ellis + Zenna Tavares)
8 apr· Machine Learning Street Talk (MLST)
Prof. Kevin Ellis and Dr. Zenna Tavares talk about making AI smarter, like humans. They want AI to learn from just a little bit of information by actively trying things out, not just by looking at tons of data.
They discuss two main ways AI can "think": one way is like following specific rules or steps (like a computer program), and the other is more intuitive, like guessing based on patterns (like modern AI often does). They found combining both methods works well for solving complex puzzles like ARC.
A key idea is "compositionality" - building big ideas from small ones, like LEGOs. This is powerful but can also be overwhelming. Another important idea is "abstraction" - understanding things simply, without getting lost in details, and knowing there are different levels of understanding.
Ultimately, they believe the best AI will need to explore, experiment, and build models of the world, much like humans do when learning something new.
SPONSOR MESSAGES:
***
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.
Goto https://tufalabs.ai/
***
TRANSCRIPT:
https://www.dropbox.com/scl/fi/3ngggvhb3tnemw879er5y/BASIS.pdf?rlkey=lr2zbj3317mex1q5l0c2rsk0h&dl=0
Zenna Tavares:
http://www.zenna.org/
Kevin Ellis:
https://www.cs.cornell.edu/~ellisk/
TOC:
1. Compositionality and Learning Foundations
[00:00:00] 1.1 Compositional Search and Learning Challenges
[00:03:55] 1.2 Bayesian Learning and World Models
[00:12:05] 1.3 Programming Languages and Compositionality Trade-offs
[00:15:35] 1.4 Inductive vs Transductive Approaches in AI Systems
2. Neural-Symbolic Program Synthesis
[00:27:20] 2.1 Integration of LLMs with Traditional Programming and Meta-Programming
[00:30:43] 2.2 Wake-Sleep Learning and DreamCoder Architecture
[00:38:26] 2.3 Program Synthesis from Interactions and Hidden State Inference
[00:41:36] 2.4 Abstraction Mechanisms and Resource Rationality
[00:48:38] 2.5 Inductive Biases and Causal Abstraction in AI Systems
3. Abstract Reasoning Systems
[00:52:10] 3.1 Abstract Concepts and Grid-Based Transformations in ARC
[00:56:08] 3.2 Induction vs Transduction Approaches in Abstract Reasoning
[00:59:12] 3.3 ARC Limitations and Interactive Learning Extensions
[01:06:30] 3.4 Wake-Sleep Program Learning and Hybrid Approaches
[01:11:37] 3.5 Project MARA and Future Research Directions
REFS:
[00:00:25] DreamCoder, Kevin Ellis et al.
https://arxiv.org/abs/2006.08381
[00:01:10] Mind Your Step, Ryan Liu et al.
https://arxiv.org/abs/2410.21333
[00:06:05] Bayesian inference, Griffiths, T. L., Kemp, C., & Tenenbaum, J. B.
https://psycnet.apa.org/record/2008-06911-003
[00:13:00] Induction and Transduction, Wen-Ding Li, Zenna Tavares, Yewen Pu, Kevin Ellis
https://arxiv.org/abs/2411.02272
[00:23:15] Neurosymbolic AI, Garcez, Artur d'Avila et al.
https://arxiv.org/abs/2012.05876
[00:33:50] Induction and Transduction (II), Wen-Ding Li, Kevin Ellis et al.
https://arxiv.org/abs/2411.02272
[00:38:35] ARC, François Chollet
https://arxiv.org/abs/1911.01547
[00:39:20] Causal Reactive Programs, Ria Das, Joshua B. Tenenbaum, Armando Solar-Lezama, Zenna Tavares
http://www.zenna.org/publications/autumn2022.pdf
[00:42:50] MuZero, Julian Schrittwieser et al.
http://arxiv.org/pdf/1911.08265
[00:43:20] VisualPredicator, Yichao Liang
https://arxiv.org/abs/2410.23156
[00:48:55] Bayesian models of cognition, Joshua B. Tenenbaum
https://mitpress.mit.edu/9780262049412/bayesian-models-of-cognition/
[00:49:30] The Bitter Lesson, Rich Sutton
http://www.incompleteideas.net/IncIdeas/BitterLesson.html
[01:06:35] Program induction, Kevin Ellis, Wen-Ding Li
https://arxiv.org/pdf/2411.02272
[01:06:50] DreamCoder (II), Kevin Ellis et al.
https://arxiv.org/abs/2006.08381
[01:11:55] Project MARA, Zenna Tavares, Kevin Ellis
https://www.basis.ai/blog/mara/
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Eiso Kant (CTO poolside) - Superhuman Coding Is Coming!
2 apr· Machine Learning Street Talk (MLST)
Eiso Kant, CTO of poolside AI, discusses the company's approach to building frontier AI foundation models, particularly focused on software development. Their unique strategy is reinforcement learning from code execution feedback which is an important axis for scaling AI capabilities beyond just increasing model size or data volume. Kant predicts human-level AI in knowledge work could be achieved within 18-36 months, outlining poolside's vision to dramatically increase software development productivity and accessibility.
SPONSOR MESSAGES:
***
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.
Goto https://tufalabs.ai/
***
Eiso Kant:
https://x.com/eisokant
https://poolside.ai/
TRANSCRIPT:
https://www.dropbox.com/scl/fi/szepl6taqziyqie9wgmk9/poolside.pdf?rlkey=iqar7dcwshyrpeoz0xa76k422&dl=0
TOC:
1. Foundation Models and AI Strategy
[00:00:00] 1.1 Foundation Models and Timeline Predictions for AI Development
[00:02:55] 1.2 Poolside AI's Corporate History and Strategic Vision
[00:06:48] 1.3 Foundation Models vs Enterprise Customization Trade-offs
2. Reinforcement Learning and Model Economics
[00:15:42] 2.1 Reinforcement Learning and Code Execution Feedback Approaches
[00:22:06] 2.2 Model Economics and Experimental Optimization
3. Enterprise AI Implementation
[00:25:20] 3.1 Poolside's Enterprise Deployment Strategy and Infrastructure
[00:26:00] 3.2 Enterprise-First Business Model and Market Focus
[00:27:05] 3.3 Foundation Models and AGI Development Approach
[00:29:24] 3.4 DeepSeek Case Study and Infrastructure Requirements
4. LLM Architecture and Performance
[00:30:15] 4.1 Distributed Training and Hardware Architecture Optimization
[00:33:01] 4.2 Model Scaling Strategies and Chinchilla Optimality Trade-offs
[00:36:04] 4.3 Emergent Reasoning and Model Architecture Comparisons
[00:43:26] 4.4 Balancing Creativity and Determinism in AI Models
[00:50:01] 4.5 AI-Assisted Software Development Evolution
5. AI Systems Engineering and Scalability
[00:58:31] 5.1 Enterprise AI Productivity and Implementation Challenges
[00:58:40] 5.2 Low-Code Solutions and Enterprise Hiring Trends
[01:01:25] 5.3 Distributed Systems and Engineering Complexity
[01:01:50] 5.4 GenAI Architecture and Scalability Patterns
[01:01:55] 5.5 Scaling Limitations and Architectural Patterns in AI Code Generation
6. AI Safety and Future Capabilities
[01:06:23] 6.1 Semantic Understanding and Language Model Reasoning Approaches
[01:12:42] 6.2 Model Interpretability and Safety Considerations in AI Systems
[01:16:27] 6.3 AI vs Human Capabilities in Software Development
[01:33:45] 6.4 Enterprise Deployment and Security Architecture
CORE REFS (see shownotes for URLs/more refs):
[00:15:45] Research demonstrating how training on model-generated content leads to distribution collapse in AI models, Ilia Shumailov et al. (Key finding on synthetic data risk)
[00:20:05] Foundational paper introducing Word2Vec for computing word vector representations, Tomas Mikolov et al. (Seminal NLP technique)
[00:22:15] OpenAI O3 model's breakthrough performance on ARC Prize Challenge, OpenAI (Significant AI reasoning benchmark achievement)
[00:22:40] Seminal paper proposing a formal definition of intelligence as skill-acquisition efficiency, François Chollet (Influential AI definition/philosophy)
[00:30:30] Technical documentation of DeepSeek's V3 model architecture and capabilities, DeepSeek AI (Details on a major new model)
[00:34:30] Foundational paper establishing optimal scaling laws for LLM training, Jordan Hoffmann et al. (Key paper on LLM scaling)
[00:45:45] Seminal essay arguing that scaling computation consistently trumps human-engineered solutions in AI, Richard S. Sutton (Influential "Bitter Lesson" perspective)
<trunc - see PDF shownotes>
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
The Compendium - Connor Leahy and Gabriel Alfour
30 mar· Machine Learning Street Talk (MLST)
Connor Leahy and Gabriel Alfour, AI researchers from Conjecture and authors of "The Compendium," joinus for a critical discussion centered on Artificial Superintelligence (ASI) safety and governance. Drawing from their comprehensive analysis in "The Compendium," they articulate a stark warning about the existential risks inherent in uncontrolled AI development, framing it through the lens of "intelligence domination"—where a sufficiently advanced AI could subordinate humanity, much like humans dominate less intelligent species.
SPONSOR MESSAGES:
***
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.
Goto https://tufalabs.ai/
***
TRANSCRIPT + REFS + NOTES:
https://www.dropbox.com/scl/fi/p86l75y4o2ii40df5t7no/Compendium.pdf?rlkey=tukczgf3flw133sr9rgss0pnj&dl=0
https://www.thecompendium.ai/
https://en.wikipedia.org/wiki/Connor_Leahy
https://www.conjecture.dev/about
https://substack.com/@gabecc
TOC:
1. AI Intelligence and Safety Fundamentals
[00:00:00] 1.1 Understanding Intelligence and AI Capabilities
[00:06:20] 1.2 Emergence of Intelligence and Regulatory Challenges
[00:10:18] 1.3 Human vs Animal Intelligence Debate
[00:18:00] 1.4 AI Regulation and Risk Assessment Approaches
[00:26:14] 1.5 Competing AI Development Ideologies
2. Economic and Social Impact
[00:29:10] 2.1 Labor Market Disruption and Post-Scarcity Scenarios
[00:32:40] 2.2 Institutional Frameworks and Tech Power Dynamics
[00:37:40] 2.3 Ethical Frameworks and AI Governance Debates
[00:40:52] 2.4 AI Alignment Evolution and Technical Challenges
3. Technical Governance Framework
[00:55:07] 3.1 Three Levels of AI Safety: Alignment, Corrigibility, and Boundedness
[00:55:30] 3.2 Challenges of AI System Corrigibility and Constitutional Models
[00:57:35] 3.3 Limitations of Current Boundedness Approaches
[00:59:11] 3.4 Abstract Governance Concepts and Policy Solutions
4. Democratic Implementation and Coordination
[00:59:20] 4.1 Governance Design and Measurement Challenges
[01:00:10] 4.2 Democratic Institutions and Experimental Governance
[01:14:10] 4.3 Political Engagement and AI Safety Advocacy
[01:25:30] 4.4 Practical AI Safety Measures and International Coordination
CORE REFS:
[00:01:45] The Compendium (2023), Leahy et al.
https://pdf.thecompendium.ai/the_compendium.pdf
[00:06:50] Geoffrey Hinton Leaves Google, BBC News
https://www.bbc.com/news/world-us-canada-65452940
[00:10:00] ARC-AGI, Chollet
https://arcprize.org/arc-agi
[00:13:25] A Brief History of Intelligence, Bennett
https://www.amazon.com/Brief-History-Intelligence-Humans-Breakthroughs/dp/0063286343
[00:25:35] Statement on AI Risk, Center for AI Safety
https://www.safe.ai/work/statement-on-ai-risk
[00:26:15] Machines of Love and Grace, Amodei
https://darioamodei.com/machines-of-loving-grace
[00:26:35] The Techno-Optimist Manifesto, Andreessen
https://a16z.com/the-techno-optimist-manifesto/
[00:31:55] Techno-Feudalism, Varoufakis
https://www.amazon.co.uk/Technofeudalism-Killed-Capitalism-Yanis-Varoufakis/dp/1847927270
[00:42:40] Introducing Superalignment, OpenAI
https://openai.com/index/introducing-superalignment/
[00:47:20] Three Laws of Robotics, Asimov
https://www.britannica.com/topic/Three-Laws-of-Robotics
[00:50:00] Symbolic AI (GOFAI), Haugeland
https://en.wikipedia.org/wiki/Symbolic_artificial_intelligence
[00:52:30] Intent Alignment, Christiano
https://www.alignmentforum.org/posts/HEZgGBZTpT4Bov7nH/mapping-the-conceptual-territory-in-ai-existential-safety
[00:55:10] Large Language Model Alignment: A Survey, Jiang et al.
http://arxiv.org/pdf/2309.15025
[00:55:40] Constitutional Checks and Balances, Bok
https://plato.stanford.edu/entries/montesquieu/
<trunc, see PDF>
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
ARC Prize v2 Launch! (Francois Chollet and Mike Knoop)
24 mar· Machine Learning Street Talk (MLST)
We are joined by Francois Chollet and Mike Knoop, to launch the new version of the ARC prize! In version 2, the challenges have been calibrated with humans such that at least 2 humans could solve each task in a reasonable task, but also adversarially selected so that frontier reasoning models can't solve them. The best LLMs today get negligible performance on this challenge.
https://arcprize.org/
SPONSOR MESSAGES:
***
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.
Goto https://tufalabs.ai/
***
TRANSCRIPT:
https://www.dropbox.com/scl/fi/0v9o8xcpppdwnkntj59oi/ARCv2.pdf?rlkey=luqb6f141976vra6zdtptv5uj&dl=0
TOC:
1. ARC v2 Core Design & Objectives
[00:00:00] 1.1 ARC v2 Launch and Benchmark Architecture
[00:03:16] 1.2 Test-Time Optimization and AGI Assessment
[00:06:24] 1.3 Human-AI Capability Analysis
[00:13:02] 1.4 OpenAI o3 Initial Performance Results
2. ARC Technical Evolution
[00:17:20] 2.1 ARC-v1 to ARC-v2 Design Improvements
[00:21:12] 2.2 Human Validation Methodology
[00:26:05] 2.3 Task Design and Gaming Prevention
[00:29:11] 2.4 Intelligence Measurement Framework
3. O3 Performance & Future Challenges
[00:38:50] 3.1 O3 Comprehensive Performance Analysis
[00:43:40] 3.2 System Limitations and Failure Modes
[00:49:30] 3.3 Program Synthesis Applications
[00:53:00] 3.4 Future Development Roadmap
REFS:
[00:00:15] On the Measure of Intelligence, François Chollet
https://arxiv.org/abs/1911.01547
[00:06:45] ARC Prize Foundation, François Chollet, Mike Knoop
https://arcprize.org/
[00:12:50] OpenAI o3 model performance on ARC v1, ARC Prize Team
https://arcprize.org/blog/oai-o3-pub-breakthrough
[00:18:30] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jason Wei et al.
https://arxiv.org/abs/2201.11903
[00:21:45] ARC-v2 benchmark tasks, Mike Knoop
https://arcprize.org/blog/introducing-arc-agi-public-leaderboard
[00:26:05] ARC Prize 2024: Technical Report, Francois Chollet et al.
https://arxiv.org/html/2412.04604v2
[00:32:45] ARC Prize 2024 Technical Report, Francois Chollet, Mike Knoop, Gregory Kamradt
https://arxiv.org/abs/2412.04604
[00:48:55] The Bitter Lesson, Rich Sutton
http://www.incompleteideas.net/IncIdeas/BitterLesson.html
[00:53:30] Decoding strategies in neural text generation, Sina Zarrieß
https://www.mdpi.com/2078-2489/12/9/355/pdf
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Test-Time Adaptation: the key to reasoning with DL (Mohamed Osman)
22 mar· Machine Learning Street Talk (MLST)
Mohamed Osman joins to discuss MindsAI's highest scoring entry to the ARC challenge 2024 and the paradigm of test-time fine-tuning. They explore how the team, now part of Tufa Labs in Zurich, achieved state-of-the-art results using a combination of pre-training techniques, a unique meta-learning strategy, and an ensemble voting mechanism. Mohamed emphasizes the importance of raw data input and flexibility of the network.
SPONSOR MESSAGES:
***
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.
Goto https://tufalabs.ai/
***
TRANSCRIPT + REFS:
https://www.dropbox.com/scl/fi/jeavyqidsjzjgjgd7ns7h/MoFInal.pdf?rlkey=cjjmo7rgtenxrr3b46nk6yq2e&dl=0
Mohamed Osman (Tufa Labs)
https://x.com/MohamedOsmanML
Jack Cole (Tufa Labs)
https://x.com/MindsAI_Jack
How and why deep learning for ARC paper:
https://github.com/MohamedOsman1998/deep-learning-for-arc/blob/main/deep_learning_for_arc.pdf
TOC:
1. Abstract Reasoning Foundations
[00:00:00] 1.1 Test-Time Fine-Tuning and ARC Challenge Overview
[00:10:20] 1.2 Neural Networks vs Programmatic Approaches to Reasoning
[00:13:23] 1.3 Code-Based Learning and Meta-Model Architecture
[00:20:26] 1.4 Technical Implementation with Long T5 Model
2. ARC Solution Architectures
[00:24:10] 2.1 Test-Time Tuning and Voting Methods for ARC Solutions
[00:27:54] 2.2 Model Generalization and Function Generation Challenges
[00:32:53] 2.3 Input Representation and VLM Limitations
[00:36:21] 2.4 Architecture Innovation and Cross-Modal Integration
[00:40:05] 2.5 Future of ARC Challenge and Program Synthesis Approaches
3. Advanced Systems Integration
[00:43:00] 3.1 DreamCoder Evolution and LLM Integration
[00:50:07] 3.2 MindsAI Team Progress and Acquisition by Tufa Labs
[00:54:15] 3.3 ARC v2 Development and Performance Scaling
[00:58:22] 3.4 Intelligence Benchmarks and Transformer Limitations
[01:01:50] 3.5 Neural Architecture Optimization and Processing Distribution
REFS:
[00:01:32] Original ARC challenge paper, François Chollet
https://arxiv.org/abs/1911.01547
[00:06:55] DreamCoder, Kevin Ellis et al.
https://arxiv.org/abs/2006.08381
[00:12:50] Deep Learning with Python, François Chollet
https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438
[00:13:35] Deep Learning with Python, François Chollet
https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438
[00:13:35] Influence of pretraining data for reasoning, Laura Ruis
https://arxiv.org/abs/2411.12580
[00:17:50] Latent Program Networks, Clement Bonnet
https://arxiv.org/html/2411.08706v1
[00:20:50] T5, Colin Raffel et al.
https://arxiv.org/abs/1910.10683
[00:30:30] Combining Induction and Transduction for Abstract Reasoning, Wen-Ding Li, Kevin Ellis et al.
https://arxiv.org/abs/2411.02272
[00:34:15] Six finger problem, Chen et al.
https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_SpatialVLM_Endowing_Vision-Language_Models_with_Spatial_Reasoning_Capabilities_CVPR_2024_paper.pdf
[00:38:15] DeepSeek-R1-Distill-Llama, DeepSeek AI
https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B
[00:40:10] ARC Prize 2024 Technical Report, François Chollet et al.
https://arxiv.org/html/2412.04604v2
[00:45:20] LLM-Guided Compositional Program Synthesis, Wen-Ding Li and Kevin Ellis
https://arxiv.org/html/2503.15540
[00:54:25] Abstraction and Reasoning Corpus, François Chollet
https://github.com/fchollet/ARC-AGI
[00:57:10] O3 breakthrough on ARC-AGI, OpenAI
https://arcprize.org/
[00:59:35] ConceptARC Benchmark, Arseny Moskvichev, Melanie Mitchell
https://arxiv.org/abs/2305.07141
[01:02:05] Mixtape: Breaking the Softmax Bottleneck Efficiently, Yang, Zhilin and Dai, Zihang and Salakhutdinov, Ruslan and Cohen, William W.
http://papers.neurips.cc/paper/9723-mixtape-breaking-the-softmax-bottleneck-efficiently.pdf
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
GSMSymbolic paper - Iman Mirzadeh (Apple)
19 mar· Machine Learning Street Talk (MLST)
Iman Mirzadeh from Apple, who recently published the GSM-Symbolic paper discusses the crucial distinction between intelligence and achievement in AI systems. He critiques current AI research methodologies, highlighting the limitations of Large Language Models (LLMs) in reasoning and knowledge representation.
SPONSOR MESSAGES:
***
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.
Goto https://tufalabs.ai/
***
TRANSCRIPT + RESEARCH:
https://www.dropbox.com/scl/fi/mlcjl9cd5p1kem4l0vqd3/IMAN.pdf?rlkey=dqfqb74zr81a5gqr8r6c8isg3&dl=0
TOC:
1. Intelligence vs Achievement in AI Systems
[00:00:00] 1.1 Intelligence vs Achievement Metrics in AI Systems
[00:03:27] 1.2 AlphaZero and Abstract Understanding in Chess
[00:10:10] 1.3 Language Models and Distribution Learning Limitations
[00:14:47] 1.4 Research Methodology and Theoretical Frameworks
2. Intelligence Measurement and Learning
[00:24:24] 2.1 LLM Capabilities: Interpolation vs True Reasoning
[00:29:00] 2.2 Intelligence Definition and Measurement Approaches
[00:34:35] 2.3 Learning Capabilities and Agency in AI Systems
[00:39:26] 2.4 Abstract Reasoning and Symbol Understanding
3. LLM Performance and Evaluation
[00:47:15] 3.1 Scaling Laws and Fundamental Limitations
[00:54:33] 3.2 Connectionism vs Symbolism Debate in Neural Networks
[00:58:09] 3.3 GSM-Symbolic: Testing Mathematical Reasoning in LLMs
[01:08:38] 3.4 Benchmark Evaluation and Model Performance Assessment
REFS:
[00:01:00] AlphaZero chess AI system, Silver et al.
https://arxiv.org/abs/1712.01815
[00:07:10] Game Changer: AlphaZero's Groundbreaking Chess Strategies, Sadler & Regan
https://www.amazon.com/Game-Changer-AlphaZeros-Groundbreaking-Strategies/dp/9056918184
[00:11:35] Cross-entropy loss in language modeling, Voita
http://lena-voita.github.io/nlp_course/language_modeling.html
[00:17:20] GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in LLMs, Mirzadeh et al.
https://arxiv.org/abs/2410.05229
[00:21:25] Connectionism and Cognitive Architecture: A Critical Analysis, Fodor & Pylyshyn
https://www.sciencedirect.com/science/article/pii/001002779090014B
[00:28:55] Brain-to-body mass ratio scaling laws, Sutskever
https://www.theverge.com/2024/12/13/24320811/what-ilya-sutskever-sees-openai-model-data-training
[00:29:40] On the Measure of Intelligence, Chollet
https://arxiv.org/abs/1911.01547
[00:33:30] On definition of intelligence, Gignac et al.
https://www.sciencedirect.com/science/article/pii/S0160289624000266
[00:35:30] Defining intelligence, Wang
https://cis.temple.edu/~wangp/papers.html
[00:37:40] How We Learn: Why Brains Learn Better Than Any Machine... for Now, Dehaene
https://www.amazon.com/How-We-Learn-Brains-Machine/dp/0525559884
[00:39:35] Surfaces and Essences: Analogy as the Fuel and Fire of Thinking, Hofstadter and Sander
https://www.amazon.com/Surfaces-Essences-Analogy-Fuel-Thinking/dp/0465018475
[00:43:15] Chain-of-thought prompting, Wei et al.
https://arxiv.org/abs/2201.11903
[00:47:20] Test-time scaling laws in machine learning, Brown
https://podcasts.apple.com/mv/podcast/openais-noam-brown-ilge-akkaya-and-hunter-lightman-on/id1750736528?i=1000671532058
[00:47:50] Scaling Laws for Neural Language Models, Kaplan et al.
https://arxiv.org/abs/2001.08361
[00:55:15] Tensor product variable binding, Smolensky
https://www.sciencedirect.com/science/article/abs/pii/000437029090007M
[01:08:45] GSM-8K dataset, OpenAI
https://huggingface.co/datasets/openai/gsm8k
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Reasoning, Robustness, and Human Feedback in AI - Max Bartolo (Cohere)
18 mar· Machine Learning Street Talk (MLST)
Dr. Max Bartolo from Cohere discusses machine learning model development, evaluation, and robustness. Key topics include model reasoning, the DynaBench platform for dynamic benchmarking, data-centric AI development, model training challenges, and the limitations of human feedback mechanisms. The conversation also covers technical aspects like influence functions, model quantization, and the PRISM project.
Max Bartolo (Cohere):
https://www.maxbartolo.com/
https://cohere.com/command
TRANSCRIPT:
https://www.dropbox.com/scl/fi/vujxscaffw37pqgb6hpie/MAXB.pdf?rlkey=0oqjxs5u49eqa2m7uaol64lbw&dl=0
TOC:
1. Model Reasoning and Verification
[00:00:00] 1.1 Model Consistency and Reasoning Verification
[00:03:25] 1.2 Influence Functions and Distributed Knowledge Analysis
[00:10:28] 1.3 AI Application Development and Model Deployment
[00:14:24] 1.4 AI Alignment and Human Feedback Limitations
2. Evaluation and Bias Assessment
[00:20:15] 2.1 Human Evaluation Challenges and Factuality Assessment
[00:27:15] 2.2 Cultural and Demographic Influences on Model Behavior
[00:32:43] 2.3 Adversarial Examples and Model Robustness
3. Benchmarking Systems and Methods
[00:41:54] 3.1 DynaBench and Dynamic Benchmarking Approaches
[00:50:02] 3.2 Benchmarking Challenges and Alternative Metrics
[00:50:33] 3.3 Evolution of Model Benchmarking Methods
[00:51:15] 3.4 Hierarchical Capability Testing Framework
[00:52:35] 3.5 Benchmark Platforms and Tools
4. Model Architecture and Performance
[00:55:15] 4.1 Cohere's Model Development Process
[01:00:26] 4.2 Model Quantization and Performance Evaluation
[01:05:18] 4.3 Reasoning Capabilities and Benchmark Standards
[01:08:27] 4.4 Training Progression and Technical Challenges
5. Future Directions and Challenges
[01:13:48] 5.1 Context Window Evolution and Trade-offs
[01:22:47] 5.2 Enterprise Applications and Future Challenges
REFS:
[00:03:10] Research at Cohere with Laura Ruis et al., Max Bartolo, Laura Ruis et al.
https://cohere.com/research/papers/procedural-knowledge-in-pretraining-drives-reasoning-in-large-language-models-2024-11-20
[00:04:15] Influence functions in machine learning, Koh & Liang
https://arxiv.org/abs/1703.04730
[00:08:05] Studying Large Language Model Generalization with Influence Functions, Roger Grosse et al.
https://storage.prod.researchhub.com/uploads/papers/2023/08/08/2308.03296.pdf
[00:11:10] The LLM ARChitect: Solving ARC-AGI Is A Matter of Perspective, Daniel Franzen, Jan Disselhoff, and David Hartmann
https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf
[00:12:10] Hugging Face model repo for C4AI Command A, Cohere and Cohere For AI
https://huggingface.co/CohereForAI/c4ai-command-a-03-2025
[00:13:30] OpenInterpreter
https://github.com/KillianLucas/open-interpreter
[00:16:15] Human Feedback is not Gold Standard, Tom Hosking, Max Bartolo, Phil Blunsom
https://arxiv.org/abs/2309.16349
[00:27:15] The PRISM Alignment Dataset, Hannah Kirk et al.
https://arxiv.org/abs/2404.16019
[00:32:50] How adversarial examples arise, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, Aleksander Madry
https://arxiv.org/abs/1905.02175
[00:43:00] DynaBench platform paper, Douwe Kiela et al.
https://aclanthology.org/2021.naacl-main.324.pdf
[00:50:15] Sara Hooker's work on compute limitations, Sara Hooker
https://arxiv.org/html/2407.05694v1
[00:53:25] DataPerf: Community-led benchmark suite, Mazumder et al.
https://arxiv.org/abs/2207.10062
[01:04:35] DROP, Dheeru Dua et al.
https://arxiv.org/abs/1903.00161
[01:07:05] GSM8k, Cobbe et al.
https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k
[01:09:30] ARC, François Chollet
https://github.com/fchollet/ARC-AGI
[01:15:50] Command A, Cohere
https://cohere.com/blog/command-a
[01:22:55] Enterprise search using LLMs, Cohere
https://cohere.com/blog/commonly-asked-questions-about-search-from-coheres-enterprise-customers
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Tau Language: The Software Synthesis Future (sponsored)
12 mar· Machine Learning Street Talk (MLST)
This sponsored episode features mathematician Ohad Asor discussing logical approaches to AI, focusing on the limitations of machine learning and introducing the Tau language for software development and blockchain tech. Asor argues that machine learning cannot guarantee correctness. Tau allows logical specification of software requirements, automatically creating provably correct implementations with potential to revolutionize distributed systems. The discussion highlights program synthesis, software updates, and applications in finance and governance.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT + RESEARCH:https://www.dropbox.com/scl/fi/t849j6v1juk3gc15g4rsy/TAU.pdf?rlkey=hh11h2mhog3ncdbeapbzpzctc&dl=0Tau:https://tau.net/Tau Language:https://tau.ai/tau-language/Research:https://tau.net/Theories-and-Applications-of-Boolean-Algebras-0.29.pdfTOC:1. Machine Learning Foundations and Limitations [00:00:00] 1.1 Fundamental Limitations of Machine Learning and PAC Learning Theory [00:04:50] 1.2 Transductive Learning and the Three Curses of Machine Learning [00:08:57] 1.3 Language, Reality, and AI System Design [00:12:58] 1.4 Program Synthesis and Formal Verification Approaches2. Logical Programming Architecture [00:31:55] 2.1 Safe AI Development Requirements [00:32:05] 2.2 Self-Referential Language Architecture [00:32:50] 2.3 Boolean Algebra and Logical Foundations [00:37:52] 2.4 SAT Solvers and Complexity Challenges [00:44:30] 2.5 Program Synthesis and Specification [00:47:39] 2.6 Overcoming Tarski's Undefinability with Boolean Algebra [00:56:05] 2.7 Tau Language Implementation and User Control3. Blockchain-Based Software Governance [01:09:10] 3.1 User Control and Software Governance Mechanisms [01:18:27] 3.2 Tau's Blockchain Architecture and Meta-Programming Capabilities [01:21:43] 3.3 Development Status and Token Implementation [01:24:52] 3.4 Consensus Building and Opinion Mapping System [01:35:29] 3.5 Automation and Financial ApplicationsCORE REFS (more in pinned comment):[00:03:45] PAC (Probably Approximately Correct) Learning framework, Leslie Valianthttps://en.wikipedia.org/wiki/Probably_approximately_correct_learning[00:06:10] Boolean Satisfiability Problem (SAT), Varioushttps://en.wikipedia.org/wiki/Boolean_satisfiability_problem[00:13:55] Knowledge as Justified True Belief (JTB), Matthias Steuphttps://plato.stanford.edu/entries/epistemology/[00:17:50] Wittgenstein's concept of the limits of language, Ludwig Wittgensteinhttps://plato.stanford.edu/entries/wittgenstein/[00:21:25] Boolean algebras, Ohad Osorhttps://tau.net/tau-language-research/[00:26:10] The Halting Problemhttps://plato.stanford.edu/entries/turing-machine/#HaltProb[00:30:25] Alfred Tarski (1901-1983), Mario Gómez-Torrentehttps://plato.stanford.edu/entries/tarski/[00:41:50] DPLLhttps://www.cs.princeton.edu/~zkincaid/courses/fall18/readings/SATHandbook-CDCL.pdf[00:49:50] Tarski's undefinability theorem (1936), Alfred Tarskihttps://plato.stanford.edu/entries/tarski-truth/[00:51:45] Boolean Algebra mathematical foundations, J. Donald Monkhttps://plato.stanford.edu/entries/boolalg-math/[01:02:35] Belief Revision Theory and AGM Postulates, Sven Ove Hanssonhttps://plato.stanford.edu/entries/logic-belief-revision/[01:05:35] Quantifier elimination in atomless boolean algebra, H. Jerome Keislerhttps://people.math.wisc.edu/~hkeisler/random.pdf[01:08:35] Quantifier elimination in Tau language specification, Ohad Asorhttps://tau.ai/Theories-and-Applications-of-Boolean-Algebras-0.29.pdf[01:11:50] Tau Net blockchain platformhttps://tau.net/[01:19:20] Tau blockchain's innovative approach treating blockchain code itself as a contracthttps://tau.net/Whitepaper.pdf
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
John Palazza - Vice President of Global Sales @ CentML ( sponsored)
10 mar· Machine Learning Street Talk (MLST)
John Palazza from CentML joins us in this sponsored interview to discuss the critical importance of infrastructure optimization in the age of Large Language Models and Generative AI. We explore how enterprises can transition from the innovation phase to production and scale, highlighting the significance of efficient GPU utilization and cost management. The conversation covers the open-source versus proprietary model debate, the rise of AI agents, and the need for platform independence to avoid vendor lock-in, as well as emerging trends in AI infrastructure and the pivotal role of strategic partnerships.
SPONSOR MESSAGES:
***
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!
https://centml.ai/pricing/
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.
Goto https://tufalabs.ai/
***
TRANSCRIPT:
https://www.dropbox.com/scl/fi/dnjsygrgdgq5ng5fdlfjg/JOHNPALAZZA.pdf?rlkey=hl9wyydi9mj077rbg5acdmo3a&dl=0
John Palazza:
Vice President of Global Sales @ CentML
https://www.linkedin.com/in/john-p-b34655/
TOC:
1. Enterprise AI Organization and Strategy
[00:00:00] 1.1 Organizational Structure and ML Ownership
[00:02:59] 1.2 Infrastructure Efficiency and GPU Utilization
[00:07:59] 1.3 Platform Centralization vs Team Autonomy
[00:11:32] 1.4 Enterprise AI Adoption Strategy and Leadership
2. MLOps Infrastructure and Resource Management
[00:15:08] 2.1 Technology Evolution and Enterprise Integration
[00:19:10] 2.2 Enterprise MLOps Platform Development
[00:22:15] 2.3 AI Interface Evolution and Agent-Based Solutions
[00:25:47] 2.4 CentML's Infrastructure Solutions
[00:30:00] 2.5 Workload Abstraction and Resource Allocation
3. LLM Infrastructure Optimization and Independence
[00:33:10] 3.1 GPU Optimization and Cost Efficiency
[00:36:47] 3.2 AI Efficiency and Innovation Challenges
[00:41:40] 3.3 Cloud Provider Strategy and Infrastructure Control
[00:46:52] 3.4 Platform Independence and Vendor Lock-in
[00:50:53] 3.5 Technical Innovation and Growth Strategy
REFS:
[00:01:25] Apple Acquires GraphLab, Apple Inc.
https://techcrunch.com/2016/08/05/apple-acquires-turi-a-machine-learning-company/
[00:03:50] Bain Tech Report 2024, Gartner
https://www.bain.com/insights/topics/technology-report/
[00:04:50] PaaS vs IaaS Efficiency, Gartner
https://www.gartner.com/en/newsroom/press-releases/2024-11-19-gartner-forecasts-worldwide-public-cloud-end-user-spending-to-total-723-billion-dollars-in-2025
[00:14:55] Fashion Quote, Oscar Wilde
https://www.amazon.com/Complete-Works-Oscar-Wilde-Collins/dp/0007144369
[00:15:30] PointCast Network, PointCast Inc.
https://en.wikipedia.org/wiki/Push_technology
[00:18:05] AI Bain Report, Bain & Company
https://www.bain.com/insights/how-generative-ai-changes-the-game-in-tech-services-tech-report-2024/
[00:20:40] Uber Michelangelo, Uber Engineering Team
https://www.uber.com/en-SE/blog/michelangelo-machine-learning-platform/
[00:20:50] Algorithmia Acquisition, DataRobot
https://www.datarobot.com/newsroom/press/datarobot-is-acquiring-algorithmia-enhancing-leading-mlops-architecture-for-the-enterprise/
[00:22:55] Fine Tuning vs RAG, Heydar Soudani, Evangelos Kanoulas & Faegheh Hasibi.
https://arxiv.org/html/2403.01432v2
[00:24:40] LLM Agent Survey, Lei Wang et al.
https://arxiv.org/abs/2308.11432
[00:26:30] CentML CServe, CentML
https://docs.centml.ai/apps/llm
[00:29:15] CentML Snowflake, Snowflake
https://www.snowflake.com/en/engineering-blog/optimize-llms-with-llama-snowflake-ai-stack/
[00:30:15] NVIDIA H100 GPU, NVIDIA
https://www.nvidia.com/en-us/data-center/h100/
[00:33:25] CentML\'s 60% savings, CentML
https://centml.ai/platform/
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Transformers Need Glasses! - Federico Barbero
8 mar· Machine Learning Street Talk (MLST)
Federico Barbero (DeepMind/Oxford) is the lead author of "Transformers Need Glasses!".
Have you ever wondered why LLMs struggle with seemingly simple tasks like counting or copying long strings of text? We break down the theoretical reasons behind these failures, revealing architectural bottlenecks and the challenges of maintaining information fidelity across extended contexts.
Federico explains how these issues are rooted in the transformer's design, drawing parallels to over-squashing in graph neural networks and detailing how the softmax function limits sharp decision-making.
But it's not all bad news! Discover practical "glasses" that can help transformers see more clearly, from simple input modifications to architectural tweaks.
SPONSOR MESSAGES:
***
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!
https://centml.ai/pricing/
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.
Goto https://tufalabs.ai/
***
https://federicobarbero.com/
TRANSCRIPT + RESEARCH:
https://www.dropbox.com/s/h7ys83ztwktqjje/Federico.pdf?dl=0
TOC:
1. Transformer Limitations: Token Detection & Representation
[00:00:00] 1.1 Transformers fail at single token detection
[00:02:45] 1.2 Representation collapse in transformers
[00:03:21] 1.3 Experiment: LLMs fail at copying last tokens
[00:18:00] 1.4 Attention sharpness limitations in transformers
2. Transformer Limitations: Information Flow & Quantization
[00:18:50] 2.1 Unidirectional information mixing
[00:18:50] 2.2 Unidirectional information flow towards sequence beginning in transformers
[00:21:50] 2.3 Diagonal attention heads as expensive no-ops in LAMA/Gemma
[00:27:14] 2.4 Sequence entropy affects transformer model distinguishability
[00:30:36] 2.5 Quantization limitations lead to information loss & representational collapse
[00:38:34] 2.6 LLMs use subitizing as opposed to counting algorithms
3. Transformers and the Nature of Reasoning
[00:40:30] 3.1 Turing completeness conditions in transformers
[00:43:23] 3.2 Transformers struggle with sequential tasks
[00:45:50] 3.3 Windowed attention as solution to information compression
[00:51:04] 3.4 Chess engines: mechanical computation vs creative reasoning
[01:00:35] 3.5 Epistemic foraging introduced
REFS:
[00:01:05] Transformers Need Glasses!, Barbero et al.
https://proceedings.neurips.cc/paper_files/paper/2024/file/b1d35561c4a4a0e0b6012b2af531e149-Paper-Conference.pdf
[00:05:30] Softmax is Not Enough, Veličković et al.
https://arxiv.org/abs/2410.01104
[00:11:30] Adv Alg Lecture 15, Chawla
https://pages.cs.wisc.edu/~shuchi/courses/787-F09/scribe-notes/lec15.pdf
[00:15:05] Graph Attention Networks, Veličković
https://arxiv.org/abs/1710.10903
[00:19:15] Extract Training Data, Carlini et al.
https://arxiv.org/pdf/2311.17035
[00:31:30] 1-bit LLMs, Ma et al.
https://arxiv.org/abs/2402.17764
[00:38:35] LLMs Solve Math, Nikankin et al.
https://arxiv.org/html/2410.21272v1
[00:38:45] Subitizing, Railo
https://link.springer.com/10.1007/978-1-4419-1428-6_578
[00:43:25] NN & Chomsky Hierarchy, Delétang et al.
https://arxiv.org/abs/2207.02098
[00:51:05] Measure of Intelligence, Chollet
https://arxiv.org/abs/1911.01547
[00:52:10] AlphaZero, Silver et al.
https://pubmed.ncbi.nlm.nih.gov/30523106/
[00:55:10] Golden Gate Claude, Anthropic
https://www.anthropic.com/news/golden-gate-claude
[00:56:40] Chess Positions, Chase & Simon
https://www.sciencedirect.com/science/article/abs/pii/0010028573900042
[01:00:35] Epistemic Foraging, Friston
https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2016.00056/full
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Sakana AI - Chris Lu, Robert Tjarko Lange, Cong Lu
1 mar· Machine Learning Street Talk (MLST)
We speak with Sakana AI, who are building nature-inspired methods that could fundamentally transform how we develop AI systems.
The guests include Chris Lu, a researcher who recently completed his DPhil at Oxford University under Prof. Jakob Foerster's supervision, where he focused on meta-learning and multi-agent systems. Chris is the first author of the DiscoPOP paper, which demonstrates how language models can discover and design better training algorithms. Also joining is Robert Tjarko Lange, a founding member of Sakana AI who specializes in evolutionary algorithms and large language models. Robert leads research at the intersection of evolutionary computation and foundation models, and is completing his PhD at TU Berlin on evolutionary meta-learning. The discussion also features Cong Lu, currently a Research Scientist at Google DeepMind's Open-Endedness team, who previously helped develop The AI Scientist and Intelligent Go-Explore.
SPONSOR MESSAGES:
***
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!
https://centml.ai/pricing/
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.
Goto https://tufalabs.ai/
***
* DiscoPOP - A framework where language models discover their own optimization algorithms
* EvoLLM - Using language models as evolution strategies for optimization
The AI Scientist - A fully automated system that conducts scientific research end-to-end
* Neural Attention Memory Models (NAMMs) - Evolved memory systems that make transformers both faster and more accurate
TRANSCRIPT + REFS:
https://www.dropbox.com/scl/fi/gflcyvnujp8cl7zlv3v9d/Sakana.pdf?rlkey=woaoo82943170jd4yyi2he71c&dl=0
Robert Tjarko Lange
https://roberttlange.com/
Chris Lu
https://chrislu.page/
Cong Lu
https://www.conglu.co.uk/
Sakana
https://sakana.ai/blog/
TOC:
1. LLMs for Algorithm Generation and Optimization
[00:00:00] 1.1 LLMs generating algorithms for training other LLMs
[00:04:00] 1.2 Evolutionary black-box optim using neural network loss parameterization
[00:11:50] 1.3 DiscoPOP: Non-convex loss function for noisy data
[00:20:45] 1.4 External entropy Injection for preventing Model collapse
[00:26:25] 1.5 LLMs for black-box optimization using abstract numerical sequences
2. Model Learning and Generalization
[00:31:05] 2.1 Fine-tuning on teacher algorithm trajectories
[00:31:30] 2.2 Transformers learning gradient descent
[00:33:00] 2.3 LLM tokenization biases towards specific numbers
[00:34:50] 2.4 LLMs as evolution strategies for black box optimization
[00:38:05] 2.5 DiscoPOP: LLMs discovering novel optimization algorithms
3. AI Agents and System Architectures
[00:51:30] 3.1 ARC challenge: Induction vs. transformer approaches
[00:54:35] 3.2 LangChain / modular agent components
[00:57:50] 3.3 Debate improves LLM truthfulness
[01:00:55] 3.4 Time limits controlling AI agent systems
[01:03:00] 3.5 Gemini: Million-token context enables flatter hierarchies
[01:04:05] 3.6 Agents follow own interest gradients
[01:09:50] 3.7 Go-Explore algorithm: archive-based exploration
[01:11:05] 3.8 Foundation models for interesting state discovery
[01:13:00] 3.9 LLMs leverage prior game knowledge
4. AI for Scientific Discovery and Human Alignment
[01:17:45] 4.1 Encoding Alignment & Aesthetics via Reward Functions
[01:20:00] 4.2 AI Scientist: Automated Open-Ended Scientific Discovery
[01:24:15] 4.3 DiscoPOP: LLM for Preference Optimization Algorithms
[01:28:30] 4.4 Balancing AI Knowledge with Human Understanding
[01:33:55] 4.5 AI-Driven Conferences and Paper Review
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Clement Bonnet - Can Latent Program Networks Solve Abstract Reasoning?
19 feb· Machine Learning Street Talk (MLST)
Clement Bonnet discusses his novel approach to the ARC (Abstraction and Reasoning Corpus) challenge. Unlike approaches that rely on fine-tuning LLMs or generating samples at inference time, Clement's method encodes input-output pairs into a latent space, optimizes this representation with a search algorithm, and decodes outputs for new inputs. This end-to-end architecture uses a VAE loss, including reconstruction and prior losses.
SPONSOR MESSAGES:
***
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!
https://centml.ai/pricing/
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.
Goto https://tufalabs.ai/
***
TRANSCRIPT + RESEARCH OVERVIEW:
https://www.dropbox.com/scl/fi/j7m0gaz1126y594gswtma/CLEMMLST.pdf?rlkey=y5qvwq2er5nchbcibm07rcfpq&dl=0
Clem and Matthew-
https://www.linkedin.com/in/clement-bonnet16/
https://github.com/clement-bonnet
https://mvmacfarlane.github.io/
TOC
1. LPN Fundamentals
[00:00:00] 1.1 Introduction to ARC Benchmark and LPN Overview
[00:05:05] 1.2 Neural Networks' Challenges with ARC and Program Synthesis
[00:06:55] 1.3 Induction vs Transduction in Machine Learning
2. LPN Architecture and Latent Space
[00:11:50] 2.1 LPN Architecture and Latent Space Implementation
[00:16:25] 2.2 LPN Latent Space Encoding and VAE Architecture
[00:20:25] 2.3 Gradient-Based Search Training Strategy
[00:23:39] 2.4 LPN Model Architecture and Implementation Details
3. Implementation and Scaling
[00:27:34] 3.1 Training Data Generation and re-ARC Framework
[00:31:28] 3.2 Limitations of Latent Space and Multi-Thread Search
[00:34:43] 3.3 Program Composition and Computational Graph Architecture
4. Advanced Concepts and Future Directions
[00:45:09] 4.1 AI Creativity and Program Synthesis Approaches
[00:49:47] 4.2 Scaling and Interpretability in Latent Space Models
REFS
[00:00:05] ARC benchmark, Chollet
https://arxiv.org/abs/2412.04604
[00:02:10] Latent Program Spaces, Bonnet, Macfarlane
https://arxiv.org/abs/2411.08706
[00:07:45] Kevin Ellis work on program generation
https://www.cs.cornell.edu/~ellisk/
[00:08:45] Induction vs transduction in abstract reasoning, Li et al.
https://arxiv.org/abs/2411.02272
[00:17:40] VAEs, Kingma, Welling
https://arxiv.org/abs/1312.6114
[00:27:50] re-ARC, Hodel
https://github.com/michaelhodel/re-arc
[00:29:40] Grid size in ARC tasks, Chollet
https://github.com/fchollet/ARC-AGI
[00:33:00] Critique of deep learning, Marcus
https://arxiv.org/vc/arxiv/papers/2002/2002.06177v1.pdf
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Prof. Jakob Foerster - ImageNet Moment for Reinforcement Learning?
18 feb· Machine Learning Street Talk (MLST)
Prof. Jakob Foerster, a leading AI researcher at Oxford University and Meta, and Chris Lu, a researcher at OpenAI -- they explain how AI is moving beyond just mimicking human behaviour to creating truly intelligent agents that can learn and solve problems on their own. Foerster champions open-source AI for responsible, decentralised development. He addresses AI scaling, goal misalignment (Goodhart's Law), and the need for holistic alignment, offering a quick look at the future of AI and how to guide it.
SPONSOR MESSAGES:
***
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!
https://centml.ai/pricing/
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.
Goto https://tufalabs.ai/
***
TRANSCRIPT/REFS:
https://www.dropbox.com/scl/fi/yqjszhntfr00bhjh6t565/JAKOB.pdf?rlkey=scvny4bnwj8th42fjv8zsfu2y&dl=0
Prof. Jakob Foerster
https://x.com/j_foerst
https://www.jakobfoerster.com/
University of Oxford Profile:
https://eng.ox.ac.uk/people/jakob-foerster/
Chris Lu:
https://chrislu.page/
TOC
1. GPU Acceleration and Training Infrastructure
[00:00:00] 1.1 ARC Challenge Criticism and FLAIR Lab Overview
[00:01:25] 1.2 GPU Acceleration and Hardware Lottery in RL
[00:05:50] 1.3 Data Wall Challenges and Simulation-Based Solutions
[00:08:40] 1.4 JAX Implementation and Technical Acceleration
2. Learning Frameworks and Policy Optimization
[00:14:18] 2.1 Evolution of RL Algorithms and Mirror Learning Framework
[00:15:25] 2.2 Meta-Learning and Policy Optimization Algorithms
[00:21:47] 2.3 Language Models and Benchmark Challenges
[00:28:15] 2.4 Creativity and Meta-Learning in AI Systems
3. Multi-Agent Systems and Decentralization
[00:31:24] 3.1 Multi-Agent Systems and Emergent Intelligence
[00:38:35] 3.2 Swarm Intelligence vs Monolithic AGI Systems
[00:42:44] 3.3 Democratic Control and Decentralization of AI Development
[00:46:14] 3.4 Open Source AI and Alignment Challenges
[00:49:31] 3.5 Collaborative Models for AI Development
REFS
[[00:00:05] ARC Benchmark, Chollet
https://github.com/fchollet/ARC-AGI
[00:03:05] DRL Doesn't Work, Irpan
https://www.alexirpan.com/2018/02/14/rl-hard.html
[00:05:55] AI Training Data, Data Provenance Initiative
https://www.nytimes.com/2024/07/19/technology/ai-data-restrictions.html
[00:06:10] JaxMARL, Foerster et al.
https://arxiv.org/html/2311.10090v5
[00:08:50] M-FOS, Lu et al.
https://arxiv.org/abs/2205.01447
[00:09:45] JAX Library, Google Research
https://github.com/jax-ml/jax
[00:12:10] Kinetix, Mike and Michael
https://arxiv.org/abs/2410.23208
[00:12:45] Genie 2, DeepMind
https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/
[00:14:42] Mirror Learning, Grudzien, Kuba et al.
https://arxiv.org/abs/2208.01682
[00:16:30] Discovered Policy Optimisation, Lu et al.
https://arxiv.org/abs/2210.05639
[00:24:10] Goodhart's Law, Goodhart
https://en.wikipedia.org/wiki/Goodhart%27s_law
[00:25:15] LLM ARChitect, Franzen et al.
https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf
[00:28:55] AlphaGo, Silver et al.
https://arxiv.org/pdf/1712.01815.pdf
[00:30:10] Meta-learning, Lu, Towers, Foerster
https://direct.mit.edu/isal/proceedings-pdf/isal2023/35/67/2354943/isal_a_00674.pdf
[00:31:30] Emergence of Pragmatics, Yuan et al.
https://arxiv.org/abs/2001.07752
[00:34:30] AI Safety, Amodei et al.
https://arxiv.org/abs/1606.06565
[00:35:45] Intentional Stance, Dennett
https://plato.stanford.edu/entries/ethics-ai/
[00:39:25] Multi-Agent RL, Zhou et al.
https://arxiv.org/pdf/2305.10091
[00:41:00] Open Source Generative AI, Foerster et al.
https://arxiv.org/abs/2405.08597
<trunc, see PDF/YT>
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Daniel Franzen & Jan Disselhoff - ARC Prize 2024 winners
12 feb· Machine Learning Street Talk (MLST)
Daniel Franzen and Jan Disselhoff, the "ARChitects" are the official winners of the ARC Prize 2024. Filmed at Tufa Labs in Zurich - they revealed how they achieved a remarkable 53.5% accuracy by creatively utilising large language models (LLMs) in new ways. Discover their innovative techniques, including depth-first search for token selection, test-time training, and a novel augmentation-based validation system. Their results were extremely surprising.
SPONSOR MESSAGES:
***
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!
https://centml.ai/pricing/
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.
Goto https://tufalabs.ai/
***
Jan Disselhoff
https://www.linkedin.com/in/jan-disselhoff-1423a2240/
Daniel Franzen
https://github.com/da-fr
ARC Prize: http://arcprize.org/
TRANSCRIPT AND BACKGROUND READING:
https://www.dropbox.com/scl/fi/utkn2i1ma79fn6an4yvjw/ARCHitects.pdf?rlkey=67pe38mtss7oyhjk2ad0d2aza&dl=0
TOC
1. Solution Architecture and Strategy Overview
[00:00:00] 1.1 Initial Solution Overview and Model Architecture
[00:04:25] 1.2 LLM Capabilities and Dataset Approach
[00:10:51] 1.3 Test-Time Training and Data Augmentation Strategies
[00:14:08] 1.4 Sampling Methods and Search Implementation
[00:17:52] 1.5 ARC vs Language Model Context Comparison
2. LLM Search and Model Implementation
[00:21:53] 2.1 LLM-Guided Search Approaches and Solution Validation
[00:27:04] 2.2 Symmetry Augmentation and Model Architecture
[00:30:11] 2.3 Model Intelligence Characteristics and Performance
[00:37:23] 2.4 Tokenization and Numerical Processing Challenges
3. Advanced Training and Optimization
[00:45:15] 3.1 DFS Token Selection and Probability Thresholds
[00:49:41] 3.2 Model Size and Fine-tuning Performance Trade-offs
[00:53:07] 3.3 LoRA Implementation and Catastrophic Forgetting Prevention
[00:56:10] 3.4 Training Infrastructure and Optimization Experiments
[01:02:34] 3.5 Search Tree Analysis and Entropy Distribution Patterns
REFS
[00:01:05] Winning ARC 2024 solution using 12B param model, Franzen, Disselhoff, Hartmann
https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf
[00:03:40] Robustness of analogical reasoning in LLMs, Melanie Mitchell
https://arxiv.org/html/2411.14215
[00:07:50] Re-ARC dataset generator for ARC task variations, Michael Hodel
https://github.com/michaelhodel/re-arc
[00:15:00] Analysis of search methods in LLMs (greedy, beam, DFS), Chen et al.
https://arxiv.org/html/2408.00724v2
[00:16:55] Language model reachability space exploration, University of Toronto
https://www.youtube.com/watch?v=Bpgloy1dDn0
[00:22:30] GPT-4 guided code solutions for ARC tasks, Ryan Greenblatt
https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt
[00:41:20] GPT tokenization approach for numbers, OpenAI
https://platform.openai.com/docs/guides/text-generation/tokenizer-examples
[00:46:25] DFS in AI search strategies, Russell & Norvig
https://www.amazon.com/Artificial-Intelligence-Modern-Approach-4th/dp/0134610997
[00:53:10] Paper on catastrophic forgetting in neural networks, Kirkpatrick et al.
https://www.pnas.org/doi/10.1073/pnas.1611835114
[00:54:00] LoRA for efficient fine-tuning of LLMs, Hu et al.
https://arxiv.org/abs/2106.09685
[00:57:20] NVIDIA H100 Tensor Core GPU specs, NVIDIA
https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/
[01:04:55] Original MCTS in computer Go, Yifan Jin
https://stanford.edu/~rezab/classes/cme323/S15/projects/montecarlo_search_tree_report.pdf
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Sepp Hochreiter - LSTM: The Comeback Story?
12 feb· Machine Learning Street Talk (MLST)
Sepp Hochreiter, the inventor of LSTM (Long Short-Term Memory) networks – a foundational technology in AI. Sepp discusses his journey, the origins of LSTM, and why he believes his latest work, XLSTM, could be the next big thing in AI, particularly for applications like robotics and industrial simulation. He also shares his controversial perspective on Large Language Models (LLMs) and why reasoning is a critical missing piece in current AI systems.
SPONSOR MESSAGES:
***
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!
https://centml.ai/pricing/
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.
Goto https://tufalabs.ai/
***
TRANSCRIPT AND BACKGROUND READING:
https://www.dropbox.com/scl/fi/n1vzm79t3uuss8xyinxzo/SEPPH.pdf?rlkey=fp7gwaopjk17uyvgjxekxrh5v&dl=0
Prof. Sepp Hochreiter
https://www.nx-ai.com/
https://x.com/hochreitersepp
https://scholar.google.at/citations?user=tvUH3WMAAAAJ&hl=en
TOC:
1. LLM Evolution and Reasoning Capabilities
[00:00:00] 1.1 LLM Capabilities and Limitations Debate
[00:03:16] 1.2 Program Generation and Reasoning in AI Systems
[00:06:30] 1.3 Human vs AI Reasoning Comparison
[00:09:59] 1.4 New Research Initiatives and Hybrid Approaches
2. LSTM Technical Architecture
[00:13:18] 2.1 LSTM Development History and Technical Background
[00:20:38] 2.2 LSTM vs RNN Architecture and Computational Complexity
[00:25:10] 2.3 xLSTM Architecture and Flash Attention Comparison
[00:30:51] 2.4 Evolution of Gating Mechanisms from Sigmoid to Exponential
3. Industrial Applications and Neuro-Symbolic AI
[00:40:35] 3.1 Industrial Applications and Fixed Memory Advantages
[00:42:31] 3.2 Neuro-Symbolic Integration and Pi AI Project
[00:46:00] 3.3 Integration of Symbolic and Neural AI Approaches
[00:51:29] 3.4 Evolution of AI Paradigms and System Thinking
[00:54:55] 3.5 AI Reasoning and Human Intelligence Comparison
[00:58:12] 3.6 NXAI Company and Industrial AI Applications
REFS:
[00:00:15] Seminal LSTM paper establishing Hochreiter's expertise (Hochreiter & Schmidhuber)
https://direct.mit.edu/neco/article-abstract/9/8/1735/6109/Long-Short-Term-Memory
[00:04:20] Kolmogorov complexity and program composition limitations (Kolmogorov)
https://link.springer.com/article/10.1007/BF02478259
[00:07:10] Limitations of LLM mathematical reasoning and symbolic integration (Various Authors)
https://www.arxiv.org/pdf/2502.03671
[00:09:05] AlphaGo’s Move 37 demonstrating creative AI (Google DeepMind)
https://deepmind.google/research/breakthroughs/alphago/
[00:10:15] New AI research lab in Zurich for fundamental LLM research (Benjamin Crouzier)
https://tufalabs.ai
[00:19:40] Introduction of xLSTM with exponential gating (Beck, Hochreiter, et al.)
https://arxiv.org/abs/2405.04517
[00:22:55] FlashAttention: fast & memory-efficient attention (Tri Dao et al.)
https://arxiv.org/abs/2205.14135
[00:31:00] Historical use of sigmoid/tanh activation in 1990s (James A. McCaffrey)
https://visualstudiomagazine.com/articles/2015/06/01/alternative-activation-functions.aspx
[00:36:10] Mamba 2 state space model architecture (Albert Gu et al.)
https://arxiv.org/abs/2312.00752
[00:46:00] Austria’s Pi AI project integrating symbolic & neural AI (Hochreiter et al.)
https://www.jku.at/en/institute-of-machine-learning/research/projects/
[00:48:10] Neuro-symbolic integration challenges in language models (Diego Calanzone et al.)
https://openreview.net/forum?id=7PGluppo4k
[00:49:30] JKU Linz’s historical and neuro-symbolic research (Sepp Hochreiter)
https://www.jku.at/en/news-events/news/detail/news/bilaterale-ki-projekt-unter-leitung-der-jku-erhaelt-fwf-cluster-of-excellence/
YT: https://www.youtube.com/watch?v=8u2pW2zZLCs
<truncated, see show notes/YT>
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Want to Understand Neural Networks? Think Elastic Origami! - Prof. Randall Balestriero
8 feb· Machine Learning Street Talk (MLST)
Professor Randall Balestriero joins us to discuss neural network geometry, spline theory, and emerging phenomena in deep learning, based on research presented at ICML. Topics include the delayed emergence of adversarial robustness in neural networks ("grokking"), geometric interpretations of neural networks via spline theory, and challenges in reconstruction learning. We also cover geometric analysis of Large Language Models (LLMs) for toxicity detection and the relationship between intrinsic dimensionality and model control in RLHF.
SPONSOR MESSAGES:
***
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.
https://centml.ai/pricing/
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?
Goto https://tufalabs.ai/
***
Randall Balestriero
https://x.com/randall_balestr
https://randallbalestriero.github.io/
Show notes and transcript: https://www.dropbox.com/scl/fi/3lufge4upq5gy0ug75j4a/RANDALLSHOW.pdf?rlkey=nbemgpa0jhawt1e86rx7372e4&dl=0
TOC:
- Introduction
- 00:00:00: Introduction
- Neural Network Geometry and Spline Theory
- 00:01:41: Neural Network Geometry and Spline Theory
- 00:07:41: Deep Networks Always Grok
- 00:11:39: Grokking and Adversarial Robustness
- 00:16:09: Double Descent and Catastrophic Forgetting
- Reconstruction Learning
- 00:18:49: Reconstruction Learning
- 00:24:15: Frequency Bias in Neural Networks
- Geometric Analysis of Neural Networks
- 00:29:02: Geometric Analysis of Neural Networks
- 00:34:41: Adversarial Examples and Region Concentration
- LLM Safety and Geometric Analysis
- 00:40:05: LLM Safety and Geometric Analysis
- 00:46:11: Toxicity Detection in LLMs
- 00:52:24: Intrinsic Dimensionality and Model Control
- 00:58:07: RLHF and High-Dimensional Spaces
- Conclusion
- 01:02:13: Neural Tangent Kernel
- 01:08:07: Conclusion
REFS:
[00:01:35] Humayun – Deep network geometry & input space partitioning
https://arxiv.org/html/2408.04809v1
[00:03:55] Balestriero & Paris – Linking deep networks to adaptive spline operators
https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf
[00:13:55] Song et al. – Gradient-based white-box adversarial attacks
https://arxiv.org/abs/2012.14965
[00:16:05] Humayun, Balestriero & Baraniuk – Grokking phenomenon & emergent robustness
https://arxiv.org/abs/2402.15555
[00:18:25] Humayun – Training dynamics & double descent via linear region evolution
https://arxiv.org/abs/2310.12977
[00:20:15] Balestriero – Power diagram partitions in DNN decision boundaries
https://arxiv.org/abs/1905.08443
[00:23:00] Frankle & Carbin – Lottery Ticket Hypothesis for network pruning
https://arxiv.org/abs/1803.03635
[00:24:00] Belkin et al. – Double descent phenomenon in modern ML
https://arxiv.org/abs/1812.11118
[00:25:55] Balestriero et al. – Batch normalization’s regularization effects
https://arxiv.org/pdf/2209.14778
[00:29:35] EU – EU AI Act 2024 with compute restrictions
https://www.lw.com/admin/upload/SiteAttachments/EU-AI-Act-Navigating-a-Brave-New-World.pdf
[00:39:30] Humayun, Balestriero & Baraniuk – SplineCam: Visualizing deep network geometry
https://openaccess.thecvf.com/content/CVPR2023/papers/Humayun_SplineCam_Exact_Visualization_and_Characterization_of_Deep_Network_Geometry_and_CVPR_2023_paper.pdf
[00:40:40] Carlini – Trade-offs between adversarial robustness and accuracy
https://arxiv.org/pdf/2407.20099
[00:44:55] Balestriero & LeCun – Limitations of reconstruction-based learning methods
https://openreview.net/forum?id=ez7w0Ss4g9
(truncated, see shownotes PDF)
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Nicholas Carlini (Google DeepMind)
25 jan· Machine Learning Street Talk (MLST)
Nicholas Carlini from Google DeepMind offers his view of AI security, emergent LLM capabilities, and his groundbreaking model-stealing research. He reveals how LLMs can unexpectedly excel at tasks like chess and discusses the security pitfalls of LLM-generated code.
SPONSOR MESSAGES:
***
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.
https://centml.ai/pricing/
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?
Goto https://tufalabs.ai/
***
Transcript: https://www.dropbox.com/scl/fi/lat7sfyd4k3g5k9crjpbf/CARLINI.pdf?rlkey=b7kcqbvau17uw6rksbr8ccd8v&dl=0
TOC:
1. ML Security Fundamentals
[00:00:00] 1.1 ML Model Reasoning and Security Fundamentals
[00:03:04] 1.2 ML Security Vulnerabilities and System Design
[00:08:22] 1.3 LLM Chess Capabilities and Emergent Behavior
[00:13:20] 1.4 Model Training, RLHF, and Calibration Effects
2. Model Evaluation and Research Methods
[00:19:40] 2.1 Model Reasoning and Evaluation Metrics
[00:24:37] 2.2 Security Research Philosophy and Methodology
[00:27:50] 2.3 Security Disclosure Norms and Community Differences
3. LLM Applications and Best Practices
[00:44:29] 3.1 Practical LLM Applications and Productivity Gains
[00:49:51] 3.2 Effective LLM Usage and Prompting Strategies
[00:53:03] 3.3 Security Vulnerabilities in LLM-Generated Code
4. Advanced LLM Research and Architecture
[00:59:13] 4.1 LLM Code Generation Performance and O(1) Labs Experience
[01:03:31] 4.2 Adaptation Patterns and Benchmarking Challenges
[01:10:10] 4.3 Model Stealing Research and Production LLM Architecture Extraction
REFS:
[00:01:15] Nicholas Carlini’s personal website & research profile (Google DeepMind, ML security) - https://nicholas.carlini.com/
[00:01:50] CentML AI compute platform for language model workloads - https://centml.ai/
[00:04:30] Seminal paper on neural network robustness against adversarial examples (Carlini & Wagner, 2016) - https://arxiv.org/abs/1608.04644
[00:05:20] Computer Fraud and Abuse Act (CFAA) – primary U.S. federal law on computer hacking liability - https://www.justice.gov/jm/jm-9-48000-computer-fraud
[00:08:30] Blog post: Emergent chess capabilities in GPT-3.5-turbo-instruct (Nicholas Carlini, Sept 2023) - https://nicholas.carlini.com/writing/2023/chess-llm.html
[00:16:10] Paper: “Self-Play Preference Optimization for Language Model Alignment” (Yue Wu et al., 2024) - https://arxiv.org/abs/2405.00675
[00:18:00] GPT-4 Technical Report: development, capabilities, and calibration analysis - https://arxiv.org/abs/2303.08774
[00:22:40] Historical shift from descriptive to algebraic chess notation (FIDE) - https://en.wikipedia.org/wiki/Descriptive_notation
[00:23:55] Analysis of distribution shift in ML (Hendrycks et al.) - https://arxiv.org/abs/2006.16241
[00:27:40] Nicholas Carlini’s essay “Why I Attack” (June 2024) – motivations for security research - https://nicholas.carlini.com/writing/2024/why-i-attack.html
[00:34:05] Google Project Zero’s 90-day vulnerability disclosure policy - https://googleprojectzero.blogspot.com/p/vulnerability-disclosure-policy.html
[00:51:15] Evolution of Google search syntax & user behavior (Daniel M. Russell) - https://www.amazon.com/Joy-Search-Google-Master-Information/dp/0262042878
[01:04:05] Rust’s ownership & borrowing system for memory safety - https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html
[01:10:05] Paper: “Stealing Part of a Production Language Model” (Carlini et al., March 2024) – extraction attacks on ChatGPT, PaLM-2 - https://arxiv.org/abs/2403.06634
[01:10:55] First model stealing paper (Tramèr et al., 2016) – attacking ML APIs via prediction - https://arxiv.org/abs/1609.02943
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Visa fler

Avsnitt

"Blurring Reality" - Chai's Social AI Platform (SPONSORED)

Google AlphaEvolve - Discovering new science (exclusive interview)

Prof. Randall Balestriero - LLMs without pretraining and SSL

How Machines Learn to Ignore the Noise (Kevin Ellis + Zenna Tavares)

Eiso Kant (CTO poolside) - Superhuman Coding Is Coming!

The Compendium - Connor Leahy and Gabriel Alfour

ARC Prize v2 Launch! (Francois Chollet and Mike Knoop)

Test-Time Adaptation: the key to reasoning with DL (Mohamed Osman)

GSMSymbolic paper - Iman Mirzadeh (Apple)

Reasoning, Robustness, and Human Feedback in AI - Max Bartolo (Cohere)

Tau Language: The Software Synthesis Future (sponsored)

John Palazza - Vice President of Global Sales @ CentML ( sponsored)

Transformers Need Glasses! - Federico Barbero

Sakana AI - Chris Lu, Robert Tjarko Lange, Cong Lu

Clement Bonnet - Can Latent Program Networks Solve Abstract Reasoning?

Prof. Jakob Foerster - ImageNet Moment for Reinforcement Learning?

Daniel Franzen & Jan Disselhoff - ARC Prize 2024 winners

Sepp Hochreiter - LSTM: The Comeback Story?

Want to Understand Neural Networks? Think Elastic Origami! - Prof. Randall Balestriero

Nicholas Carlini (Google DeepMind)