Avsnitt

  • Connor Leahy and Gabriel Alfour, AI researchers from Conjecture and authors of "The Compendium," joinus for a critical discussion centered on Artificial Superintelligence (ASI) safety and governance. Drawing from their comprehensive analysis in "The Compendium," they articulate a stark warning about the existential risks inherent in uncontrolled AI development, framing it through the lens of "intelligence domination"—where a sufficiently advanced AI could subordinate humanity, much like humans dominate less intelligent species.

    SPONSOR MESSAGES:

    ***

    Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

    Goto https://tufalabs.ai/

    ***

    TRANSCRIPT + REFS + NOTES:

    https://www.dropbox.com/scl/fi/p86l75y4o2ii40df5t7no/Compendium.pdf?rlkey=tukczgf3flw133sr9rgss0pnj&dl=0

    https://www.thecompendium.ai/

    https://en.wikipedia.org/wiki/Connor_Leahy

    https://www.conjecture.dev/about

    https://substack.com/@gabecc​

    TOC:

    1. AI Intelligence and Safety Fundamentals

    [00:00:00] 1.1 Understanding Intelligence and AI Capabilities

    [00:06:20] 1.2 Emergence of Intelligence and Regulatory Challenges

    [00:10:18] 1.3 Human vs Animal Intelligence Debate

    [00:18:00] 1.4 AI Regulation and Risk Assessment Approaches

    [00:26:14] 1.5 Competing AI Development Ideologies

    2. Economic and Social Impact

    [00:29:10] 2.1 Labor Market Disruption and Post-Scarcity Scenarios

    [00:32:40] 2.2 Institutional Frameworks and Tech Power Dynamics

    [00:37:40] 2.3 Ethical Frameworks and AI Governance Debates

    [00:40:52] 2.4 AI Alignment Evolution and Technical Challenges

    3. Technical Governance Framework

    [00:55:07] 3.1 Three Levels of AI Safety: Alignment, Corrigibility, and Boundedness

    [00:55:30] 3.2 Challenges of AI System Corrigibility and Constitutional Models

    [00:57:35] 3.3 Limitations of Current Boundedness Approaches

    [00:59:11] 3.4 Abstract Governance Concepts and Policy Solutions

    4. Democratic Implementation and Coordination

    [00:59:20] 4.1 Governance Design and Measurement Challenges

    [01:00:10] 4.2 Democratic Institutions and Experimental Governance

    [01:14:10] 4.3 Political Engagement and AI Safety Advocacy

    [01:25:30] 4.4 Practical AI Safety Measures and International Coordination

    CORE REFS:

    [00:01:45] The Compendium (2023), Leahy et al.

    https://pdf.thecompendium.ai/the_compendium.pdf

    [00:06:50] Geoffrey Hinton Leaves Google, BBC News

    https://www.bbc.com/news/world-us-canada-65452940

    [00:10:00] ARC-AGI, Chollet

    https://arcprize.org/arc-agi

    [00:13:25] A Brief History of Intelligence, Bennett

    https://www.amazon.com/Brief-History-Intelligence-Humans-Breakthroughs/dp/0063286343

    [00:25:35] Statement on AI Risk, Center for AI Safety

    https://www.safe.ai/work/statement-on-ai-risk

    [00:26:15] Machines of Love and Grace, Amodei

    https://darioamodei.com/machines-of-loving-grace

    [00:26:35] The Techno-Optimist Manifesto, Andreessen

    https://a16z.com/the-techno-optimist-manifesto/

    [00:31:55] Techno-Feudalism, Varoufakis

    https://www.amazon.co.uk/Technofeudalism-Killed-Capitalism-Yanis-Varoufakis/dp/1847927270

    [00:42:40] Introducing Superalignment, OpenAI

    https://openai.com/index/introducing-superalignment/

    [00:47:20] Three Laws of Robotics, Asimov

    https://www.britannica.com/topic/Three-Laws-of-Robotics

    [00:50:00] Symbolic AI (GOFAI), Haugeland

    https://en.wikipedia.org/wiki/Symbolic_artificial_intelligence

    [00:52:30] Intent Alignment, Christiano

    https://www.alignmentforum.org/posts/HEZgGBZTpT4Bov7nH/mapping-the-conceptual-territory-in-ai-existential-safety

    [00:55:10] Large Language Model Alignment: A Survey, Jiang et al.

    http://arxiv.org/pdf/2309.15025

    [00:55:40] Constitutional Checks and Balances, Bok

    https://plato.stanford.edu/entries/montesquieu/

    <trunc, see PDF>

  • We are joined by Francois Chollet and Mike Knoop, to launch the new version of the ARC prize! In version 2, the challenges have been calibrated with humans such that at least 2 humans could solve each task in a reasonable task, but also adversarially selected so that frontier reasoning models can't solve them. The best LLMs today get negligible performance on this challenge.

    https://arcprize.org/

    SPONSOR MESSAGES:

    ***

    Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

    Goto https://tufalabs.ai/

    ***

    TRANSCRIPT:

    https://www.dropbox.com/scl/fi/0v9o8xcpppdwnkntj59oi/ARCv2.pdf?rlkey=luqb6f141976vra6zdtptv5uj&dl=0

    TOC:

    1. ARC v2 Core Design & Objectives

    [00:00:00] 1.1 ARC v2 Launch and Benchmark Architecture

    [00:03:16] 1.2 Test-Time Optimization and AGI Assessment

    [00:06:24] 1.3 Human-AI Capability Analysis

    [00:13:02] 1.4 OpenAI o3 Initial Performance Results

    2. ARC Technical Evolution

    [00:17:20] 2.1 ARC-v1 to ARC-v2 Design Improvements

    [00:21:12] 2.2 Human Validation Methodology

    [00:26:05] 2.3 Task Design and Gaming Prevention

    [00:29:11] 2.4 Intelligence Measurement Framework

    3. O3 Performance & Future Challenges

    [00:38:50] 3.1 O3 Comprehensive Performance Analysis

    [00:43:40] 3.2 System Limitations and Failure Modes

    [00:49:30] 3.3 Program Synthesis Applications

    [00:53:00] 3.4 Future Development Roadmap

    REFS:

    [00:00:15] On the Measure of Intelligence, François Chollet

    https://arxiv.org/abs/1911.01547

    [00:06:45] ARC Prize Foundation, François Chollet, Mike Knoop

    https://arcprize.org/

    [00:12:50] OpenAI o3 model performance on ARC v1, ARC Prize Team

    https://arcprize.org/blog/oai-o3-pub-breakthrough

    [00:18:30] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jason Wei et al.

    https://arxiv.org/abs/2201.11903

    [00:21:45] ARC-v2 benchmark tasks, Mike Knoop

    https://arcprize.org/blog/introducing-arc-agi-public-leaderboard

    [00:26:05] ARC Prize 2024: Technical Report, Francois Chollet et al.

    https://arxiv.org/html/2412.04604v2

    [00:32:45] ARC Prize 2024 Technical Report, Francois Chollet, Mike Knoop, Gregory Kamradt

    https://arxiv.org/abs/2412.04604

    [00:48:55] The Bitter Lesson, Rich Sutton

    http://www.incompleteideas.net/IncIdeas/BitterLesson.html

    [00:53:30] Decoding strategies in neural text generation, Sina Zarrieß

    https://www.mdpi.com/2078-2489/12/9/355/pdf

  • Saknas det avsnitt?

    Klicka här för att uppdatera flödet manuellt.

  • Mohamed Osman joins to discuss MindsAI's highest scoring entry to the ARC challenge 2024 and the paradigm of test-time fine-tuning. They explore how the team, now part of Tufa Labs in Zurich, achieved state-of-the-art results using a combination of pre-training techniques, a unique meta-learning strategy, and an ensemble voting mechanism. Mohamed emphasizes the importance of raw data input and flexibility of the network.

    SPONSOR MESSAGES:

    ***

    Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

    Goto https://tufalabs.ai/

    ***

    TRANSCRIPT + REFS:

    https://www.dropbox.com/scl/fi/jeavyqidsjzjgjgd7ns7h/MoFInal.pdf?rlkey=cjjmo7rgtenxrr3b46nk6yq2e&dl=0

    Mohamed Osman (Tufa Labs)

    https://x.com/MohamedOsmanML

    Jack Cole (Tufa Labs)

    https://x.com/MindsAI_Jack

    How and why deep learning for ARC paper:

    https://github.com/MohamedOsman1998/deep-learning-for-arc/blob/main/deep_learning_for_arc.pdf

    TOC:

    1. Abstract Reasoning Foundations

    [00:00:00] 1.1 Test-Time Fine-Tuning and ARC Challenge Overview

    [00:10:20] 1.2 Neural Networks vs Programmatic Approaches to Reasoning

    [00:13:23] 1.3 Code-Based Learning and Meta-Model Architecture

    [00:20:26] 1.4 Technical Implementation with Long T5 Model

    2. ARC Solution Architectures

    [00:24:10] 2.1 Test-Time Tuning and Voting Methods for ARC Solutions

    [00:27:54] 2.2 Model Generalization and Function Generation Challenges

    [00:32:53] 2.3 Input Representation and VLM Limitations

    [00:36:21] 2.4 Architecture Innovation and Cross-Modal Integration

    [00:40:05] 2.5 Future of ARC Challenge and Program Synthesis Approaches

    3. Advanced Systems Integration

    [00:43:00] 3.1 DreamCoder Evolution and LLM Integration

    [00:50:07] 3.2 MindsAI Team Progress and Acquisition by Tufa Labs

    [00:54:15] 3.3 ARC v2 Development and Performance Scaling

    [00:58:22] 3.4 Intelligence Benchmarks and Transformer Limitations

    [01:01:50] 3.5 Neural Architecture Optimization and Processing Distribution

    REFS:

    [00:01:32] Original ARC challenge paper, François Chollet

    https://arxiv.org/abs/1911.01547

    [00:06:55] DreamCoder, Kevin Ellis et al.

    https://arxiv.org/abs/2006.08381

    [00:12:50] Deep Learning with Python, François Chollet

    https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438

    [00:13:35] Deep Learning with Python, François Chollet

    https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438

    [00:13:35] Influence of pretraining data for reasoning, Laura Ruis

    https://arxiv.org/abs/2411.12580

    [00:17:50] Latent Program Networks, Clement Bonnet

    https://arxiv.org/html/2411.08706v1

    [00:20:50] T5, Colin Raffel et al.

    https://arxiv.org/abs/1910.10683

    [00:30:30] Combining Induction and Transduction for Abstract Reasoning, Wen-Ding Li, Kevin Ellis et al.

    https://arxiv.org/abs/2411.02272

    [00:34:15] Six finger problem, Chen et al.

    https://openaccess.thecvf.com/content/CVPR2024/papers/Chen_SpatialVLM_Endowing_Vision-Language_Models_with_Spatial_Reasoning_Capabilities_CVPR_2024_paper.pdf

    [00:38:15] DeepSeek-R1-Distill-Llama, DeepSeek AI

    https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B

    [00:40:10] ARC Prize 2024 Technical Report, François Chollet et al.

    https://arxiv.org/html/2412.04604v2

    [00:45:20] LLM-Guided Compositional Program Synthesis, Wen-Ding Li and Kevin Ellis

    https://arxiv.org/html/2503.15540

    [00:54:25] Abstraction and Reasoning Corpus, François Chollet

    https://github.com/fchollet/ARC-AGI

    [00:57:10] O3 breakthrough on ARC-AGI, OpenAI

    https://arcprize.org/

    [00:59:35] ConceptARC Benchmark, Arseny Moskvichev, Melanie Mitchell

    https://arxiv.org/abs/2305.07141

    [01:02:05] Mixtape: Breaking the Softmax Bottleneck Efficiently, Yang, Zhilin and Dai, Zihang and Salakhutdinov, Ruslan and Cohen, William W.

    http://papers.neurips.cc/paper/9723-mixtape-breaking-the-softmax-bottleneck-efficiently.pdf

  • Iman Mirzadeh from Apple, who recently published the GSM-Symbolic paper discusses the crucial distinction between intelligence and achievement in AI systems. He critiques current AI research methodologies, highlighting the limitations of Large Language Models (LLMs) in reasoning and knowledge representation.

    SPONSOR MESSAGES:

    ***

    Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

    Goto https://tufalabs.ai/

    ***

    TRANSCRIPT + RESEARCH:

    https://www.dropbox.com/scl/fi/mlcjl9cd5p1kem4l0vqd3/IMAN.pdf?rlkey=dqfqb74zr81a5gqr8r6c8isg3&dl=0

    TOC:

    1. Intelligence vs Achievement in AI Systems

    [00:00:00] 1.1 Intelligence vs Achievement Metrics in AI Systems

    [00:03:27] 1.2 AlphaZero and Abstract Understanding in Chess

    [00:10:10] 1.3 Language Models and Distribution Learning Limitations

    [00:14:47] 1.4 Research Methodology and Theoretical Frameworks

    2. Intelligence Measurement and Learning

    [00:24:24] 2.1 LLM Capabilities: Interpolation vs True Reasoning

    [00:29:00] 2.2 Intelligence Definition and Measurement Approaches

    [00:34:35] 2.3 Learning Capabilities and Agency in AI Systems

    [00:39:26] 2.4 Abstract Reasoning and Symbol Understanding

    3. LLM Performance and Evaluation

    [00:47:15] 3.1 Scaling Laws and Fundamental Limitations

    [00:54:33] 3.2 Connectionism vs Symbolism Debate in Neural Networks

    [00:58:09] 3.3 GSM-Symbolic: Testing Mathematical Reasoning in LLMs

    [01:08:38] 3.4 Benchmark Evaluation and Model Performance Assessment

    REFS:

    [00:01:00] AlphaZero chess AI system, Silver et al.

    https://arxiv.org/abs/1712.01815

    [00:07:10] Game Changer: AlphaZero's Groundbreaking Chess Strategies, Sadler & Regan

    https://www.amazon.com/Game-Changer-AlphaZeros-Groundbreaking-Strategies/dp/9056918184

    [00:11:35] Cross-entropy loss in language modeling, Voita

    http://lena-voita.github.io/nlp_course/language_modeling.html

    [00:17:20] GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in LLMs, Mirzadeh et al.

    https://arxiv.org/abs/2410.05229

    [00:21:25] Connectionism and Cognitive Architecture: A Critical Analysis, Fodor & Pylyshyn

    https://www.sciencedirect.com/science/article/pii/001002779090014B

    [00:28:55] Brain-to-body mass ratio scaling laws, Sutskever

    https://www.theverge.com/2024/12/13/24320811/what-ilya-sutskever-sees-openai-model-data-training

    [00:29:40] On the Measure of Intelligence, Chollet

    https://arxiv.org/abs/1911.01547

    [00:33:30] On definition of intelligence, Gignac et al.

    https://www.sciencedirect.com/science/article/pii/S0160289624000266

    [00:35:30] Defining intelligence, Wang

    https://cis.temple.edu/~wangp/papers.html

    [00:37:40] How We Learn: Why Brains Learn Better Than Any Machine... for Now, Dehaene

    https://www.amazon.com/How-We-Learn-Brains-Machine/dp/0525559884

    [00:39:35] Surfaces and Essences: Analogy as the Fuel and Fire of Thinking, Hofstadter and Sander

    https://www.amazon.com/Surfaces-Essences-Analogy-Fuel-Thinking/dp/0465018475

    [00:43:15] Chain-of-thought prompting, Wei et al.

    https://arxiv.org/abs/2201.11903

    [00:47:20] Test-time scaling laws in machine learning, Brown

    https://podcasts.apple.com/mv/podcast/openais-noam-brown-ilge-akkaya-and-hunter-lightman-on/id1750736528?i=1000671532058

    [00:47:50] Scaling Laws for Neural Language Models, Kaplan et al.

    https://arxiv.org/abs/2001.08361

    [00:55:15] Tensor product variable binding, Smolensky

    https://www.sciencedirect.com/science/article/abs/pii/000437029090007M

    [01:08:45] GSM-8K dataset, OpenAI

    https://huggingface.co/datasets/openai/gsm8k

  • Dr. Max Bartolo from Cohere discusses machine learning model development, evaluation, and robustness. Key topics include model reasoning, the DynaBench platform for dynamic benchmarking, data-centric AI development, model training challenges, and the limitations of human feedback mechanisms. The conversation also covers technical aspects like influence functions, model quantization, and the PRISM project.

    Max Bartolo (Cohere):

    https://www.maxbartolo.com/

    https://cohere.com/command

    TRANSCRIPT:

    https://www.dropbox.com/scl/fi/vujxscaffw37pqgb6hpie/MAXB.pdf?rlkey=0oqjxs5u49eqa2m7uaol64lbw&dl=0

    TOC:

    1. Model Reasoning and Verification

    [00:00:00] 1.1 Model Consistency and Reasoning Verification

    [00:03:25] 1.2 Influence Functions and Distributed Knowledge Analysis

    [00:10:28] 1.3 AI Application Development and Model Deployment

    [00:14:24] 1.4 AI Alignment and Human Feedback Limitations

    2. Evaluation and Bias Assessment

    [00:20:15] 2.1 Human Evaluation Challenges and Factuality Assessment

    [00:27:15] 2.2 Cultural and Demographic Influences on Model Behavior

    [00:32:43] 2.3 Adversarial Examples and Model Robustness

    3. Benchmarking Systems and Methods

    [00:41:54] 3.1 DynaBench and Dynamic Benchmarking Approaches

    [00:50:02] 3.2 Benchmarking Challenges and Alternative Metrics

    [00:50:33] 3.3 Evolution of Model Benchmarking Methods

    [00:51:15] 3.4 Hierarchical Capability Testing Framework

    [00:52:35] 3.5 Benchmark Platforms and Tools

    4. Model Architecture and Performance

    [00:55:15] 4.1 Cohere's Model Development Process

    [01:00:26] 4.2 Model Quantization and Performance Evaluation

    [01:05:18] 4.3 Reasoning Capabilities and Benchmark Standards

    [01:08:27] 4.4 Training Progression and Technical Challenges

    5. Future Directions and Challenges

    [01:13:48] 5.1 Context Window Evolution and Trade-offs

    [01:22:47] 5.2 Enterprise Applications and Future Challenges

    REFS:

    [00:03:10] Research at Cohere with Laura Ruis et al., Max Bartolo, Laura Ruis et al.

    https://cohere.com/research/papers/procedural-knowledge-in-pretraining-drives-reasoning-in-large-language-models-2024-11-20

    [00:04:15] Influence functions in machine learning, Koh & Liang

    https://arxiv.org/abs/1703.04730

    [00:08:05] Studying Large Language Model Generalization with Influence Functions, Roger Grosse et al.

    https://storage.prod.researchhub.com/uploads/papers/2023/08/08/2308.03296.pdf

    [00:11:10] The LLM ARChitect: Solving ARC-AGI Is A Matter of Perspective, Daniel Franzen, Jan Disselhoff, and David Hartmann

    https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf

    [00:12:10] Hugging Face model repo for C4AI Command A, Cohere and Cohere For AI

    https://huggingface.co/CohereForAI/c4ai-command-a-03-2025

    [00:13:30] OpenInterpreter

    https://github.com/KillianLucas/open-interpreter

    [00:16:15] Human Feedback is not Gold Standard, Tom Hosking, Max Bartolo, Phil Blunsom

    https://arxiv.org/abs/2309.16349

    [00:27:15] The PRISM Alignment Dataset, Hannah Kirk et al.

    https://arxiv.org/abs/2404.16019

    [00:32:50] How adversarial examples arise, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, Aleksander Madry

    https://arxiv.org/abs/1905.02175

    [00:43:00] DynaBench platform paper, Douwe Kiela et al.

    https://aclanthology.org/2021.naacl-main.324.pdf

    [00:50:15] Sara Hooker's work on compute limitations, Sara Hooker

    https://arxiv.org/html/2407.05694v1

    [00:53:25] DataPerf: Community-led benchmark suite, Mazumder et al.

    https://arxiv.org/abs/2207.10062

    [01:04:35] DROP, Dheeru Dua et al.

    https://arxiv.org/abs/1903.00161

    [01:07:05] GSM8k, Cobbe et al.

    https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k

    [01:09:30] ARC, François Chollet

    https://github.com/fchollet/ARC-AGI

    [01:15:50] Command A, Cohere

    https://cohere.com/blog/command-a

    [01:22:55] Enterprise search using LLMs, Cohere

    https://cohere.com/blog/commonly-asked-questions-about-search-from-coheres-enterprise-customers

  • This sponsored episode features mathematician Ohad Asor discussing logical approaches to AI, focusing on the limitations of machine learning and introducing the Tau language for software development and blockchain tech. Asor argues that machine learning cannot guarantee correctness. Tau allows logical specification of software requirements, automatically creating provably correct implementations with potential to revolutionize distributed systems. The discussion highlights program synthesis, software updates, and applications in finance and governance.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT + RESEARCH:https://www.dropbox.com/scl/fi/t849j6v1juk3gc15g4rsy/TAU.pdf?rlkey=hh11h2mhog3ncdbeapbzpzctc&dl=0Tau:https://tau.net/Tau Language:https://tau.ai/tau-language/Research:https://tau.net/Theories-and-Applications-of-Boolean-Algebras-0.29.pdfTOC:1. Machine Learning Foundations and Limitations [00:00:00] 1.1 Fundamental Limitations of Machine Learning and PAC Learning Theory [00:04:50] 1.2 Transductive Learning and the Three Curses of Machine Learning [00:08:57] 1.3 Language, Reality, and AI System Design [00:12:58] 1.4 Program Synthesis and Formal Verification Approaches2. Logical Programming Architecture [00:31:55] 2.1 Safe AI Development Requirements [00:32:05] 2.2 Self-Referential Language Architecture [00:32:50] 2.3 Boolean Algebra and Logical Foundations [00:37:52] 2.4 SAT Solvers and Complexity Challenges [00:44:30] 2.5 Program Synthesis and Specification [00:47:39] 2.6 Overcoming Tarski's Undefinability with Boolean Algebra [00:56:05] 2.7 Tau Language Implementation and User Control3. Blockchain-Based Software Governance [01:09:10] 3.1 User Control and Software Governance Mechanisms [01:18:27] 3.2 Tau's Blockchain Architecture and Meta-Programming Capabilities [01:21:43] 3.3 Development Status and Token Implementation [01:24:52] 3.4 Consensus Building and Opinion Mapping System [01:35:29] 3.5 Automation and Financial ApplicationsCORE REFS (more in pinned comment):[00:03:45] PAC (Probably Approximately Correct) Learning framework, Leslie Valianthttps://en.wikipedia.org/wiki/Probably_approximately_correct_learning[00:06:10] Boolean Satisfiability Problem (SAT), Varioushttps://en.wikipedia.org/wiki/Boolean_satisfiability_problem[00:13:55] Knowledge as Justified True Belief (JTB), Matthias Steuphttps://plato.stanford.edu/entries/epistemology/[00:17:50] Wittgenstein's concept of the limits of language, Ludwig Wittgensteinhttps://plato.stanford.edu/entries/wittgenstein/[00:21:25] Boolean algebras, Ohad Osorhttps://tau.net/tau-language-research/[00:26:10] The Halting Problemhttps://plato.stanford.edu/entries/turing-machine/#HaltProb[00:30:25] Alfred Tarski (1901-1983), Mario Gómez-Torrentehttps://plato.stanford.edu/entries/tarski/[00:41:50] DPLLhttps://www.cs.princeton.edu/~zkincaid/courses/fall18/readings/SATHandbook-CDCL.pdf[00:49:50] Tarski's undefinability theorem (1936), Alfred Tarskihttps://plato.stanford.edu/entries/tarski-truth/[00:51:45] Boolean Algebra mathematical foundations, J. Donald Monkhttps://plato.stanford.edu/entries/boolalg-math/[01:02:35] Belief Revision Theory and AGM Postulates, Sven Ove Hanssonhttps://plato.stanford.edu/entries/logic-belief-revision/[01:05:35] Quantifier elimination in atomless boolean algebra, H. Jerome Keislerhttps://people.math.wisc.edu/~hkeisler/random.pdf[01:08:35] Quantifier elimination in Tau language specification, Ohad Asorhttps://tau.ai/Theories-and-Applications-of-Boolean-Algebras-0.29.pdf[01:11:50] Tau Net blockchain platformhttps://tau.net/[01:19:20] Tau blockchain's innovative approach treating blockchain code itself as a contracthttps://tau.net/Whitepaper.pdf

  • John Palazza from CentML joins us in this sponsored interview to discuss the critical importance of infrastructure optimization in the age of Large Language Models and Generative AI. We explore how enterprises can transition from the innovation phase to production and scale, highlighting the significance of efficient GPU utilization and cost management. The conversation covers the open-source versus proprietary model debate, the rise of AI agents, and the need for platform independence to avoid vendor lock-in, as well as emerging trends in AI infrastructure and the pivotal role of strategic partnerships.

    SPONSOR MESSAGES:

    ***

    CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

    https://centml.ai/pricing/

    Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

    Goto https://tufalabs.ai/

    ***

    TRANSCRIPT:

    https://www.dropbox.com/scl/fi/dnjsygrgdgq5ng5fdlfjg/JOHNPALAZZA.pdf?rlkey=hl9wyydi9mj077rbg5acdmo3a&dl=0

    John Palazza:

    Vice President of Global Sales @ CentML

    https://www.linkedin.com/in/john-p-b34655/

    TOC:

    1. Enterprise AI Organization and Strategy

    [00:00:00] 1.1 Organizational Structure and ML Ownership

    [00:02:59] 1.2 Infrastructure Efficiency and GPU Utilization

    [00:07:59] 1.3 Platform Centralization vs Team Autonomy

    [00:11:32] 1.4 Enterprise AI Adoption Strategy and Leadership

    2. MLOps Infrastructure and Resource Management

    [00:15:08] 2.1 Technology Evolution and Enterprise Integration

    [00:19:10] 2.2 Enterprise MLOps Platform Development

    [00:22:15] 2.3 AI Interface Evolution and Agent-Based Solutions

    [00:25:47] 2.4 CentML's Infrastructure Solutions

    [00:30:00] 2.5 Workload Abstraction and Resource Allocation

    3. LLM Infrastructure Optimization and Independence

    [00:33:10] 3.1 GPU Optimization and Cost Efficiency

    [00:36:47] 3.2 AI Efficiency and Innovation Challenges

    [00:41:40] 3.3 Cloud Provider Strategy and Infrastructure Control

    [00:46:52] 3.4 Platform Independence and Vendor Lock-in

    [00:50:53] 3.5 Technical Innovation and Growth Strategy

    REFS:

    [00:01:25] Apple Acquires GraphLab, Apple Inc.

    https://techcrunch.com/2016/08/05/apple-acquires-turi-a-machine-learning-company/

    [00:03:50] Bain Tech Report 2024, Gartner

    https://www.bain.com/insights/topics/technology-report/

    [00:04:50] PaaS vs IaaS Efficiency, Gartner

    https://www.gartner.com/en/newsroom/press-releases/2024-11-19-gartner-forecasts-worldwide-public-cloud-end-user-spending-to-total-723-billion-dollars-in-2025

    [00:14:55] Fashion Quote, Oscar Wilde

    https://www.amazon.com/Complete-Works-Oscar-Wilde-Collins/dp/0007144369

    [00:15:30] PointCast Network, PointCast Inc.

    https://en.wikipedia.org/wiki/Push_technology

    [00:18:05] AI Bain Report, Bain & Company

    https://www.bain.com/insights/how-generative-ai-changes-the-game-in-tech-services-tech-report-2024/

    [00:20:40] Uber Michelangelo, Uber Engineering Team

    https://www.uber.com/en-SE/blog/michelangelo-machine-learning-platform/

    [00:20:50] Algorithmia Acquisition, DataRobot

    https://www.datarobot.com/newsroom/press/datarobot-is-acquiring-algorithmia-enhancing-leading-mlops-architecture-for-the-enterprise/

    [00:22:55] Fine Tuning vs RAG, Heydar Soudani, Evangelos Kanoulas & Faegheh Hasibi.

    https://arxiv.org/html/2403.01432v2

    [00:24:40] LLM Agent Survey, Lei Wang et al.

    https://arxiv.org/abs/2308.11432

    [00:26:30] CentML CServe, CentML

    https://docs.centml.ai/apps/llm

    [00:29:15] CentML Snowflake, Snowflake

    https://www.snowflake.com/en/engineering-blog/optimize-llms-with-llama-snowflake-ai-stack/

    [00:30:15] NVIDIA H100 GPU, NVIDIA

    https://www.nvidia.com/en-us/data-center/h100/

    [00:33:25] CentML\'s 60% savings, CentML

    https://centml.ai/platform/

  • Federico Barbero (DeepMind/Oxford) is the lead author of "Transformers Need Glasses!".

    Have you ever wondered why LLMs struggle with seemingly simple tasks like counting or copying long strings of text? We break down the theoretical reasons behind these failures, revealing architectural bottlenecks and the challenges of maintaining information fidelity across extended contexts.

    Federico explains how these issues are rooted in the transformer's design, drawing parallels to over-squashing in graph neural networks and detailing how the softmax function limits sharp decision-making.

    But it's not all bad news! Discover practical "glasses" that can help transformers see more clearly, from simple input modifications to architectural tweaks.

    SPONSOR MESSAGES:

    ***

    CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

    https://centml.ai/pricing/

    Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

    Goto https://tufalabs.ai/

    ***

    https://federicobarbero.com/

    TRANSCRIPT + RESEARCH:

    https://www.dropbox.com/s/h7ys83ztwktqjje/Federico.pdf?dl=0

    TOC:

    1. Transformer Limitations: Token Detection & Representation

    [00:00:00] 1.1 Transformers fail at single token detection

    [00:02:45] 1.2 Representation collapse in transformers

    [00:03:21] 1.3 Experiment: LLMs fail at copying last tokens

    [00:18:00] 1.4 Attention sharpness limitations in transformers

    2. Transformer Limitations: Information Flow & Quantization

    [00:18:50] 2.1 Unidirectional information mixing

    [00:18:50] 2.2 Unidirectional information flow towards sequence beginning in transformers

    [00:21:50] 2.3 Diagonal attention heads as expensive no-ops in LAMA/Gemma

    [00:27:14] 2.4 Sequence entropy affects transformer model distinguishability

    [00:30:36] 2.5 Quantization limitations lead to information loss & representational collapse

    [00:38:34] 2.6 LLMs use subitizing as opposed to counting algorithms

    3. Transformers and the Nature of Reasoning

    [00:40:30] 3.1 Turing completeness conditions in transformers

    [00:43:23] 3.2 Transformers struggle with sequential tasks

    [00:45:50] 3.3 Windowed attention as solution to information compression

    [00:51:04] 3.4 Chess engines: mechanical computation vs creative reasoning

    [01:00:35] 3.5 Epistemic foraging introduced

    REFS:

    [00:01:05] Transformers Need Glasses!, Barbero et al.

    https://proceedings.neurips.cc/paper_files/paper/2024/file/b1d35561c4a4a0e0b6012b2af531e149-Paper-Conference.pdf

    [00:05:30] Softmax is Not Enough, Veličković et al.

    https://arxiv.org/abs/2410.01104

    [00:11:30] Adv Alg Lecture 15, Chawla

    https://pages.cs.wisc.edu/~shuchi/courses/787-F09/scribe-notes/lec15.pdf

    [00:15:05] Graph Attention Networks, Veličković

    https://arxiv.org/abs/1710.10903

    [00:19:15] Extract Training Data, Carlini et al.

    https://arxiv.org/pdf/2311.17035

    [00:31:30] 1-bit LLMs, Ma et al.

    https://arxiv.org/abs/2402.17764

    [00:38:35] LLMs Solve Math, Nikankin et al.

    https://arxiv.org/html/2410.21272v1

    [00:38:45] Subitizing, Railo

    https://link.springer.com/10.1007/978-1-4419-1428-6_578

    [00:43:25] NN & Chomsky Hierarchy, Delétang et al.

    https://arxiv.org/abs/2207.02098

    [00:51:05] Measure of Intelligence, Chollet

    https://arxiv.org/abs/1911.01547

    [00:52:10] AlphaZero, Silver et al.

    https://pubmed.ncbi.nlm.nih.gov/30523106/

    [00:55:10] Golden Gate Claude, Anthropic

    https://www.anthropic.com/news/golden-gate-claude

    [00:56:40] Chess Positions, Chase & Simon

    https://www.sciencedirect.com/science/article/abs/pii/0010028573900042

    [01:00:35] Epistemic Foraging, Friston

    https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2016.00056/full

  • We speak with Sakana AI, who are building nature-inspired methods that could fundamentally transform how we develop AI systems.

    The guests include Chris Lu, a researcher who recently completed his DPhil at Oxford University under Prof. Jakob Foerster's supervision, where he focused on meta-learning and multi-agent systems. Chris is the first author of the DiscoPOP paper, which demonstrates how language models can discover and design better training algorithms. Also joining is Robert Tjarko Lange, a founding member of Sakana AI who specializes in evolutionary algorithms and large language models. Robert leads research at the intersection of evolutionary computation and foundation models, and is completing his PhD at TU Berlin on evolutionary meta-learning. The discussion also features Cong Lu, currently a Research Scientist at Google DeepMind's Open-Endedness team, who previously helped develop The AI Scientist and Intelligent Go-Explore.

    SPONSOR MESSAGES:

    ***

    CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

    https://centml.ai/pricing/

    Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

    Goto https://tufalabs.ai/

    ***

    * DiscoPOP - A framework where language models discover their own optimization algorithms

    * EvoLLM - Using language models as evolution strategies for optimization

    The AI Scientist - A fully automated system that conducts scientific research end-to-end

    * Neural Attention Memory Models (NAMMs) - Evolved memory systems that make transformers both faster and more accurate

    TRANSCRIPT + REFS:

    https://www.dropbox.com/scl/fi/gflcyvnujp8cl7zlv3v9d/Sakana.pdf?rlkey=woaoo82943170jd4yyi2he71c&dl=0

    Robert Tjarko Lange

    https://roberttlange.com/

    Chris Lu

    https://chrislu.page/

    Cong Lu

    https://www.conglu.co.uk/

    Sakana

    https://sakana.ai/blog/

    TOC:

    1. LLMs for Algorithm Generation and Optimization

    [00:00:00] 1.1 LLMs generating algorithms for training other LLMs

    [00:04:00] 1.2 Evolutionary black-box optim using neural network loss parameterization

    [00:11:50] 1.3 DiscoPOP: Non-convex loss function for noisy data

    [00:20:45] 1.4 External entropy Injection for preventing Model collapse

    [00:26:25] 1.5 LLMs for black-box optimization using abstract numerical sequences

    2. Model Learning and Generalization

    [00:31:05] 2.1 Fine-tuning on teacher algorithm trajectories

    [00:31:30] 2.2 Transformers learning gradient descent

    [00:33:00] 2.3 LLM tokenization biases towards specific numbers

    [00:34:50] 2.4 LLMs as evolution strategies for black box optimization

    [00:38:05] 2.5 DiscoPOP: LLMs discovering novel optimization algorithms

    3. AI Agents and System Architectures

    [00:51:30] 3.1 ARC challenge: Induction vs. transformer approaches

    [00:54:35] 3.2 LangChain / modular agent components

    [00:57:50] 3.3 Debate improves LLM truthfulness

    [01:00:55] 3.4 Time limits controlling AI agent systems

    [01:03:00] 3.5 Gemini: Million-token context enables flatter hierarchies

    [01:04:05] 3.6 Agents follow own interest gradients

    [01:09:50] 3.7 Go-Explore algorithm: archive-based exploration

    [01:11:05] 3.8 Foundation models for interesting state discovery

    [01:13:00] 3.9 LLMs leverage prior game knowledge

    4. AI for Scientific Discovery and Human Alignment

    [01:17:45] 4.1 Encoding Alignment & Aesthetics via Reward Functions

    [01:20:00] 4.2 AI Scientist: Automated Open-Ended Scientific Discovery

    [01:24:15] 4.3 DiscoPOP: LLM for Preference Optimization Algorithms

    [01:28:30] 4.4 Balancing AI Knowledge with Human Understanding

    [01:33:55] 4.5 AI-Driven Conferences and Paper Review

  • Clement Bonnet discusses his novel approach to the ARC (Abstraction and Reasoning Corpus) challenge. Unlike approaches that rely on fine-tuning LLMs or generating samples at inference time, Clement's method encodes input-output pairs into a latent space, optimizes this representation with a search algorithm, and decodes outputs for new inputs. This end-to-end architecture uses a VAE loss, including reconstruction and prior losses.

    SPONSOR MESSAGES:

    ***

    CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

    https://centml.ai/pricing/

    Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

    Goto https://tufalabs.ai/

    ***

    TRANSCRIPT + RESEARCH OVERVIEW:

    https://www.dropbox.com/scl/fi/j7m0gaz1126y594gswtma/CLEMMLST.pdf?rlkey=y5qvwq2er5nchbcibm07rcfpq&dl=0

    Clem and Matthew-

    https://www.linkedin.com/in/clement-bonnet16/

    https://github.com/clement-bonnet

    https://mvmacfarlane.github.io/

    TOC

    1. LPN Fundamentals

    [00:00:00] 1.1 Introduction to ARC Benchmark and LPN Overview

    [00:05:05] 1.2 Neural Networks' Challenges with ARC and Program Synthesis

    [00:06:55] 1.3 Induction vs Transduction in Machine Learning

    2. LPN Architecture and Latent Space

    [00:11:50] 2.1 LPN Architecture and Latent Space Implementation

    [00:16:25] 2.2 LPN Latent Space Encoding and VAE Architecture

    [00:20:25] 2.3 Gradient-Based Search Training Strategy

    [00:23:39] 2.4 LPN Model Architecture and Implementation Details

    3. Implementation and Scaling

    [00:27:34] 3.1 Training Data Generation and re-ARC Framework

    [00:31:28] 3.2 Limitations of Latent Space and Multi-Thread Search

    [00:34:43] 3.3 Program Composition and Computational Graph Architecture

    4. Advanced Concepts and Future Directions

    [00:45:09] 4.1 AI Creativity and Program Synthesis Approaches

    [00:49:47] 4.2 Scaling and Interpretability in Latent Space Models

    REFS

    [00:00:05] ARC benchmark, Chollet

    https://arxiv.org/abs/2412.04604

    [00:02:10] Latent Program Spaces, Bonnet, Macfarlane

    https://arxiv.org/abs/2411.08706

    [00:07:45] Kevin Ellis work on program generation

    https://www.cs.cornell.edu/~ellisk/

    [00:08:45] Induction vs transduction in abstract reasoning, Li et al.

    https://arxiv.org/abs/2411.02272

    [00:17:40] VAEs, Kingma, Welling

    https://arxiv.org/abs/1312.6114

    [00:27:50] re-ARC, Hodel

    https://github.com/michaelhodel/re-arc

    [00:29:40] Grid size in ARC tasks, Chollet

    https://github.com/fchollet/ARC-AGI

    [00:33:00] Critique of deep learning, Marcus

    https://arxiv.org/vc/arxiv/papers/2002/2002.06177v1.pdf

  • Prof. Jakob Foerster, a leading AI researcher at Oxford University and Meta, and Chris Lu, a researcher at OpenAI -- they explain how AI is moving beyond just mimicking human behaviour to creating truly intelligent agents that can learn and solve problems on their own. Foerster champions open-source AI for responsible, decentralised development. He addresses AI scaling, goal misalignment (Goodhart's Law), and the need for holistic alignment, offering a quick look at the future of AI and how to guide it.

    SPONSOR MESSAGES:

    ***

    CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

    https://centml.ai/pricing/

    Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

    Goto https://tufalabs.ai/

    ***

    TRANSCRIPT/REFS:

    https://www.dropbox.com/scl/fi/yqjszhntfr00bhjh6t565/JAKOB.pdf?rlkey=scvny4bnwj8th42fjv8zsfu2y&dl=0

    Prof. Jakob Foerster

    https://x.com/j_foerst

    https://www.jakobfoerster.com/

    University of Oxford Profile:

    https://eng.ox.ac.uk/people/jakob-foerster/

    Chris Lu:

    https://chrislu.page/

    TOC

    1. GPU Acceleration and Training Infrastructure

    [00:00:00] 1.1 ARC Challenge Criticism and FLAIR Lab Overview

    [00:01:25] 1.2 GPU Acceleration and Hardware Lottery in RL

    [00:05:50] 1.3 Data Wall Challenges and Simulation-Based Solutions

    [00:08:40] 1.4 JAX Implementation and Technical Acceleration

    2. Learning Frameworks and Policy Optimization

    [00:14:18] 2.1 Evolution of RL Algorithms and Mirror Learning Framework

    [00:15:25] 2.2 Meta-Learning and Policy Optimization Algorithms

    [00:21:47] 2.3 Language Models and Benchmark Challenges

    [00:28:15] 2.4 Creativity and Meta-Learning in AI Systems

    3. Multi-Agent Systems and Decentralization

    [00:31:24] 3.1 Multi-Agent Systems and Emergent Intelligence

    [00:38:35] 3.2 Swarm Intelligence vs Monolithic AGI Systems

    [00:42:44] 3.3 Democratic Control and Decentralization of AI Development

    [00:46:14] 3.4 Open Source AI and Alignment Challenges

    [00:49:31] 3.5 Collaborative Models for AI Development

    REFS

    [[00:00:05] ARC Benchmark, Chollet

    https://github.com/fchollet/ARC-AGI

    [00:03:05] DRL Doesn't Work, Irpan

    https://www.alexirpan.com/2018/02/14/rl-hard.html

    [00:05:55] AI Training Data, Data Provenance Initiative

    https://www.nytimes.com/2024/07/19/technology/ai-data-restrictions.html

    [00:06:10] JaxMARL, Foerster et al.

    https://arxiv.org/html/2311.10090v5

    [00:08:50] M-FOS, Lu et al.

    https://arxiv.org/abs/2205.01447

    [00:09:45] JAX Library, Google Research

    https://github.com/jax-ml/jax

    [00:12:10] Kinetix, Mike and Michael

    https://arxiv.org/abs/2410.23208

    [00:12:45] Genie 2, DeepMind

    https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/

    [00:14:42] Mirror Learning, Grudzien, Kuba et al.

    https://arxiv.org/abs/2208.01682

    [00:16:30] Discovered Policy Optimisation, Lu et al.

    https://arxiv.org/abs/2210.05639

    [00:24:10] Goodhart's Law, Goodhart

    https://en.wikipedia.org/wiki/Goodhart%27s_law

    [00:25:15] LLM ARChitect, Franzen et al.

    https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf

    [00:28:55] AlphaGo, Silver et al.

    https://arxiv.org/pdf/1712.01815.pdf

    [00:30:10] Meta-learning, Lu, Towers, Foerster

    https://direct.mit.edu/isal/proceedings-pdf/isal2023/35/67/2354943/isal_a_00674.pdf

    [00:31:30] Emergence of Pragmatics, Yuan et al.

    https://arxiv.org/abs/2001.07752

    [00:34:30] AI Safety, Amodei et al.

    https://arxiv.org/abs/1606.06565

    [00:35:45] Intentional Stance, Dennett

    https://plato.stanford.edu/entries/ethics-ai/

    [00:39:25] Multi-Agent RL, Zhou et al.

    https://arxiv.org/pdf/2305.10091

    [00:41:00] Open Source Generative AI, Foerster et al.

    https://arxiv.org/abs/2405.08597

    <trunc, see PDF/YT>

  • Daniel Franzen and Jan Disselhoff, the "ARChitects" are the official winners of the ARC Prize 2024. Filmed at Tufa Labs in Zurich - they revealed how they achieved a remarkable 53.5% accuracy by creatively utilising large language models (LLMs) in new ways. Discover their innovative techniques, including depth-first search for token selection, test-time training, and a novel augmentation-based validation system. Their results were extremely surprising.

    SPONSOR MESSAGES:

    ***

    CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

    https://centml.ai/pricing/

    Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

    Goto https://tufalabs.ai/

    ***

    Jan Disselhoff

    https://www.linkedin.com/in/jan-disselhoff-1423a2240/

    Daniel Franzen

    https://github.com/da-fr

    ARC Prize: http://arcprize.org/

    TRANSCRIPT AND BACKGROUND READING:

    https://www.dropbox.com/scl/fi/utkn2i1ma79fn6an4yvjw/ARCHitects.pdf?rlkey=67pe38mtss7oyhjk2ad0d2aza&dl=0

    TOC

    1. Solution Architecture and Strategy Overview

    [00:00:00] 1.1 Initial Solution Overview and Model Architecture

    [00:04:25] 1.2 LLM Capabilities and Dataset Approach

    [00:10:51] 1.3 Test-Time Training and Data Augmentation Strategies

    [00:14:08] 1.4 Sampling Methods and Search Implementation

    [00:17:52] 1.5 ARC vs Language Model Context Comparison

    2. LLM Search and Model Implementation

    [00:21:53] 2.1 LLM-Guided Search Approaches and Solution Validation

    [00:27:04] 2.2 Symmetry Augmentation and Model Architecture

    [00:30:11] 2.3 Model Intelligence Characteristics and Performance

    [00:37:23] 2.4 Tokenization and Numerical Processing Challenges

    3. Advanced Training and Optimization

    [00:45:15] 3.1 DFS Token Selection and Probability Thresholds

    [00:49:41] 3.2 Model Size and Fine-tuning Performance Trade-offs

    [00:53:07] 3.3 LoRA Implementation and Catastrophic Forgetting Prevention

    [00:56:10] 3.4 Training Infrastructure and Optimization Experiments

    [01:02:34] 3.5 Search Tree Analysis and Entropy Distribution Patterns

    REFS

    [00:01:05] Winning ARC 2024 solution using 12B param model, Franzen, Disselhoff, Hartmann

    https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf

    [00:03:40] Robustness of analogical reasoning in LLMs, Melanie Mitchell

    https://arxiv.org/html/2411.14215

    [00:07:50] Re-ARC dataset generator for ARC task variations, Michael Hodel

    https://github.com/michaelhodel/re-arc

    [00:15:00] Analysis of search methods in LLMs (greedy, beam, DFS), Chen et al.

    https://arxiv.org/html/2408.00724v2

    [00:16:55] Language model reachability space exploration, University of Toronto

    https://www.youtube.com/watch?v=Bpgloy1dDn0

    [00:22:30] GPT-4 guided code solutions for ARC tasks, Ryan Greenblatt

    https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt

    [00:41:20] GPT tokenization approach for numbers, OpenAI

    https://platform.openai.com/docs/guides/text-generation/tokenizer-examples

    [00:46:25] DFS in AI search strategies, Russell & Norvig

    https://www.amazon.com/Artificial-Intelligence-Modern-Approach-4th/dp/0134610997

    [00:53:10] Paper on catastrophic forgetting in neural networks, Kirkpatrick et al.

    https://www.pnas.org/doi/10.1073/pnas.1611835114

    [00:54:00] LoRA for efficient fine-tuning of LLMs, Hu et al.

    https://arxiv.org/abs/2106.09685

    [00:57:20] NVIDIA H100 Tensor Core GPU specs, NVIDIA

    https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/

    [01:04:55] Original MCTS in computer Go, Yifan Jin

    https://stanford.edu/~rezab/classes/cme323/S15/projects/montecarlo_search_tree_report.pdf

  • Sepp Hochreiter, the inventor of LSTM (Long Short-Term Memory) networks – a foundational technology in AI. Sepp discusses his journey, the origins of LSTM, and why he believes his latest work, XLSTM, could be the next big thing in AI, particularly for applications like robotics and industrial simulation. He also shares his controversial perspective on Large Language Models (LLMs) and why reasoning is a critical missing piece in current AI systems.

    SPONSOR MESSAGES:

    ***

    CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

    https://centml.ai/pricing/

    Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

    Goto https://tufalabs.ai/

    ***

    TRANSCRIPT AND BACKGROUND READING:

    https://www.dropbox.com/scl/fi/n1vzm79t3uuss8xyinxzo/SEPPH.pdf?rlkey=fp7gwaopjk17uyvgjxekxrh5v&dl=0

    Prof. Sepp Hochreiter

    https://www.nx-ai.com/

    https://x.com/hochreitersepp

    https://scholar.google.at/citations?user=tvUH3WMAAAAJ&hl=en

    TOC:

    1. LLM Evolution and Reasoning Capabilities

    [00:00:00] 1.1 LLM Capabilities and Limitations Debate

    [00:03:16] 1.2 Program Generation and Reasoning in AI Systems

    [00:06:30] 1.3 Human vs AI Reasoning Comparison

    [00:09:59] 1.4 New Research Initiatives and Hybrid Approaches

    2. LSTM Technical Architecture

    [00:13:18] 2.1 LSTM Development History and Technical Background

    [00:20:38] 2.2 LSTM vs RNN Architecture and Computational Complexity

    [00:25:10] 2.3 xLSTM Architecture and Flash Attention Comparison

    [00:30:51] 2.4 Evolution of Gating Mechanisms from Sigmoid to Exponential

    3. Industrial Applications and Neuro-Symbolic AI

    [00:40:35] 3.1 Industrial Applications and Fixed Memory Advantages

    [00:42:31] 3.2 Neuro-Symbolic Integration and Pi AI Project

    [00:46:00] 3.3 Integration of Symbolic and Neural AI Approaches

    [00:51:29] 3.4 Evolution of AI Paradigms and System Thinking

    [00:54:55] 3.5 AI Reasoning and Human Intelligence Comparison

    [00:58:12] 3.6 NXAI Company and Industrial AI Applications

    REFS:

    [00:00:15] Seminal LSTM paper establishing Hochreiter's expertise (Hochreiter & Schmidhuber)

    https://direct.mit.edu/neco/article-abstract/9/8/1735/6109/Long-Short-Term-Memory

    [00:04:20] Kolmogorov complexity and program composition limitations (Kolmogorov)

    https://link.springer.com/article/10.1007/BF02478259

    [00:07:10] Limitations of LLM mathematical reasoning and symbolic integration (Various Authors)

    https://www.arxiv.org/pdf/2502.03671

    [00:09:05] AlphaGo’s Move 37 demonstrating creative AI (Google DeepMind)

    https://deepmind.google/research/breakthroughs/alphago/

    [00:10:15] New AI research lab in Zurich for fundamental LLM research (Benjamin Crouzier)

    https://tufalabs.ai

    [00:19:40] Introduction of xLSTM with exponential gating (Beck, Hochreiter, et al.)

    https://arxiv.org/abs/2405.04517

    [00:22:55] FlashAttention: fast & memory-efficient attention (Tri Dao et al.)

    https://arxiv.org/abs/2205.14135

    [00:31:00] Historical use of sigmoid/tanh activation in 1990s (James A. McCaffrey)

    https://visualstudiomagazine.com/articles/2015/06/01/alternative-activation-functions.aspx

    [00:36:10] Mamba 2 state space model architecture (Albert Gu et al.)

    https://arxiv.org/abs/2312.00752

    [00:46:00] Austria’s Pi AI project integrating symbolic & neural AI (Hochreiter et al.)

    https://www.jku.at/en/institute-of-machine-learning/research/projects/

    [00:48:10] Neuro-symbolic integration challenges in language models (Diego Calanzone et al.)

    https://openreview.net/forum?id=7PGluppo4k

    [00:49:30] JKU Linz’s historical and neuro-symbolic research (Sepp Hochreiter)

    https://www.jku.at/en/news-events/news/detail/news/bilaterale-ki-projekt-unter-leitung-der-jku-erhaelt-fwf-cluster-of-excellence/

    YT: https://www.youtube.com/watch?v=8u2pW2zZLCs

    <truncated, see show notes/YT>

  • Professor Randall Balestriero joins us to discuss neural network geometry, spline theory, and emerging phenomena in deep learning, based on research presented at ICML. Topics include the delayed emergence of adversarial robustness in neural networks ("grokking"), geometric interpretations of neural networks via spline theory, and challenges in reconstruction learning. We also cover geometric analysis of Large Language Models (LLMs) for toxicity detection and the relationship between intrinsic dimensionality and model control in RLHF.

    SPONSOR MESSAGES:

    ***

    CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

    https://centml.ai/pricing/

    Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

    Goto https://tufalabs.ai/

    ***

    Randall Balestriero

    https://x.com/randall_balestr

    https://randallbalestriero.github.io/

    Show notes and transcript: https://www.dropbox.com/scl/fi/3lufge4upq5gy0ug75j4a/RANDALLSHOW.pdf?rlkey=nbemgpa0jhawt1e86rx7372e4&dl=0

    TOC:

    - Introduction

    - 00:00:00: Introduction

    - Neural Network Geometry and Spline Theory

    - 00:01:41: Neural Network Geometry and Spline Theory

    - 00:07:41: Deep Networks Always Grok

    - 00:11:39: Grokking and Adversarial Robustness

    - 00:16:09: Double Descent and Catastrophic Forgetting

    - Reconstruction Learning

    - 00:18:49: Reconstruction Learning

    - 00:24:15: Frequency Bias in Neural Networks

    - Geometric Analysis of Neural Networks

    - 00:29:02: Geometric Analysis of Neural Networks

    - 00:34:41: Adversarial Examples and Region Concentration

    - LLM Safety and Geometric Analysis

    - 00:40:05: LLM Safety and Geometric Analysis

    - 00:46:11: Toxicity Detection in LLMs

    - 00:52:24: Intrinsic Dimensionality and Model Control

    - 00:58:07: RLHF and High-Dimensional Spaces

    - Conclusion

    - 01:02:13: Neural Tangent Kernel

    - 01:08:07: Conclusion

    REFS:

    [00:01:35] Humayun – Deep network geometry & input space partitioning

    https://arxiv.org/html/2408.04809v1

    [00:03:55] Balestriero & Paris – Linking deep networks to adaptive spline operators

    https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf

    [00:13:55] Song et al. – Gradient-based white-box adversarial attacks

    https://arxiv.org/abs/2012.14965

    [00:16:05] Humayun, Balestriero & Baraniuk – Grokking phenomenon & emergent robustness

    https://arxiv.org/abs/2402.15555

    [00:18:25] Humayun – Training dynamics & double descent via linear region evolution

    https://arxiv.org/abs/2310.12977

    [00:20:15] Balestriero – Power diagram partitions in DNN decision boundaries

    https://arxiv.org/abs/1905.08443

    [00:23:00] Frankle & Carbin – Lottery Ticket Hypothesis for network pruning

    https://arxiv.org/abs/1803.03635

    [00:24:00] Belkin et al. – Double descent phenomenon in modern ML

    https://arxiv.org/abs/1812.11118

    [00:25:55] Balestriero et al. – Batch normalization’s regularization effects

    https://arxiv.org/pdf/2209.14778

    [00:29:35] EU – EU AI Act 2024 with compute restrictions

    https://www.lw.com/admin/upload/SiteAttachments/EU-AI-Act-Navigating-a-Brave-New-World.pdf

    [00:39:30] Humayun, Balestriero & Baraniuk – SplineCam: Visualizing deep network geometry

    https://openaccess.thecvf.com/content/CVPR2023/papers/Humayun_SplineCam_Exact_Visualization_and_Characterization_of_Deep_Network_Geometry_and_CVPR_2023_paper.pdf

    [00:40:40] Carlini – Trade-offs between adversarial robustness and accuracy

    https://arxiv.org/pdf/2407.20099

    [00:44:55] Balestriero & LeCun – Limitations of reconstruction-based learning methods

    https://openreview.net/forum?id=ez7w0Ss4g9

    (truncated, see shownotes PDF)

  • Nicholas Carlini from Google DeepMind offers his view of AI security, emergent LLM capabilities, and his groundbreaking model-stealing research. He reveals how LLMs can unexpectedly excel at tasks like chess and discusses the security pitfalls of LLM-generated code.

    SPONSOR MESSAGES:

    ***

    CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

    https://centml.ai/pricing/

    Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

    Goto https://tufalabs.ai/

    ***

    Transcript: https://www.dropbox.com/scl/fi/lat7sfyd4k3g5k9crjpbf/CARLINI.pdf?rlkey=b7kcqbvau17uw6rksbr8ccd8v&dl=0

    TOC:

    1. ML Security Fundamentals

    [00:00:00] 1.1 ML Model Reasoning and Security Fundamentals

    [00:03:04] 1.2 ML Security Vulnerabilities and System Design

    [00:08:22] 1.3 LLM Chess Capabilities and Emergent Behavior

    [00:13:20] 1.4 Model Training, RLHF, and Calibration Effects

    2. Model Evaluation and Research Methods

    [00:19:40] 2.1 Model Reasoning and Evaluation Metrics

    [00:24:37] 2.2 Security Research Philosophy and Methodology

    [00:27:50] 2.3 Security Disclosure Norms and Community Differences

    3. LLM Applications and Best Practices

    [00:44:29] 3.1 Practical LLM Applications and Productivity Gains

    [00:49:51] 3.2 Effective LLM Usage and Prompting Strategies

    [00:53:03] 3.3 Security Vulnerabilities in LLM-Generated Code

    4. Advanced LLM Research and Architecture

    [00:59:13] 4.1 LLM Code Generation Performance and O(1) Labs Experience

    [01:03:31] 4.2 Adaptation Patterns and Benchmarking Challenges

    [01:10:10] 4.3 Model Stealing Research and Production LLM Architecture Extraction

    REFS:

    [00:01:15] Nicholas Carlini’s personal website & research profile (Google DeepMind, ML security) - https://nicholas.carlini.com/

    [00:01:50] CentML AI compute platform for language model workloads - https://centml.ai/

    [00:04:30] Seminal paper on neural network robustness against adversarial examples (Carlini & Wagner, 2016) - https://arxiv.org/abs/1608.04644

    [00:05:20] Computer Fraud and Abuse Act (CFAA) – primary U.S. federal law on computer hacking liability - https://www.justice.gov/jm/jm-9-48000-computer-fraud

    [00:08:30] Blog post: Emergent chess capabilities in GPT-3.5-turbo-instruct (Nicholas Carlini, Sept 2023) - https://nicholas.carlini.com/writing/2023/chess-llm.html

    [00:16:10] Paper: “Self-Play Preference Optimization for Language Model Alignment” (Yue Wu et al., 2024) - https://arxiv.org/abs/2405.00675

    [00:18:00] GPT-4 Technical Report: development, capabilities, and calibration analysis - https://arxiv.org/abs/2303.08774

    [00:22:40] Historical shift from descriptive to algebraic chess notation (FIDE) - https://en.wikipedia.org/wiki/Descriptive_notation

    [00:23:55] Analysis of distribution shift in ML (Hendrycks et al.) - https://arxiv.org/abs/2006.16241

    [00:27:40] Nicholas Carlini’s essay “Why I Attack” (June 2024) – motivations for security research - https://nicholas.carlini.com/writing/2024/why-i-attack.html

    [00:34:05] Google Project Zero’s 90-day vulnerability disclosure policy - https://googleprojectzero.blogspot.com/p/vulnerability-disclosure-policy.html

    [00:51:15] Evolution of Google search syntax & user behavior (Daniel M. Russell) - https://www.amazon.com/Joy-Search-Google-Master-Information/dp/0262042878

    [01:04:05] Rust’s ownership & borrowing system for memory safety - https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html

    [01:10:05] Paper: “Stealing Part of a Production Language Model” (Carlini et al., March 2024) – extraction attacks on ChatGPT, PaLM-2 - https://arxiv.org/abs/2403.06634

    [01:10:55] First model stealing paper (Tramèr et al., 2016) – attacking ML APIs via prediction - https://arxiv.org/abs/1609.02943

  • Join Prof. Subbarao Kambhampati and host Tim Scarfe for a deep dive into OpenAI's O1 model and the future of AI reasoning systems.

    * How O1 likely uses reinforcement learning similar to AlphaGo, with hidden reasoning tokens that users pay for but never see

    * The evolution from traditional Large Language Models to more sophisticated reasoning systems

    * The concept of "fractal intelligence" in AI - where models work brilliantly sometimes but fail unpredictably

    * Why O1's improved performance comes with substantial computational costs

    * The ongoing debate between single-model approaches (OpenAI) vs hybrid systems (Google)

    * The critical distinction between AI as an intelligence amplifier vs autonomous decision-maker

    SPONSOR MESSAGES:

    ***

    CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

    https://centml.ai/pricing/

    Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

    Goto https://tufalabs.ai/

    ***

    TOC:

    1. **O1 Architecture and Reasoning Foundations**

    [00:00:00] 1.1 Fractal Intelligence and Reasoning Model Limitations

    [00:04:28] 1.2 LLM Evolution: From Simple Prompting to Advanced Reasoning

    [00:14:28] 1.3 O1's Architecture and AlphaGo-like Reasoning Approach

    [00:23:18] 1.4 Empirical Evaluation of O1's Planning Capabilities

    2. **Monte Carlo Methods and Model Deep-Dive**

    [00:29:30] 2.1 Monte Carlo Methods and MARCO-O1 Implementation

    [00:31:30] 2.2 Reasoning vs. Retrieval in LLM Systems

    [00:40:40] 2.3 Fractal Intelligence Capabilities and Limitations

    [00:45:59] 2.4 Mechanistic Interpretability of Model Behavior

    [00:51:41] 2.5 O1 Response Patterns and Performance Analysis

    3. **System Design and Real-World Applications**

    [00:59:30] 3.1 Evolution from LLMs to Language Reasoning Models

    [01:06:48] 3.2 Cost-Efficiency Analysis: LLMs vs O1

    [01:11:28] 3.3 Autonomous vs Human-in-the-Loop Systems

    [01:16:01] 3.4 Program Generation and Fine-Tuning Approaches

    [01:26:08] 3.5 Hybrid Architecture Implementation Strategies

    Transcript: https://www.dropbox.com/scl/fi/d0ef4ovnfxi0lknirkvft/Subbarao.pdf?rlkey=l3rp29gs4hkut7he8u04mm1df&dl=0

    REFS:

    [00:02:00] Monty Python (1975)

    Witch trial scene: flawed logical reasoning.

    https://www.youtube.com/watch?v=zrzMhU_4m-g

    [00:04:00] Cade Metz (2024)

    Microsoft–OpenAI partnership evolution and control dynamics.

    https://www.nytimes.com/2024/10/17/technology/microsoft-openai-partnership-deal.html

    [00:07:25] Kojima et al. (2022)

    Zero-shot chain-of-thought prompting ('Let's think step by step').

    https://arxiv.org/pdf/2205.11916

    [00:12:50] DeepMind Research Team (2023)

    Multi-bot game solving with external and internal planning.

    https://deepmind.google/research/publications/139455/

    [00:15:10] Silver et al. (2016)

    AlphaGo's Monte Carlo Tree Search and Q-learning.

    https://www.nature.com/articles/nature16961

    [00:16:30] Kambhampati, S. et al. (2023)

    Evaluates O1's planning in "Strawberry Fields" benchmarks.

    https://arxiv.org/pdf/2410.02162

    [00:29:30] Alibaba AIDC-AI Team (2023)

    MARCO-O1: Chain-of-Thought + MCTS for improved reasoning.

    https://arxiv.org/html/2411.14405

    [00:31:30] Kambhampati, S. (2024)

    Explores LLM "reasoning vs retrieval" debate.

    https://arxiv.org/html/2403.04121v2

    [00:37:35] Wei, J. et al. (2022)

    Chain-of-thought prompting (introduces last-letter concatenation).

    https://arxiv.org/pdf/2201.11903

    [00:42:35] Barbero, F. et al. (2024)

    Transformer attention and "information over-squashing."

    https://arxiv.org/html/2406.04267v2

    [00:46:05] Ruis, L. et al. (2023)

    Influence functions to understand procedural knowledge in LLMs.

    https://arxiv.org/html/2411.12580v1

    (truncated - continued in shownotes/transcript doc)

  • Laura Ruis, a PhD student at University College London and researcher at Cohere, explains her groundbreaking research into how large language models (LLMs) perform reasoning tasks, the fundamental mechanisms underlying LLM reasoning capabilities, and whether these models primarily rely on retrieval or develop procedural knowledge.

    SPONSOR MESSAGES:

    ***

    CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

    https://centml.ai/pricing/

    Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

    Goto https://tufalabs.ai/

    ***

    TOC

    1. LLM Foundations and Learning

    1.1 Scale and Learning in Language Models [00:00:00]

    1.2 Procedural Knowledge vs Fact Retrieval [00:03:40]

    1.3 Influence Functions and Model Analysis [00:07:40]

    1.4 Role of Code in LLM Reasoning [00:11:10]

    1.5 Semantic Understanding and Physical Grounding [00:19:30]

    2. Reasoning Architectures and Measurement

    2.1 Measuring Understanding and Reasoning in Language Models [00:23:10]

    2.2 Formal vs Approximate Reasoning and Model Creativity [00:26:40]

    2.3 Symbolic vs Subsymbolic Computation Debate [00:34:10]

    2.4 Neural Network Architectures and Tensor Product Representations [00:40:50]

    3. AI Agency and Risk Assessment

    3.1 Agency and Goal-Directed Behavior in Language Models [00:45:10]

    3.2 Defining and Measuring Agency in AI Systems [00:49:50]

    3.3 Core Knowledge Systems and Agency Detection [00:54:40]

    3.4 Language Models as Agent Models and Simulator Theory [01:03:20]

    3.5 AI Safety and Societal Control Mechanisms [01:07:10]

    3.6 Evolution of AI Capabilities and Emergent Risks [01:14:20]

    REFS:

    [00:01:10] Procedural Knowledge in Pretraining & LLM Reasoning

    Ruis et al., 2024

    https://arxiv.org/abs/2411.12580

    [00:03:50] EK-FAC Influence Functions in Large LMs

    Grosse et al., 2023

    https://arxiv.org/abs/2308.03296

    [00:13:05] Surfaces and Essences: Analogy as the Core of Cognition

    Hofstadter & Sander

    https://www.amazon.com/Surfaces-Essences-Analogy-Fuel-Thinking/dp/0465018475

    [00:13:45] Wittgenstein on Language Games

    https://plato.stanford.edu/entries/wittgenstein/

    [00:14:30] Montague Semantics for Natural Language

    https://plato.stanford.edu/entries/montague-semantics/

    [00:19:35] The Chinese Room Argument

    David Cole

    https://plato.stanford.edu/entries/chinese-room/

    [00:19:55] ARC: Abstraction and Reasoning Corpus

    François Chollet

    https://arxiv.org/abs/1911.01547

    [00:24:20] Systematic Generalization in Neural Nets

    Lake & Baroni, 2023

    https://www.nature.com/articles/s41586-023-06668-3

    [00:27:40] Open-Endedness & Creativity in AI

    Tim Rocktäschel

    https://arxiv.org/html/2406.04268v1

    [00:30:50] Fodor & Pylyshyn on Connectionism

    https://www.sciencedirect.com/science/article/abs/pii/0010027788900315

    [00:31:30] Tensor Product Representations

    Smolensky, 1990

    https://www.sciencedirect.com/science/article/abs/pii/000437029090007M

    [00:35:50] DreamCoder: Wake-Sleep Program Synthesis

    Kevin Ellis et al.

    https://courses.cs.washington.edu/courses/cse599j1/22sp/papers/dreamcoder.pdf

    [00:36:30] Compositional Generalization Benchmarks

    Ruis, Lake et al., 2022

    https://arxiv.org/pdf/2202.10745

    [00:40:30] RNNs & Tensor Products

    McCoy et al., 2018

    https://arxiv.org/abs/1812.08718

    [00:46:10] Formal Causal Definition of Agency

    Kenton et al.

    https://arxiv.org/pdf/2208.08345v2

    [00:48:40] Agency in Language Models

    Sumers et al.

    https://arxiv.org/abs/2309.02427

    [00:55:20] Heider & Simmel’s Moving Shapes Experiment

    https://www.nature.com/articles/s41598-024-65532-0

    [01:00:40] Language Models as Agent Models

    Jacob Andreas, 2022

    https://arxiv.org/abs/2212.01681

    [01:13:35] Pragmatic Understanding in LLMs

    Ruis et al.

    https://arxiv.org/abs/2210.14986

  • Jürgen Schmidhuber, the father of generative AI, challenges current AI narratives, revealing that early deep learning work is in his opinion misattributed, where it actually originated in Ukraine and Japan. He discusses his early work on linear transformers and artificial curiosity which preceded modern developments, shares his expansive vision of AI colonising space, and explains his groundbreaking 1991 consciousness model. Schmidhuber dismisses fears of human-AI conflict, arguing that superintelligent AI scientists will be fascinated by their own origins and motivated to protect life rather than harm it, while being more interested in other superintelligent AI and in cosmic expansion than earthly matters. He offers unique insights into how humans and AI might coexist.This was the long-awaited second, unreleased part of our interview we filmed last time. SPONSOR MESSAGES:***CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. https://centml.ai/pricing/Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events? Goto https://tufalabs.ai/***Interviewer: Tim ScarfeTOC[00:00:00] The Nature and Motivations of AI [00:02:08] Influential Inventions: 20th vs. 21st Century [00:05:28] Transformer and GPT: A Reflection The revolutionary impact of modern language models, the 1991 linear transformer, linear vs. quadratic scaling, the fast weight controller, and fast weight matrix memory.[00:11:03] Pioneering Contributions to AI and Deep Learning The invention of the transformer, pre-trained networks, the first GANs, the role of predictive coding, and the emergence of artificial curiosity.[00:13:58] AI's Evolution and Achievements The role of compute, breakthroughs in handwriting recognition and computer vision, the rise of GPU-based CNNs, achieving superhuman results, and Japanese contributions to CNN development.[00:15:40] The Hardware Lottery and GPUs GPUs as a serendipitous advantage for AI, the gaming-AI parallel, and Nvidia's strategic shift towards AI.[00:19:58] AI Applications and Societal Impact AI-powered translation breaking communication barriers, AI in medicine for imaging and disease prediction, and AI's potential for human enhancement and sustainable development.[00:23:26] The Path to AGI and Current Limitations Distinguishing large language models from AGI, challenges in replacing physical world workers, and AI's difficulty in real-world versus board games.[00:25:56] AI and Consciousness Simulating consciousness through unsupervised learning, chunking and automatizing neural networks, data compression, and self-symbols in predictive world models.[00:30:50] The Future of AI and Humanity Transition from AGIs as tools to AGIs with their own goals, the role of humans in an AGI-dominated world, and the concept of Homo Ludens.[00:38:05] The AI Race: Europe, China, and the US Europe's historical contributions, current dominance of the US and East Asia, and the role of venture capital and industrial policy.[00:50:32] Addressing AI Existential Risk The obsession with AI existential risk, commercial pressure for friendly AIs, AI vs. hydrogen bombs, and the long-term future of AI.[00:58:00] The Fermi Paradox and Extraterrestrial Intelligence Expanding AI bubbles as an explanation for the Fermi paradox, dark matter and encrypted civilizations, and Earth as the first to spawn an AI bubble.[01:02:08] The Diversity of AI and AI Ecologies The unrealism of a monolithic super intelligence, diverse AIs with varying goals, and intense competition and collaboration in AI ecologies.[01:12:21] Final Thoughts and Closing Remarks REFERENCES:See pinned comment on YT: https://youtu.be/fZYUqICYCAk

  • Professor Yoshua Bengio is a pioneer in deep learning and Turing Award winner. Bengio talks about AI safety, why goal-seeking “agentic” AIs might be dangerous, and his vision for building powerful AI tools without giving them agency. Topics include reward tampering risks, instrumental convergence, global AI governance, and how non-agent AIs could revolutionize science and medicine while reducing existential threats. Perfect for anyone curious about advanced AI risks and how to manage them responsibly.

    SPONSOR MESSAGES:

    ***

    CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

    https://centml.ai/pricing/

    Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

    They are hosting an event in Zurich on January 9th with the ARChitects, join if you can.

    Goto https://tufalabs.ai/

    ***

    Interviewer: Tim Scarfe

    Yoshua Bengio:

    https://x.com/Yoshua_Bengio

    https://scholar.google.com/citations?user=kukA0LcAAAAJ&hl=en

    https://yoshuabengio.org/

    https://en.wikipedia.org/wiki/Yoshua_Bengio

    TOC:

    1. AI Safety Fundamentals

    [00:00:00] 1.1 AI Safety Risks and International Cooperation

    [00:03:20] 1.2 Fundamental Principles vs Scaling in AI Development

    [00:11:25] 1.3 System 1/2 Thinking and AI Reasoning Capabilities

    [00:15:15] 1.4 Reward Tampering and AI Agency Risks

    [00:25:17] 1.5 Alignment Challenges and Instrumental Convergence

    2. AI Architecture and Safety Design

    [00:33:10] 2.1 Instrumental Goals and AI Safety Fundamentals

    [00:35:02] 2.2 Separating Intelligence from Goals in AI Systems

    [00:40:40] 2.3 Non-Agent AI as Scientific Tools

    [00:44:25] 2.4 Oracle AI Systems and Mathematical Safety Frameworks

    3. Global Governance and Security

    [00:49:50] 3.1 International AI Competition and Hardware Governance

    [00:51:58] 3.2 Military and Security Implications of AI Development

    [00:56:07] 3.3 Personal Evolution of AI Safety Perspectives

    [01:00:25] 3.4 AI Development Scaling and Global Governance Challenges

    [01:12:10] 3.5 AI Regulation and Corporate Oversight

    4. Technical Innovations

    [01:23:00] 4.1 Evolution of Neural Architectures: From RNNs to Transformers

    [01:26:02] 4.2 GFlowNets and Symbolic Computation

    [01:30:47] 4.3 Neural Dynamics and Consciousness

    [01:34:38] 4.4 AI Creativity and Scientific Discovery

    SHOWNOTES (Transcript, references, best clips etc):

    https://www.dropbox.com/scl/fi/ajucigli8n90fbxv9h94x/BENGIO_SHOW.pdf?rlkey=38hi2m19sylnr8orb76b85wkw&dl=0

    CORE REFS (full list in shownotes and pinned comment):

    [00:00:15] Bengio et al.: "AI Risk" Statement

    https://www.safe.ai/work/statement-on-ai-risk

    [00:23:10] Bengio on reward tampering & AI safety (Harvard Data Science Review)

    https://hdsr.mitpress.mit.edu/pub/w974bwb0

    [00:40:45] Munk Debate on AI existential risk, featuring Bengio

    https://munkdebates.com/debates/artificial-intelligence

    [00:44:30] "Can a Bayesian Oracle Prevent Harm from an Agent?" (Bengio et al.) on oracle-to-agent safety

    https://arxiv.org/abs/2408.05284

    [00:51:20] Bengio (2024) memo on hardware-based AI governance verification

    https://yoshuabengio.org/wp-content/uploads/2024/08/FlexHEG-Memo_August-2024.pdf

    [01:12:55] Bengio’s involvement in EU AI Act code of practice

    https://digital-strategy.ec.europa.eu/en/news/meet-chairs-leading-development-first-general-purpose-ai-code-practice

    [01:27:05] Complexity-based compositionality theory (Elmoznino, Jiralerspong, Bengio, Lajoie)

    https://arxiv.org/abs/2410.14817

    [01:29:00] GFlowNet Foundations (Bengio et al.) for probabilistic inference

    https://arxiv.org/pdf/2111.09266

    [01:32:10] Discrete attractor states in neural systems (Nam, Elmoznino, Bengio, Lajoie)

    https://arxiv.org/pdf/2302.06403

  • François Chollet discusses the outcomes of the ARC-AGI (Abstraction and Reasoning Corpus) Prize competition in 2024, where accuracy rose from 33% to 55.5% on a private evaluation set.

    SPONSOR MESSAGES:

    ***

    CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

    https://centml.ai/pricing/

    Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

    They are hosting an event in Zurich on January 9th with the ARChitects, join if you can.

    Goto https://tufalabs.ai/

    ***

    Read about the recent result on o3 with ARC here (Chollet knew about it at the time of the interview but wasn't allowed to say):

    https://arcprize.org/blog/oai-o3-pub-breakthrough

    TOC:

    1. Introduction and Opening

    [00:00:00] 1.1 Deep Learning vs. Symbolic Reasoning: François’s Long-Standing Hybrid View

    [00:00:48] 1.2 “Why Do They Call You a Symbolist?” – Addressing Misconceptions

    [00:01:31] 1.3 Defining Reasoning

    3. ARC Competition 2024 Results and Evolution

    [00:07:26] 3.1 ARC Prize 2024: Reflecting on the Narrative Shift Toward System 2

    [00:10:29] 3.2 Comparing Private Leaderboard vs. Public Leaderboard Solutions

    [00:13:17] 3.3 Two Winning Approaches: Deep Learning–Guided Program Synthesis and Test-Time Training

    4. Transduction vs. Induction in ARC

    [00:16:04] 4.1 Test-Time Training, Overfitting Concerns, and Developer-Aware Generalization

    [00:19:35] 4.2 Gradient Descent Adaptation vs. Discrete Program Search

    5. ARC-2 Development and Future Directions

    [00:23:51] 5.1 Ensemble Methods, Benchmark Flaws, and the Need for ARC-2

    [00:25:35] 5.2 Human-Level Performance Metrics and Private Test Sets

    [00:29:44] 5.3 Task Diversity, Redundancy Issues, and Expanded Evaluation Methodology

    6. Program Synthesis Approaches

    [00:30:18] 6.1 Induction vs. Transduction

    [00:32:11] 6.2 Challenges of Writing Algorithms for Perceptual vs. Algorithmic Tasks

    [00:34:23] 6.3 Combining Induction and Transduction

    [00:37:05] 6.4 Multi-View Insight and Overfitting Regulation

    7. Latent Space and Graph-Based Synthesis

    [00:38:17] 7.1 Clément Bonnet’s Latent Program Search Approach

    [00:40:10] 7.2 Decoding to Symbolic Form and Local Discrete Search

    [00:41:15] 7.3 Graph of Operators vs. Token-by-Token Code Generation

    [00:45:50] 7.4 Iterative Program Graph Modifications and Reusable Functions

    8. Compute Efficiency and Lifelong Learning

    [00:48:05] 8.1 Symbolic Process for Architecture Generation

    [00:50:33] 8.2 Logarithmic Relationship of Compute and Accuracy

    [00:52:20] 8.3 Learning New Building Blocks for Future Tasks

    9. AI Reasoning and Future Development

    [00:53:15] 9.1 Consciousness as a Self-Consistency Mechanism in Iterative Reasoning

    [00:56:30] 9.2 Reconciling Symbolic and Connectionist Views

    [01:00:13] 9.3 System 2 Reasoning - Awareness and Consistency

    [01:03:05] 9.4 Novel Problem Solving, Abstraction, and Reusability

    10. Program Synthesis and Research Lab

    [01:05:53] 10.1 François Leaving Google to Focus on Program Synthesis

    [01:09:55] 10.2 Democratizing Programming and Natural Language Instruction

    11. Frontier Models and O1 Architecture

    [01:14:38] 11.1 Search-Based Chain of Thought vs. Standard Forward Pass

    [01:16:55] 11.2 o1’s Natural Language Program Generation and Test-Time Compute Scaling

    [01:19:35] 11.3 Logarithmic Gains with Deeper Search

    12. ARC Evaluation and Human Intelligence

    [01:22:55] 12.1 LLMs as Guessing Machines and Agent Reliability Issues

    [01:25:02] 12.2 ARC-2 Human Testing and Correlation with g-Factor

    [01:26:16] 12.3 Closing Remarks and Future Directions

    SHOWNOTES PDF:

    https://www.dropbox.com/scl/fi/ujaai0ewpdnsosc5mc30k/CholletNeurips.pdf?rlkey=s68dp432vefpj2z0dp5wmzqz6&st=hazphyx5&dl=0