Avsnitt
-
Google's AI Overviews are improving to provide accurate and helpful information.
Nvidia's new embedding model, NV-Embed-v1, ranks number one on the Massive Text Embedding Benchmark.
Matryoshka Query Transformer (MQT) offers flexibility to Large Vision-Language Models (LVLMs) by encoding an image into a variable number of visual tokens during inference.
Contextual Position Encoding (CoPE) improves the position encoding method in Large Language Models (LLMs) and solves tasks where popular position embeddings fail.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:35 AI Overviews: About last week
03:58 Nvidia Releases Embedding Model NV-Embed-v1
04:53 Multi-camera YOLOv5 on Zynq UltraScale+ with Hailo-8 AI Acceleration
06:31 Fake sponsor
08:28 Matryoshka Query Transformer for Large Vision-Language Models
10:24 Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
11:51 Contextual Position Encoding: Learning to Count What's Important
13:30 Outro
-
OpenAI announces new content and product partnerships with Vox Media and The Atlantic, making their reporting and stories more discoverable to millions of OpenAI users.
Mistral AI releases Codestral, a 22B parameter, open-weight model that specializes in coding tasks, beating out its code-focused rivals across top benchmarks.
MAP-Neo is the first fully open-sourced bilingual LLM that provides all the details needed to reproduce the model, improving transparency in large language models.
Self-Exploring Language Models (SELM) is a promising approach to improving the alignment of LLMs to human intentions through online feedback collection.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:39 A content and product partnership with The Atlantic
02:59 Mistral Releases Codestral, a Code-focused Model
04:34 How Dell Is Beating Supermicro
05:50 Fake sponsor
08:06 MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
09:44 Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
11:16 Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
13:18 Outro
-
Saknas det avsnitt?
-
OpenAI has formed a new safety team to address concerns about AI safety and ethics, led by CEO Sam Altman and board members Adam DโAngelo and Nicole Seligman.
Jan Leike, a leading AI researcher, has left OpenAI and joined Anthropic's Superalignment team, which is focused on AI safety and security.
The latest version of Sentence Transformers v3 has been released, allowing for finetuning of models for specific tasks like semantic search and paraphrase mining.
Exciting new research papers have been published, including MoEUT, a shared-layer Transformer design that outperforms standard Transformers on language modeling tasks, and EM Distillation, a new distillation method for diffusion models that efficiently distills them to one-step generator models without sacrificing perceptual quality.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:32 OpenAI has a new safety team โ itโs run by Sam Altman
03:18 Jan Leike (ex OpenAI) joins Anthropic's Superalignment Team
05:04 Sentence Transformers v3 Released
06:06 Fake sponsor
08:19 MoEUT: Mixture-of-Experts Universal Transformers
10:10 Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models
11:48 EM Distillation for One-step Diffusion Models
13:42 Outro
-
xAI, founded by Elon Musk, raises $6 billion in funding to accelerate the research and development of future technologies in the AI race.
Google's new 'AI Overviews' search feature causes uproar with bizarre and inaccurate responses, potentially eroding trust in Google's search results.
"Transformers Can Do Arithmetic with the Right Embeddings" proposes a solution to transformers' struggles with arithmetic tasks, achieving up to 99% accuracy on 100 digit addition problems.
"SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering" introduces SWE-agent, an autonomous system that uses a language model to interact with a computer to solve software engineering tasks, with potential to revolutionize the field.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:27 Elon Muskโs xAI raises $6 billion to fund its race against ChatGPT and all the rest
02:51 Googleโs A.I. Search Errors Cause a Furor Online
04:17 ir-measures Documentation
05:15 Fake sponsor
07:12 Transformers Can Do Arithmetic with the Right Embeddings
08:23 Matryoshka Multimodal Models
09:58 SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
11:47 Outro
-
OpenAI drama: Leaked documents and a resignation from a policy researcher.
DeepSeek-Prover: A new approach to formal theorem proving using synthetic data.
Dense Connector for MLLMs: A plug-and-play vision-language connector that enhances existing models.
Thermodynamic Natural Gradient Descent: A new algorithm for training neural networks using natural gradient descent.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:29 On OpenAI's Sky Voice
03:04 Successful language model evals
03:58 Generative Molecular Design Isn't As Easy As People Make It Look
05:21 Fake sponsor
07:30 DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
09:15 Dense Connector for MLLMs
10:43 Thermodynamic Natural Gradient Descent
12:37 Outro
-
Cohere's Aya model and dataset for multilingual AI in 101 languages through open science.
"Mapping the Mind of a Large Language Model" paper by Anthropic Blog, providing a detailed look inside a modern, production-grade model.
"ReVideo: Remake a Video with Motion and Content Control" paper introducing a new approach to video editing.
"Dense Connector for MLLMs" paper introducing the Dense Connector, a plug-and-play vision-language connector that significantly enhances existing MLLMs.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:32 Cohere Launches Aya
03:32 Mapping the Mind of a Large Language Model
05:05 The Batch Newsletter
06:08 Fake sponsor
07:38 ReVideo: Remake a Video with Motion and Content Control
09:14 Not All Language Model Features Are Linear
10:55 Dense Connector for MLLMs
12:30 Outro
-
Nvidia's Q1 revenue up 262% to $26.0B, beating estimates.
OpenAI's News Corp deal licenses content from WSJ, New York Post and more.
PyramidInfer compresses KV cache to save memory during inference for Large Language Models.
Your Transformer is Secretly Linear challenges our existing understanding of transformer architectures.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:55 Nvidia's Q1 revenue up 262% to $26.0B, beating estimates
03:23 OpenAIโs News Corp deal licenses content from WSJ, New York Post, and more
04:57 Systematically Improving Your RAG
06:18 Fake sponsor
08:17 PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference
09:49 Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
11:48 Your Transformer is Secretly Linear
13:26 Outro
-
Google's redesign of its search engine using AI to enhance the search experience.
Microsoft's introduction of Copilot+, the first AI PC, with innovative features like Recall and Cocreator.
Imp, a highly capable large multimodal model for mobile devices that can process and understand multiple types of data simultaneously.
Octo, an open-source generalist robot policy that can be finetuned to new observation and action spaces, potentially transforming robotic learning.
Contact: [email protected]
Timestamps:
00:34 Introduction
02:14 Google is redesigning its search engine โ and itโs AI all the way down
03:35 Microsoft Unveils First AI PC
05:18 Greg Brockman and Sam Altman Statement on OpenAI's Aligmnent Schism
06:53 Fake sponsor
08:55 Imp: Highly Capable Large Multimodal Models for Mobile Devices
10:37 Octo: An Open-Source Generalist Robot Policy
12:11 Towards Modular LLMs by Building and Reusing a Library of LoRAs
14:02 Outro
-
OpenAI's ChatGPT introduces new enhancements for data analysis, making it easier for beginners to perform in-depth analyses and saves experts time on routine data-cleaning tasks.
Sony Music warns AI companies against unauthorized use of their copyrighted material for the "training, development or commercialization of AI systems", highlighting concerns around the use of AI-generated voice clones.
Chameleon, a family of models that can understand and generate both images and text in any sequence, uses an early-fusion approach, resulting in better performance across a wide range of tasks.
MoRA proposes a new method for fine-tuning large language models, achieving high-rank updating while maintaining the same number of trainable parameters, which could have practical implications for improving the efficiency of large language models.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:58 Improvements to data analysis in ChatGPT
03:31 Sony Music warns AI companies against โunauthorized useโ of its content
05:28 Statement from Scarlett Johansson on the OpenAI situation
06:52 Fake sponsor
09:02 Chameleon: Mixed-Modal Early-Fusion Foundation Models
10:27 Layer-Condensed KV Cache for Efficient Inference of Large Language Models
12:00 MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
13:47 Outro
-
OpenAI team imploding due to a loss of faith in leadership and prioritization of safety over commercialization.
Reddit and OpenAI partnership to bring Reddit content to ChatGPT and introduce new AI-powered features to users.
The extensive process OpenAI went through to select the five distinct voices for ChatGPT's Voice Mode.
Papers discussing Layer-Condensed KV Cache for efficient inference of large language models, observational scaling laws and the predictability of language model performance, and Chameleon's mixed-modal early-fusion foundation models.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:27 Why the OpenAI team in charge of safeguarding humanity imploded
03:03 Reddit and OpenAI Build Partnership
04:33 How the voices for ChatGPT were chosen
05:54 Fake sponsor
07:22 Layer-Condensed KV Cache for Efficient Inference of Large Language Models
08:48 Observational Scaling Laws and the Predictability of Language Model Performance
10:35 Chameleon: Mixed-Modal Early-Fusion Foundation Models
12:20 Outro
-
Google I/O 2024 announcements, including new AI tools like Firebase Genkit, LearnLM, and Veo, as well as Gemini, an AI replacement for Google Assistant.
The introduction of the MS MARCO Web Search dataset, which provides a retrieval benchmark with three web retrieval challenge tasks and millions of real-clicked query-document pairs for training and evaluating retrieval models.
The "What matters when building vision-language models?" paper, which identifies critical decisions in the design of vision-language models and presents Idefics2, an efficient foundational VLM of 8 billion parameters that achieves state-of-the-art performance within its size category.
The "RLHF Workflow: From Reward Modeling to Online RLHF" paper, which presents a workflow for Online Iterative Reinforcement Learning from Human Feedback (RLHF) in an online setting and achieves impressive performance on LLM chatbot benchmarks and academic benchmarks.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:34 Google I/O 2024: Hereโs everything Google just announced
03:26 Ilya Sutskever leaves OpenAI
04:57 GPT-4oโs Memory Breakthrough!
06:00 Fake sponsor
07:49 MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
09:33 What matters when building vision-language models?
10:54 RLHF Workflow: From Reward Modeling to Online RLHF
13:00 Outro
-
OpenAI's new model, GPT-4o, can reason across audio, vision, and text in real-time, with safety measures built-in by design.
Apple and Google collaborate to deliver support for unwanted tracking alerts in iOS and Android, an industry first involving community and industry input.
LoRA Land, a web application that hosts 25 LoRA fine-tuned Mistral-7B LLMs on a single NVIDIA A100 GPU with 80GB memory, highlights the quality and cost-effectiveness of employing multiple specialized LLMs over a single, general-purpose LLM.
WildChat, a public dataset showcasing how chatbots like GPT-4 and ChatGPT are used by a population of users in practice, offers the most diverse user prompts, contains the largest number of languages, and presents the richest variety of potentially toxic use-cases for researchers to study.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:35 OpenAI Announces GPT-4 Omni
03:00 Apple and Google deliver support for unwanted tracking alerts in iOS and Android
05:03 Sam Altman on GPT-4 Omni
06:20 Fake sponsor
08:43 LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
10:17 Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
12:04 WildChat: 1M ChatGPT Interaction Logs in the Wild
14:04 Outro
-
OpenAI plans to challenge Google search with a new search feature for ChatGPT, which could have a significant impact on the AI industry.
SoundHound AI and Perplexity have partnered to improve the accuracy and complexity of voice assistants across cars, apps, and phone assistants.
"Fishing for Magikarp" addresses an issue with "glitch tokens" in large language models, while "CuMo" proposes a new approach to improving multimodal LLMs.
"Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?" highlights the risk in introducing new factual knowledge through fine-tuning and the importance of pre-training for large language models.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:38 OpenAI plans to announce Google search competitor today
03:02 SoundHound AI and Perplexity Partner to Bring Online LLMs to Next Gen Voice Assistants Across Cars and IoT Devices
04:48 Homoiconic Python
06:06 Fake sponsor
07:49 Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models
09:00 CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
10:34 Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
12:13 Outro
-
DeepMind's AlphaFold 3, the newest and most powerful version of their AI model that can predict the structure of proteins and other molecules with incredible accuracy, has been released for free for non-commercial use.
OpenAI has introduced the Model Spec, a document that specifies how they want their AI models to behave in their API and ChatGPT, to deepen the public conversation about how AI models should behave.
Microsoft Research's paper explores how players can interact with large language models (LLMs) to create emergent behaviors in game narratives, which could have big implications for game development and player engagement.
The University of California, Berkeley's paper proposes the Learnable Latent Codes as Bridges (LCB) method, which uses a learnable latent code as a bridge between LLMs and low-level policies, allowing for more flexible communication of goals in the task plan without being entirely constrained by language limitations.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:18 DeepMind Announces AlphaFold 3
02:37 OpenAI Introduces the Model Spec
04:22 ChatBotArena: The peoplesโ LLM evaluation, the future of evaluation, the incentives of evaluation, and gpt2chatbot
05:53 Fake sponsor
08:11 Player-Driven Emergence in LLM-Driven Game Narrative
09:25 vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
11:15 From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control
12:55 Outro
-
Apple's new M4 chip for the iPad Pro promises improved performance and AI capabilities.
OpenAI is developing a search feature for ChatGPT that could rival Google and Perplexity.
IBM's Granite Language Models are a promising tool for code generative tasks, with improved performance and trustworthy data usage.
xLSTM and vAttention are two new approaches to optimizing the performance of large language models, with potential applications in natural language processing, speech recognition, and video analysis.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:42 Apple introduces M4 chip
03:34 IBM Granite Language Models
04:40 OpenAI Is Readying a Search Product to Rival Google, Perplexity
05:47 Fake sponsor
07:36 xLSTM: Extended Long Short-Term Memory
09:17 vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
10:46 NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts
12:45 Outro
-
Stack Overflow and OpenAI partner to provide developers with accurate and vetted data for AI development.
Elon Musk plans to use AI to distill and present news on X, combining breaking news and social media reactions.
HuggingFace's Robotics Library, LeRobot, provides state-of-the-art machine learning models, datasets, and tools for real-world robotics.
Research papers on AI language retrieval explore improving multilingual information retrieval and the effects of downsizing large language models.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:33 Stack Overflow and OpenAI Partner to Strengthen the Worldโs Most Popular Large Language Models
03:21 Elon Musk's AI News Plans for X
05:15 LeRobot: HuggingFace's Robotics Library
06:27 Fake sponsor
08:22 Distillation for Multilingual Information Retrieval
10:03 The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning
11:35 In-Context Learning with Long-Context Models: An In-Depth Exploration
13:26 Outro
-
Apple is making strides in AI with their own model called Ajax and improvements to Siri, including making large language models faster and more efficient.
"In-Context Learning with Long-Context Models: An In-Depth Exploration" explores a training method for long-context models called in-context learning and its effectiveness.
"WildChat: 1M ChatGPT Interaction Logs in the Wild" offers a diverse dataset of user-chatbot interactions for researchers to study and fine-tune instruction-following models.
"I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust" investigates the impact of large language models on user reliance and trust, and the potential harm of overreliance. The study found that using natural language expressions of uncertainty can reduce overreliance on LLMs.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:34 Better Siri is coming: what Appleโs research says about its AI plans
03:19 Your guide to AI: May 2024
04:18 How LLMs Work, Explained Without Math
05:37 Fake sponsor
07:06 In-Context Learning with Long-Context Models: An In-Depth Exploration
08:42 WildChat: 1M ChatGPT Interaction Logs in the Wild
10:36 "I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust
12:55 Outro
-
Ukraine introduces an AI-generated digital spokesperson for their Ministry of Foreign Affairs, named 'Victoriya Shi', who will deliver pre-prepared official statements on behalf of the ministry.
Anthropic releases a mobile app version of their Claude AI models, including a new paid plan called Claude Team for group usage.
BAGEL is a new method for bootstrapping language model agents without human supervision, which quickly converts the initial distribution of trajectories towards those that are well-described by natural language.
CodeIt is a self-improvement method for language models that helps them improve their performance on complex reasoning tasks, achieving state-of-the-art performance and outperforming existing neural and symbolic baselines.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:25 Ukraine Unveils AI-generated Foreign Ministry Spokeswoman
03:01 Anthropic finally releases a Claude mobile app
04:51 Apple's Tiny LLMs, Amazon Rethinks Cashier-Free Stores, Predicting Scientific Discoveries
06:46 Fake sponsor
08:17 BAGEL: Bootstrapping Agents by Guiding Exploration with Language
09:41 A Careful Examination of Large Language Model Performance on Grade School Arithmetic
11:18 CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay
13:09 Outro
-
Amazon has launched Q, an AI-powered assistant for businesses and developers that offers advanced capabilities such as code generation, testing, debugging, reasoning, and agents for step-by-step planning.
Microsoft's $1 billion investment in OpenAI was triggered by fears of falling behind Google in the AI race. The investment has helped Microsoft catch up and be seen as more of a leader in AI, with OpenAI's models integrated into their products.
A new dataset for the Global Artificial Intelligence Championship Math 2024 has been created, consisting of 387 math problems curated by professional math problem writers from prestigious institutions.
Three AI research papers were discussed, including a new approach to evaluating large language models using a panel of diverse models, a method for real-time, controllable motion generation, and the use of ranked list truncation for large language model-based re-ranking.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:36 Amazon Q, a generative AI-powered assistant for businesses and developers
03:08 Microsoftโs OpenAI investment was triggered by Google fears, emails reveal
05:12 A Dataset for The Global Artificial Intelligence Championship Math 2024
06:21 Fake sponsor
08:25 Ranked List Truncation for Large Language Model-based Re-Ranking
10:04 Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
11:35 MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
13:16 Outro
-
Cohere Command R & R+ now available on Amazon for enterprise-grade workloads and multilingual support.
Big tech companies dominating AI lobbying efforts in Washington, potentially leading to weak regulations.
Multi-token prediction proposed as a new way of training large language models, resulting in higher sample efficiency and faster inference.
KANs, a new type of neural network with learnable activation functions on edges or weights, outperform MLPs in accuracy and interpretability, and can help scientists discover mathematical and physical laws.
Contact: [email protected]
Timestamps:
00:34 Introduction
01:54 Cohere Command R & R+ now available on Amazon
03:25 Thereโs an AI Lobbying Frenzy in Washington. Big Tech Is Dominating
05:22 THE 150X PGVECTOR SPEEDUP: A YEAR-IN-REVIEW
06:31 Fake sponsor
08:04 Better & Faster Large Language Models via Multi-token Prediction
09:55 KAN: Kolmogorov-Arnold Networks
11:51 Iterative Reasoning Preference Optimization
13:43 Outro
- Visa fler