High Impact in Databases with... Ryan Marcus

Avsnitt

Anastasiia Kozar | Fault Tolerance Placement in the Internet of Things | #61
16 dec 2024· Disseminate: The Computer Science Research Podcast
In this episode, we chat with Anastasiia Kozar about her research on fault tolerance in resource-constrained environments. As IoT applications leverage sensors, edge devices, and cloud infrastructure, ensuring system reliability at the edge poses unique challenges. Unlike the cloud, edge devices operate without persistent backups or high availability standards, leading to increased vulnerability to failures. Anastasiia explains how traditional methods fall short, as they fail to align resource allocation with fault tolerance needs, often resulting in system underperformance.

To address this, Anastasiia introduces a novel resource-aware approach that combines operator placement and fault tolerance into a unified process. By optimizing where and how data is backed up, her solution significantly improves system reliability, especially for low-end edge devices with limited resources. The result? Up to a tenfold increase in throughput compared to existing methods. Tune to learn more!

Links:
Fault Tolerance Placement in the Internet of Things [SIGMOD'24]The NebulaStream Platform: Data and Application Management for the Internet of Things [CIDR'20]nebula.stream
Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Liana Patel | ACORN: Performant and Predicate-Agnostic Hybrid Search | #60
11 nov 2024· Disseminate: The Computer Science Research Podcast
In this episode, we chat with with Liana Patel to discuss ACORN, a groundbreaking method for hybrid search in applications using mixed-modality data. As more systems require simultaneous access to embedded images, text, video, and structured data, traditional search methods struggle to maintain efficiency and flexibility. Liana explains how ACORN, leveraging Hierarchical Navigable Small Worlds (HNSW), enables efficient, predicate-agnostic searches by introducing innovative predicate subgraph traversal. This allows ACORN to outperform existing methods significantly, supporting complex query semantics and achieving 2–1,000 times higher throughput on diverse datasets. Tune in to learn more!

Links:
ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data [SIGMOD'24]Liana's LinkedInLiana's X
Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Saknas det avsnitt?

Klicka här för att uppdatera flödet manuellt.
High Impact in Databases with... David Maier
4 nov 2024· Disseminate: The Computer Science Research Podcast
In this High Impact episode we talk to David Maier.

David is the Maseeh Professor Emeritus of Emerging Technologies at Portland State University. Tune in to hear David's story and learn about some of his most impactful work.

The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.

You can find David on:
HomepageGoogle Scholar
Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Raunak Shah | R2D2: Reducing Redundancy and Duplication in Data Lakes | #59
28 okt 2024· Disseminate: The Computer Science Research Podcast
In this episode, Raunak Shah joins us to discuss the critical issue of data redundancy in enterprise data lakes, which can lead to soaring storage and maintenance costs. Raunak highlights how large-scale data environments, ranging from terabytes to petabytes, often contain duplicate and redundant datasets that are difficult to manage. He introduces the concept of "dataset containment" and explains its significance in identifying and reducing redundancy at the table level in these massive data lakes—an area where there has been little prior work.

Raunak then dives into the details of R2D2, a novel three-step hierarchical pipeline designed to efficiently tackle dataset containment. By utilizing schema containment graphs, statistical min-max pruning, and content-level pruning, R2D2 progressively reduces the search space to pinpoint redundant data. Raunak also discusses how the system, implemented on platforms like Azure Databricks and AWS, offers significant improvements over existing methods, processing TB-scale data lakes in just a few hours with high accuracy. He concludes with a discussion on how R2D2 optimally balances storage savings and performance by identifying datasets that can be deleted and reconstructed on demand, providing valuable insights for enterprises aiming to streamline their data management strategies.

Materials:
SIGMOD'24 Paper - R2D2: Reducing Redundancy and Duplication in Data LakesICDE'24 - Towards Optimizing Storage Costs in the Cloud
Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
High Impact in Databases with... Aditya Parameswaran
21 okt 2024· Disseminate: The Computer Science Research Podcast
In this High Impact episode we talk to Aditya Parameswaran about his some of his most impactful work.

Aditya is an Associate Professor at the University of California, Berkeley. Tune in to hear Aditya's story!

The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.

Links:
EPIC Data LabAnswering Queries using Humans, Algorithms and Databases (CIDR'11)Potter’s Wheel: An Interactive Data Cleaning System (VLDB'01)Online Aggregation (SIGMOD'97)Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases (INFOVIS'00)Coping with Rejection Ponder
You can find Aditya on:
TwitterLinkedInGoogle Scholar
Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Marco Costa | Taming Adversarial Queries with Optimal Range Filters | #58
14 okt 2024· Disseminate: The Computer Science Research Podcast
In this episode, we sit down with Marco Costa to discuss the fascinating world of range filters, focusing on how they help optimize queries in databases by determining whether a range intersects with a given set of keys. Marco explains how traditional range filters, like Bloom filters, often result in high false positives and slow query times, especially when dealing with adversarial inputs where queries are correlated with the keys. He walks us through the limitations of existing heuristic-based solutions and the common challenges they face in maintaining accuracy and speed under such conditions.

The highlight of our conversation is Grafite, a novel range filter introduced by Marco and his team. Unlike previous approaches, Grafite comes with clear theoretical guarantees and offers robust performance across various datasets, query sizes, and workloads. Marco dives into the technicalities, explaining how Grafite delivers faster query times and maintains predictable false positive rates, making it the most reliable range filter in scenarios where queries are correlated with keys. Additionally, he introduces a simple heuristic filter that excels in uncorrelated queries, pushing the boundaries of current solutions in the field.

SIGMOD' 24 Paper - Grafite: Taming Adversarial Queries with Optimal Range Filters

Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
High Impact in Databases with... Ali Dasdan
8 okt 2024· Disseminate: The Computer Science Research Podcast
In this High Impact episode we talk to Ali Dasdan, CTO at Zoominfo. Tune in to hear Ali's story and learn about some of his most impactful work such as his work on "Map-Reduce-Merge".

The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.

Materials mentioned on this episode:
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters (SIGMOD'07)The Art of Doing Science and Engineering: Learning to Learn, Richard HammingHow to Solve It, George PolyaSystems Architecting: Creating & Building Complex Systems, Eberhardt Rechtin
You can find Ali on:
TwitterLinkedIn
Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Matt Perron | Analytical Workload Cost and Performance Stability With Elastic Pools | #57
22 jul 2024· Disseminate: The Computer Science Research Podcast
In this episode, we dive deep into the complexities of managing analytical query workloads with our guest, Matt Perron. Matt explains how the rapid and unpredictable fluctuations in resource demands present a significant challenge for provisioning. Traditional methods often lead to either over-provisioning, resulting in excessive costs, or under-provisioning, which causes poor query latency during demand spikes. However, there's a promising solution on the horizon. Matt shares insights from recent research that showcases the viability of using cloud functions to dynamically match compute supply with workload demand without the need for prior resource provisioning. While effective for low query volumes, this approach becomes cost-prohibitive as query volumes increase, highlighting the need for a more balanced strategy.

Matt introduces us to a novel strategy that combines the best of both worlds: the rapid scalability of cloud functions and the cost-effectiveness of virtual machines. This innovative approach leverages the fast but expensive cloud functions alongside slow-starting yet inexpensive virtual machines to provide elasticity without sacrificing cost efficiency. He elaborates on how their implementation, called Cackle, achieves consistent performance and cost savings across a wide range of workloads and conditions. Tune in to learn how Cackle avoids the pitfalls of traditional approaches, delivering stable query performance and minimizing costs even as demand fluctuates wildly.

Links:
Cackle: Analytical Workload Cost and Performance Stability With Elastic Pools [SIGMOD'24]Matt's Homepage
Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
High Impact in Databases with... Andreas Kipf
15 jul 2024· Disseminate: The Computer Science Research Podcast
In this High Impact episode we talk to Andreas Kipf about his work on "Learned Cardinalities".

Andreas is the Professor of Data Systems at Technische Universität Nürnberg (UTN). Tune in to hear Andreas's story and learn about some of his most impactful work.

The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.

Papers mentioned on this episode:
Learned Cardinalities: Estimating Correlated Joins with Deep Learning CIDR'19The Case for Learned Index Structures SIGMOD'18Adaptive Optimization of Very Large Join Queries SIGMOD'18
You can find Andreas on:
TwitterLinkedIn Google ScholarData Systems Lab @ UTN
Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Marvin Wyrich & Justus Bogner | How Software Engineering Research Is Discussed on LinkedIn | #56
8 jul 2024· Disseminate: The Computer Science Research Podcast
In this episode, we delve into the intersection of software engineering (SE) research and professional practice with experts Marvin Wyrich and Justus Bogner. As LinkedIn stands as the largest professional network globally, it serves as a critical platform for bridging the gap between SE researchers and practitioners. Marvin and Justus explore the dynamics of how research findings are shared and discussed on LinkedIn, providing both quantitative and qualitative insights into the effectiveness of these interactions. They reveal that a significant portion of SE research posts on LinkedIn are authored by individuals outside the original research team and that a majority of comments on these posts come from industry professionals, highlighting a vibrant but underutilized avenue for science communication.

Our guests shed light on the current state of this metaphorical bridge, emphasizing the potential for LinkedIn to enhance collaboration and knowledge exchange between academia and industry. Despite the promising engagement from practitioners, the discussion reveals that only half of the SE research posts receive any comments, indicating room for improvement in fostering more interactive dialogues. Marvin and Justus offer practical advice for researchers to better engage with practitioners on LinkedIn and suggest strategies for making research dissemination more impactful. This episode provides valuable insights for anyone interested in leveraging social media for advancing software engineering knowledge and practice.
Links:ICSE'24 PaperMarvin's HomepageJustus's Homepage
Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
High Impact in Databases with... Joe Hellerstein
1 jul 2024· Disseminate: The Computer Science Research Podcast
In this High Impact episode we talk to Joe Hellerstein.

Joe is the Jim Gray Professor of Computer Science at UC Berkeley. Tune in to hear Joe's story and learn about some of his most impactful work.

The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.

Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Harry Goldstein | Property-Based Testing | #55
25 jun 2024· Disseminate: The Computer Science Research Podcast
In this episode, we chat with Harry Goldstein about Property-Based Testing (PBT). Harry shares insights from interviews with PBT users at Jane Street, highlighting PBT's strengths in testing complex code and boosting developer confidence. Harry also discusses the challenges of writing properties and generating random data, and the difficulties in assessing test effectiveness. He identifies key areas for future improvement, such as performance enhancements and better random input generation. This episode is essential for those interested in the latest developments in software testing and PBT's future.
Links:ICSE'24 Paper Harry's websiteX: @hgoldstein95
Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
High Impact in Databases with... Raghu Ramakrishnan
17 jun 2024· Disseminate: The Computer Science Research Podcast
In this High Impact episode we talk to Raghu Ramakrishnan.

Raghu is CTO for Data and a Technical Fellow at Microsoft. Tune in to hear Raghu's story and learn about some of his most impactful work.

The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.

Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Gina Yuan | In-Network Assistance With Sidekick Protocols | #54
10 jun 2024· Disseminate: The Computer Science Research Podcast
Join us as we chat with Gina Yuan about her pioneering work on sidekick protocols, designed to enhance the performance of encrypted transport protocols like QUIC and WebRTC. These protocols ensure privacy but limit in-network innovations. Gina explains how sidekick protocols allow intermediaries to assist endpoints without compromising encryption.

Discover how Gina tackles the challenge of referencing opaque packets with her innovative quACK tool and learn about the real-world benefits, including improved Wi-Fi retransmissions, energy-saving proxy acknowledgments, and the PACUBIC congestion-control mechanism. This episode offers a glimpse into the future of network performance and security.
Links:NSDI'2024 PaperGina's HomepageSidekick's Github Repo

Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
High Impact in Databases with... Moshe Vardi
3 jun 2024· Disseminate: The Computer Science Research Podcast
Welcome to another episode of the High Impact series - today we talk with Moshe Vardi!

Moshe is the Karen George Distinguished Service Professor in Computational Engineering at Rice University where his research focuses on automated reasoning. Tune in to hear Moshe's story and learn about some of his most impactful work.

The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.

You can find Moshe on X, LinkedIn, and Mastadon @vardi. Links to all his work can be found on his website here.

Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Tammy Sukprasert | Move Your Workloads To Sweden! | #53
27 maj 2024· Disseminate: The Computer Science Research Podcast
In this episode, we dip our toes into the world of sustainable computing and interview Tammy Sukprasert about her research on reducing carbon emissions in cloud computing through workload scheduling. Tammy explores the concept of shifting cloud workloads across different times and locations to coincide with low-carbon energy availability. Unlike previous studies that focused on specific regions or workloads, her comprehensive analysis uses carbon intensity data from 123 regions to assess both batch and interactive workloads. She considers various factors such as job duration, deadlines, and service level objectives (SLOs). Tammy's findings reveal that while spatiotemporal workload shifting can reduce carbon emissions, the practical upper bounds of these reductions are limited and far from ideal. Simple scheduling policies often achieve most of the potential reductions, with more complex techniques offering minimal additional benefits.

Additionally, Tammy's research highlights that as the energy grid becomes greener, the benefits of carbon-aware scheduling over carbon-agnostic approaches decrease. This discussion offers crucial insights for the future of cloud computing and sustainable technology. Whether you're a tech enthusiast, environmental advocate, or cloud industry professional, Tammy's work provides valuable perspectives on the intersection of technology and sustainability. Join us to learn more about how innovative scheduling strategies can contribute to a greener cloud computing landscape.
Links:Tammy's LinkedInOn the Limitations of Carbon-Aware Temporal and Spatial Workload Shifting in the Cloud EuroSys'24 Paper Carbon Savings Upper Bound Analysis
Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
High Impact in Databases with... Ryan Marcus
20 maj 2024· Disseminate: The Computer Science Research Podcast
Welcome to the first episode of the High Impact series!

The High Impact series is inspired by a blog post “Most Influential Database Papers" by Ryan Marcus and today we talk to Ryan! Tune in to hear about Ryan's story so far. We chat about his current work before moving on to discuss his most impactful work. We also dig into what motivates him and how he handles setbacks, as well as getting his take on the current trends.

The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.

Links:
Most influential database papersRyan's websiteRyan's twitter/XBao: Making Learned Query Optimization PracticalNeo: A Learned Query Optimizer
Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Yazhuo Zhang | SIEVE is Simpler than LRU | #52
13 maj 2024· Disseminate: The Computer Science Research Podcast
In this episode, we explore the world of caching with Yazhuo Zhang, who introduces the game-changing SIEVE algorithm. Traditional eviction algorithms have long struggled with a trade-off between efficiency, throughput, and simplicity. However, SIEVE disrupts this balance by offering a simpler alternative to LRU while outperforming state-of-the-art algorithms in both efficiency and scalability for web cache workloads. Implemented in five production cache libraries with minimal code changes, SIEVE's superiority shines through in a comprehensive evaluation across 1559 cache traces. With up to a remarkable 63.2% lower miss ratio than ARC and surpassing nine other algorithms in over 45% of cases, SIEVE's simplicity doesn't compromise on scalability, doubling throughput compared to optimized LRU implementations. Join us as Yazhuo reveals how SIEVE is set to redefine caching efficiency, promising faster and more streamlined data serving in production systems.
Links:SIEVE is Simpler than LRU: an Efficient Turn-Key Eviction Algorithm for Web Caches (NSDI'24)FIFO Queues are All You Need for Cache Eviction (SOSP'23)Yazhuo's homepageYazhuo's LinkedInYazhuo's Twitter/XCachemon/SIEVE's websiteS3FIFO website
Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Introducing the High Impact Series...
6 maj 2024· Disseminate: The Computer Science Research Podcast
Introducing the High Impact Series!

Hey folks, we have a new series coming soon inspired by a blog post “Most Influential Database Papers" by Ryan Marcus. The series will feature interviews with the authors of some of the most impactful work in the field of databases. We will talk about the story behind some of their most impactful work, getting them to reflect on the impact it has had over years, as well as getting their take on the current trends in the field.

Proudly sponsored by Pometry

Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Eleni Zapridou | Oligolithic Cross-task Optimizations across Isolated Workloads | #51
29 apr 2024· Disseminate: The Computer Science Research Podcast
In this episode, we talk to Eleni Zapridou and delve into the challenges of data processing within enterprises, where multiple applications operate concurrently on shared resources. Traditional resource boundaries between applications often lead to increased costs and resource consumption. However, as Eleni explains the principle of functional isolation offers a solution by combining cross-task optimizations with performance isolation. We explore GroupShare, an innovative strategy that reduces CPU consumption and query latency, transforming data processing efficiency. Join us as we discuss the implications of functional isolation with Eleni and its potential to revolutionize enterprise data processing.
Links:CIDR'24 PaperEleni's TwitterEleni's LinkedIn
Hosted on Acast. See acast.com/privacy for more information.
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Visa fler

Avsnitt

Anastasiia Kozar | Fault Tolerance Placement in the Internet of Things | #61

Liana Patel | ACORN: Performant and Predicate-Agnostic Hybrid Search | #60

High Impact in Databases with... David Maier

Raunak Shah | R2D2: Reducing Redundancy and Duplication in Data Lakes | #59

High Impact in Databases with... Aditya Parameswaran

Marco Costa | Taming Adversarial Queries with Optimal Range Filters | #58

High Impact in Databases with... Ali Dasdan

Matt Perron | Analytical Workload Cost and Performance Stability With Elastic Pools | #57

High Impact in Databases with... Andreas Kipf

Marvin Wyrich & Justus Bogner | How Software Engineering Research Is Discussed on LinkedIn | #56

High Impact in Databases with... Joe Hellerstein

Harry Goldstein | Property-Based Testing | #55

High Impact in Databases with... Raghu Ramakrishnan

Gina Yuan | In-Network Assistance With Sidekick Protocols | #54

High Impact in Databases with... Moshe Vardi

Tammy Sukprasert | Move Your Workloads To Sweden! | #53

Yazhuo Zhang | SIEVE is Simpler than LRU | #52

Introducing the High Impact Series...

Eleni Zapridou | Oligolithic Cross-task Optimizations across Isolated Workloads | #51