The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI – Lyssna här

Avsnitt

Reflections on a Decade of Data Engineering at Seattle Data Guy
3 apr· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Lessons from the past decade of data engineering reveal how much the ecosystem has changed and what has stayed surprisingly consistent.
In this episode, Benjamin Rogojan, Owner and Data Consultant at Seattle Data Guy, joins us to reflect on how the data engineering landscape has evolved alongside Apache Airflow. We explore when Airflow makes sense as an orchestrator, why batch processing is still dominant and how AI is reshaping the workflows and responsibilities of modern data engineers.
Key Takeaways:
00:00 Introduction.
03:00 Airflow becomes valuable when workflows involve many pipelines, teams and dependencies.
05:00 Data engineers are still focused on making data accessible and aligning work with business needs.
05:30 Batch pipelines remain the most common approach even as real-time use cases grow.
07:45 Many “real-time” requests are actually event-driven batch workflows.
09:00 Airflow replaced many custom-built pipeline systems with built-in dependency management.
11:00 Modern orchestration tools often build on Airflow concepts or differentiate from them.
14:00 AI can assist with writing SQL and pipelines but still requires experienced engineers.
15:30 Organizations are collecting increasingly granular data creating more engineering demand.
19:00 The data stack has shifted rapidly from Hadoop-era systems to modern cloud platforms.
Resources Mentioned:
Benjamin Rogojan
https://www.linkedin.com/in/benjaminrogojan/
Seattle Data Guy
https://www.linkedin.com/company/seattle-data-guy/
Apache Airflow
https://airflow.apache.org
Airflow Summit / Airflow Conference
https://airflowsummit.org
Snowflake
https://www.snowflake.com
HubSpot Data Sharing / APIs
https://developers.hubspot.com
MLflow
https://mlflow.org
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Managing Data Quality and Governance With Airflow at Credit Karma with Ashir Alam
26 mar· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Data quality is not optional when you manage credit data at scale.
In this episode, Ashir Alam, Senior Data Engineer at Credit Karma, joins us to share how his team acts as the gatekeeper for credit data ingestion, how they standardize data quality with Airflow and DAG Factory and how they scale safely across thousands of DAGs. We explore how governance, PII protection and orchestration come together inside a modern data platform.

Key Takeaways:
00:00 Introduction.
01:00 Overview of Credit Karma’s products and financial data ecosystem.
02:00 The team acts as gatekeepers for ingesting data from TransUnion and Equifax.
03:00 Why PII handling and controlled downstream access led to adopting Airflow.
04:00 BigQuery as the warehouse and Airflow as the primary orchestrator.
05:00 Why data quality and governance are critical in financial systems.
07:00 Why Airflow was selected: ease of use and unified ETL plus data quality.
09:00 Introduction to DAG Factory and YAML-based DAG generation.
10:00 GitHub executor creates PR-driven DAG workflows with CI checks.
12:00 BigQuery operators, structured checks and custom Slack and PagerDuty alerts.
13:00 Failed checks stop ETL pipelines and trigger notifications.
17:00 Scaling DAG Factory across thousands of DAGs and runtime vs compile-time concerns.
19:00 Future improvements: better defaults, retries and GenAI workflows in Airflow.
Resources Mentioned:
Ashir Alam
https://www.linkedin.com/in/ashir-alam/
Credit Karma
https://www.linkedin.com/company/intuit-credit-karma/
Apache Airflow
https://airflow.apache.org/
DAG Factory
https://github.com/astronomer/dag-factory
BigQuery (Google Cloud)
https://cloud.google.com/bigquery
GitHub
https://github.com/
Slack
https://slack.com/
PagerDuty
https://www.pagerduty.com/
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Saknas det avsnitt?

Klicka här för att uppdatera flödet manuellt.
Open Source Airflow Contributions and Performance Improvements at G-Research with Christos Bisias
19 mar· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Modern Airflow isn’t just orchestration. It's a contribution.

In this episode, we explore how open source investment drives real performance gains and deeper observability.
We’re joined by Christos Bisias, Open Source Software Engineer, Apache Airflow at G-Research, to discuss how his team uses Airflow for large-scale data transformations, contributes upstream and improves scheduler throughput and OpenTelemetry support. From trace-level observability to CI-enforced metrics governance and a major scheduler optimization, this conversation spans strategy, engineering and community impact.
Key Takeaways:
00:00 Introduction.
01:20 How G-Research applies machine learning and big data to predict financial market movements.
02:15 Contributing to open source is a business decision.
03:10 Maintaining a fork is costly.
04:30 OpenTelemetry collects metrics, logs and traces to provide deep system visibility.
06:10 Custom spans help identify bottlenecks inside tasks and enable performance optimization.
08:05 OpenTelemetry integration works properly in Airflow 3.0 and above.
10:00 A YAML-based metrics registry with CI enforcement ensures consistency between docs and exported metrics.
12:10 Scheduler throughput improved significantly by applying concurrency limits earlier in the database query.
15:20 Future Task SDK changes may enable language-agnostic DAG authoring beyond Python.
Resources Mentioned:
Christos Bisias
https://www.linkedin.com/in/xbis/
G-Research
https://www.linkedin.com/company/g-research/
Apache Airflow
https://airflow.apache.org/
OpenTelemetry
https://opentelemetry.io/
Prometheus
https://prometheus.io/
Grafana
https://grafana.com/
Jaeger
https://www.jaegertracing.io/
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Automating Threat Intelligence Using Airflow with Karan Alang
12 mar· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
In this episode, Karan Alang, Principal Software Engineer at Versa Networks, joins the conversation to discuss how Airflow can be used to automate threat intelligence in modern cybersecurity environments. He explains the growing scale of cloud computing, the profitability of hacking and the shortage of SOC analysts. Karan also outlines a novel architecture that combines Airflow, XDR, graph databases and LLMs to orchestrate automated threat detection and response.
Key Takeaways:
00:00 Introduction.
05:00 Organizations face massive log volumes and a shortage of SOC analysts.
07:00 The solution integrates Airflow, XDR, Neo4j graph databases and LLMs into one architecture.
08:00 MITRE ATT&CK provides a global framework for mapping tactics and techniques.
11:00 Airflow acts as the orchestration backbone for ingestion graph transformation and LLM workflows.
13:00 Graph databases provide a full relationship view of attackers’ systems and entities.
14:00 LLMs automate mapping activity to MITRE ATT&CK and assign explainable risk scores.
17:00 Traditional signature-based detection allows lateral movement and exfiltration before teams can react.
18:00 End-to-end automation is essential to mitigating modern cybersecurity threats.
20:00 Future opportunities include deeper LLM integration as first-class citizens within Airflow.
Resources Mentioned:
Karan Alang
https://www.linkedin.com/in/karan-alang-4173437
Versa Networks | LinkedIn
https://www.linkedin.com/company/versa-networks
Versa Networks | Website
https://versa-networks.com
Google Cloud Composer (Managed Airflow on GCP)
https://cloud.google.com/composer
Microsoft Defender XDR
https://www.microsoft.com/es-es/security/business/siem-and-xdr/microsoft-defender-xdr
Neo4j (Graph Database)
https://neo4j.com
MITRE ATT&CK Framework
https://attack.mitre.org
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Using Plugins To Customize Airflow at Ponder Labs with Egor Tarasenko
5 mar· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
In this episode, we explore how teams scale Apache Airflow in complex environments and what it takes to make orchestration work across many stakeholders. We look at real-world challenges around visibility, ownership and predictability as data platforms grow.
Egor Tarasenko, Data and AI Engineer at Ponder Labs, joins us to share how Ponder Labs customizes Airflow for education organizations using plugins, event-driven architectures and AI-powered tooling. He explains how his team supports large charter school networks and why structure, consistency and extensibility become critical at scale.
Key Takeaways:
00:00 Introduction.
01:21 Ponder Labs helps education organizations bring data from many systems together so it becomes useful for teachers, school leaders and administrators.
03:10 Airflow serves as the backbone for orchestrating ingestion, transformation and reverse ETL across client data platforms.
05:43 Everything is triggered from Airflow to maintain dependency, visibility and a single operational picture.
09:05 Managing hundreds of DAGs requires a focus on structure, visibility and consistency across teams.
09:51 Treating DAGs like APIs helps teams scale without needing deep knowledge of upstream logic.
12:00 Custom plugins like schedule insights help predict DAG run times across layered dependencies.
15:00 AI-powered Airflow chat enables non-technical stakeholders to understand DAG ownership dependencies and cluster activity.
22:06 Migrating plugins to Airflow 3 improves developer experience through cleaner APIs and faster extensibility.
Resources Mentioned:
Egor Tarasenko
https://www.linkedin.com/in/egorseno/
Apache Airflow
https://airflow.apache.org
dbt
https://www.getdbt.com
Astronomer Astro Platform
https://www.astronomer.io
Egor Tarasenko on Substack
https://egortarasenko.substack.com
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Scaling Airflow at Wix for Analytics and AI with Ethan Shalev
26 feb· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Modern data orchestration at scale demands reliability, speed and thoughtful adoption of new tooling. As organizations grow, keeping pipelines efficient while supporting more teams becomes a critical challenge.
In this episode, we’re joined by Ethan Shalev, Data Engineer at Wix, to discuss how Wix operates Airflow at massive scale, migrates to Airflow 3 and uses AI to accelerate development.
Key Takeaways:
00:00 Introduction.
02:13 Wix structures data engineering across multiple product-focused organizations.
03:40 Migrating nearly 8,000 DAGs to Airflow 3 requires careful planning.
04:31 Migration creates an opportunity to remove long-standing legacy Airflow code.
05:32 Internal playbooks and Cursor rules standardize and speed up DAG migrations.
07:39 Airflow 3 introduces backfills, DAG versioning and asset-aware scheduling.
09:16 Deferrable operators reduce scheduler congestion in large Airflow environments.
12:54 AI-generated code still requires review and strong testing practices.
14:52 Moving to managed Airflow reduces operational burden on internal platform teams.
15:57 Improving multi-tenancy and UI personalization remains a key Airflow need.
Resources Mentioned:
Ethan Shalev
https://www.linkedin.com/in/eshalev/
Wix | LinkedIn
https://www.linkedin.com/company/wix-com/
Wix | Website
https://www.wix.com/
Apache Airflow
https://airflow.apache.org/
Astronomer
https://www.astronomer.io/
Trino
https://trino.io/
Apache Iceberg
https://iceberg.apache.org/
Cursor
https://cursor.sh/
Airflow Summit
https://airflowsummit.org/
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Using Airflow To Orchestrate Billions of Events at Addi with Carlos Daniel Puerto Niño
19 feb· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Strong data orchestration is as much about culture and visibility as it is about technology. As data platforms scale, teams need systems that reduce cognitive load while increasing reliability and observability.
In this episode, Carlos Daniel Puerto Niño, Senior Analytics Engineer and Data Analyst at Addi, joins us to share how Addi uses Airflow to support batch orchestration, manage organizational complexity and improve monitoring across its data platform.
Key Takeaways:
00:00 Introduction.
01:25 Changes in company strategy increase data platform complexity over time.
04:00 Centralized data teams help manage organizational and technical change.
06:08 Scalable architectures support growing data volumes and use cases.
09:10 Adopting orchestration tools introduces operational and maintenance challenges.
14:43 Abstraction layers lower technical barriers for onboarding new team members.
15:36 Modularity and visibility improve the reliability of data pipelines.
18:14 Integrated monitoring supports faster incident response and resolution.
22:19 Limited access to orchestration metadata constrains proactive analysis.
Resources Mentioned:
Carlos Daniel Puerto Niño
https://www.linkedin.com/in/carlospuertoni%C3%B1o/
Addi | LinkedIn
https://www.linkedin.com/company/addicol/
Addi | Website
https://www.addi.com
Apache Airflow
https://airflow.apache.org/
Astronomer
https://www.astronomer.io/
Databricks
https://www.databricks.com/
dbt
https://www.getdbt.com/
Grafana
https://grafana.com/
Slack
https://slack.com/
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Building Event-Driven Data Pipelines With Airflow 3 at Astrafy with Andrea Bombino
12 feb· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Real-time data expectations are reshaping how modern data teams think about orchestration and dependencies. As event-driven architectures become more common, teams need to rethink how pipelines react to data changes, rather than schedules.
In this episode, Andrea Bombino, Co-Founder and Head of Analytics Engineering at Astrafy, joins us to discuss how event-driven scheduling in Airflow is evolving and how Astrafy applies it to deliver faster, more responsive data pipelines.
Key Takeaways:
00:00 Introduction.
02:02 Astrafy’s role in guiding clients across the modern data stack.
03:15 Strong DAG dependencies create challenges for time-based scheduling.
04:48 Event-driven pipelines respond to increasing real-time data demands.
05:30 Airflow 3 introduces native support for event-driven orchestration.
06:27 Sensor-based workflows reveal scalability and efficiency limitations.
11:32 Event-driven assets improve efficiency and pipeline elegance.
14:45 Governance and cross-instance coordination emerge as ongoing challenges.
Resources Mentioned:
Andrea Bombino
https://www.linkedin.com/in/andrea-bombino/
Astrafy | LinkedIn
https://www.linkedin.com/company/astrafy/
Astrafy | Website
https://www.astrafy.io
Apache Airflow
https://airflow.apache.org/
Google Cloud
https://cloud.google.com/
Google Pub/Sub
https://cloud.google.com/pubsub
Google BigQuery
https://cloud.google.com/bigquery
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Uphold’s Approach to Orchestrating Modern Data Workflows with Jaime Oliveira
5 feb· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
A strong data-driven mindset underpins how fintech teams scale analytics, infrastructure and decision-making across the business.
In this episode, Jaime Oliveira, Lead Data Engineer at Uphold, joins us to discuss how Uphold structures its data organization and orchestration strategy. Jaime shares how the team uses Airflow and dbt to support analytics, reporting and data activation while evolving their approach as the stack grows.
Key Takeaways:
00:00 Introduction.
01:23 A data-driven mindset supports product development and business decisions.
02:55 Diverse ingestion pipelines enable scalable analytics.
04:18 A single orchestration platform simplifies analytics workflows.
05:17 Early experience with orchestration tools shapes engineering practices.
08:16 Analytics orchestration works best when aligned with transformation workflows.
09:25 Infrastructure choices involve tradeoffs in testing, visibility and overhead.
16:39 More collaborative workflow tools could improve accessibility and autonomy.
Resources Mentioned:
Jaime Oliveira
https://www.linkedin.com/in/jaime-oliveira-b075855a/
Uphold | LinkedIn
https://www.linkedin.com/company/upholdinc/
Uphold | Website
https://uphold.com
Apache Airflow
https://airflow.apache.org
dbt
https://www.getdbt.com
Snowflake
https://www.snowflake.com
Kubernetes
https://kubernetes.io
Astronomer Cosmos
https://astronomer.github.io/astronomer-cosmos
Cosmos e-book
https://www.astronomer.io/ebooks/orchestrating-dbt-with-airflow-using-cosmos/
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Modern Airflow Best Practices for Scalable Data Pipelines with Bhavani Ravi
29 jan· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Building reliable data pipelines at scale requires more than writing code. It depends on thoughtful design, infrastructure trade-offs and an understanding of how orchestration platforms evolve over time.
In this episode, Airflow best practices shaped by real-world implementation are examined. Bhavani Ravi, Independent Software Consultant and Apache Airflow Champion, shares lessons on pipeline design, architectural decisions and the evolution of the Airflow ecosystem in modern data environments.
Key Takeaways:
00:00 Introduction.
01:30 Independent consulting supports effective Airflow adoption.
02:38 Early challenges shaped modern Airflow practices.
03:21 Airflow setup has become significantly simpler.
04:30 New features expanded workflow capabilities.
06:03 Frequent releases support long-term sustainability.
07:34 Community and providers strengthen the ecosystem.
10:03 Pipeline design should come before coding.
10:55 Decoupling logic requires careful trade-offs.
13:30 Plugins extend Airflow into new use cases.
Resources Mentioned:
Bhavani Ravi
https://www.linkedin.com/in/bhavanicodes/
Apache Airflow
https://airflow.apache.org/
Kubernetes
https://kubernetes.io/
Azure Fabric
https://learn.microsoft.com/en-us/fabric/
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Inside Conviva’s Decision To Power Its Data Platform With Airflow with Han Zhang
22 jan· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Conviva operates at a massive scale, delivering outcome-based intelligence for digital businesses through real-time and batch data processing. As new use cases emerged, the team needed a way to extend a streaming-first architecture without rebuilding core systems.
In this episode, Han Zhang joins us to explain how Conviva uses Apache Airflow as the orchestration backbone for its batch workloads, how the control plane is designed and what trade-offs shaped their platform decisions.
Key Takeaways:
00:00 Introduction.
01:17 Large-scale data platforms require low-latency processing capabilities.
02:08 Batch workloads can complement streaming pipelines for additional use cases.
03:45 An orchestration framework can act as the core coordination layer.
06:12 Batch processing enables workloads that streaming alone cannot support.
08:50 Ecosystem maturity and observability are key orchestration considerations.
10:15 Built-in run history and logs make failures easier to diagnose.
14:20 Platform users can monitor workflows without managing orchestration logic.
17:08 Identity, secrets and scheduling present ongoing optimization challenges.
19:59 Configuration history and change visibility improve operational reliability.
Resources Mentioned:
Han Zhang
https://www.linkedin.com/in/zhanghan177
Conviva | Website
http://www.conviva.com
Apache Airflow
https://airflow.apache.org/
Celery
https://docs.celeryq.dev/
Temporal
https://temporal.io/
Kubernetes
https://kubernetes.io/
LDAP
https://ldap.com/
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Why Airflow Became the Scheduling Backbone at Condé Nast Technology Lab with Arun Karthik
15 jan· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Data platforms are moving from batch-first pipelines to near real-time systems where orchestration, observability, scalability and governance all have to work together.
In this episode, Arun Karthik, Director, Data Solutions Engineering at Condé Nast Technology Lab, joins us to share how data engineering evolves from relational databases and ETL into distributed processing, modern orchestration with Apache Airflow and managed Airflow with Astronomer.
Key Takeaways:
00:00 Introduction.
02:13 Early data systems rely heavily on relational databases and batch-oriented processing models.
07:01 Scheduling requirements evolve beyond fixed time windows as dependencies increase.
10:14 Ease of use and developer experience influence adoption of orchestration frameworks.
13:22 Operating open source orchestration tools requires ongoing engineering effort.
14:45 Managed services help teams reduce infrastructure and maintenance responsibilities.
17:27 Observability improves confidence in pipeline execution and system health.
19:12 Governance considerations grow in importance as data platforms mature.
20:46 Building data systems requires balancing speed, reliability and long-term sustainability.
Resources Mentioned:
Arun Karthik
https://www.linkedin.com/in/earunkarthik/
Condé Nast Technology Lab | LinkedIn
https://www.linkedin.com/company/conde-nast-technology-lab/
Condé Nast Technology Lab | Website
https://www.condenast.com/
Apache Airflow
https://airflow.apache.org/
Astronomer
https://www.astronomer.io/
Apache Spark
https://spark.apache.org/
Apache Hadoop
https://hadoop.apache.org/
Jenkins
https://www.jenkins.io/
dbt Labs
https://www.getdbt.com/product/what-is-dbt
Amazon Web Services
https://aws.amazon.com/free/?trk=54026797-7540-48d8-9f6b-0db2c3a0040c&sc_channel=ps&trk=54026797-7540-48d8-9f6b-0db2c3a0040c&sc_channel=ps&ef_id=CjwKCAiAmp3LBhAkEiwAJM2JUKIc3E2I-hDlF6fRWgZn5n2-RWX-kEDAVApJYd88wwlsiyosV71VixoCmRoQAvD_BwE:G:s&s_kwcid=AL!4422!3!785574063524!e!!g!!amazon%20web%20services!23291338728!189486861095&gad_campaignid=23291338728&gbraid=0AAAAADjHtp813XNbg7azDj5QMwJPbGNqZ&gclid=CjwKCAiAmp3LBhAkEiwAJM2JUKIc3E2I-hDlF6fRWgZn5n2-RWX-kEDAVApJYd88wwlsiyosV71VixoCmRoQAvD_BwE
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
The Role of Airflow in Building Smarter ML Pipelines at Vivian Health with Max Calehuff
11 dec 2025· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
The integration of data orchestration and machine learning is critical to operational efficiency in healthcare tech. Vivian Health leverages Airflow to power both its ETL pipelines and ML workflows while maintaining strict compliance standards.
Max Calehuff, Lead Data Engineer at Vivian Health, joins us to discuss how his team uses Airflow for ML ops, regulatory compliance and large-scale data orchestration. He also shares insights into upgrading to Airflow 3 and the importance of balancing flexibility with security in a healthcare environment.
Key Takeaways:
00:00 Introduction.
04:21 The role of Airflow in managing ETL pipelines and ML retraining.
06:23 Using AWS SageMaker for ML training and deployment.
07:47 Why Airflow’s versatility makes it ideal for MLOps.
10:50 The importance of documentation and best practices for engineering teams.
13:44 Automating anonymization of user data for compliance.
15:30 The benefits of remote execution in Airflow 3 for regulated industries.
18:16 Quality-of-life improvements and desired features in future Airflow versions.
Resources Mentioned:
Max Calehuff
https://www.linkedin.com/in/maxwell-calehuff/
Vivian Health | LinkedIn
https://www.linkedin.com/company/vivianhealth/
Vivian Health | Website
https://www.vivian.com
Apache Airflow
https://airflow.apache.org/
Astronomer
https://www.astronomer.io/
AWS SageMaker
https://www.google.com/aclk?sa=L&ai=DChsSEwj3-fbz1tiQAxWXlKYDHXUBBVoYACICCAEQABoCdGI&ae=2&aspm=1&co=1&ase=2&gclid=Cj0KCQiA5abIBhCaARIsAM3-zFWbfj2olUvX4dqoiYNaE3q2fMf_ZifRjmbKNQCVX7D6ZMClaUXUkFkaAuwmEALw_wcB&cid=CAASQuRoMccxWhBvMq-1Uez3XOZti1ul7mTDotKvSMoDHv0q2xCsyS2FzMptO5dJf3tmfkLRu22TtD8ChTmdjvs6YetTjQ&cce=2&category=acrcp_v1_35&sig=AOD64_2xE2xolEEVbpDb56qXQluxTzs-Aw&q&nis=4&adurl&ved=2ahUKEwj7le3z1tiQAxWXcvUHHfZePbAQ0Qx6BAgUEAE
dbtLabs
https://www.getdbt.com/
Cosmos
https://github.com/astronomer/astronomer-cosmos
Split
https://www.split.io/
Snowflake
https://www.snowflake.com/en/
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Scaling Airflow to 11,000 DAGs Across Three Regions at Intercom with András Gombosi and Paul Vickers
4 dec 2025· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
The evolution of Intercom’s data infrastructure reveals how a well-built orchestration system can scale to serve global needs. With thousands of DAGs powering analytics, AI and customer operations, the team’s approach combines technical depth with organizational insight.
In this episode, András Gombosi, Senior Engineering Manager of Data Infra and Analytics Engineering, and Paul Vickers, Principal Engineer, both at Intercom, share how they built one of the largest Airflow deployments in production and enabled self-serve data platforms across teams.
Key Takeaways:
00:00 Introduction.
04:24 Community input encourages confident adoption of a common platform.
08:50 Self-serve workflows require consistent guardrails and review.
09:25 Internal infrastructure support accelerates scalable deployments.
13:26 Batch LLM processing benefits from a configuration-driven design.
15:20 Standardized development environments enable effective AI-assisted work.
19:58 Applied AI enhances internal analysis and operational enablement.
27:27 Strong test coverage and staged upgrades protect stability.
30:36 Proactive observability and on-call ownership improve outcomes.
Resources Mentioned:
András Gombosi
https://www.linkedin.com/in/andrasgombosi/
Paul Vickers
https://www.linkedin.com/in/paul-vickers-a22b76a3/
Intercom | LinkedIn
https://www.linkedin.com/company/intercom/
Intercom | Website
https://www.intercom.com
Apache Airflow
https://airflow.apache.org/
dbtLabs
https://www.getdbt.com/
Snowflake Cortex AI
https://www.snowflake.com/en/product/features/cortex/
Datadog
https://www.datadoghq.com/
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
How Covestro Turns Airflow Into a Simulation Toolbox with Anja Mackenzie
20 nov 2025· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Building scalable, reproducible workflows for scientific computing often requires bridging the gap between research flexibility and enterprise reliability.
In this episode, Anja MacKenzie, Expert for Cheminformatics at Covestro, explains how her team uses Airflow and Kubernetes to create a shared, self-service platform for computational chemistry.
Key Takeaways:
00:00 Introduction.
06:19 Custom scripts made sharing and reuse difficult.
09:29 Workflows are manually triggered with user traceability.
10:38 Customization supports varied compute requirements.
12:48 Persistent volumes allow tasks to share large amounts of data.
14:25 Custom operators separate logic from infrastructure.
16:43 Modified triggers connect dependent workflows.
18:36 UI plugins enable file uploads and secure access.
Resources Mentioned:
Anja MacKenzie
https://www.linkedin.com/in/anja-mackenzie/
Covestro | LinkedIn
https://www.linkedin.com/company/covestro/
Covestro | Website
https://www.covestro.com
Apache Airflow
https://airflow.apache.org/
Kubernetes
https://kubernetes.io/
Airflow KubernetesPodOperator
https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html
Astronomer
https://www.astronomer.io/
Airflow Academy by Marc Lamberti
https://www.udemy.com/user/lockgfg/?utm_source=adwords&utm_medium=udemyads&utm_campaign=Search_DSA_GammaCatchall_NonP_la.EN_cc.ROW-English&campaigntype=Search&portfolio=ROW-English&language=EN&product=Course&test=&audience=DSA&topic=&priority=Gamma&utm_content=deal4584&utm_term=_._ag_169801645584_._ad_700876640602_._kw__._de_c_._dm__._pl__._ti_dsa-1456167871416_._li_9061346_._pd__._&matchtype=&gad_source=1&gad_campaignid=21341313808&gbraid=0AAAAADROdO1_-I2TMcVyU8F3i1jRXJ24K&gclid=Cj0KCQjwvJHIBhCgARIsAEQnWlC1uYHIRm3y9Q8rPNSuVPNivsxogqfczpKHwhmNho2uKZYC-y0taNQaApU2EALw_wcB
Airflow Documentation
https://airflow.apache.org/docs/
Airflow Plugins
https://airflow.apache.org/docs/apache-airflow/1.10.9/plugins.html
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Building Secure Financial Data Platforms at AgileEngine with Valentyn Druzhynin
13 nov 2025· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
The use of Apache Airflow in financial services demands a balance between innovation and compliance. Agile Engine’s approach to orchestration showcases how secure, auditable workflows can scale even within the constraints of regulatory environments.
In this episode, Valentyn Druzhynin, Senior Data Engineer at AgileEngine, discusses how his team leverages Airflow for ETF calculations, data validation and workflow reliability within tightly controlled release cycles.
Key Takeaways:
00:00 Introduction.
03:24 The orchestrator ensures secure and auditable workflows.
05:13 Validations before and after computation prevent errors.
08:24 Release freezes shape prioritization and delivery plans.
11:14 Migration plans must respect managed service constraints.
13:04 Versioning, backfills and event triggers increase reliability.
15:08 UI and integration improvements simplify operations.
18:05 New contributors should start small and seek help.
Resources Mentioned:
Valentyn Druzhynin
https://www.linkedin.com/in/valentyn-druzhynin/
AgileEngine | LinkedIn
https://www.linkedin.com/company/agileengine/
AgileEngine | Website
https://agileengine.com/
Apache Airflow
https://airflow.apache.org/
Astronomer
https://www.astronomer.io/
AWS Managed Airflow
https://aws.amazon.com/managed-workflows-for-apache-airflow/
Google Cloud Composer (Managed Airflow)
https://cloud.google.com/composer
Airflow Summit
https://airflowsummit.org/
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
How Redica Transformed Their Data With Airflow and Snowflake with Shankar Mahindar
6 nov 2025· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
The life sciences industry relies on data accuracy, regulatory insight and quality intelligence. Building a unified system that keeps these elements aligned is no small feat.
In this episode, we welcome Shankar Mahindar, Senior Data Engineer II at Redica Systems. We discuss how the team restructures its data platform with Airflow to strengthen governance, reduce compliance risk and improve customer experience.
Key Takeaways:
00:00 Introduction.
01:53 A focused analytics platform reduces compliance risk in life sciences.
07:31 A centralized warehouse orchestrated by Airflow strengthens governance.
09:12 Managed orchestration keeps attention on analytics and outcomes.
10:32 A modern transformation stack enables scalable modeling and operations.
11:51 Event-driven pipelines improve data freshness and responsiveness.
14:13 Asset-oriented scheduling and versioning enhance reliability and change control.
16:53 Observability and SLAs build confidence in data quality and freshness.
21:04 Priorities include partitioned assets and streamlined developer tooling.
Resources Mentioned:
Shankar Mahindar
https://www.linkedin.com/in/shankar-mahindar-83a61b137/
Redica Systems | LinkedIn
https://www.linkedin.com/company/redicasystems/
Redica Systems | Website
https://redica.com
Apache Airflow
https://airflow.apache.org/
Astronomer
https://www.astronomer.io/
Snowflake
https://www.snowflake.com/
AWS
https://aws.amazon.com/
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
How Airflow and AI Power Investigative Journalism at the Financial Times with Zdravko Hvarlingov
30 okt 2025· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
The Financial Times leverages Airflow and AI to uncover powerful stories hidden within vast, unstructured data.
In this episode, Zdravko Hvarlingov, Senior Software Engineer at the Financial Times, discusses building multi-tenant Airflow systems and AI-driven pipelines that surface stories that might otherwise be missed. Zdravko walks through entity extraction and fuzzy matching, linking the UK Register of Members’ Financial Interests with Companies House, and how this work cuts weeks of manual analysis to minutes.
Key Takeaways:
00:00 Introduction.
02:12 What computational journalism means for day-to-day newsroom work.
05:22 Why a shared orchestration platform supports consistent, scalable workflows.
08:30 Tradeoffs of one centralized platform versus many separate instances.
11:52 Using pipelines to structure messy sources for faster analysis.
14:14 Turning recurring disclosures into usable data for investigations.
16:03 Applying lightweight ML and matching to reveal entities and links.
18:46 How automation reduces manual effort and shortens time to insight.
20:41 Practical improvements that make backfilling and reliability easier.
Resources Mentioned:
Zdravko Hvarlingov
https://www.linkedin.com/in/zdravko-hvarlingov-3aa36016b/
Financial Times | LinkedIn
https://www.linkedin.com/company/financial-times/
Financial Times | Website
https://www.ft.com/
Apache Airflow
https://airflow.apache.org/
UK Register of Members’ Financial Interests
https://www.parliament.uk/mps-lords-and-offices/standards-and-financial-interests/parliamentary-commissioner-for-standards/registers-of-interests/register-of-members-financial-interests/
UK Companies House
https://www.gov.uk/government/organisations/companies-house
Doppler
https://www.doppler.com/
Kubernetes
https://kubernetes.io/
Airflow Kubernetes Executor
https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html
GitHub
https://github.com/
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Inside Vinted’s Code-Generated Airflow Pipelines with Oscar Ligthart and Rodrigo Loredo
23 okt 2025· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
The shift from monolithic to decentralized data workflows changes how teams build, connect and scale pipelines.
In this episode, we feature Oscar Ligthart, Lead Data Engineer, and Rodrigo Loredo, Lead Analytics Engineer, both at Vinted, as we unpack their YAML-driven abstraction that generates Airflow DAGs and standardizes cross-team orchestration.
Key Takeaways:
00:00 Introduction.
05:28 Challenges of decentralization.
06:45 YAML-based generator standardizes pipelines and dependencies.
12:28 Declarative assets and sensors align cross-DAG dependencies.
17:29 Task-level callbacks enable auto-recovery and clear ownership.
21:39 Standardized building blocks simplify upgrades and maintenance.
24:52 Platform focus frees domain work.
26:49 Container-only standardization prevents sprawl.
Resources Mentioned:
Oscar Ligthart
https://www.linkedin.com/in/oscar-ligthart/
Rodrigo Loredo
https://www.linkedin.com/in/rodrigo-loredo-410a16134/
Vinted | LinkedIn
https://www.linkedin.com/company/vinted/
Vinted | Website
https://www.vinted.com/?srsltid=AfmBOor87MGR_eLOauCO93V9A-aLDaAhGYx9cnu_oN8s1SAXMlCRuhW7
Apache Airflow
https://airflow.apache.org/
Kubernetes
https://kubernetes.io/
dbt
https://www.getdbt.com/
Google Cloud Vertex AI
https://cloud.google.com/vertex-ai
Airflow Datasets & Assets (concepts)
https://www.astronomer.io/docs/learn/airflow-datasets
Airflow Summit
https://airflowsummit.org/
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Transforming Data Pipelines at XENA Intelligence with Naseem Shah
16 okt 2025· The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
The shift from simple cron jobs to orchestrated AI-powered workflows is reshaping how startups scale. For a small team, these transitions come with unique challenges and big opportunities.
In this episode, Naseem Shah, Head of Engineering at Xena Intelligence, shares how he built data pipelines from scratch, adopted Apache Airflow and transformed Amazon review analysis with LLMs.
Key Takeaways:
00:00 Introduction.
03:28 The importance of building initial products that support growth and investment.
06:16 The process of adopting new tools to improve reliability and efficiency.
09:29 Approaches to learning complex technologies through practice and fundamentals.
13:57 Trade-offs small teams face when balancing performance and costs.
18:40 Using AI-driven approaches to generate insights from large datasets.
22:38 How unstructured data can be transformed into actionable information.
25:55 Moving from manual tasks to fully automated workflows.
28:05 Orchestration as a foundation for scaling advanced use cases.
Resources Mentioned:
Naseem Shah
https://www.linkedin.com/in/naseemshah/
Xena Intelligence | LinkedIn
https://www.linkedin.com/company/xena-intelligence/
Xena Intelligence | Website
https://xenaintelligence.com/
Apache Airflow
https://airflow.apache.org/
Google Cloud Composer
https://cloud.google.com/composer
Techstars
https://www.techstars.com/
Docker
https://www.docker.com/
AWS SQS
https://aws.amazon.com/sqs/
PostgreSQL
https://www.postgresql.org/
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
- Lyssna Lyssna igen Fortsätt Lyssnar...
- Lyssna senare Lyssna senare
Visa fler

Avsnitt

Reflections on a Decade of Data Engineering at Seattle Data Guy

Managing Data Quality and Governance With Airflow at Credit Karma with Ashir Alam

Open Source Airflow Contributions and Performance Improvements at G-Research with Christos Bisias

Automating Threat Intelligence Using Airflow with Karan Alang

Using Plugins To Customize Airflow at Ponder Labs with Egor Tarasenko

Scaling Airflow at Wix for Analytics and AI with Ethan Shalev

Using Airflow To Orchestrate Billions of Events at Addi with Carlos Daniel Puerto Niño

Building Event-Driven Data Pipelines With Airflow 3 at Astrafy with Andrea Bombino

Uphold’s Approach to Orchestrating Modern Data Workflows with Jaime Oliveira

Modern Airflow Best Practices for Scalable Data Pipelines with Bhavani Ravi

Inside Conviva’s Decision To Power Its Data Platform With Airflow with Han Zhang

Why Airflow Became the Scheduling Backbone at Condé Nast Technology Lab with Arun Karthik

The Role of Airflow in Building Smarter ML Pipelines at Vivian Health with Max Calehuff

Scaling Airflow to 11,000 DAGs Across Three Regions at Intercom with András Gombosi and Paul Vickers

How Covestro Turns Airflow Into a Simulation Toolbox with Anja Mackenzie

Building Secure Financial Data Platforms at AgileEngine with Valentyn Druzhynin

How Redica Transformed Their Data With Airflow and Snowflake with Shankar Mahindar

How Airflow and AI Power Investigative Journalism at the Financial Times with Zdravko Hvarlingov

Inside Vinted’s Code-Generated Airflow Pipelines with Oscar Ligthart and Rodrigo Loredo

Transforming Data Pipelines at XENA Intelligence with Naseem Shah