PyCon Pune 2018

PyCon Pune

Airflow: To Manage Data Pipelines

Submitted by Mahendra Yadav (@userimack) on Thursday, 17 August 2017

Technical level: Beginner

Abstract

A significant part of the IT/Data Engineering team is spent on writing and scheduling jobs, monitoring and troubleshooting the issues. Enterprise data originates from various sources and there are various business rules and processes that govern how that data can be consumed.

Airflow is a platform to programmatically author, schedule and monitor workflows. (https://airflow.incubator.apache.org/)

The various tasks in the workflow(s) are configured as a Directed Acyclic Graph. This talk covers how Airflow is used to establish better workflows for data engineering.

P.S: This talk is inspired from Bargava Subramanian (@barsubra) proposal.

Outline

  1. Introduction to Airflow
  2. Existing challenges in data engineering - creating/monitoring/troubleshooting workflows
  3. Main advantages of Airflow
  4. Basic Concepts
  5. Airflow in practice - case study

Speaker bio

Mahendra Yadav is a Data Engineer at Azri Solutions, Hyderabad. In his day to day work he processes a lot of data from different sources.

Links

Slides

https://userimack.github.io/airflow_slides/airflow_slides/#/

Comments

  • Kushal Das (@kushaldas) Reviewer 2 years ago

    Thank you for submitting the talk to PyCon Pune. The talk selection team will contact you here in case of any queries. Meanwhile, please make sure that you provide a link to the presentation slides.

    • Mahendra Yadav (@userimack) Proposer 2 years ago

      Thanks, added the slides link.

Login with Twitter or Google to leave a comment