Core Concepts
Sensors
As their name suggests they sesne for something to occcur or waiting for an external event. Sensor have two different mode to run
Modes
poke
- Consumes a whole worker slot and waits and checks again till timeout
- Good When you need to check frequently such as every 60 seconds.
- In such scenario giving up worker and consuming back every 60 seconds is not worth.
reschedule
- Only consumes worker when needed to check
- Good when you have longer wait duration such as every 15 minutes or hour
- It rleases worker and consume only when next schedule is
Common parameters
- Poke Interval: When to check, for poke it will check and for rechedule it will schedule
- timeout: maximum time the sensor is allowed to check before failing
- mode: poke/rechedule
- soft_fail: if set to
Trueit wont showFAILEDafter timeout it will have statusSKIPPED - exponential_backoff: Time increases exponentially after each checks till
max_wait - max_wait: Upper limit of exponential_backoff
Example:
BashSensor(
task_id="wait_for_file",
bash_command="test -f /data/input.csv",
poke_interval=60,
timeout=60 * 60,
mode="reschedule",
)
Operators
Operators are like functions or templates where you plugin your configuration/data and it is done. Airflow has lot of default operators. Some popular operators are
- BashOperator
- PythonOperator
- HttpOperator
Executor
Executor is what actually runs your task. Airflow has multiple predefined executors such as
- Local Executor
- Celery Executor
- Kubernetes Executor
- ECS Executor
Airflow also supports multiple executors at once. You can also create a custom executor and run it. If you want to create a executor you need to use
BaseExecutor
XComs
XComs stands for cross-communications are mechanism that allows tasks to talk to each other.
XComs are like a relative of Variables. Main difference is:
- XComs are per task instance and designed for communication within a DAG run
- Variables are global and designed for overall configuration and sharing values across Airflow
Use XComs when one task needs to pass a small result to another task in the same run. Use Variables when you need a config value that many DAGs or tasks can read.