Skip to main content

PiPy Installation of Apache Airflow 3.0+ on Python Virtual Environment

In this tutorial you will find steps to install apache airflow on python virtual environment. You can follow the document and watch video which provides more details on the installation. For Airflow 2.x please refer

Tutorial to follow along

Steps To Install Airflow 3.0+

Update Ubuntu/Debian


sudo apt update -y & sudo apt upgrade -y

Create app directory for superset and dependencies

sudo mkdir /app
sudo chown user /app
cd /app

Create python environment

mkdir airflow
cd airflow
python3 -m venv airflow_env
. airflow_env/bin/activate
pip install --upgrade setuptools pip flask_appbuilder

Install Apache Airflow

pip install apache-airflow apache-airflow-providers-fab

Setup environment variable

export AIRFLOW_HOME=$(pwd)

Update Airflow configuration

Create Airflow Configuration file

airflow config list --defaults > "${AIRFLOW_HOME}/airflow.cfg"

Update following configurations

Update executtor to be LocalExecutor as default is sequentialExecutor

executor = LocalExecutor

Also add auth manager if you want to create a user using airflow users

auth_manager = airflow.providers.fab.auth_manager.fab_auth_manager.FabAuthManager

Create JWT secret key using following

python -c "import secrets; print(secrets.token_urlsafe(32))"

Update JWT secret in config file

jwt_secret = {YOUR JWT SECRET}

If you want CSRF token enabled generate another JWT token and update config file as follows

secret_key = {YOUR JWT SECRET}

Configure Airflow configuration file if needed such as Airflow metadata db if you want to use any other database than sqlite.

Example for postgresql: add sql_alchemy_conn = postgresql://postgres:[email protected]:54321/airflow_meta?sslmode=require

note

If you want to use other database then you can also change executor to LocalExecutor in config file. In the config file this will be commented you can uncomment this and use LocalExecutor as default executor is sequentialExecutor which does not support parallelism because of sqlite.

Setting up and testing airflow

Run following to setup airflow

airflow db migrate

airflow users create \
--username admin \
--firstname Shantanu \
--lastname Khond \
--role Admin \
--email [email protected]

airflow api-server --port 8080

Now your airflow webserver should be running and accessible on port 8080 Lets create service to run all components (For executor I will write another page to use executors)

Airflow run script

Create Airflow webserver run script using nano run_airflow.sh Add following code in this

#!/bin/bash
# start_airflow.sh
# Exit on any error
set -e
export AIRFLOW_HOME=/app/airflow
source /app/airflow/airflow_env/bin/activate

echo "Starting all Airflow components..."

# Start components in the background and store their PIDs
airflow scheduler &
PIDS[0]=$!

airflow dag-processor &
PIDS[1]=$!

airflow triggerer &
PIDS[2]=$!

airflow api-server --port 8080 &
PIDS[3]=$!

# Function to clean up and kill all background processes
cleanup() {
echo "Shutting down all Airflow components..."
for PID in ${PIDS[*]}; do
kill $PID 2>/dev/null
done
}

# Set a trap to run the cleanup function on script exit (e.g., Ctrl+C)
trap cleanup EXIT

# Wait for any background process to exit
wait -n

# Check the exit code of the process that finished
EXIT_CODE=$?
if [ $EXIT_CODE -ne 0 ]; then
echo "A component failed with exit code $EXIT_CODE. Stopping all other components."
# The 'trap' will handle the cleanup automatically
exit $EXIT_CODE
else
echo "A component finished successfully. Shutting down."
fi

echo "All Airflow components have been stopped."

Add execute permission

chmod +x run_airflow.sh

Create services

First create airflow web service using following command sudo nano /etc/systemd/system/airflow.service And add following code in it

[Unit]
Description = Apache Airflow Webserver Daemon
After = network.target

[Service]
PIDFile = /app/airflow/airflow.PIDFile
Environment=AIRFLOW_HOME=/app/airflow
Environment=PYTHONPATH=/app/airflow
WorkingDirectory = /app/airflow
ExecStart = bash /app/airflow/run_airflow.sh
ExecStop = /bin/kill -s TERM $MAINPID

[Install]
WantedBy=multi-user.target

Finally lets enable and start service

sudo systemctl daemon-reload
sudo systemctl enable airflow.service
sudo systemctl start airflow.service

With these steps, your Apache Airflow should now be running as system services. You can access the Airflow UI by navigating to http://localhost:8080 in your web browser.