PiPy Installation of Apache Airflow 2.x on Python Virtual Environment
In this tutorial you will find steps to install apache airflow on python virtual environment. You can follow the document and watch video which provides more details on the installation.
Tutorial to follow along
Steps To Install Airflow
Update Ubuntu/Debian
sudo apt update -y & sudo apt upgrade -y
Create app directory for superset and dependencies
sudo mkdir /app
sudo chown user /app
cd /app
Create python environment
mkdir airflow
cd airflow
python3 -m venv airflow_env
. airflow_env/bin/activate
pip install --upgrade setuptools pip
Install Apache Airflow
pip install apache-airflow apache-airflow-providers-google
Setup environment variable
export AIRFLOW_HOME=/app/airflow
Create Airflow Configuration file
airflow config list --defaults > "${AIRFLOW_HOME}/airflow.cfg"
Configure Airflow configuration file if needed such as Airflow metadata db if you want to use any other database than sqlite.
Example for postgresql:
add sql_alchemy_conn = postgresql://postgres:[email protected]:54321/airflow_meta?sslmode=require
If you want to use other database then you can also change executor to LocalExecutor
in config file. In the config file this will be commented you can uncomment this and use LocalExecutor
as default executor is sequentialExecutor which does not support parallelism because of sqlite.
Run following to setup airflow
airflow db migrate
airflow users create \
--username admin \
--firstname Shantanu \
--lastname Khond \
--role Admin \
--email [email protected]
airflow webserver --port 8080
Now your airflow webserver should be running and accessible on port 8080 Lets create service to run both web server and scheduler (For executor I will write another page to use executors)
Lets create Airflow and scheduler run script
Create Airflow webserver run script using nano run_airflow.sh
Add following code in this
#bash
. airflow_env/bin/activate
export AIRFLOW_HOME=/app/airflow/
airflow webserver
Create Airflow scheduler run script using nano run_scheduler.sh
Add following code in this
#bash
. airflow_env/bin/activate
export AIRFLOW_HOME=/app/airflow/
airflow scheduler
Add execute permission
chmod +x run_airflow.sh
chmod +x run_scheduler.sh
Finally create services
First create airflow web service using following command
sudo nano /etc/systemd/system/airflow-webserver.service
And add following code in it
[Unit]
Description = Apache Airflow Webserver Daemon
After = network.target
[Service]
PIDFile = /app/airflow/airflow-webserver.PIDFile
Environment=SUPERSET_HOME=/app/airflow
Environment=PYTHONPATH=/app/airflow
WorkingDirectory = /app/airflow
ExecStart = sh /app/airflow/run_airflow.sh
ExecStop = /bin/kill -s TERM $MAINPID
[Install]
WantedBy=multi-user.target
First create airflow web service using following command
sudo nano /etc/systemd/system/airflow-scheduler.service
Add following code in it
[Unit]
Description = Apache Airflow scheduler Daemon
After = network.target
[Service]
PIDFile = /app/airflow/airflow-scheduler.PIDFile
Environment=SUPERSET_HOME=/app/airflow
Environment=PYTHONPATH=/app/airflow
WorkingDirectory = /app/airflow
ExecStart = sh /app/airflow/run_scheduler.sh
ExecStop = /bin/kill -s TERM $MAINPID
[Install]
WantedBy=multi-user.target
Finally lets enable and start service
sudo systemctl daemon-reload
sudo systemctl enable airflow-webserver.service
sudo systemctl enable airflow-scheduler.service
sudo systemctl start airflow-webserver.service
sudo systemctl start airflow-scheduler.service
With these steps, your Apache Airflow webserver and scheduler should now be running as system services. You can access the Airflow UI by navigating to http://localhost:8080 in your web browser.