Skip to main content

How to test AWS Glue jobs locally / run AWS Glue jobs locally – Part 1

AWS Glue – Local Testing using Apache Spark 2.4.3

Recently, AWS release its glue libs on GitHub AWS Glue GitHub - 

You can either download Glue 0.9 or Glue 1.0 from the GitHub branch.

git clone -b glue-1.0

glue 0.9: 
git clone


Maven 3.6.0 or higher – 

Spark 2.2x or higher – 

Step 1: Install and configure Maven and Apache Spark – configure it as per your installation

>> vi ~/.bashrc

export M2_HOME=/home/tusharsarde/soft/maven3
export PATH=${M2_HOME}/bin:${PATH}
export PYSPARK_PYTHON=python3
export SPARK_HOME=/tmp/tush/aws-glue-pfa/spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8/
export PATH=${SPARK_HOME}/bin:${PATH}

>> source ~/.bashrc

Step 2: Clone git repo and run the file, this step will create folder jarsv1/ and download the required jars.

>> cd aws-glue-libs

>> chmod +x bin/*

>> ./bin/

Note – Very important step, there are some netty-all-* jar incompatibility issues.
Spark uses netty-all-4.1 version whereas AWS Glue downloads netty-all-4.0.23.Final.jar so we need to remove netty-all-4.0.23.Final.jar from jarsv1/

Step 3: Test Glue pyspark shell is running fine

>> ./bin/gluepyspark


Step 3: Submit your AWS Glue script using spark submit

 >> ./bin/gluesparksubmit --master local

Step 4: Comment me here if you face any issues :)

In Part 2 we’ll see how to run Glue-1.0 using PyCharm Community Edition.


Post a Comment