Skip to main content

Posts

Showing posts from November, 2019

How to test AWS Glue jobs locally / run AWS Glue jobs locally – Part 1

AWS Glue – Local Testing using Apache Spark 2.4.3 Recently, AWS release its glue libs on GitHub AWS Glue GitHub -  https://github.com/awslabs/aws-glue-libs You can either download Glue 0.9 or Glue 1.0 from the GitHub branch. glue-1.0:  git clone -b glue-1.0 https://github.com/awslabs/aws-glue-libs.git glue 0.9:   git clone https://github.com/awslabs/aws-glue-libs.git Prerequisites: Maven 3.6.0 or higher –  https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-common/apache-maven-3.6.0-bin.tar.gz Spark 2.2x or higher –  https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-1.0/spark-2.4.3-bin-hadoop2.8.tgz https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-0.9/spark-2.2.1-bin-hadoop2.7.tgz Step 1: Install and configure Maven and Apache Spark – configure it as per your installation >> wget https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-1.0/spark-2.4.3-bin-hadoop2.8.tgz >> vi ~/.bashrc export M2

Run AWS Glue Job in PyCharm Community Edition – Part 2

Run AWS Glue Job in PyCharm IDE - Community Edition Step 1: PyCharm  Install PySpark using  >> pip install pyspark==2.4.3 Step 2: Prebuild AWS Glue-1.0 Jar with Python dependencies: >>  Download_Prebuild_Glue_Jar Step 3 : Copy awsglue folder and Jar file into your pycharm project >> https://github.com/awslabs/aws-glue-libs/tree/glue-1.0/awsglue Step 4 : Copy python code from my git repository >> https://github.com/sardetushar/awsglue-pycharm-local-dev Step 5 : Project Structure Step 6:  On console type – Make sure to type your own path >> python com/mypackage/pack/glue-spark-pycharm-example.py Step 6 : Any issues comment me here :) In Part 3 , we’ll see more advanced example like AWS Glue-1.0 and Snowflake database.