Skip to main content

Run AWS Glue Job in PyCharm Community Edition – Part 2


Run AWS Glue Job in PyCharm IDE - Community Edition


Step 1: PyCharm Install PySpark using 

>> pip install pyspark==2.4.3


Step 2: Prebuild AWS Glue-1.0 Jar with Python dependencies:



Step 3 : Copy awsglue folder and Jar file into your pycharm project

>> https://github.com/awslabs/aws-glue-libs/tree/glue-1.0/awsglue


Step 4 : Copy python code from my git repository

>> https://github.com/sardetushar/awsglue-pycharm-local-dev


Step 5 : Project Structure



Step 6: On console type – Make sure to type your own path

>> python com/mypackage/pack/glue-spark-pycharm-example.py



Step 6 : Any issues comment me here :)

In Part 3, we’ll see more advanced example like AWS Glue-1.0 and Snowflake database.


Comments

  1. I followed your instruction and copied jar to my pycharm project and getting below error
    File "nput", line 1, in
    File "/Users/sb/work/gitrepo/aws-glue-libs/awsglue/context.py", line 45, in __init__
    self._glue_scala_context = self._get_glue_scala_context(**options)
    File "/Users/sb/work/gitrepo/aws-glue-libs/awsglue/context.py", line 66, in _get_glue_scala_context
    return self._jvm.GlueContext(self._jsc.sc())
    TypeError: 'JavaPackage' object is not callable

    I googled it and was not able to find solution for the above problem.. Can you please help

    ReplyDelete
  2. I was able to resolve the above issue but now facing a different error
    inputGDF = glueContext.create_dynamic_frame_from_options(connection_type="s3", connection_options = {"paths": ["s3://sb/patt-part/year=2020/month=02/day=13"]}, format = "json")

    error
    py4j.protocol.Py4JJavaError: An error occurred while calling o55.getDynamicFrame.
    : java.lang.IllegalAccessError: class org.apache.hadoop.fs.s3a.S3AInstrumentation tried to access method 'void org.apache.hadoop.metrics2.lib.MutableCounterLong.(org.apache.hadoop.metrics2.MetricsInfo, long)' (org.apache.hadoop.fs.s3a.S3AInstrumentation is in unnamed module of loader org.apache.spark.util.MutableURLClassLoader @2516fc68; org.apache.hadoop.metrics2.lib.MutableCounterLong is in unnamed module of loader 'app')

    ReplyDelete
  3. How did you solve it?

    ReplyDelete
  4. Thanks for the tutorial! Super helpful!
    How did you build the "AWS Glue-1.0 Jar with Python dependencies" from the sources?

    ReplyDelete
  5. Just download official repo

    https://github.com/awslabs/aws-glue-libs

    build using normal command ex. mvn clean and mvn install

    ReplyDelete

Post a Comment