Skip to main content

How to run or install PySpark 3 locally Windows 10 / Mac / Linux / Ubuntu

 PySpark 3 - Windows 10 / Mac / Ubuntu


1. Install jupyter and pyspark

pip install jupyter

pip install pyspark

2. Start jupyter server and run sample pi example code

# ref - https://github.com/apache/spark/blob/master/examples/src/main/python/pi.py

import sys
from random import random
from operator import add

from pyspark.sql import SparkSession


if __name__ == "__main__":

    spark = SparkSession\
        .builder\
        .appName("PythonPi")\
        .getOrCreate()

    partitions = 100
    n = 100000 * partitions

    def f(_):
        x = random() * 2 - 1
        y = random() * 2 - 1
        return 1 if x ** 2 + y ** 2 <= 1 else 0

    count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
    print("Pi is roughly %f" % (4.0 * count / n))

    spark.stop()

3. Check your Spark UI from Jupyter





Comments