Even though sys.argv
is a good solution, I still prefer this more proper way of handling line command args in my PySpark jobs:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--ngrams", help="some useful description.")
args = parser.parse_args()
if args.ngrams:
ngrams = args.ngrams
This way, you can launch your job as follows:
spark-submit job.py --ngrams 3
More information about argparse
module can be found in Argparse Tutorial
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…