Tuesday, June 28, 2016

Spark Job Server Job Remote Debugging

Please find below step to configure debugging for remote, local or spark in docker container.

STEP 1 :
Configure System or User variable in system where spark master is running with name :
SPARK_SUBMIT_OPTS
And value :
-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=9999
Port 9999 is the port where you want your spark master to listen(you can choose port of your choice.)
(IF you are running the spark job server in docker container please don’t forget to bind the port with –p 999:9999 while running docker container)




STEP 2 :
Restart your docker image or system where spark master is running or use command line to make your system or user variable effective.

STEP 3 :
Start your spark instance on server or local system(if it’s an docker image spark instance automatically start at boot time with system variable). If all variable is successfully loaded by system or docker container then you will get following message in the spark logs.



STEP 4 :
Right click on your eclipse java class you want to debug.
Configure the remote java debugging instance for class with IP Address of the machine where spark is running and port as 9999.
Click on debug.



STEP 5 :
If it’s successful your spark logs processed from the previous state of logs in STEP 3.




STEP 6 :
IF you have not uploaded the jar file containing job into your spark or spark job server. Please upload it using the following command in your job server.

curl --data-binary @job-server-tests/target/scala-2.10/job-server-tests-$VER.jar localhost:8090/jars/test
OK⏎

STEP 7 :
Create spark context before running job if you have not created spark context in your job(class in jar using code base), or not configured to create it at boot time.

curl -d "" 'localhost:8090/contexts/test-context?num-cpu-cores=4&memory-per-node=512m'
OK⏎

STEP 8 :
Mark debugging points into your spark application or class you want to debug.

STEP 9 :
START the job using the following CURL command in your spark job server.

curl -d "input.string = a b c a b see" 'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample'