Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
494 views
in Technique[技术] by (71.8m points)

python - Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1, worked perfectly on local

I have googled this error on each and every forum but no luck. I have got the error written below:

18/08/29 00:24:53 INFO mapreduce.Job:  map 0% reduce 0%
18/08/29 00:24:59 INFO mapreduce.Job: Task Id : attempt_1535105716146_0226_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)


18/08/29 00:25:45 INFO mapreduce.Job: Task Id : attempt_1535105716146_0226_r_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
        at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:454)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

18/08/29 00:25:52 INFO mapreduce.Job:  map 100% reduce 100%
18/08/29 00:25:53 INFO mapreduce.Job: Job job_1535105716146_0226 failed with state FAILED due to: Task failed task_1535105716146_0226_r_000000
Job failed as tasks failed. failedMaps:0 failedReduces:1 killedMaps:0 killedReduces: 0


18/08/29 00:25:53 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!

I have also tried my map-reduce code with the help of python standalone command

cat student1.txt | python mapper.py | python reducer.py

The code works perfectly fine. But when I tried it through Hadoop Streaming then it repeatedly throws the above error. My input file size is 3KB. I have tried the running Hadoop-streaming command also after changing the python version but no luck! I have also added #!/usr/bin/python command on the top of script. The directory has nothing inside. I also tried different versions of command:

version 1:

hadoop jar /usr/hdp/3.0.0.0-1634/hadoop-mapreduce/hadoop-streaming-3.1.0.3.0.0.0-1634.jar -Dmapred.reduce.tasks=1 -file /home/mapper.py -mapper mapper.py -file /home/reducer.py -reducer reducer.py -input /data/studentMapReduce/student1.txt -output outputMapReduceFile.txt

version 2: python commands with single quotes as well as double quotes

hadoop jar /usr/hdp/3.0.0.0-1634/hadoop-mapreduce/hadoop-streaming-3.1.0.3.0.0.0-1634.jar -Dmapred.reduce.tasks=1 -file /home/mapper.py -mapper "python mapper.py" -file /home/reducer.py -reducer "python reducer.py" -input /data/studentMapReduce/student1.txt -output outputMapReduceFile.txt

Simple word-count program is running successfully on the environment, generates correct output also but when I added mysql.connector service in the python script then Hadoop-streaming reports this error. I have also studied the job logs but no such information found.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I checked the job error logs and placed the required python files which are not predefined libraries into the python directory. Then, enter the Hadoop streaming command with those python files:

hadoop jar /usr/hdp/3.0.0.0-1634/hadoop-mapreduce/hadoop-streaming-3.1.0.3.0.0.0-1634.jar -Dmapred.reduce.tasks=0 -file /home/mapper3.py -mapper mapper3.py -file /home/reducer3.py -reducer reducer3.py -file /home/ErrorHandle.py -file /home/ExceptionUtil.py -input /data/studentMapReduce/student1.txt -output outputMapReduceFile.txt

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...