Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
730 views
in Technique[技术] by (71.8m points)

apache spark - Unable to understand error "SparkListenerBus has already stopped! Dropping event ..."

The issue

I'd like to know if anyone has a magic method to avoid such messages in Spark logs:

2015-08-30 19:30:44 ERROR LiveListenerBus:75 - SparkListenerBus has already
stopped! Dropping event SparkListenerExecutorMetricsUpdate(41,WrappedArray())

After further investigations, I understand that LiveListenerBus extends AsynchronousListenerBus. And thus, at some point, .stop() method is called. Then, messages that might be sent/received will be dropped and remain unprocessed. Basically, some SparkListenerExecutorMetricsUpdate messages are unfortunately not received yet, and once they are, they become dropped to nowhere.

This doesn't look critical since SparkListenerExecutorMetricsUpdate just correspond to Periodic updates from executors.

What is embarrassing is that I absolutely don't understand why this happens and nothings refers to this issue. Note that this is totally non-deterministic and I can't reproduce this, probably due to the asynchronous nature and my lack of understand on how/when stop() is supposed to be called.

About the running code

A tight sample:

val sc = new SparkContext(sparkConf)
val metricsMap = Metrics.values.toSeq.map(
    v => v -> sc.accumulator(0, v.toString)
).toMap
val outFiles = sc.textFile(outPaths)

And there's no other reference to sc or a SparkContent instance.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This ticket might be related. https://issues.apache.org/jira/browse/SPARK-12009

The message seems to indicate yarn allocation failure after sparkcontext stop.


Sorry for unclear comment.

The main reason seems that there is some interval between AM's shutdown event and executors stop all.
So, AM tries to reallocate after executors stop.

As Saisai said below,

A interesting thing is that AM is shutting down at time 2015-11-26,03:05:16, but YarnAllocator still request 13 executors after 11 seconds. Looks like AM is not exited so fast, that's why YarnAllocator is still requesting new containers. Normally if AM is exited as fast as it receive disconnected message, there will be not time for container requesting for YarnAllocator.

I have come across similar logs near finishing spark context sometimes.
In my case, this ticket seems to be answer.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...