Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
511 views
in Technique[技术] by (71.8m points)

tcp - Why isn't Hadoop implemented using MPI?

Correct me if I'm wrong, but my understanding is that Hadoop does not use MPI for communication between different nodes.

What are the technical reasons for this?

I could hazard a few guesses, but I do not know enough of how MPI is implemented "under the hood" to know whether or not I'm right.

Come to think of it, I'm not entirely familiar with Hadoop's internals either. I understand the framework at a conceptual level (map/combine/shuffle/reduce and how that works at a high level) but I don't know the nitty gritty implementation details. I've always assumed Hadoop was transmitting serialized data structures (perhaps GPBs) over a TCP connection, eg during the shuffle phase. Let me know if that's not true.

question from:https://stackoverflow.com/questions/4590674/why-isnt-hadoop-implemented-using-mpi

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

One of the big features of Hadoop/map-reduce is the fault tolerance. Fault tolerance is not supported in most (any?) current MPI implementations. It is being thought about for future versions of OpenMPI.

Sandia labs has a version of map-reduce which uses MPI, but it lacks fault tolerance.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...