Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
949 views
in Technique[技术] by (71.8m points)

mapreduce - hadoop map reduce secondary sorting

Can any one explain me how secondary sorting works in hadoop ?
Why must one use GroupingComparator and how does it work in hadoop ?

I was going through the link given below and got doubt on how groupcomapator works.
Can any one explain me how grouping comparator works?

http://www.bigdataspeak.com/2013/02/hadoop-how-to-do-secondary-sort-on_25.html

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I find it easy to understand certain concepts with help of diagrams and this is certainly one of them.

Lets assume that our secondary sorting is on a composite key made out of Last Name and First Name.

Composite Key

With the composite key out of the way, now lets look at the secondary sorting mechanism

Secondary Sorting Steps

The partitioner and the group comparator use only natural key, the partitioner uses it to channel all records with the same natural key to a single reducer. This partitioning happens in the Map Phase, data from various Map tasks are received by reducers where they are grouped and then sent to the reduce method. This grouping is where the group comparator comes into picture, if we would not have specified a custom group comparator then Hadoop would have used the default implementation which would have considered the entire composite key, which would have lead to incorrect results.

Overview of MR steps

enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...