Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
556 views
in Technique[技术] by (71.8m points)

hadoop - Behavior of the parameter "mapred.min.split.size" in HDFS

The parameter "mapred.min.split.size" changes the size of the block in which the file was written earlier? Assuming a situation where I, when starting my JOB, pass the parameter "mapred.min.split.size" with a value of 134217728 (128MB). What is correct to say about what happens?

1 - Each MAP process the equivalent of 2 HDFS blocks (assuming each block 64MB);

2 - There will be a new division of my input file (previously included HDFS) to occupy blocks in HDFS 128M;

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The split size is calculated by the formula:-

max(mapred.min.split.size, min(mapred.max.split.size, dfs.block.size))

In your case it will be:-

split size=max(128,min(Long.MAX_VALUE(default),64))

So above inference:-

  1. each map will process 2 hdfs blocks(assuming each block 64MB): True

  2. There will be a new division of my input file (previously included HDFS) to occupy blocks in HDFS 128M: False

but making the minimum split size greater than the block size increases the split size, but at the cost of locality.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...