Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
374 views
in Technique[技术] by (71.8m points)

repository - Is git worth for managing many files bigger than 500MB

I would put under version control a big amount of data, i.e. a directory structure (with depth<=5) with hundreds files with size about 500Mb).

The things I need is a system that help me: - to detect if an files has been changed - to detect if files were added/removed - to clone the entire repository in another location - to store a "checkpoint" and restore it later

I don't need sha1 for change detect, something faster is acceptable.

Is git worth for this? There is a better alternative?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As I mentioned in "What are the Git limits", Git is not made to manage big files (or big binary files for that matter).

Git would be needed if you needed to:

  • know what has actually changed within a file. But for the directory-level, the other answers are better (Unison or rsynch)
  • keep a close proximity (i.e. "same referential") between your development data, and those large resources. Having only one referential would help, but then you would need a fork of Git, like git-bigfiles to efficiently manage them.

Note: still using Git, you can try this approach

Unfortunately, rsync isn't really perfect for our purposes either.

  • First of all, it isn't really a version control system. If you want to store multiple revisions of the file, you have to make multiple copies, which is wasteful, or xdelta them, which is tedious (and potentially slow to reassemble, and makes it hard to prune intermediate versions), or check them into git, which will still melt down because your files are too big.
  • Plus rsync really can't handle file renames properly - at all.

Okay, what about another idea: let's split the file into chunks, and check each of those blocks into git separately.
Then git's delta compression won't have too much to chew on at a time, and we only have to send modified blocks...

Based on gzip --rsyncable, with a POC available in this Git repo.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...