Seems this would not be a deterministic thing, or is there a way to do this reliably?
If you're using gzip, you can do something like this:
# diff <(zcat file1.gz) <(zcat file2.gz)
1.4m articles
1.4m replys
5 comments
56.8k users