Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
415 views
in Technique[技术] by (71.8m points)

c++ - What's the real reason to not use the EOF bit as our stream extraction condition?

Inspired by my previous question

A common mistake for new C++ programmers is to read from a file with something along the lines of:

std::ifstream file("foo.txt");
std::string line;
while (!file.eof()) {
  file >> line;
  // Do something with line
}

They will often report that the last line of the file was read twice. The common explanation for this problem (one that I have given before) goes something like:

The extraction will only set the EOF bit on the stream if you attempt to extract the end-of-file, not if your extraction just stops at the end-of-file. file.eof() will only tell you if the previous read hit the end-of-file and not if the next one will. After the last line has been extracted, the EOF bit is still not set and the iteration occurs one more time. However, on this last iteration, the extraction fails and line still has the same content as before, i.e. the last line is duplicated.

However, the first sentence of this explanation is wrong and so the explanation of what the code is doing is also wrong.

The definition of formatted input functions (which operator>>(std::string&) is) defines extraction as using rdbuf()->sbumpc() or rdbuf()->sgetc() to obtain input characters. It states that if either of these functions returns traits::eof(), then the EOF bit is set:

If rdbuf()->sbumpc() or rdbuf()->sgetc() returns traits::eof(), then the input function, except as explicitly noted otherwise, completes its actions and does setstate(eofbit), which may throw ios_base::failure (27.5.5.4), before returning.

We can see this with the simple example that uses a std::stringstream rather than a file (they are both input streams and behave the same way when extracting):

int main(int argc, const char* argv[])
{
  std::stringstream ss("hello");
  std::string result;
  ss >> result;
  std::cout << ss.eof() << std::endl; // Outputs 1
  return 0;
}

It's clear here that the single extraction obtains hello from the string and sets the EOF bit to 1.

So what's wrong with the explanation? What's different about files that causes !file.eof() to cause the last line to be duplicated? What's the real reason we shouldn't use !file.eof() as our extraction condition?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Yes, extracting from an input stream will set the EOF bit if the extraction stops at the end-of-file, as demonstrated by the std::stringstream example. If it were this simple, the loop with !file.eof() as its condition would work just fine on a file like:

hello
world

The second extraction would eat world, stopping at the end-of-file, and consequently setting the EOF bit. The next iteration wouldn't occur.

However, many text editors have a dirty secret. They're lying to you when you save a text file even as simple as that. What they don't tell you is that there's a hidden at the end of the file. Every line in the file ends with a , including the last one. So the file actually contains:

hello
world

This is what causes the last line to be duplicated when using !file.eof() as the condition. Now that we know this, we can see that the second extraction will eat world stopping at and not setting the EOF bit (because we haven't gotten there yet). The loop will iterate for a third time but the next extraction will fail because it doesn't find a string to extract, only whitespace. The string is left with its previous value still hanging around and so we get the duplicated line.

You don't experience this with std::stringstream because what you stick in the stream is exactly what you get. There's no at the end of std::stringstream ss("hello"), unlike in the file. If you were to do std::stringstream ss("hello "), you'd experience the same duplicate line issue.

So of course, we can see that we should never use !file.eof() as the condition when extracting from a text file - but what's the real issue here? Why should we really never use that as our condition, regardless of whether we're extracting from a file or not?

The real problem is that eof() gives us no idea whether the next read will fail or not. In the above case, we saw that even though eof() was 0, the next extraction failed because there was no string to extract. The same situation would happen if we didn't associate a file stream with any file or if the stream was empty. The EOF bit wouldn't be set but there's nothing to read. We can't just blindly go ahead and extract from the file just because eof() isn't set.

Using while (std::getline(...)) and related conditions works perfectly because just before the extraction starts, the formatted input function checks if any of the bad, fail, or EOF bits are set. If any of them are, it immediately ends, setting the fail bit in the process. It will also fail if it finds the end-of-file before it finds what it wants to extract, setting both the eof and fail bits.


Note: You can save a file without the extra in vim if you do :set noeol and :set binary before saving.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...