Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
306 views
in Technique[技术] by (71.8m points)

c - Not checking close()'s return value: how serious, really?

Linux's "man close" warns (SVr4, 4.3BSD, POSIX.1-2001):

Not checking the return value of close() is a common but nevertheless serious programming error. It is quite possible that errors on a previous write(2) operation are first reported at the final close(). Not checking the return value when closing the file may lead to silent loss of data. This can especially be observed with NFS and with disk quota.

I can believe that this error is common (at least in applications; I'm no kernel hacker). But how serious is it, today or at any point in the past three decades? In particular:

Is there a simple, reproducible example of such silent loss of data? Even a contrived one like sending SIGKILL during close()?

If such an example exists, can the data loss be handled more gracefully than just

printf("Sorry, dude, you lost some data. "); ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

[H]ow serious is it, today or at any point in the past three decades?

Typical applications process data. They consume some input, and produce a result. So, there are two general cases where close() may return an error: when closing an input (read-only?) file, and when closing a file that was just generated or modified.

The known situations where close() returns an error are specific to writing/flushing data to permanent storage. In particular, it is common for an operating system to cache data locally, before actually writing to the permanent storage (at close(), fsync(), or fdatasync()); this is very common with remote filesystems, and is the reason why NFS is mentioned on the man page.

I have never encountered an error while closing a read-only input file. All the cases I can think of where it might happen in real life using any of the common filesystems are ones where there is a catastrophic failure, something like kernel data structure corruption. If that happens, I think the close() error cannot be the only sign that something is terribly wrong.

When writing to a file on a remote filesystem, close()-time errors are woefully common, if the local network is prone to glitches or just drops a lot of packets. As an end user, I want my applications to tell me if there was an error when writing to a file. Usually the connection to the remote filesystem is broken altogether, and the fact that writing to a new file failed, is the first indicator to the user.

If you don't check the close() return value, the application will lie to the user. It will indicate (by a lack of an error message if not otherwise), that the file was correctly written, when in fact it wasn't, and the application was told so; the application just ignored the indication. If the user is like me, they'll be very unhappy with the application.

The question is, how important is user data to you? Most current application programmers don't care at all. Basile Starynkevitch (in a comment to the original question) is absolutely right; checking for close() errors is not something most programmers bother to do.

I believe that attitude is reprehensible; cavalier disregard for user data.

It is natural, though, because the users have no clear indication as to which application corrupted their data. In my experience the end users end up blaming the OS, hardware, open source or free software in general, or the local IT support; so, there is no pressure, social or otherwise, for a programmer to care. Because only programmers are aware of details such as this, and most programmers don't care, there is no pressure to change the status quo.

(I know saying the above will make a lot of programmers hate my guts, but at least I'm being honest. The typical response I get for pointing out things such as this is that this is such a rare occurrence, that it would be a waste of resources to check for this. That is likely true.. but I for one am willing to spend more CPU cycles and paying a few percent more to the programmers, if it means my machine actually works more predictably, and tells me if it lost the plot, rather than silently corrupts my data.)

Is there a simple, reproducible example of such silent loss of data?

I know of three approaches:

  1. Use an USB stick, and yank it out after the final write() but before the close(). Unfortunately, most USB sticks have hardware that is not designed to survive that, so you may end up bricking the USB stick. Depending on the filesystem, your kernel may also panic, because most filesystems are written with the assumption that this will never ever happen.

  2. Set up an NFS server, and simulate intermittent packet drops by using iptables to drop all packets between the NFS server and the client. The exact scenario depends on the server and client, mount options, and versions used. A test bed should be relatively easy to set up using two or three virtual machines, however.

  3. Use a custom filesystem to simulate a write error at close() time. Current kernels do not let you force-unmount tmpfs or loopback mounts, only NFS mounts, otherwise this would be easy to simulate by force-unmounting the filesystem after the final write but prior the close(). (Current kernels simply deny the umount if there are open files on that filesystem.) For application testing, creating a variant of tmpfs that returns an error at close() if the file mode indicates it is desirable (for example, other-writable but not other-readable or other-executable, ie. -??????-w-) would be quite easy, and safe. It would not actually corrupt the data, but it would make it easy to check how the application behaves if the kernel reports (the risk of) data corruption at close time.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...