Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
492 views
in Technique[技术] by (71.8m points)

unix - Reading a large file using C (greater than 4GB) using read function, causing problems

I have to write C code for reading large files. The code is below:

int read_from_file_open(char *filename,long size)
{
    long read1=0;
    int result=1;
    int fd;
    int check=0;
    long *buffer=(long*) malloc(size * sizeof(int));
    fd = open(filename, O_RDONLY|O_LARGEFILE);
    if (fd == -1)
    {
       printf("
File Open Unsuccessful
");
       exit (0);;
    }
    long chunk=0;
    lseek(fd,0,SEEK_SET);
    printf("
Current Position%d
",lseek(fd,size,SEEK_SET));
    while ( chunk < size )
    {
        printf ("the size of chunk read is  %d
",chunk);
        if ( read(fd,buffer,1048576) == -1 )
        {
            result=0;
        }
        if (result == 0)
        {
            printf("
Read Unsuccessful
");
            close(fd);
            return(result);
        }

        chunk=chunk+1048576;
        lseek(fd,chunk,SEEK_SET);
        free(buffer);
    }

    printf("
Read Successful
");

    close(fd);
    return(result);
}

The issue I am facing here is that as long as the argument passed (size parameter) is less than 264000000 bytes, it seems to be able to read. I am getting the increasing sizes of the chunk variable with each cycle.

When I pass 264000000 bytes or more, the read fails, i.e.: according to the check used read returns -1.

Can anyone point me to why this is happening? I am compiling using cc in normal mode, not using DD64.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In the first place, why do you need lseek() in your cycle? read() will advance the cursor in the file by the number of bytes read.

And, to the topic: long, and, respectively, chunk, have a maximum value of 2147483647, any number greater than that will actually become negative.

You want to use off_t to declare chunk: off_t chunk, and size as size_t. That's the main reason why lseek() fails.

And, then again, as other people have noticed, you do not want to free() your buffer inside the cycle.

Note also that you will overwrite the data you have already read. Additionally, read() will not necessarily read as much as you have asked it to, so it is better to advance chunk by the amount of the bytes actually read, rather than amount of bytes you want to read.

Taking everything in regards, the correct code should probably look something like this:

// Edited: note comments after the code
#ifndef O_LARGEFILE
#define O_LARGEFILE 0
#endif

int read_from_file_open(char *filename,size_t size)
{
int fd;
long *buffer=(long*) malloc(size * sizeof(long));
fd = open(filename, O_RDONLY|O_LARGEFILE);
   if (fd == -1)
    {
       printf("
File Open Unsuccessful
");
       exit (0);;
    }
off_t chunk=0;
lseek(fd,0,SEEK_SET);
printf("
Current Position%d
",lseek(fd,size,SEEK_SET));
while ( chunk < size )
  {
   printf ("the size of chunk read is  %d
",chunk);
   size_t readnow;
   readnow=read(fd,((char *)buffer)+chunk,1048576);
   if (readnow < 0 )
     {
        printf("
Read Unsuccessful
");
        free (buffer);
        close (fd);
        return 0;
     }

   chunk=chunk+readnow;
  }

printf("
Read Successful
");

free(buffer);
close(fd);
return 1;

}

I also took the liberty of removing result variable and all related logic since, I believe, it can be simplified.

Edit: I have noted that some systems (most notably, BSD) do not have O_LARGEFILE, since it is not needed there. So, I have added an #ifdef in the beginning, which would make the code more portable.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...