Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
382 views
in Technique[技术] by (71.8m points)

c - Properly using sscanf

I am supposed to get an input line that can be in of any of the following formats:

  • There must be space between word 1 and word 2.
  • There must be a comma between word 2 and word 3.
  • Spaces are not a must between word 2 and word 3 — but any number of spaces is possible.

How can I separate 1, 2 and 3 word cases and put the data into the correct variables?

word1
word1 word2 
word1 word2 , word3
word1 word2,word3

I thought about something like:

sscanf("string", "%s %s,%s", word1, word2, word3);

but it doesn't seem to work.

I use strict C89.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
int n = sscanf("string", "%s %[^, ]%*[, ]%s", word1, word2, word3);

The return value in n tells you how many assignments were made successfully. The %[^, ] is a negated character-class match that finds a word not including either commas or blanks (add tabs if you like). The %*[, ] is a match that finds a comma or space but suppresses the assignment.

I'm not sure I'd use this in practice, but it should work. It is, however, untested.


Maybe a tighter specification is:

int n = sscanf("string", "%s %[^, ]%*[,]%s", word1, word2, word3);

The difference is that the non-assigning character class only accepts a comma. sscanf() stops at any space (or EOS, end of string) after word2, and skips spaces before assigning to word3. The previous edition allowed a space between the second and third words in lieu of a comma, which the question does not strictly allow.

As pmg suggests in a comment, the assigning conversion specifications should be given a length to prevent buffer overflow. Note that the length does not include the null terminator, so the value in the format string must be one less than the size of the arrays in bytes. Also note that whereas printf() allows you to specify sizes dynamically with *, sscanf() et al use * to suppress assignment. That means you have to create the string specifically for the task at hand:

char word1[20], word2[32], word3[64];
int n = sscanf("string", "%19s %31[^, ]%*[,]%63s", word1, word2, word3);

(Kernighan & Pike suggest formatting the format string dynamically in their (excellent) book 'The Practice of Programming' or Amazon The Practice of Programming 1999.)


Just found a problem: given "word1 word2 ,word3", it doesn't read word3. Is there a cure?

Yes, there's a cure, and it is actually trivial, too. Add a space in the format string before the non-assigning, comma-matching conversion specification. Thus:

#include <stdio.h>

static void tester(const char *data)
{
    char word1[20], word2[32], word3[64];
    int n = sscanf(data, "%19s %31[^, ] %*[,]%63s", word1, word2, word3);
    printf("Test data: <<%s>>
", data);
    printf("n = %d; w1 = <<%s>>, w2 = <<%s>>, w3 = <<%s>>
", n, word1, word2, word3);
}

int main(void)
{
    const char *data[] =
    {
        "word1 word2 , word3",
        "word1 word2 ,word3",
        "word1 word2, word3",
        "word1 word2,word3",
        "word1 word2       ,       word3",
    };
    enum { DATA_SIZE = sizeof(data)/sizeof(data[0]) };
    size_t i;
    for (i = 0; i < DATA_SIZE; i++)
        tester(data[i]);
    return(0);
}

Example output:

Test data: <<word1 word2 , word3>>
n = 3; w1 = <<word1>>, w2 = <<word2>>, w3 = <<word3>>
Test data: <<word1 word2 ,word3>>
n = 3; w1 = <<word1>>, w2 = <<word2>>, w3 = <<word3>>
Test data: <<word1 word2, word3>>
n = 3; w1 = <<word1>>, w2 = <<word2>>, w3 = <<word3>>
Test data: <<word1 word2,word3>>
n = 3; w1 = <<word1>>, w2 = <<word2>>, w3 = <<word3>>
Test data: <<word1 word2       ,       word3>>
n = 3; w1 = <<word1>>, w2 = <<word2>>, w3 = <<word3>>

Once the 'non-assigning character class' only accepts a comma, you can abbreviate that to a literal comma in the format string:

int n = sscanf(data, "%19s %31[^, ] , %63s", word1, word2, word3);

Plugging that into the test harness produces the same result as before. Note that all code benefits from review; it can often (essentially always) be improved even after it is working.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...