Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
693 views
in Technique[技术] by (71.8m points)

c++ - Reading formatted input from an istream

The problem below has been simplified from real requirements.

Consider the following program:

#include <iostream>
#include <iterator>
#include <string>
#include <set>
#include <algorithm>

using namespace std;

typedef string T; // to simplify, always consider T as string

template<typename input_iterator>
void do_something(const input_iterator& first, const input_iterator& last) {
    const ostream_iterator<T> os(cout, "
");
    const set<T> words(first, last);
    copy(words.begin(), words.end(), os);
}

int main(int argc, char** argv) {
    const istream_iterator<T> is(cin), eof;
    do_something(is, eof);
    return 0;
}

The program extracts all the words from an istream (cin) and does something with them. Each word is seperated by a white space by default. The logic behind the formatted extraction is inside the istream_iterator.

What I need to do now is to pass to do_something() two iterators so that the extracted words will be separated by a punctuation character instead of a white space (white spaces will be considered as "normal" characters). How would you do that in a "clean C++ way" (that is, with the minimum effort)?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Although it isn't a priori obvious there is a relatively simple way to change what a stream considers to be whitespace. The way to do it is to imbue() the stream with a std::locale object whose std::ctype<char> facet is replaced to consider the desired characters as whitespace. imbue(), locale, ctype - huh?!? OK, well, these aren't necessarily the things you use day to day so here is a quick example which set's up std::cin to use comma and newline characters as spaced:

#include <locale>
template <char S0, char S1>
struct commactype_base {
    commactype_base(): table_() {
        this->table_[static_cast<unsigned char>(S0)] = std::ctype_base::space;
        this->table_[static_cast<unsigned char>(S1)] = std::ctype_base::space;
    }
    std::ctype<char>::mask table_[std::ctype<char>::table_size];
};
template <char S0, char S1 = S0>
struct ctype:
    commactype_base<S0, S1>,
    std::ctype<char>
{
    ctype(): std::ctype<char>(this->table_, false) {}
};

Actually, this particular implementation of std::ctype<char> can actually be used to use one or two arbitrary chars as spaces (a proper C++2011 version would probably allow an arbitrary number of arguments; also, the don't really have to be template argumentss). Anyway, with this in place, just drop the following line at the beginning of your main() function and you are all set:

std::cin.imbue(std::locale(std::locale(), new ::ctype<',', '
'>));

Note that this really only considers , and as space characters. This also means that no other characters are skipped as whitespace. ... and, of course, a sequence of multiple comma characters is considered to be just one separator rather than possibly creating a bunch of empty strings. Also note that the above std::ctype<char> facet removes all other character classification. If you want to parse other objects than just strings you might want to retain the other character classification and only change that for spaces. Here is a way this could be done:

template <char S0, char S1>
struct commactype_base {
    commactype_base(): table_() {
        std::transform(std::ctype<char>::classic_table(),
                       std::ctype<char>::classic_table() + std::ctype<char>::table_size,
                       this->table_, 
                       [](std::ctype_base::mask m) -> std::ctype_base::mask {
                           return m & ~(std::ctype_base::space);
                       });
        this->table_[static_cast<unsigned char>(S0)] |= std::ctype_base::space;
        this->table_[static_cast<unsigned char>(S1)] |= std::ctype_base::space;
    }
    std::ctype<char>::mask table_[std::ctype<char>::table_size];
};

Sadly, this crashes with the version of gcc I have on my system (apparently the std::ctype<char>::classic_table() yields a null pointer. Compiling this with a current version of clang doesn't work because clang doesn't support lambda. With the two caveats the above code should be correct, though...


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...