Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
256 views
in Technique[技术] by (71.8m points)

c++ - Cannot get Boost Spirit grammar to use known keys for std::map<>

I seem to be experiencing some mental block with Boost Spirit I just cannot get by. I have a fairly simple grammar I need to handle, where I would like to put the values into a struct, that contains a std::map<> as one of it's members. The key names for the pairs are known up front, so that only those are allowed. There could be one to many keys in the map, in any order with each key name validated via qi.

The grammar looks something like this, as an example.

test .|*|<hostname> add|modify|save ( key [value] key [value] ... ) ;

//
test . add ( a1 ex00
             a2 ex01
             a3 "ex02,ex03,ex04" );

//
test * modify ( m1 ex10
                m2 ex11
                m3 "ex12,ex13,ex14"
                m4 "abc def ghi" );


//
test 10.0.0.1 clear ( c1
                      c2
                      c3 );

In this example the keys for “add” being a1, a2 and a3, likewise for “modify” m1, m2, m3 and m4 and each must contain a value. For “clear” the keys of the map c1, c2 and c3 may not contain a value. Also, let's say for this example you can have up to 10 keys (a1 ... a11, m1 ... m11 and c1 ... c11) any combination of them could be used, in any order, for their corresponding action. Meaning that you cannot use the known key cX for the "add" or mX for "clear"

The structure follows this simple pattern

//
struct test
{
    std::string host;
    std::string action;
    std::map<std::string,std::string> option;
}

So from the above examples, I would expect to have the struct contain ...

// add ...
test.host = .
test.action = add
test.option[0].first = a1
test.option[0].second = ex00
test.option[1].first = a2
test.option[1].second = ex01
test.option[2].first = a3
test.option[2].second = ex02,ex03,ex04

// modify ...
test.host = *
test.action = modify
test.option[0].first = m1
test.option[0].second = ex10
test.option[1].first = m2
test.option[1].second = ex11
test.option[2].first = m3
test.option[2].second = ex12,ex13,ex14
test.option[2].first = m3
test.option[2].second = abc def ghi

// clear ...
test.host = *
test.action = 10.0.0.1
test.option[0].first = c1
test.option[0].second = 
test.option[1].first = c2
test.option[1].second = 
test.option[2].first = c3
test.option[2].second = 

I can get each indivudal part working, standalone, but I cannot seem to them working together. For example I have the host and action working without the map<>.

I’ve adapted a previously posted example from Sehe (here) trying to get this to work (BTW: Sehe has some awesome examples, which I’ve been using as much as the documentation).

Here is an excerpt (obviously not working), but at least shows where I’m trying to go.

namespace ast {

    namespace qi = boost::spirit::qi;

    //
    using unused = qi::unused_type;

    //
    using string  = std::string;
    using strings = std::vector<string>;
    using list    = strings;
    using pair    = std::pair<string, string>;
    using map     = std::map<string, string>;

    //
    struct test
    {
        using preference = std::map<string,string>;

        string host;
        string action;
        preference option;
    };
}

//
BOOST_FUSION_ADAPT_STRUCT( ast::test,
                        ( std::string, host )
                        ( std::string, action ) )
                        ( ast::test::preference, option ) )

//
namespace grammar
{
    //
    template <typename It>
    struct parser
    {
        //
        struct skip : qi::grammar<It>
        {
            //
            skip() : skip::base_type( text )
            {
                using namespace qi;

                // handle all whitespace (" ", , ...)
                // along with comment lines/blocks
                //
                // comment blocks: /* ... */
                //                 // ...
                //                 -- ...
                //                 #  ...
                text = ascii::space
                    | ( "#"  >> *( char_ - eol )  >> ( eoi | eol ) ) // line comment
                    | ( "--" >> *( char_ - eol )  >> ( eoi | eol ) ) // ...
                    | ( "//" >> *( char_ - eol )  >> ( eoi | eol ) ) // ...
                    | ( "/*" >> *( char_ - "*/" ) >> "*/" );         // block comment

                //
                BOOST_SPIRIT_DEBUG_NODES( ( text ) )
            }

            //
            qi::rule<It> text;
        };
        //
        struct token
        {
            //
            token()
            {
                using namespace qi;

                // common
                string   = '"' >> *("" >> char_ | ~char_('"')) >> '"';
                identity = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
                real     = double_;
                integer  = int_;

                //
                value    = ( string | identity );

                // ip target
                any      = '*';
                local    = ( char_('.') | fqdn );
                fqdn     =  +char_("a-zA-Z0-9.\-" );   // consession

                ipv4     =  +as_string[ octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
                        >>             octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
                        >>             octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
                        >>             octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] ];

                //
                target   = ( any | local | fqdn | ipv4 );

                //
                pair     =  identity >> -( attr( ' ' ) >> value );
                map      =  pair >> *( attr( ' ' ) >> pair );
                list     =  *( value );

                //
                BOOST_SPIRIT_DEBUG_NODES( ( string )
                                        ( identity )
                                        ( value )
                                        ( real )
                                        ( integer )
                                        ( any )
                                        ( local )
                                        ( fqdn )
                                        ( ipv4 )
                                        ( target )
                                        ( pair )
                                        ( keyval )
                                        ( map )
                                        ( list ) )
            }

            //
            qi::rule<It, std::string()> string;
            qi::rule<It, std::string()> identity;
            qi::rule<It, std::string()> value;
            qi::rule<It, double()>      real;
            qi::rule<It, int()>         integer;
            qi::uint_parser<unsigned, 10, 1, 3> octet;

            qi::rule<It, std::string()> any;
            qi::rule<It, std::string()> local;
            qi::rule<It, std::string()> fqdn;
            qi::rule<It, std::string()> ipv4;
            qi::rule<It, std::string()> target;

            //
            qi::rule<It, ast::map()>  map;
            qi::rule<It, ast::pair()> pair;
            qi::rule<It, ast::pair()> keyval;
            qi::rule<It, ast::list()> list;
        };

    //
        struct test : token, qi::grammar<It, ast::test(), skip>
        {
            //
            test() : test::base_type( command_ )
            {
                using namespace qi;
                using namespace qr;

                auto kw = qr::distinct( copy( char_( "a-zA-Z0-9_" ) ) );

                // not sure how to enforce the "key" names!
                key_     = *( '(' >> *value >> ')' );
                // tried using token::map ... didn't work ...

                //
                add_     = ( ( "add"    >> attr( ' ' ) ) [ _val = "add" ] );
                modify_  = ( ( "modify" >> attr( ' ' ) ) [ _val = "modify" ] );
                clear_   = ( ( "clear"  >> attr( ' ' ) ) [ _val = "clear" ] );

                //
                action_  = ( add_ | modify_ | clear_ );


                /* *** can't get from A to B here ... not sure what to do *** */

                //
                command_ =  kw[ "test" ]
                        >> target
                        >> action_
                        >> ';';

                BOOST_SPIRIT_DEBUG_NODES( ( command_ )
                                        ( action_ )
                                        ( add_ )
                                        ( modify_ )
                                        ( clear_ ) )
            }

            //
            private:
                //
                using token::value;
                using token::target;
                using token::map;

                qi::rule<It, ast::test(), skip> command_;
                qi::rule<It, std::string(), skip> action_;

                //
                qi::rule<It, std::string(), skip> add_;
                qi::rule<It, std::string(), skip> modify_;
                qi::rule<It, std::string(), skip> clear_;
        };

    ...

    };
}

I hope this question isn't too ambiguous and if you need a working example of the problem, I can certainly provide that. Any and all help is greatly appreciated, so thank you in advance!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Notes:

  1. with this

            add_     = ( ( "add"    >> attr( ' ' ) ) [ _val = "add" ] );
            modify_  = ( ( "modify" >> attr( ' ' ) ) [ _val = "modify" ] );
            clear_   = ( ( "clear"  >> attr( ' ' ) ) [ _val = "clear" ] );
    

    did you mean to require a space? Or are you really just trying to force the struct action field to contain a trailing space (that's what will happen).

    If you meant the latter, I'd do that outside of the parser1.

    If you wanted the first, use the kw facility:

            add_    = kw["add"]    [ _val = "add"    ];
            modify_ = kw["modify"] [ _val = "modify" ];
            clear_  = kw["clear"]  [ _val = "clear"  ];
    

    In fact, you can simplify that (again, 1):

            add_    = raw[ kw["add"] ];
            modify_ = raw[ kw["modify"] ];
            clear_  = raw[ kw["clear"] ];
    

    Which also means that you can simplify to

            action_  = raw[ kw[lit("add")|"modify"|"clear"] ];
    

    However, getting a bit close to your question, you could also use a symbol parser:

            symbols<char> action_sym;
            action_sym += "add", "modify", "clear";
            //
            action_  = raw[ kw[action_sym] ];
    

    Caveat: the symbols needs to be a member so its lifetime extends beyond the constructor.

  2. If you meant to capture the input representation of ipv4 addresses with

            ipv4     =  +as_string[ octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
                >>             octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
                >>             octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
                >>             octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] ];
    

    Side note I'm assuming +as_string is a simple mistake and you meant as_string instead.

    Simplify:

        qi::uint_parser<uint8_t, 10, 1, 3> octet;
    

    This obviates the range checks (see 1 again):

        ipv4 = as_string[ octet >> '.' >> octet >> '.' >> octet >> '.' >> octet ];
    

    However, this would build a 4-char binary string representation of the address. If you wanted that, fine. I doubt it (because you'd have written std::array<uint8_t, 4> or uint64_t, right?). So if you wanted the string, again use raw[]:

        ipv4     = raw[ octet >> '.' >> octet >> '.' >> octet >> '.' >> octet ];
    
  3. Same issue as with number 1.:

        pair     =  identity >> -( attr(' ') >> value );
    

    This time, the problem betrays that the productions should not be in token; Conceptually token-izing precedes parsing and hence I'd keep the tokens skipper-less. kw doesn't really do a lot of good in that context. Instead, I'd move pair, map and list (unused?) into the parser:

        pair     =  kw[identity] >> -value;
        map      =  +pair;
        list     =  *value;
    

Some examples

There's a very recent example I made about using symbols to parse (here), but this answer comes a lot closer to your question:

It goes far beyond the scope of your parser because it does all kinds of actions in the grammar, but what it does show is to have generic "lookup-ish" rules that can be parameterized with a particular "symbol set": see the Identifier Lookup section of the answer:

Identifier Lookup

We store "symbol tables" in Domain members _variables and _functions:

      using Domain = qi::symbols<char>;           Domain _variables, _functions;

Then we declare some rules that can do lookups on either of them:

      // domain identifier lookups
      qi::_r1_type _domain;
      qi::rule<It, Ast::Identifier(Domain const&)> maybe_known, known,

unknown;

The corresponding declarations will be shown shortly.

Variables are pretty simple:

      variable   = maybe_known(phx::ref(_variables));

Calls are trickier. If a name is unknown we don't want to assume it implies a function unless it's followed by a '(' character. However, if an identifier is a known function name, we want even to imply the ( (this gives the UX the appearance of autocompletion where when the user types sqrt, it suggests the next character to be ( magically).

      // The heuristics:          // - an unknown identifier followed by (
      // - an unclosed argument list implies )            call %= (

known(phx::ref(_functions)) // known -> imply the parens | &(identifier >> '(') >> unknown(phx::ref(_functions)) ) >> implied('(') >> -(expression % ',') >> implied(')');

It all builds on known, unknown and maybe_known:

          ///////////////////////////////
          // identifier loopkup, suggesting
          {
              maybe_known = known(_domain) | unknown(_domain);

              // distinct to avoid partially-matching identifiers
              using boost::spirit::repository::qi::distinct;
              auto kw     = distinct(copy(alnum | '_'));

              known       = raw[kw[lazy(_domain)]];
              unknown     = raw[identifier[_val=_1]] [suggest_for(_1, _domain)];
          }

I think you can use the same approach constructively here. One additional gimmick could be to validate that properties supplied are, in fact, unique.

Demo Work

Combining all the hints above makes it compile and "parse" the test commands:

Live On Coliru

#include <string>
#include <map>
#include <vector>

namespace ast {

    //
    using string  = std::string;
    using strings = std::vector<string>;
    using list    = strings;
    using pair    = std::pair<string, string>;
    using map     = std::map<string, string>;

    //
    struct command {
        string host;
        string action;
        map option;
    };
}

#include <boost/fusion/adapted.hpp>

BOOST_FUSION_ADAPT_STRUCT(ast::command, host, action, option)

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/repository/include/qi_distinct.hpp>

namespace grammar
{
    namespace qi = boost::spirit::qi;
    namespace qr = boost::spirit::repository::qi;

    template <typename It>
    struct parser
    {
        struct skip : qi::grammar<It> {

            skip() : skip::base_type(text) {
                using namespace qi;

                // handle all whitespace along with line/block comments
                text = ascii::space
                    | (lit("#")|"--"|"//") >> *(char_ - eol)  >> (eoi | eol) // line comment
                    | "/*" >> *(char_ - "*/") >> "*/";         // block comment

                //
                BOOST_SPIRIT_DEBUG_NODES((text))
            }

          private:
            qi::rule<It> text;
        };
        //
        struct token {
            //
            token() {
                using namespace qi;

                // common
                string   = '"' >> *("" >> char_ | ~char_('"')) >> '"';
                identity = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
                value    = string | identity;

                // ip target
                any      = '*';
                local    = '.' | fqdn;
                fqdn     = +char_("a-zA-Z0-9.\-"); // concession

                ipv4     = raw [ octet >> '.' >> octet >> '.' >> octet >> '.' >> octet ];
                //
                target   = any | local | fqdn | ipv4;

                //
                BOOST_SPIRIT_DEBUG_NODES(
                        (string) (identity) (value)
                        (any) (local) (fqdn) (ipv4) (target)
                   )
            }

          protected:
            //
            qi::rule<It, std::string()> string;
            qi::rule<It, std::string()> identity;
            qi::rule<It, std::string()> value;
            qi::uint_parser<uint8_t, 10, 1, 3> octet;

            qi::rule<It, std::string()> any;
            qi::rule<It, std::string()> local;
            qi::rule<It, std::string()> fqdn;
            qi::rule<It, std::string()> ipv4;
            qi::rule<It, std::string()> target;
        };

        //
        struct test : token, qi::grammar<It, ast::command(), skip> {
            //
            test() : test::base_type(command_)
            {
                using namespace qi;

                auto kw = qr::distinct( copy( char_( "a-zA-Z0-9_" ) ) );

                //
                action_sym += "add", "modify", "clear";
                action_  = raw[ kw[action_sym] ];

                //
                command_ =  kw["test"]
                        >> target
                        >> action_
                        >> '(' >> map >> ')'
                        >> ';';

                //
                pair     = kw[identity] >> -value;
                map      = +pair;
                list     = *value;

                BOOST_SPIRIT_DEBUG_NODES(
                        (command_) (action_)
                        (pair) (map) (list)
                    )
            }

          private:
            using token::target;
            using token::identity;
            using token::value;
            qi::symbols<char> action_sym;

            //
            qi::rule<It, ast::command(), skip> command_;
            qi::rule<It, std::string(), skip> action_;

            //
            qi::rule<It, ast::map(), skip>  map;
            qi::rule<It, ast::pair(), skip> pair;
            qi::rule<It, ast::list(), skip> list;
        };

    };
}

#include <fstream>

int main() {
    using It = boost::s

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...