Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

python - Regex to match key in YAML

I have a yaml which looks like this..! User can define N number of xyz_flovor_id where _flovor_id key will be common. Aim is to grab *_flavor_id key and extract value out of it.

  server:
    tenant: "admin"
    availability_zone: "nova"
    cpu_overcommit_ratio: 1:1
    memory_overcommit_ratio: 1:1
    xyz_flovor_id: 1
    abc_flavor_id: 2

I am able to figure the regex to match the _flovor_id. however while trying to use this in code it's throwing Error. here is my code.

def get_flavor_keys(params):
    pattern = re.compile(r'[^*]flavor_id')
    for key, value in params.iteritems():
        print value
        if key == 'server':
            if pattern.match(value):
                print 'test'

print value is dumping entire YAML file (expected). Immediate traceback after that.

Traceback (most recent call last):
  File "resource_meter.py", line 150, in <module>
    get_flavor_keys(items)
  File "resource_meter.py", line 15, in get_flavor_keys
    if pattern.match(value):
TypeError: expected string or buffer
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You get that error, because the value for the key server is not a string, but a dict (or a subclass of dict). That is what the YAML mapping in your input, which includes the key abc_flavor_id, is loaded as.

Apart from that it is always a bad idea to use regular expressions to parse YAML (or any other structured text format like HTML, XML, CVS), as it is difficult, if not impossible, to capture all nuance of the grammar. If it wasn't you would not need a parser.

E.g a minor change to the file, just adding a comment on which value needs updating for some user editing the file, breaks the simplistic regular expression approaches:

server:
  tenant: "admin"
  availability_zone: "nova"
  cpu_overcommit_ratio: 1:1
  memory_overcommit_ratio: 1:1
  xyz_flovor_id: 1
  abc_flavor_id:  # extract the value for this key
    2

This YAML documenta above, is semantically identical to yours, but will no longer work with the currently posted other answers.

If some YAML load/save operation transforms your input into (again semantically equivalent):

server: {abc_flavor_id: 2, availability_zone: nova,
  cpu_overcommit_ratio: 61, memory_overcommit_ratio: 61,
  tenant: admin, xyz_flovor_id: 1} then tweaking a dumb regular expression will not begin to suffice (this is not a construed example, this is the default way to dump your data structure in PyYAML and in ruamel.yaml using 'safe'-mode).

What you need to do, is regular expression match the keys of the value associated with server, not the whole document:

import re
import sys
from ruamel.yaml import YAML

yaml_str = """
server:
  tenant: "admin"
  availability_zone: "nova"
  cpu_overcommit_ratio: 1:1
  memory_overcommit_ratio: 1:1
  xyz_flovor_id: 1
  abc_flavor_id:  # extract the value for this key
    2
"""

def get_flavor_keys(params):
    pattern = re.compile(r'(?P<key>.*)_flavor_id')
    ret_val = {}
    for key in params['server']:
        m = pattern.match(key)
        if m is not None:
            ret_val[m.group('key')] = params['server'][key]
            print('test', m.group('key'))
    return ret_val

yaml = YAML(typ='safe')
data = yaml.load(yaml_str)
keys = get_flavor_keys(data)
print(keys)

this gives you:

{'abc': 2}

( the xyz_flovor_id of course doesn't match, but maybe that is a typo in your post).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...