Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
180 views
in Technique[技术] by (71.8m points)

python - Regex to match special list items

I have weird list of items and lists like this with | as a delimiters and [[ ]] as a parenthesis. It looks like this:

| item1 | item2 | item3 | Ulist1[[ | item4 | item5 | Ulist2[[ | item6 | item7 ]] | item8 ]] | item9 | list3[[ | item10 | item11 | item12 ]] | item13 | item14

I want to match items in lists called Ulist* (items 4-8) using RegEx and replace them with Uitem*. The result should look like this:

| item1 | item2 | item3 | Ulist1[[ | Uitem4 | Uitem5 | Ulist2[[ | Uitem6 | Uitem7 ]] | Uitem8 ]] | item9 | list3[[ | item10 | item11 | item12 ]] | item13 | item14

I tryied almost everything I know about RegEx, but I haven't found any RegEx matching each item inside if the Ulists. My current RegEx:

/Ulist(d+)[[(s*(|s*[^s|]*)*s*)*]]/i

What is wrong? I am beginner with RegEx.

It is in Python 2.7, specifically my code is:

    def fixDirtyLists(self, text):
        text = textlib.replaceExcept(text, r'Ulist(d+)[[(s*(|s*[^s|]*)*s*)*]]', r'Ulist1[[ U3 ]]', '', site=self.site)
        return text

text gets that weird list, textlib replaces RegEx with RegEx. Not complicated at all.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you install PyPi regex module (with Python 2.7.9+ it can be done by a mere pip install regex when in Python27Scripts folder), you will be able to match nested square brackets. You can match the strings you need, replace item with Uitem inside only those substrings.

The pattern (see demo, note that PyPi regex recursion resembles that of PCRE):

(Ulistd+)([[(?>[^][]|](?!])|[(?![)|(?2))*]])
^-Group1-^^-----------Group2--------------------^

A short explanation: (Ulistd+) is Group 1 that matches a literal word Ulist followed by 1 or more digits followed by ([[(?>[^][]|](?!])|[(?![)|(?2))*]]) that matches substrings starting with [[ up to the corresponding ]].

And the Python code:

>>> import regex
>>> s = "| item1 | item2 | item3 | Ulist1[[ | item4 | item5 | Ulist2[[ | item6 | item7 ]] | item8 ]] | item9 | list3[[ | item10 | item11 | item12 ]] | item13 | item14"
>>> pat = r'(Ulistd+)([[(?>[^][]|](?!])|[(?![)|(?2))*]])'
>>> res = regex.sub(pat, lambda m: m.group(1) + m.group(2).replace("item", "Uitem"), s)
>>> print(res)
| item1 | item2 | item3 | Ulist1[[ | Uitem4 | Uitem5 | Ulist2[[ | Uitem6 | Uitem7 ]] | Uitem8 ]] | item9 | list3[[ | item10 | item11 | item12 ]] | item13 | item14

To avoid modifying lists inside Ulist, use

def repl(m):
    return "".join([x.replace("item", "Uitem") if not x.startswith("list") else x for x in regex.split(r'listd*[{2}[^]]*(?:](?!])[^]]*)*]]', m.group(0))])

and replace the regex.sub with

res = regex.sub(pat, repl, s)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...