Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
280 views
in Technique[技术] by (71.8m points)

regex - Python regular expressions with Foreign characters in python PyQT5

This problem might be very simple but I find it a bit confusing & that is why I need help.

With relevance to this question I posted that got solved, I got a new issue that I just noticed.


Source code:

from PyQt5 import QtCore,QtWidgets

app=QtWidgets.QApplication([])

def scroll():
    #QtCore.QRegularExpression(r''+'cat'+'')
    item = listWidget.findItems(r'cat', QtCore.Qt.MatchRegularExpression)
    for d in item:
        print(d.text())

window = QtWidgets.QDialog()
window.setLayout(QtWidgets.QVBoxLayout())
listWidget = QtWidgets.QListWidget()
window.layout().addWidget(listWidget)


cats = ["love my cat","catirization","cat in the clouds","cat??"]

for i,cat in enumerate(cats):
    QtWidgets.QListWidgetItem(f"{i}  {cat}", listWidget)

btn = QtWidgets.QPushButton('Scroll')
btn.clicked.connect(scroll)
window.layout().addWidget(btn)
window.show()
app.exec_()

Output GUI:

PYQT5 OUTPUT GUI


Now as you can see I am just trying to print out the text data based on the regex r"cat" when I press the "Scroll" button and it works fine!

Output:

0  love my cat
2  cat in the clouds
3  cat??

However... as you can see on the #3, it should not be printed out cause it obviously does not match with the mentioned regular expression which is r"cat". However it does & I am thinking it has something to do with that special foreign character ?? that makes it a match & prints it out (which it shouldn't right?).

I'm expecting an output like:

0  love my cat
2  cat in the clouds

Researches I have tried

I found this question and it says something about this p{L} & based on the answer it means:

If all you want to match is letters (including "international" letters) you can use p{L}.

To be honest I'm not so sure how to apply that with PyQT5 also still I've made some tries & and I tried changing the regex to like this r''+r'p{cat}'+r''. However I got this error.

QString::contains: invalid QRegularExpression object
QString::contains: invalid QRegularExpression object
QString::contains: invalid QRegularExpression object
QString::contains: invalid QRegularExpression object

Obviously the error says it's not a valid regex. Can someone educate me on how to solve this issue? Thank you!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In general, when you need to make your shorthand character classes and word boundaries Unicode-aware, you need to pass the QRegularExpression.UseUnicodePropertiesOption option to the regex compiler. See the QRegularExpression.UseUnicodePropertiesOption reference:

The meaning of the w, d, etc., character classes, as well as the meaning of their counterparts (W, D, etc.), is changed from matching ASCII characters only to matching any character with the corresponding Unicode property. For instance, d is changed to match any character with the Unicode Nd (decimal digit) property; w to match any character with either the Unicode L (letter) or N (digit) property, plus underscore, and so on. This option corresponds to the /u modifier in Perl regular expressions.

In Python, you could declare it as

rx = QtCore.QRegularExpression(r'cat', QtCore.QRegularExpression.UseUnicodePropertiesOption)

However, since the QListWidget.findItems does not support a QRegularExpression as argument and only allows the regex as a string object, you can only use the (*UCP) PCRE verb as an alternative:

r'(*UCP)cat'

Make sure you define it at the regex beginning.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...