Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
893 views
in Technique[技术] by (71.8m points)

removing emojis from a string in Python

I found this code in Python for removing emojis but it is not working. Can you help with other codes or fix to this?

I have observed all my emjois start with xf but when I try to search for str.startswith("xf") I get invalid character error.

emoji_pattern = r'/[x{1F601}-x{1F64F}]/u'
re.sub(emoji_pattern, '', word)

Here's the error:

Traceback (most recent call last):
  File "test.py", line 52, in <module>
    re.sub(emoji_pattern,'',word)
  File "/usr/lib/python2.7/re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/usr/lib/python2.7/re.py", line 244, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character range

Each of the items in a list can be a word ['This', 'dog', 'xf0x9fx98x82', 'https://t.co/5N86jYipOI']

UPDATE: I used this other code:

emoji_pattern=re.compile(ur" " " [U0001F600-U0001F64F] # emoticons 
                                 |
                                 [U0001F300-U0001F5FF] # symbols & pictographs
                                 |
                                 [U0001F680-U0001F6FF] # transport & map symbols
                                 |
                                 [U0001F1E0-U0001F1FF] # flags (iOS)
                          " " ", re.VERBOSE)

emoji_pattern.sub('', word)

But this still doesn't remove the emojis and shows them! Any clue why is that? enter image description here

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

On Python 2, you have to use u'' literal to create a Unicode string. Also, you should pass re.UNICODE flag and convert your input data to Unicode (e.g., text = data.decode('utf-8')):

#!/usr/bin/env python
import re

text = u'This dog U0001f602'
print(text) # with emoji

emoji_pattern = re.compile("["
        u"U0001F600-U0001F64F"  # emoticons
        u"U0001F300-U0001F5FF"  # symbols & pictographs
        u"U0001F680-U0001F6FF"  # transport & map symbols
        u"U0001F1E0-U0001F1FF"  # flags (iOS)
                           "]+", flags=re.UNICODE)
print(emoji_pattern.sub(r'', text)) # no emoji

Output

This dog ??
This dog 

Note: emoji_pattern matches only some emoji (not all). See Which Characters are Emoji.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...