I am trying the following
>>> import string >>> s = 'https://google.com <0x03><0x03><0x03>' >>> s.decode('utf8').encode('ascii', errors='ignore')
The expected output is:
'https://google.com'
But the hex characters and new line is not removed.
This code:
import string import re s = 'https://google.com <0x03><0x03><0x03>' s=re.sub(r'[^ -~].*'.format(string.punctuation), '',s) print(s)
gives this:
1.4m articles
1.4m replys
5 comments
57.0k users