Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
302 views
in Technique[技术] by (71.8m points)

python - read bytes string from file in python3

The content of a file is like following, and the file encoding is utf-8:

cd232704-a46f-3d9d-97f6-67edb897d65f    b'this Friday, Gerda Scheuers will be excited xe2x80x94 but shexe2x80x99s most excited about the merchandise the movie will bring.'

Here is my code:

with open(file, 'r') as f_in:
    for line in f_in:
        tokens = line.split('')
        print(tokens[1])

I want to get the right answer - "this Friday, Gerda Scheuers will be excited - but she's most excited about the merchandise the movie will bring."

print(b'xe2x80x94'.decode('utf-8')) #convert into ASCII 

But I can't read the bytes from a file. If I open a file with bytes, I need to decode the line to splite it.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use ast.literal_eval to convert the bytes literal to bytes:

Then, decode it to get string object:

>>> ast.literal_eval(r"b'excited xe2x80x94 but shexe2x80x99s'")
b'excited xe2x80x94 but shexe2x80x99s'
>>> ast.literal_eval(r"b'excited xe2x80x94 but shexe2x80x99s'").decode('utf-8')
'excited — but she’s'

with open(file, 'r') as f_in:
    for line in f_in:
        tokens = line.split('')
        # if len(tokens) < 2:
        #    continue
        bytes_part = ast.literal_eval(tokens[1])
        s = bytes_part.decode('utf-8')  # Decode the bytes to convert to a string

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...