python - How to decode binary file with " for index, line in enumerate(file)"?

Question

Welcome To Ask or Share your Answers For Others

python - How to decode binary file with " for index, line in enumerate(file)"?

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

python - How to decode binary file with " for index, line in enumerate(file)"?

I am opening up an extremely large binary file I am opening in Python 3.5 in file1.py:

with open(pathname, 'rb') as file:
    for i, line in enumerate(file):
        # parsing here

However, I naturally get an error because I am reading the file in binary mode and then creating a list of bytes. Then with a for loop, you are comparing string to bytes and here the code fails.

If I was reading in individual lines, I would do this:

with open(fname, 'rb') as f:
    lines = [x.decode('utf8').strip() for x in f.readlines()]

However, I am using for index, lines in enumerate(file):. What is the correct approach in this case? Do I decode the next objects?

Here is the actual code I am running:

with open(bam_path, 'rb') as file:
    for i, line in enumerate(file):
        line_data=pd.DataFrame({k.strip():v.strip()
            for k,_,v in (e.partition(':')
                for e in line.split('	'))}, index=[i])

And here is the error:

Traceback (most recent call last):                                                                                                
  File "file1.py", line 18, in <module>                                                                                        
    for e in line.split('	'))}, index=[i])                                                                                       
TypeError: a bytes-like object is required, not 'str'

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:13:44+0000

You could feed a generator with the decoded lines to enumerate:

for i, line in enumerate(l.decode(errors='ignore') for l in f):

Which does the trick of yielding every line in f after decoding it. I've added errors='ignore' due to the fact that opening with r failed with an unknown start byte.

As an aside, you could just replace all string literals with byte literals when operating on bytes, i.e: partition(b':'), split(b' ') and do your work using bytes (pretty sure pandas works fine with them).

Categories

python - How to decode binary file with " for index, line in enumerate(file)"?

python - How to decode binary file with " for index, line in enumerate(file)"?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

Categories

python - How to decode binary file with &quot; for index, line in enumerate(file)&quot;?

python - How to decode binary file with &quot; for index, line in enumerate(file)&quot;?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

python - How to decode binary file with " for index, line in enumerate(file)"?

python - How to decode binary file with " for index, line in enumerate(file)"?