Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.5k views
in Technique[技术] by (71.8m points)

python - openpyxl load_workbook() on a legit .xlsx file leads to a zipfile.BadZipFile error

What I was trying to do is appening dataframe data to an existing legit excel file. I used load_workbook() function from openpyxl, but it systematically returns an error. Here is some code that crashes on my machine:

from openpyxl import load_workbook

report_path = root_folder + '\log_report.xlsx'
writer = pd.ExcelWriter(report_path, engine='openpyxl')
writer.book = load_workbook(report_path)
writer.close()

Here, log_report.xlsx already exists and have been created through pandas .to_excel(). Before opening with openpyxl load_workbook(), it is possible to open it, to edit it and do anything MS Excel allows. I got the following error returned:

Traceback (most recent call last):
  File "D:/failsafe_counter/main.py", line 419, in <module>
    writer.book = load_workbook(report_path)
  File "D:failsafe_countervenvlibsite-packagesopenpyxl
eaderexcel.py", line 315, in load_workbook
    reader = ExcelReader(filename, read_only, keep_vba,
  File "D:failsafe_countervenvlibsite-packagesopenpyxl
eaderexcel.py", line 124, in __init__
    self.archive = _validate_archive(fn)
  File "D:failsafe_countervenvlibsite-packagesopenpyxl
eaderexcel.py", line 96, in _validate_archive
    archive = ZipFile(filename, 'r')
  File "C:UsersXXXXAppDataLocalProgramsPythonPython38-32libzipfile.py", line 1269, in __init__
    self._RealGetContents()
  File "C:UsersXXXXAppDataLocalProgramsPythonPython38-32libzipfile.py", line 1336, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Important aspect of that is this operation left the initial excel file corrupted, and impossible to open again.

Initial file is a legit zipfile (verified through renaming the xlsx into a .zip). Once the little code above returns the error, it turns the Excel file into an empty archive (verified through the same process).

I employed such functions with success on my previous laptop (under windows 7) but since I migrated under windows 10 I'm not able to use them anymore. Both of them were running python 3.8.

Is there any known issue about openpyxl load_workbook() on some configs? Do you have any idea how to fix this, or any workaround?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I was able to replicate the problem. It is pandas related. Everything works just fine up to pandas 1.1.5 In pandas 1.2.0 they did some changes

At the time when you instantiate pd.ExcelWriter with

writer = pd.ExcelWriter(report_path, engine='openpyxl')`

it creates empty file with size 0 bytes and overwrites the existing file and then you get error when try to load it. It is not openpyxl related, because with latest version of openpyxl it works fine with pandas 1.1.5.

The solution - specify mode='a', change the above line to

writer = pd.ExcelWriter(report_path, engine='openpyxl', mode='a')

Alternatively - look at @CharlieClark solution or this solution where they loads the file before instantiating the pd.ExcelWriter..


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...