Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
899 views
in Technique[技术] by (71.8m points)

utf 8 - Ansi to UTF-8 using python causing error

While I was trying to write a python program that converts Ansi to UTF-8, I found this

https://stackoverflow.com/questions/14732996/how-can-i-convert-utf-8-to-ansi-in-python

which converts UTF-8 to Ansi.

I thought it will just work by reversing the order. So I coded

file_path_ansi = "input.txt"
file_path_utf8 = "output.txt"

#open and encode the original content
file_source = open(file_path_ansi, mode='r', encoding='latin-1', errors='ignore')
file_content = file_source.read()
file_source.close

#write 
file_target = open(file_path_utf8, mode='w', encoding='utf-8')
file_target.write(file_content)
file_target.close

But it causes error.

TypeError: file<> takes at most 3 arguments <4 given>

So I changed

file_source = open(file_path_ansi, mode='r', encoding='latin-1', errors='ignore')

to

file_source = open(file_path_ansi, mode='r', encoding='latin-1')

Then it causes another error:

TypeError: 'encoding' is an invalid keyword arguemtn for this function

How should I fix my code to solve this problem?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You are trying to use the Python 3 version of the open() function, on Python 2. Between the major versions, I/O support was overhauled, supporting better encoding and decoding.

You can get the same new version in Python 2 as io.open() instead.

I'd use the shutil.copyfileobj() function to do the copying, so you don't have to read the whole file into memory:

import io
import shutil

with io.open(file_path_ansi, encoding='latin-1', errors='ignore') as source:
    with io.open(file_path_utf8, mode='w', encoding='utf-8') as target:
        shutil.copyfileobj(source, target)

Be careful though; most people talking about ANSI refer to one of the Windows codepages; you may really have a file in CP (codepage) 1252, which is almost, but not quite the same thing as ISO-8859-1 (Latin 1). If so, use cp1252 instead of latin-1 as the encoding parameter.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...