Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
510 views
in Technique[技术] by (71.8m points)

python 3.x - How to cast a string to bytes without encoding

I have a bunch of binary data that comes to python via a char* from some C interface (not under my control) so I have a string of arbitrary binary data (what is normally a byte array). I would like to convert it to a byte array to simplify using it with other python functions but I can't seem to figure out how.

Examples that don't work:

data = rawdatastr.encode() this assumes "utf-8" and mangles the data == BAD

data = rawdatastr.encode('ascii','ignore') strips chars over 127 == BAD

data = rawdatastr.encode('latin1') not sure -- this is the closest so far but I have no proof that it is working for all bytes.

data = array.array('B', [x for x in map(ord,data)]).tobytes() This works but seems like a lot of work to do something simple. Is there something simpler?

I am thinking I need to write my own identity encoding that just passes the bytes along (I think latin1 does this based upon some reading but no proof thus far).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Though I suspect something else is decoding your data for you (a char* in C is usually best represented as bytes, especially if it is binary data):

The latin1 codec can round trip every byte. You can verify this with the following short program:

>>> s = ''.join(chr(i) for i in range(0x100))
>>> s
'x00x01x02x03x04x05x06x07x08
x0bx0c
x0ex0fx10x11x12x13x14x15x16x17x18x19x1ax1bx1cx1dx1ex1f !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~x7fx80x81x82x83x84x85x86x87x88x89x8ax8bx8cx8dx8ex8fx90x91x92x93x94x95x96x97x98x99x9ax9bx9cx9dx9ex9fxa0?¢£¤¥|§¨?a??xad?ˉ°±23′μ?·?1o?????àá??????èéê?ìí??D?òó???×?ùú?üYT?àáa?????èéê?ìí??e?òó???÷?ùú?üyt?'
>>> s2 = s.encode('latin1').decode('latin1')
>>> s2 == s
True
>>> sb = bytes(range(0x100))
>>> sb
b'x00x01x02x03x04x05x06x07x08
x0bx0c
x0ex0fx10x11x12x13x14x15x16x17x18x19x1ax1bx1cx1dx1ex1f !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~x7fx80x81x82x83x84x85x86x87x88x89x8ax8bx8cx8dx8ex8fx90x91x92x93x94x95x96x97x98x99x9ax9bx9cx9dx9ex9fxa0xa1xa2xa3xa4xa5xa6xa7xa8xa9xaaxabxacxadxaexafxb0xb1xb2xb3xb4xb5xb6xb7xb8xb9xbaxbbxbcxbdxbexbfxc0xc1xc2xc3xc4xc5xc6xc7xc8xc9xcaxcbxccxcdxcexcfxd0xd1xd2xd3xd4xd5xd6xd7xd8xd9xdaxdbxdcxddxdexdfxe0xe1xe2xe3xe4xe5xe6xe7xe8xe9xeaxebxecxedxeexefxf0xf1xf2xf3xf4xf5xf6xf7xf8xf9xfaxfbxfcxfdxfexff'
>>> sb == s.encode('latin1')
True

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...