Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
800 views
in Technique[技术] by (71.8m points)

python unpack little endian

I'm trying to use Python read a binary file. The file is in LSB mode. I import the struct module and use unpack like this:

f=open(sys.argv[1],'rb')
contents= unpack('<I',f.read(4))[0]
print contents
f.close()

The data in the file is 0XC0000500 in LSB mode, and the actual value is 0X000500C0. So you can see the LSB mode's smallest size is per byte.

However, I use a Mac machine, perhaps because of the version of my gcc or machine (I am not for sure. I just read the http://docs.python.org/library/struct.html about the sizeof and sys.bitorder), the result from the above code is X0500C000, so the size of the LSB mode is 2Bytes.

How should I solve this problem?

I will keep digging no matter this question is answered or not, and I will update if I ever get something.

ps: The data file is an ELF file for a 32-bit machine.

pps: Since I am going to read a huge amount of data, and this is a general problem in the reading, so the manual way is not the best for me. Question is still open for answers.

ppps: < means "little-endian,standard size (16 bit)" Now I read this...

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

if the actual value is OXABCD, then the file stores DCBA.

Usually byte order defines order of bytes, not individual bits inside a byte. "xDCxBA" are two bytes (16 bits). If you swap the bytes; all possible results are:

>>> "0X%04X" % struct.unpack("<H", binascii.unhexlify("DCBA"))
'0XBADC'
>>> "0X%04X" % struct.unpack(">H", binascii.unhexlify("DCBA"))
'0XDCBA'

Here's how 0xabcd looks like in little/big-endian format:

>>> struct.pack('<H', 0xabcd)
'xcdxab'
>>> struct.pack('>H', 0xabcd)
'xabxcd'

To get 0XABCD from "xDCxBA" you need swap half-bytes (4-bits). It seems unusual.

Since I am going to read a huge amount of data

You could use array module to read multiple values at once. It uses the same type format as the struct module.

< means "little-endian,standard size (16 bit)"

If you use <> with the struct module then standard sizes are fixed and independent of anything. Standard size depends only on the format character. In particular '<H' is always 2 bytes (16 bits), '<I' is always 4 bytes (32 bits). Only @ prefix uses native sizes.

Old answer

leave it here for the comments to make sense

You could read it as 2 bytes values and convert to int manually:

>>> hi, lo = struct.unpack("<HH", "x05x00xC0x00")
>>> n = (hi << 16) | lo
>>> n
327872
>>> "0X%08X" % n
'0X000500C0'

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...