Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
108 views
in Technique[技术] by (71.8m points)

python - How can I understand a .pyc file content

I have a .pyc file. I need to understand the content of that file to know how the disassembler works of python, i.e. how can I generate a output like dis.dis(function) from .pyc file content.

for e.g.

>>> def sqr(x):  
...     return x*x
...
>>> import dis
>>> dis.dis(sqr)
  2           0 LOAD_FAST                0 (x)
              3 LOAD_FAST                0 (x)
              6 BINARY_MULTIPLY     
              7 RETURN_VALUE        

I need to get a output like this using the .pyc file.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

.pyc files contain some metadata and a marshaled code object; to load the code object and disassemble that use:

import dis, marshal, sys

header_sizes = [
    # (size, first version this applies to)
    # pyc files were introduced in 0.9.2 way, way back in June 1991.
    (8,  (0, 9, 2)),  # 2 bytes magic number, 
, 4 bytes UNIX timestamp
    (12, (3, 6)),     # added 4 bytes file size
    # bytes 4-8 are flags, meaning of 9-16 depends on what flags are set
    # bit 0 not set: 9-12 timestamp, 13-16 file size
    # bit 0 set: 9-16 file hash (SipHash-2-4, k0 = 4 bytes of the file, k1 = 0)
    (16, (3, 7)),     # inserted 4 bytes bit flag field at 4-8 
    # future version may add more bytes still, at which point we can extend
    # this table. It is correct for Python versions up to 3.9
]
header_size = next(s for s, v in reversed(header_sizes) if sys.version_info >= v)

with open(pycfile, "rb") as f:
    metadata = f.read(header_size)  # first header_size bytes are metadata
    code = marshal.load(f)          # rest is a marshalled code object

dis.dis(code)

Demo with the bisect module:

>>> import bisect
>>> import dis, marshal
>>> import sys
>>> header_sizes = [(8, (0, 9, 2)), (12, (3, 6)), (16, (3, 7))]
>>> header_size = next(s for s, v in reversed(header_sizes) if sys.version_info >= v)
>>> pycfile = getattr(bisect, '__cached__', pycfile.__file__)
>>> with open(pycfile, "rb") as f:
...     metadata = f.read(header_size)  # first header_size bytes are metadata
...     code = marshal.load(f)          # rest is bytecode
... 
>>> dis.dis(code)
  1           0 LOAD_CONST               0 ('Bisection algorithms.')
              2 STORE_NAME               0 (__doc__)

  3           4 LOAD_CONST              12 ((0, None))
              6 LOAD_CONST               3 (<code object insort_right at 0x10694f3a0, file "/.../lib/python3.8/bisect.py", line 3>)
              8 LOAD_CONST               4 ('insort_right')
             10 MAKE_FUNCTION            1 (defaults)
             12 STORE_NAME               1 (insort_right)

 15          14 LOAD_CONST              13 ((0, None))
             16 LOAD_CONST               5 (<code object bisect_right at 0x10694f2f0, file "/.../lib/python3.8/bisect.py", line 15>)
             18 LOAD_CONST               6 ('bisect_right')
             20 MAKE_FUNCTION            1 (defaults)
             22 STORE_NAME               2 (bisect_right)

 36          24 LOAD_CONST              14 ((0, None))
             26 LOAD_CONST               7 (<code object insort_left at 0x10694f240, file "/.../lib/python3.8/bisect.py", line 36>)
             28 LOAD_CONST               8 ('insort_left')
             30 MAKE_FUNCTION            1 (defaults)
             32 STORE_NAME               3 (insort_left)

 49          34 LOAD_CONST              15 ((0, None))
             36 LOAD_CONST               9 (<code object bisect_left at 0x10694f190, file "/.../lib/python3.8/bisect.py", line 49>)
             38 LOAD_CONST              10 ('bisect_left')
             40 MAKE_FUNCTION            1 (defaults)
             42 STORE_NAME               4 (bisect_left)

 71          44 SETUP_FINALLY           12 (to 58)

 72          46 LOAD_CONST               1 (0)
             48 LOAD_CONST              11 (('*',))
             50 IMPORT_NAME              5 (_bisect)
             52 IMPORT_STAR
             54 POP_BLOCK
             56 JUMP_FORWARD            20 (to 78)

 73     >>   58 DUP_TOP
             60 LOAD_NAME                6 (ImportError)
             62 COMPARE_OP              10 (exception match)
             64 POP_JUMP_IF_FALSE       76
             66 POP_TOP
             68 POP_TOP
             70 POP_TOP

 74          72 POP_EXCEPT
             74 JUMP_FORWARD             2 (to 78)
        >>   76 END_FINALLY

 77     >>   78 LOAD_NAME                2 (bisect_right)
             80 STORE_NAME               7 (bisect)

 78          82 LOAD_NAME                1 (insort_right)
             84 STORE_NAME               8 (insort)
             86 LOAD_CONST               2 (None)
             88 RETURN_VALUE

Disassembly of <code object insort_right at 0x10694f3a0, file "/.../lib/python3.8/bisect.py", line 3>:
 12           0 LOAD_GLOBAL              0 (bisect_right)
              2 LOAD_FAST                0 (a)
              4 LOAD_FAST                1 (x)
              6 LOAD_FAST                2 (lo)
              8 LOAD_FAST                3 (hi)
             10 CALL_FUNCTION            4
             12 STORE_FAST               2 (lo)

 13          14 LOAD_FAST                0 (a)
             16 LOAD_METHOD              1 (insert)
             18 LOAD_FAST                2 (lo)
             20 LOAD_FAST                1 (x)
             22 CALL_METHOD              2
             24 POP_TOP
             26 LOAD_CONST               1 (None)
             28 RETURN_VALUE

Disassembly of <code object bisect_right at 0x10694f2f0, file "/.../lib/python3.8/bisect.py", line 15>:
 26           0 LOAD_FAST                2 (lo)
              2 LOAD_CONST               1 (0)
              4 COMPARE_OP               0 (<)
              6 POP_JUMP_IF_FALSE       16

 27           8 LOAD_GLOBAL              0 (ValueError)
             10 LOAD_CONST               2 ('lo must be non-negative')
             12 CALL_FUNCTION            1
             14 RAISE_VARARGS            1

 28     >>   16 LOAD_FAST                3 (hi)
             18 LOAD_CONST               3 (None)
             20 COMPARE_OP               8 (is)
             22 POP_JUMP_IF_FALSE       32

 29          24 LOAD_GLOBAL              1 (len)
             26 LOAD_FAST                0 (a)
             28 CALL_FUNCTION            1
             30 STORE_FAST               3 (hi)

 30     >>   32 LOAD_FAST                2 (lo)
             34 LOAD_FAST                3 (hi)
             36 COMPARE_OP               0 (<)
             38 POP_JUMP_IF_FALSE       80

 31          40 LOAD_FAST                2 (lo)
             42 LOAD_FAST                3 (hi)
             44 BINARY_ADD
             46 LOAD_CONST               4 (2)
             48 BINARY_FLOOR_DIVIDE
             50 STORE_FAST               4 (mid)

 32          52 LOAD_FAST                1 (x)
             54 LOAD_FAST                0 (a)
             56 LOAD_FAST                4 (mid)
             58 BINARY_SUBSCR
             60 COMPARE_OP               0 (<)
             62 POP_JUMP_IF_FALSE       70
             64 LOAD_FAST                4 (mid)
             66 STORE_FAST               3 (hi)
             68 JUMP_ABSOLUTE           32

 33     >>   70 LOAD_FAST                4 (mid)
             72 LOAD_CONST               5 (1)
             74 BINARY_ADD
             76 STORE_FAST               2 (lo)
             78 JUMP_ABSOLUTE           32

 34     >>   80 LOAD_FAST                2 (lo)
             82 RETURN_VALUE

Disassembly of <code object insort_left at 0x10694f240, file "/.../lib/python3.8/bisect.py", line 36>:
 45           0 LOAD_GLOBAL              0 (bisect_left)
              2 LOAD_FAST                0 (a)
              4 LOAD_FAST                1 (x)
              6 LOAD_FAST                2 (lo)
              8 LOAD_FAST                3 (hi)
             10 CALL_FUNCTION            4
             12 STORE_FAST               2 (lo)

 46          14 LOAD_FAST                0 (a)
             16 LOAD_METHOD              1 (insert)
             18 LOAD_FAST                2 (lo)
             20 LOAD_FAST                1 (x)
             22 CALL_METHOD              2
             24 POP_TOP
             26 LOAD_CONST               1 (None)
             28 RETURN_VALUE

Disassembly of <code object bisect_left at 0x10694f190, file "/.../lib/python3.8/bisect.py", line 49>:
 60           0 LOAD_FAST                2 (lo)
              2 LOAD_CONST               1 (0)
              4 COMPARE_OP               0 (<)
              6 POP_JUMP_IF_FALSE       16

 61           8 LOAD_GLOBAL              0 (ValueError)
             10 LOAD_CONST               2 ('lo must be non-negative')
             12 CALL_FUNCTION            1
             14 RAISE_VARARGS            1

 62     >>   16 LOAD_FAST                3 (hi)
             18 LOAD_CONST               3 (None)
             20 COMPARE_OP               8 (is)
             22 POP_JUMP_IF_FALSE       32

 63          24 LOAD_GLOBAL              1 (len)
             26 LOAD_FAST                0 (a)
             28 CALL_FUNCTION            1
             30 STORE_FAST               3 (hi)

 64     >>   32 LOAD_FAST                2 (lo)
             34 LOAD_FAST                3 (hi)
             36 COMPARE_OP               0 (<)
             38 POP_JUMP_IF_FALSE       80

 65          40 LOAD_FAST                2 (lo)
             42 LOAD_FAST                3 (hi)
             44 BINARY_ADD
             46 LOAD_CONST               4 (2)
             48 BINARY_FLOOR_DIVIDE
             50 STORE_FAST               4 (mid)

 66          52 LOAD_FAST                0 (a)
             54 LOAD_FAST                4 (mid)
             56 BINARY_SUBSCR
             58 LOAD_FAST                1 (x)
             60 COMPARE_OP               0 (<)
             62 POP_JUMP_IF_FALSE       74
             64 LOAD_FAST                4 (mid)
             66 LOAD_CONST               5 (1)
             68 BINARY_ADD
             70 STORE_FAST               2 (lo)
             72 JUMP_ABSOLUTE           32

 67     >>   74 LOAD_FAST                4 (mid)
             76 STORE_FAST               3 (hi)
             78 JUMP_ABSOLUTE           32

 68     >>   80 LOAD_FAST                2 (lo)
             82 RETURN_VALUE(

Note that this is separates out the top level code object, defining the module, and the code objects of functions and classes. In Python 3.6 and older the dis.dis() function won't recurse. In those versions, if you wanted to analyse the functions contained, you'll need to load the nested code objects from the top-level code.co_consts array. For example, the insort_right function's code object is loaded with LOAD_CONST 3, so you look for the code object at that index:

>>> code.co_consts[3]
<code object insort_right at 0x10694f3a0, file "/.../lib/python3.8/bisect.py", line 3>
>>> dis.dis(code.co_consts[3])
 12           0 LOAD_GLOBAL              0 (bisect_right)
              2 LOAD_FAST                0 (a)
              4 LOAD_FAST                1 (x)
              6 LOAD_FAST                2 (lo)
              8 LOAD_FAST                3 (hi)
             10 CALL_FUNCTION            4
             12 STORE_FAST               2 (lo)

 13          14 LOAD_FAST                0 (a)
             16 LOAD_METHOD              1 (insert)
             18 LOAD_FAST                2 (lo)
             20 LOAD_FAST                1 (x)
             22 CALL_METHOD              2
             24 POP_TOP
             26 LOAD_CONST               1 (None)
             28 RETURN_VALUE

I personally would avoid trying to parse the .pyc file with anything other than the matching Python version and marshal module. The <c


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...