Decrypt PYC

PYC

PYC 是 Python 文件经过编译之后形成的一种文件格式,可以提高加载效率用来为其他 Python 程序提供函数接口等功能。

其文件格式如下,主要由头部四字节的 Magic 字段和 4字节的 timestap 以及 r_object 结构组成。r_object 对象是 PYC 文件的主要内容,包括了 python 文件所编译形成的 Opcode、常量、值等一系列信息。其更具体的信息可以在 python 自带的 marshal 库中找到 Python/marshal.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
typedef struct r_object {
ObjType type;
switch (type) {
case TYPE_NULL:
case TYPE_NONE:
case TYPE_STOPITER:
case TYPE_ELLIPSIS:
case TYPE_FALSE:
case TYPE_TRUE:
break;
case TYPE_INT:
r_long value;
break;
case TYPE_INT64:
r_long64 value;
break;
case TYPE_LONG:
r_long n;
local int size = n<0?-n:n;
r_short digit[size];
break;
case TYPE_FLOAT:
r_byte n;
char value[n];
break;
case TYPE_BINARY_FLOAT:
double value;
break;
case TYPE_COMPLEX:
r_byte nr;
char real[nr];
r_byte ni;
char imag[ni];
break;
case TYPE_BINARY_COMPLEX:
double real;
double imag;
break;
case TYPE_INTERNED:
case TYPE_STRING:
r_long n;
if (n)
char str[n];
break;
case TYPE_STRINGREF:
r_long n;
break;
case TYPE_TUPLE:
r_long n;
if (n)
struct r_object elements[n] <optimize=false>;
break;
case TYPE_LIST:
r_long n;
if (n)
struct r_object elements[n] <optimize=false>;
break;
case TYPE_DICT:
while (1) {
struct r_object key;
if (key.type == TYPE_NULL)
break;
struct r_object val;
}
break;
case TYPE_SET:
case TYPE_FROZENSET:
r_long n;
if (n)
struct r_object elements[n] <optimize=false>;
break;
case TYPE_CODE:
r_long argcount;
r_long nlocals;
r_long stacksize;
r_long flags;
//struct r_object code;
Code code;
struct r_object consts;
struct r_object names;
struct r_object varnames;
struct r_object freevars;
struct r_object cellvars;
struct r_object filename;
struct r_object name;
r_long firstlineno;
//struct r_object lnotab;
LnoTab lnotab;
break;
default:
Warning("unknown type code");
Exit(1);
}
} r_object;
struct {
Magic magic;
char mtime[4];
r_object data;
} file;

通过 Python 自带的 marshal 库可以很轻松解析出 PYC 文件的内容,代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import dis, marshal, struct, sys, time, types
def show_file(fname):
f = open(fname, "rb")
magic = f.read(4)
moddate = f.read(4)
modtime = time.asctime(time.localtime(struct.unpack('L', moddate)[0]))
print "magic %s" % (magic.encode('hex'))
print "moddate %s (%s)" % (moddate.encode('hex'), modtime)
code = marshal.load(f)
show_code(code)
def show_code(code, indent=''):
print "%scode" % indent
indent += ' '
print "%sargcount %d" % (indent, code.co_argcount)
print "%snlocals %d" % (indent, code.co_nlocals)
print "%sstacksize %d" % (indent, code.co_stacksize)
print "%sflags %04x" % (indent, code.co_flags)
show_hex("code", code.co_code, indent=indent)
dis.disassemble(code)
print "%sconsts" % indent
for const in code.co_consts:
if type(const) == types.CodeType:
show_code(const, indent+' ')
else:
print " %s%r" % (indent, const)
print "%snames %r" % (indent, code.co_names)
print "%svarnames %r" % (indent, code.co_varnames)
print "%sfreevars %r" % (indent, code.co_freevars)
print "%scellvars %r" % (indent, code.co_cellvars)
print "%sfilename %r" % (indent, code.co_filename)
print "%sname %r" % (indent, code.co_name)
print "%sfirstlineno %d" % (indent, code.co_firstlineno)
show_hex("lnotab", code.co_lnotab, indent=indent)
def show_hex(label, h, indent):
h = h.encode('hex')
if len(h) < 60:
print "%s%s %s" % (indent, label, h)
else:
print "%s%s" % (indent, label)
for i in range(0, len(h), 60):
print "%s %s" % (indent, h[i:i+60])
show_file(sys.argv[1])

当然也可以使用 010 Editor 的模版直接观察文件的内容,但是由于 010 Editor 的模版文件编写时间较早因此需要对其稍作修改,为 Magic 字段添加当前版本的兼容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
enum <uint16> MagicValue {
PY_24a0 = 62041,
PY_24a3 = 62051,
PY_24b1 = 62061,
PY_25a0_1 = 62071,
PY_25a0_2 = 62081,
PY_25a0_3 = 62091,
PY_25a0_4 = 62092,
PY_25b3_1 = 62101,
PY_25b3_2 = 62111,
PY_25c1 = 62121,
PY_25c2 = 62131,
PY_26a0 = 62151,
PY_26a1 = 62161,
PY_27a0_1 = 62171,
PY_27a0_2 = 62181,
PY_27a0_a = 62211,
};

PYC 解密

使用 uncompyle2 可以很方便的从 PYC 中反编译出 python 源码。但是有些情况中会出现自定义 python 的问题,PYC 文件的格式自然也会发生改变,那么直接使用 uncompyle2 就无法直接得出源码。此时需要对这种情况进行人工的修正。

验证自定义程度

一般而言这种自定义 python 都是修改 python 中 opcode 与实际 instr 的映射关系。为了对自定义成都进行验证,首先修改 010 Editor 模版文件,将其中解析 Code 部分的代码直接修改为 char 数组。此时再次运行模版文件解析 PYC,如果仍然可以正常解析则说明这个自定义 Python 仅仅修改了 opcode,否则还需要对其他字段进行修复。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
struct Code {
ObjType type;
if (type != TYPE_STRING) {
Warning("code not in string type");
Exit(1);
}
r_long n;
local int remain = n;
local int end = FTell() + n;
char code[n];
/* trick to optimize parse speed */
/* in general customPy will change the opcode ,so comment this
while (remain >= 6) {
Instruction inst[remain/6] <read=ReadInstruction,optimize=false>;
remain = end - FTell();
}
remain = end - FTell();
while (remain > 0) {
Instruction inst <read=ReadInstruction>;
remain -= sizeof(inst);
}*/
};

获取 opcode 对应表

如果自定义的 python 仅仅是修改了 opcode 的映射关系,那么我们只需要找到修改后的映射表即可,编写程序自动化完成这个工作。使用这篇文章中的代码,阴阳师:一个非酋的逆向旅程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import sys
import marshal
opmap = {}
def compare(cobj1, cobj2):
codestr1 = bytearray(cobj1.co_code)
codestr2 = bytearray(cobj2.co_code)
if len(codestr1) != len(codestr2):
print("two cobj has different length, skipping")
return
i = 0
while i < len(codestr1):
if codestr1[i] not in opmap:
opmap[codestr1[i]] = codestr2[i]
else:
if opmap[codestr1[i]] != codestr2[i]:
print("error: has wrong opcode")
break
if codestr1[i] < 90 and codestr2[i] < 102: # <------------------(1)
i += 1
elif codestr1[i] >= 90 and codestr2[i] >= 102: # <-----------------(2)
i += 3
else:
print("wrong opcode")
for const1, const2 in zip(cobj1.co_consts, cobj2.co_consts):
if hasattr(const1, 'co_code') and hasattr(const2, 'co_code'):
compare(const1, const2)
def usage():
print("Usage: %s normal_filename1.pyc custom_filename2.pyc")
def main():
if len(sys.argv) != 3:
usage()
return
cobj1 = marshal.loads(open(sys.argv[1]).read())
cobj2 = marshal.loads(open(sys.argv[2]).read())
compare(cobj1, cobj2)
print(opmap)
if __name__ == '__main__':
main()

这里需要注意的一点是代码中添加注释的两个位置,opcode 根据其具体值的不同所占的长度也不同,一般而言会以一个值作为分界线,大于这个值的 opcode 占三个字节,小于这个值的 opcode 占一个字节。这个值在正常的 python 中为 90,但是对于自定义的 python 来说则需要在文件中具体查找。通过上面的代码可以将常规 Python 的 opcode 与自定义 opcode 对应起来输出一个对照表。

使用如下程序编译的 pyc 中会使用到所有 opcode,从而得到一个完整的对照表。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
# from __future__ import division
# def_op('STOP_CODE', 0)
# ignore
# def_op('POP_TOP', 1)
a()
# def_op('ROT_TWO', 2)
(a, b) = (b, a)
# def_op('ROT_THREE', 3)
(a, a, a) = (a, a, a)
# def_op('DUP_TOP', 4)
exec 1
# def_op('ROT_FOUR', 5)
a[2:4] += 'abc'
# def_op('NOP', 9)
# ignore
# def_op('UNARY_POSITIVE', 10)
+ a
# def_op('UNARY_NEGATIVE', 11)
- a
# def_op('UNARY_NOT', 12)
not a
# def_op('UNARY_CONVERT', 13)
a = `a`
# def_op('UNARY_INVERT', 15)
a = ~a
# def_op('BINARY_POWER', 19)
a ** 1
# def_op('BINARY_MULTIPLY', 20)
a * 1
# def_op('BINARY_DIVIDE', 21)
a / 1
# def_op('BINARY_MODULO', 22)
a % 1
# def_op('BINARY_ADD', 23)
a + 1
# def_op('BINARY_SUBTRACT', 24)
a - 1
# def_op('BINARY_SUBSCR', 25)
a[1]
# def_op('BINARY_FLOOR_DIVIDE', 26)
a // 1
# def_op('BINARY_TRUE_DIVIDE', 27)
# add 'from __future__ import division' to header
# def_op('INPLACE_FLOOR_DIVIDE', 28)
a //= 1
# def_op('INPLACE_TRUE_DIVIDE', 29)
# add 'from __future__ import division' to header
# def_op('SLICE+0', 30)
a[:]
# def_op('SLICE+1', 31)
a[1:]
# def_op('SLICE+2', 32)
a[:2]
# def_op('SLICE+3', 33)
a[1:2]
# def_op('STORE_SLICE+0', 40)
a[:] = 1
# def_op('STORE_SLICE+1', 41)
a[1:] = 1
# def_op('STORE_SLICE+2', 42)
a[:2] = 1
# def_op('STORE_SLICE+3', 43)
a[1:2] =1
# def_op('DELETE_SLICE+0', 50)
del a[:]
# def_op('DELETE_SLICE+1', 51)
del a[1:]
# def_op('DELETE_SLICE+2', 52)
del a[:2]
# def_op('DELETE_SLICE+3', 53)
del a[1:2]
# def_op('STORE_MAP', 54)
{"1": 1}
# def_op('INPLACE_ADD', 55)
a += 1
# def_op('INPLACE_SUBTRACT', 56)
a -= 1
# def_op('INPLACE_MULTIPLY', 57)
a *= 1
# def_op('INPLACE_DIVIDE', 58)
a /= 1
# def_op('INPLACE_MODULO', 59)
a %= 1
# def_op('STORE_SUBSCR', 60)
a[1] = 1
# def_op('DELETE_SUBSCR', 61)
del a[1]
# def_op('BINARY_LSHIFT', 62)
a << 1
# def_op('BINARY_RSHIFT', 63)
a >> 1
# def_op('BINARY_AND', 64)
a & 1
# def_op('BINARY_XOR', 65)
a ^ 1
# def_op('BINARY_OR', 66)
a | 1
# def_op('INPLACE_POWER', 67)
a **= 1
# def_op('GET_ITER', 68)
for i in a:
pass
# def_op('PRINT_EXPR', 70)
# ignore
# def_op('PRINT_ITEM', 71)
print(1)
# def_op('PRINT_NEWLINE', 72)
print(1)
# def_op('PRINT_ITEM_TO', 73)
print >> fd, 1
# def_op('PRINT_NEWLINE_TO', 74)
print >> fd, 1
# def_op('INPLACE_LSHIFT', 75)
a <<= 1
# def_op('INPLACE_RSHIFT', 76)
a >>= 1
# def_op('INPLACE_AND', 77)
a &= 1
# def_op('INPLACE_XOR', 78)
a ^= 1
# def_op('INPLACE_OR', 79)
a |= 1
# def_op('BREAK_LOOP', 80)
while True:
break
# def_op('WITH_CLEANUP', 81)
with a:
pass
# def_op('LOAD_LOCALS', 82)
class a:
pass
# def_op('RETURN_VALUE', 83)
def a():
return
# def_op('IMPORT_STAR', 84)
from module import *
# def_op('EXEC_STMT', 85)
exec 1
# def_op('YIELD_VALUE', 86)
def a():
yield 1
# def_op('POP_BLOCK', 87)
while True:
pass
# def_op('END_FINALLY', 88)
with a:
pass
# def_op('BUILD_CLASS', 89)
class a:
pass
# name_op('STORE_NAME', 90) # Index in name list
a = 1
# name_op('DELETE_NAME', 91) # ""
del a
# def_op('UNPACK_SEQUENCE', 92) # Number of tuple items
a, b = 1, 2
# jrel_op('FOR_ITER', 93)
for i in a:
pass
# def_op('LIST_APPEND', 94)
[i for i in a]
# name_op('STORE_ATTR', 95) # Index in name list
a.a = 1
# name_op('DELETE_ATTR', 96) # ""
del a.a
# name_op('STORE_GLOBAL', 97) # ""
def a():
global aa
aa = 1
# name_op('DELETE_GLOBAL', 98) # ""
def a():
global aa
del aa
# def_op('DUP_TOPX', 99) # number of items to duplicate
b[a] += 1
# def_op('LOAD_CONST', 100) # Index in const list
123
# name_op('LOAD_NAME', 101) # Index in name list
a
# def_op('BUILD_TUPLE', 102) # Number of tuple items
(a, )
# def_op('BUILD_LIST', 103) # Number of list items
[]
# def_op('BUILD_SET', 104) # Number of set items
{1}
# def_op('BUILD_MAP', 105) # Number of dict entries (upto 255)
{}
# name_op('LOAD_ATTR', 106) # Index in name list
a.a
# def_op('COMPARE_OP', 107) # Comparison operator
a == a
# name_op('IMPORT_NAME', 108) # Index in name list
import a
# name_op('IMPORT_FROM', 109) # Index in name list
from a import b
# jrel_op('JUMP_FORWARD', 110) # Number of bytes to skip
if True:
pass
# jabs_op('JUMP_IF_FALSE_OR_POP', 111) # Target byte offset from beginning of code
0 and False
# jabs_op('JUMP_IF_TRUE_OR_POP', 112) # ""
0 or False
# jabs_op('JUMP_ABSOLUTE', 113) # ""
def a():
if b:
if c:
print('')
# jabs_op('POP_JUMP_IF_FALSE', 114) # ""
if True:
pass
# jabs_op('POP_JUMP_IF_TRUE', 115) # ""
if not True:
pass
# name_op('LOAD_GLOBAL', 116) # Index in name list
def a():
global b
return b
# jabs_op('CONTINUE_LOOP', 119) # Target address
while True:
try:
continue
except:
pass
# jrel_op('SETUP_LOOP', 120) # Distance to target address
while True:
pass
# jrel_op('SETUP_EXCEPT', 121) # ""
# jrel_op('SETUP_FINALLY', 122) # ""
try:
pass
except:
pass
finally:
pass
# def_op('LOAD_FAST', 124) # Local variable number
def a():
aa = 1
return aa
# def_op('STORE_FAST', 125) # Local variable number
def a():
aa = 1
# def_op('DELETE_FAST', 126) # Local variable number
def a():
aa = 1
del aa
# def_op('RAISE_VARARGS', 130) # Number of raise arguments (1, 2, or 3)
raise
# def_op('CALL_FUNCTION', 131) # #args + (#kwargs << 8)
a()
# def_op('MAKE_FUNCTION', 132) # Number of args with default values
def a():
pass
# def_op('BUILD_SLICE', 133) # Number of items
a[::]
# def_op('MAKE_CLOSURE', 134)
# def_op('LOAD_CLOSURE', 135)
# def_op('LOAD_DEREF', 136)
# def_op('STORE_DEREF', 137)
def f():
a = 1
def g():
return a + 1
return g()
# def_op('CALL_FUNCTION_VAR', 140) # #args + (#kwargs << 8)
a(*args)
# def_op('CALL_FUNCTION_KW', 141) # #args + (#kwargs << 8)
a(**kwargs)
# def_op('CALL_FUNCTION_VAR_KW', 142) # #args + (#kwargs << 8)
a(*args, **kwargs)
# jrel_op('SETUP_WITH', 143)
with a:
pass
# def_op('EXTENDED_ARG', 145)
# ignore
# def_op('SET_ADD', 146)
{i for i in a}
# def_op('MAP_ADD', 147)
{i:i for i in a}

修复 PYC

使用上得到的对照表就可以对 PYC 进行修复,这里用到的 pymarshal 库可以在 这里 下载到

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import os
import zlib
import marshal
import binascii
import argparse
import pymarshal
class PYCEncryptor(object):
def __init__(self):
self.opcode_encrypt_map = {
}
self.opcode_decrypt_map = {self.opcode_encrypt_map[key]: key for key in self.opcode_encrypt_map}
self.pyc27_header = "\x03\xf3\x0d\x0a\x00\x00\x00\x00"
def _decrypt_file(self, filename):
os.path.splitext(filename)
content = open(filename).read()
try:
m = pymarshal.loads(content)
except:
try:
m = marshal.loads(content)
except Exception as e:
print("[!] error: %s" % str(e))
return None
return m.co_filename.replace('\\', '/'), pymarshal.dumps(m, self.opcode_decrypt_map)
def decrypt_file(self, input_file, output_file=None):
result = self._decrypt_file(input_file)
if not result:
return
pyc_filename, pyc_content = result
if not output_file:
output_file = os.path.basename(pyc_filename) + '.pyc'
with open(output_file, 'wb') as fd:
fd.write(self.pyc27_header + pyc_content)
def main():
parser = argparse.ArgumentParser(description='onmyoji py decrypt tool')
parser.add_argument("INPUT_NAME", help='input file')
parser.add_argument("OUTPUT_NAME", help='output file')
args = parser.parse_args()
encryptor = PYCEncryptor()
encryptor.decrypt_file(args.INPUT_NAME, args.OUTPUT_NAME)
if __name__ == '__main__':
main()

运行脚本就可以得到一个可以被反编译成 Python 的正常 pyc 文件。

总结

这次的文章主要是为了学习自动化的思想,之前遇到这种问题第一时间想到的总是手动去修复,经常是费时又费力。通过这次学习深刻意识到了自动化带给人的便利性。在以后的工作中需要逐步转变思维方式,以此提升效率。

最后感谢 fc 在批处理编写过程中的及时指证,早用 python 早下班。

Reference

[1] https://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html
[2] https://mail.python.org/pipermail/python-ideas/2008-April/001550.html
[3] http://blog.fatezero.org/2017/01/14/decrypt-onmyoji/