Bypass CFG Through MRDATA

http://alex-ionescu.com/publications/euskalhack/euskalhack2017-cfg.pdf

这是 Alex 大神的一篇演讲,介绍了一种新的绕过 CFG 的思路

MRDATA

从 Win8.1 开始微软为 CFG 的 bitmap 添加了保护机制,将 bitmap 指针等一系列全局变量放置于文件的 .mrdata 区段,这是一个新的 PE 区段,用于保存那些易变的只读数据。

这个区段在模块加载时被标记为 PAGE_READONLY ,理论上无法被修改。

但是某些时候 ntdll 需要去修改 .mrdata 区段中的某些数据。为此Windows 提供了一个新的 API : LdrProtectMrdata( bProtect )函数用于设置 .mrdata区段是否开启保护 ,参数传入 0 表示 unprotect,传入 1 表示 protect。

很明显在模块加载和卸载时都会调用这个函数来设置一些数据,然而有些函数在运行时也会调用这个 API。

例如SetProtectedPolicyGetProtectedPolicy就会用到它。这两个函数用于设置和获取进程的保护策略,这些策略保存在通过 LdrMrdataHeap 分配的内存中,即策略处于 .mrdata 区段

还有许多函数会调用,这里简单列出一些,

1
2
3
4
5
6
7
8
9
10
11
12
// Runtime Calls
RtlAddFunctionTable
RtlAddGrowableFunctionTable
RtlDeleteFunctionTable
RtlDeleteGrowableFunctionTable
RtlInsertInvertedFunctionTable
RtlInstallFunctionTableCallback
RtlSetProtectedPolicy
RtlpAddVectoredHandler
RtlpCallVectoredHandlers
RtlpRemoveVectoredHandler
RtlxRemoveInvertedFunctionTable

Bypassing CFG with MRDATA

Edge JIT 时会有大量针对 Growable Function Table 的操作,这些操作会多次调用 LdrProtectMrdata 设置
.mrdata 区段的属性。如果攻击者多次触发 JIT 就会导致 .mrdata 区段频繁的改变属性

显然这种 Growable Function Table 是一种共享资源,微软通过 SRWLock 来对 Table 进行管理,对 Lock 进行了 ACquire 操作之后才能对 Table 进行修改。

SRWLock

SRWLock 是一种轻量级的读写锁,其本质就是一个指针,它标识信息的方式如下所示

1
2
3
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
______________________________|________________________________
31 16 4 3 2 1 0

指针的低四位被用作四个不同的标识

  • owned 拥有位 0 位为1表明有线程正在读\写资源
  • CONTENDED 写入位 1 位为1表明有一个或多个线程在等待独占资源,也即当前有线程正在独占资源
  • SHARED 读取位 2 位为1表明有一个或多个线程在等待读取资源
  • CONTENTION 结构位 3 位为1表明有一个线程正在获取 WAITBLOCK 结构指针

指针的高 28 位为地址位。当没有线程在请求独占资源时,其用来表示正在共享读资源的线程个数;当在同时有一个以上线程在等待资源的时候,它会被用做指向一个结构体链表,其指向的地址以 0x10 对齐。在有其他线程在读\写资源的时候,一个线程调用AcquireSRWLockExclusive或者AcquireSRWLockShared 操作会将一个在栈上构建的结构体挂入SRWLock 所指向的链表,这样每个将要读/写资源的线程都会在栈上构建这么一个结构体,并将结构体挂入链表中。

该结构体的定义如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
typedef struct _RTLP_SRWLOCK_WAITBLOCK
{
LONG SharedCount; //有多少线程 在等待读取
volatile struct _RTLP_SRWLOCK_WAITBLOCK *Last;
volatile struct _RTLP_SRWLOCK_WAITBLOCK *Next; //链表节指针
union
{
LONG Wake; //非0表示可以被唤醒,0表示继续睡眠
struct
{
PRTLP_SRWLOCK_SHARED_WAKE SharedWakeChain; //需要被唤醒的读资源线程链表
PRTLP_SRWLOCK_SHARED_WAKE LastSharedWake; //上一个被唤醒的读资源线程
};
};
BOOLEAN Exclusive; //1表示该结构体对象由写资源线程构建在栈上,0表示结构体由读资源线程构建在栈上
} volatile RTLP_SRWLOCK_WAITBLOCK, *PRTLP_SRWLOCK_WAITBLOCK;

下面是单项链表的结构

1
2
3
4
5
typedef struct _RTLP_SRWLOCK_SHARED_WAKE
{
LONG Wake; //唤醒标志,非0唤醒,0睡眠
volatile struct _RTLP_SRWLOCK_SHARED_WAKE *Next;
} volatile RTLP_SRWLOCK_SHARED_WAKE, *PRTLP_SRWLOCK_SHARED_WAKE;

由于写操作一定是独占的,因此这里重点关注 Exclusive 独占型请求。
AcquireSRWLockExclusive 操作对应函数 RtlAcquireSRWLockExclusive,其源码如下,注释中对代码的流程进行了说明

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
VOID
NTAPI
RtlAcquireSRWLockExclusive(IN OUT PRTL_SRWLOCK SRWLock)
{
__ALIGNED(16) RTLP_SRWLOCK_WAITBLOCK StackWaitBlock; // 首先在栈中分配 RTLP_SRWLOCK_WAITBLOCK 结构
PRTLP_SRWLOCK_WAITBLOCK First, Last;
if (InterlockedBitTestAndSetPointer(&SRWLock->Ptr, RTL_SRWLOCK_OWNED_BIT)) // 如果有其他线程在访问资源则进入下面的循环,否则将 owned 标志位置一
{
LONG_PTR CurrentValue, NewValue;
while (1)
{
CurrentValue = *(volatile LONG_PTR *)&SRWLock->Ptr;
if (CurrentValue & RTL_SRWLOCK_SHARED) // 如果有线程在读资源
{
/* A shared lock is being held right now. We need to add a wait block! */
if (CurrentValue & RTL_SRWLOCK_CONTENDED)
{
goto AddWaitBlock;
}
else // 如果有线程在读资源且没有其他线程在等待写资源,则表明当前没有等待的线程,
{ // 将 RTLP_SRWLOCK_WAITBLOCK 作为第一个节点挂入链表
StackWaitBlock.Exclusive = TRUE; // 独占标志位
StackWaitBlock.SharedCount = CurrentValue >> RTL_SRWLOCK_BITS; // 此时的地址位用来保存共享读线程的个数
StackWaitBlock.Next = NULL;
StackWaitBlock.Last = &StackWaitBlock;
StackWaitBlock.Wake = 0;
ASSERT_SRW_WAITBLOCK(&StackWaitBlock);
NewValue = (ULONG_PTR)&StackWaitBlock | RTL_SRWLOCK_SHARED | RTL_SRWLOCK_CONTENDED | RTL_SRWLOCK_OWNED;
if (InterlockedCompareExchangePointer(&SRWLock->Ptr,
(PVOID)NewValue,
(PVOID)CurrentValue) == (PVOID)CurrentValue) // 将 SRWLOCK->ptr 替换为新值
{
RtlpAcquireSRWLockExclusiveWait(SRWLock,
&StackWaitBlock); // 线程进入等待
/* Successfully acquired the exclusive lock */
break;
}
}
}
else
{
if (CurrentValue & RTL_SRWLOCK_OWNED)
{
if (CurrentValue & RTL_SRWLOCK_CONTENDED) // 如果有线程在等待写资源则将 RTLP_SRWLOCK_WAITBLOCK 挂入链表中
{
AddWaitBlock:
StackWaitBlock.Exclusive = TRUE;
StackWaitBlock.SharedCount = 0;
StackWaitBlock.Next = NULL;
StackWaitBlock.Last = &StackWaitBlock;
StackWaitBlock.Wake = 0;
ASSERT_SRW_WAITBLOCK(&StackWaitBlock);
First = RtlpAcquireWaitBlockLock(SRWLock); // 根据28位的指针去掉标志位,获取指向的链表表头,
if (First != NULL)
{
Last = First->Last; // 将 RTLP_SRWLOCK_WAITBLOCK 插入双向链表的尾部
Last->Next = &StackWaitBlock;
First->Last = &StackWaitBlock;
RtlpReleaseWaitBlockLock(SRWLock);
RtlpAcquireSRWLockExclusiveWait(SRWLock, // 进入线程等待
&StackWaitBlock);
/* Successfully acquired the exclusive lock */
break;
}
}
else // 当前即没有线程在(等待)读资源也没有线程在等待写资源
// 将 RTLP_SRWLOCK_WAITBLOCK 作为第一个节点挂入链表
// 即只有一个写资源线程的情况
{
StackWaitBlock.Exclusive = TRUE;
StackWaitBlock.SharedCount = 0;
StackWaitBlock.Next = NULL;
StackWaitBlock.Last = &StackWaitBlock;
StackWaitBlock.Wake = 0;
ASSERT_SRW_WAITBLOCK(&StackWaitBlock);
NewValue = (ULONG_PTR)&StackWaitBlock | RTL_SRWLOCK_OWNED | RTL_SRWLOCK_CONTENDED;
if (InterlockedCompareExchangePointer(&SRWLock->Ptr,
(PVOID)NewValue,
(PVOID)CurrentValue) == (PVOID)CurrentValue)
{
RtlpAcquireSRWLockExclusiveWait(SRWLock, // 进入线程等待
&StackWaitBlock);
/* Successfully acquired the exclusive lock */
break;
}
}
}
else // 写资源线程请求一个空闲的 SRWLock
{
if (!InterlockedBitTestAndSetPointer(&SRWLock->Ptr,
RTL_SRWLOCK_OWNED_BIT))
{
/* We managed to get hold of a simple exclusive lock! */
break;
}
}
}
YieldProcessor();
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
NTAPI
RtlpAcquireSRWLockExclusiveWait(IN OUT PRTL_SRWLOCK SRWLock, IN PRTLP_SRWLOCK_WAITBLOCK WaitBlock)
{
LONG_PTR CurrentValue;
while (1)
{
CurrentValue = (volatile LONG_PTR *)&SRWLock->Ptr;
if (!(CurrentValue & RTL_SRWLOCK_SHARED))
{
if (CurrentValue & RTL_SRWLOCK_CONTENDED)
{
if (WaitBlock->Wake != 0)
{
break;
}
}
else
{
break;
}
}
YieldProcessor(); //只有在没有线程在读取,没有其他生产线程在独占,或者独占的线程将该线程的WAKE标志设为非0,时退出死循环
}
}

RtlAcquireSRWLockExclusive 函数的功能总结起来

  1. 当有线程还在读取资源,当前没有写资源线程在等待, 那么挂入 WAITBLOCK,设置标志 (读取 独占 拥有),进入等待,读取线程全部Release时 线程等待结束,以独占模式访问资源
  2. 有其他生产线程在等待时,将 WAITBLOCK 挂入SRWLock指针所指向链表的末尾,进入等待,在前面所有已挂入的等待都Release时,线程才结束等待 以独占模式访问资源。
  3. 如果有线程在写资源,但是没有其他线程在等待的,那么挂入 WAITBLOCK,设置标志 (独占 拥有),进入等待,读取线程全部Release时 线程等待结束,以独占模式访问资源

与之相对的 ReleaseSRWLockExclusive 的代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
VOID
NTAPI
RtlReleaseSRWLockExclusive(IN OUT PRTL_SRWLOCK SRWLock)
{
LONG_PTR CurrentValue, NewValue;
PRTLP_SRWLOCK_WAITBLOCK WaitBlock;
while (1)
{
CurrentValue = *(volatile LONG_PTR *)&SRWLock->Ptr;
if (!(CurrentValue & RTL_SRWLOCK_OWNED))
{
RtlRaiseStatus(STATUS_RESOURCE_NOT_OWNED);
}
if (!(CurrentValue & RTL_SRWLOCK_SHARED))
{
if (CurrentValue & RTL_SRWLOCK_CONTENDED)
{
/* There's a wait block, we need to wake the next pending
acquirer (exclusive or shared) */
WaitBlock = RtlpAcquireWaitBlockLock(SRWLock);
if (WaitBlock != NULL)
{
RtlpReleaseWaitBlockLockExclusive(SRWLock,
WaitBlock);
/* We released the lock */
break;
}
}
else
{
/* This is a fast path, we can simply clear the RTL_SRWLOCK_OWNED
bit. All other bits should be 0 now because this is a simple
exclusive lock and no one is waiting. */
ASSERT(!(CurrentValue & ~RTL_SRWLOCK_OWNED));
NewValue = 0;
if (InterlockedCompareExchangePointer(&SRWLock->Ptr,
(PVOID)NewValue,
(PVOID)CurrentValue) == (PVOID)CurrentValue)
{
/* We released the lock */
break;
}
}
}
else
{
/* The RTL_SRWLOCK_SHARED bit must not be present now,
not even in the contended case! */
RtlRaiseStatus(STATUS_RESOURCE_NOT_OWNED);
}
YieldProcessor();
}
}

对应的 RtlReleaseSRWLockShared 也与之类似,只是在 SharedCount 有一个判断,当 SharedCount 减小到 0 时才唤醒挂起链表中的线程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
VOID
NTAPI
RtlReleaseSRWLockShared(IN OUT PRTL_SRWLOCK SRWLock)
{
LONG_PTR CurrentValue, NewValue;
PRTLP_SRWLOCK_WAITBLOCK WaitBlock;
BOOLEAN LastShared;
while (1)
{
CurrentValue = *(volatile LONG_PTR *)&SRWLock->Ptr;
if (CurrentValue & RTL_SRWLOCK_SHARED)
{
if (CurrentValue & RTL_SRWLOCK_CONTENDED)
{
/* There's a wait block, we need to wake a pending
exclusive acquirer if this is the last shared release */
WaitBlock = RtlpAcquireWaitBlockLock(SRWLock);
if (WaitBlock != NULL)
{
LastShared = (--WaitBlock->SharedCount == 0); // 这里比较特殊
if (LastShared)
RtlpReleaseWaitBlockLockLastShared(SRWLock,
WaitBlock);
else
RtlpReleaseWaitBlockLock(SRWLock);
/* We released the lock */
break;
}
}
else
{
/* This is a fast path, we can simply decrement the shared
count and store the pointer */
NewValue = CurrentValue >> RTL_SRWLOCK_BITS;
if (--NewValue != 0)
{
NewValue = (NewValue << RTL_SRWLOCK_BITS) | RTL_SRWLOCK_SHARED | RTL_SRWLOCK_OWNED;
}
if (InterlockedCompareExchangePointer(&SRWLock->Ptr,
(PVOID)NewValue,
(PVOID)CurrentValue) == (PVOID)CurrentValue)
{
/* Successfully released the lock */
break;
}
}
}
else
{
/* The RTL_SRWLOCK_SHARED bit has to be present now,
even in the contended case! */
RtlRaiseStatus(STATUS_RESOURCE_NOT_OWNED);
}
YieldProcessor();
}
}

Windows 的实现与 ReacOs 中有所不同,但是大体思路是一样的, 函数 RtlAcquireSRWLockExclusive 代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
#define SRWLockSpinCount 1024
#define Busy_Lock 1 // 已经有人获取了锁
#define Wait_Lock 2 // 有人等待锁
#define Release_Lock 4 // 说明已经有人释放一次锁
#define Mixed_Lock 8 // 共享锁、独占锁并存
struct _SyncItem
{
_SyncItem* back;
_SyncItem* notify;
_SyncItem* next;
QWORD shareCount;
DWORD flag;
};
void __fastcall RtlAcquireSRWLockExclusive(volatile PRTL_SRWLOCK *srwlock)
{
__declspec( align( 16 ) ) _SyncItem syn = {0};
_RDI = (volatile signed __int64 *)srwlock;
v15 = 0;
if ( _interlockedbittestandset64(srwlock, 0i64) ) // 之前锁已经被占用
{
lockStatu = srwlock->ptr;
while ( 1 )
{
if ( lockStatu & Busy_Lock ) // 锁已经被占用
{
if ( (unsigned __int8)RtlpWaitCouldDeadlock(a1, a2, a3, a4, v9) )
ZwTerminateProcess(-1i64, 3221225547i64);
syn->shareCount = NtCurrentTeb()->ClientId.UniqueThread;
v3 = 0;
syn->flag = 3;
syn->next = null;
if ( lockStatu & Wait_Lock ) // 有等待
{
syn->notify = null;
syn->bala = -1;
a1 = (volatile signed __int32 *)(unsigned __int8)lockStatu;
syn->back = lockStatu & 0xFFFFFFFFFFFFFFF0; // 获取等待链表头
newStatu= &syn | lockStatu & 8 | 7;
v3 = ~((unsigned __int8)lockStatu >> 2) & 1;
}
else // 无等待
{
syn->notify = &syn;
if ( (lockStatu >> 4) > 1 ) //此时 Ptr 前几位保存的是 ShareCount,若有多个 shared 线程
newStatu = &syn | Wait_Lock | Busy_Lock | Mixed_Lock;
else
newStatu = &syn | Wait_Lock | Busy_Lock;
if ( !(lockStatu >> 4) ) // 如果没有共享,说明目前是一个 独占线程在占用
syn->bala = -2;
}
v8 = _InterlockedCompareExchange(srwlock->ptr, newStatu, lockStatu); // 将 ptr 设置成新值
if( v8 == lockStatu ) // 如果 ptr 被改变了
{
if( v3 )
OptimizeSRWLockList(srwlock, newStatu);
for ( int i = SRWLockSpinCount; i>0; --i )
{
// flag(bit1) can be reset by release-lock operation in other thread
if ( !(syn.flag & 2) )
break;
_mm_pause();
}
if(interlockedbittestandreset(syn->flag ,1)) // 如果修改了
{
do
NtWaitForAlertByThreadId(srwlock,0);
while( syn->flag & 4 )
}
}
else
{
RtlBackoff(&v15);
lockStatu = (size_t)pSRWLock->Ptr;
continue;
}
}
else // 锁在这个请求的过程中正好被释放了
{
// 将锁的状态修改为占用态
if ( lockStatu == _InterlockedCompareExchange(srwlock, lockStatu+1, lockStatu) )
return; // 如果成功修改,直接返回
RtlBackoff(&v15);
lockStatu = (size_t)pSRWLock->Ptr;
continue;
}
}
}
}

一般来说用于管理 MRDATA的 SRWLock 不会处于 MRDATA中,但是有些 SRWLock 却是在 MRDATA 解除保护之后才能获取到。这是一个很奇怪的设计,也许我们可以通过尝试修改 SRWLock 的相关数据来做一些事情。

首先随意选择一个调用 LdrProtectMrdata 的函数进行观察。这里选择的函数是 RtlDeleteGrowableFunctionTable , 这个函数会在 JIT 代码段被回收时调用。函数伪代码如下,为了直观删去了一些无关内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
__int64 __fastcall RtlDeleteGrowableFunctionTable(__int64 a1)
{
// ......
LdrProtectMrdata(0i64);
if ( qword_18017A370 )
{
RtlAcquireSRWLockExclusive(&LdrpMrdataLock); // RtlAcquireSRWLockExclusive 请求 LdrpMrdataLock
v3 = *(_DWORD *)LdrpMrdataHeapUnprotected;
if ( !*(_DWORD *)LdrpMrdataHeapUnprotected )
LdrpChangeMrdataHeapProtection(4i64);
*(_DWORD *)LdrpMrdataHeapUnprotected = v3 + 1;
RtlReleaseSRWLockExclusive(&LdrpMrdataLock);
}
RtlAcquireSRWLockExclusive(&RtlpDynamicFunctionTableLock); // RtlAcquireSRWLockExclusive 请求 RtlpDynamicFunctionTableLock
RtlAvlRemoveNode(&RtlpDynamicFunctionTableTree, v1 + 11);
*v5 = v4;
*(_QWORD *)(v4 + 8) = v5;
RtlReleaseSRWLockExclusive(&RtlpDynamicFunctionTableLock);
RtlFreeHeap(v6, 0i64, v1);
if ( qword_18017A370 )
{
RtlAcquireSRWLockExclusive(&LdrpMrdataLock); // RtlAcquireSRWLockExclusive 请求 LdrpMrdataLock
v7 = *(_DWORD *)LdrpMrdataHeapUnprotected;
*(_DWORD *)LdrpMrdataHeapUnprotected = v7 - 1;
if ( v7 == 1 )
LdrpChangeMrdataHeapProtection(2i64);
RtlReleaseSRWLockExclusive(&LdrpMrdataLock);
}
LdrProtectMrdata(1i64);
return ;
}

从上述代码中可以看到,函数共请求了三次 SRWLock,三次操作均与 .MRDATA 相关。因此函数在开始和结束的位置分别调用 LdrProtectMrdata(0)LdrProtectMrdata(1) 开关 .MRDATA 段的保护。

1
2
3
4
5
6
7
8
9
10
11
LdrProtectMrdata( a1 )
{
if ( a1 )
{
LdrpChangeMrdataProtection(2u); // ZwProtectVirtualMemory(-1, LdrpMrdataBase, LdrpMrdataSize , 2, &v2)
}
else
{
LdrpChangeMrdataProtection(4u); // ZwProtectVirtualMemory(-1, LdrpMrdataBase, LdrpMrdataSize , 4, &v4)
}
}

如果我们将 LdrpMrdataLock 或者 RtlpDynamicFunctionTableLock 的标记位修改成 RTL_SRWLOCK_SHARED | RTL_SRWLOCK_CONTENDED | RTL_SRWLOCK_OWNED,那么根据上面对于 SRWLock的函数描述,相关的 SRWLock 将被认为已被占用而一直挂起新的请求。于是上面的函数在请求 SRWLock 时便会被挂起,其后的操作将不会继续执行(包括后来的LdrProtectMrdata(1i64)),而 .MRDATA 由于函数开始时的 LdrProtectMrdata(0i64); 操作将被设置为 0 ,从而关闭 .MRDATA 段的保护。

由此通过修改 SRWLock 破坏了线程调用的完整性,从而获得 .MRDATA 的操作权限。

JIT 操作一般处于单独的线程中执行,不会影响 js 解析线程的执行,因此在获得 .MRDATA 的操作权限后,并不会中断 js 代码的继续执行。攻击者便可以使用任意地址写修改 bitmap 指针,从而绕过 CFG 保护

总结

这种方法思路十分新颖,且操作简单,通过修改一位数据便可以绕过 CFG 保护。不仅如此,由于 .MRDATA 段中还包含了很多全局性的敏感对象,使用这种方法还有可能达到其他意想不到的效果

Refenrence