AST Fuzz —— a New Kind of Fuzzer

Introduce

Fuzz 测试作为安全领域内的热点话题经久不衰,这主要归功于它在漏洞挖掘领域的卓越表现,fuzz 测试是目前已知的用于挖掘漏洞的最有效手段之一。在与浏览器相关安全的研究中,fuzz 更是占有举足轻重的地位。

As a fruitful vulnerable auto detect method, Fuzz always get a lot of attention. Fuzz is one of the most effient way in vulberable digging. In browser security, it also play an important role.

Fuzz 的本质无非是向某个测试对象发送一系列畸形数据来试图引发程序内部发生的一些错误。一般来说 Fuzz 可以分成两种基本形式:生成式和变异式。
变异式 Fuzz 是指针对已知的合法输入进行随机或者基于经验的变异,从而产生不可预知的输入。比较典型的工具有 Peach Fuzzer、ProxyFuzz 等。
生成式 Fuzz 是指根据已有的经验,规范一个输入的格式,然后基于这个格式生成一系列的输入。比如 SPIKE、Domato 以及一系列浏览器 DOM fuzz。
浏览器安全研究中,在以 DOM 对象作为重点研究目标阶段,Fuzz 技术展现了十分优越的性能,短时间内发现了大量 DOM 对象间深层调用所引发的 UAF 漏洞。这其中固然有浏览器 DOM 设计时的缺陷,但是面对如此众多的对象和如此复杂的调用关系,唯有 Fuzz 才可以高效的对其进行测试。

Essentially, Fuzz is a frame that sending a series of data to a target program, and try to trigger internal errors. In general, Fuzz can divided into two types: Generator-FUZZ and Mutator-Fuzz.
Mutator-Fuzz try to produce a heristic or random mutation based on a legally seed. Peach-Fuzzer and Proxy-Fuzzer is two famous mutator fuzz.
While Generator-Fuzz try to produce a generate based on given rules. SPIKE, Domato and most of open-source DOM Fuzzer all belong to this cateory.
In browser security research, Fuzzer has done fantastic jobs target in DOM. With fuzz, researchers find a huge number of UAF vulnerables in short time. It is true that’s due to the weakness in DOM objects’ top designation, but for a complex system like browser, maybe fuzz is the most effient way to find out flaws.

同时我们也发现了一个问题,即撞洞率太高,一个问题可能会陆续被多个研究者发现,这说明大家所使用的 fuzz 思路是十分接近的。笔者阅读了一些开源的 DOM Fuzzer 源码,发现这些 Fuzzer 虽然具体的形式不同,但是其思想都是通过构造尽可能复杂的语句和 DOM 间对象关系来使目标产生一些异常。这种思路在对 DOM 的研究上已经取得了很大的成功,但是在近两年针对 js 引擎,尤其是 JIT 引擎上,它的表现却不尽如人意。不仅效果很差,而且 Fuzz 产生的样本质量也都普遍不高,可用性很低。分析原因可以看出,旧有的一些 Fuzz 方法大都使用语料库自由组合的方式构造 JS 语句,因而会产生大量无意义的错误代码。而当引擎执行到这些代码时会抛出 RuntimeError 之类的异常,从而提前终止样本。为了使样本可以继续完整执行,Fuzzer 会加入大量的 TryCatch 语句以屏蔽这些异常,进而会影响原本程序的语意。研究 DOM 对象时这种方法自然无可厚非,但是在 JS 引擎,尤其是 JIT 引擎中,TryCatch 会极大的影响引擎的表现,从而使得大部分功能永远无法执行到。其次这种从已有的语料库中进行拼凑的方法需要 Fuzzer 本身对语法的全方位覆盖,而那些语料库覆盖不到的地方则永远不会被访问,这就大大减少了 Fuzz 可以检测到的攻击面,降低了测试效果。

MeanWhile, we also encounter a problem, Clash! The vulnerables we find with our fuzzer can always be noticed by other researchers. It is means we use very similar ideals. After reviewed some open-source DOM-Fuzzer, I realized even though these various specific forms, their core ideas are to make some abnormalities in the target by constructing as many complex statements as possible. This way had achieved great success in DOM , but failed in recent years. Since 2015 most browser researchers are foucing on javascript energe, especially JIT engine. Most of the old fuzzer use the free combination of corpus to construct JS statement, which will generate a lot of meaningless error codes. When the engine execute these codes, it will throw exceptions such as RuntimeError,which will terminates the program early. In order for the sample to continue, Fuzzer will add a large amount of TryCatch statements to block these exceptions. This method is naturally understandable for DOM-Fuzzer, but in the JS engine, especially the JIT engine, TryCatch would greatly affects their performance, so that most functions can never be reseached. What’s more, this method require the full coverage of the grammar by the Fuzzer itself, and those places the corpus cannot cover will never be accessed, which would greatly decrease the attack surface.

针对目前 JS 引擎 Fuzz 遇到的这种困境,笔者在 18 年 6 月提出了一套全新的 fuzz 方案 ———— AST Fuzz。以 JS 语言解析而成的 AST 树作为媒介进行模糊测试。这种方法不仅可以有效的避免无意义代码的生成,还可以大大提升语法的覆盖率,并且在实际应用过程中还有着意想不到的收获。

In response to this dilemma encountered by the current Fuzz, I proposed a new fuzz idea in June 18th – AST Fuzz! This idea is focus on IR, The AST tree parsed in JS language is used as a medium for fuzzing. This method can not only effectively avoid the generation of meaningless code, but also greatly improve the grammar coverage, and there are unexpected gains in the actual applying.

Relative work

笔者在提出这个方案后,花费了一个月左右的时间进行了大量的背景分析工作。在调研过程中笔者发现,一些零星的研究者也想到了利用中间语言这种思路进行 fuzz 测试。有的研究者使用这种方法对 PHP-Parser 进行 fuzz,有的研究者使用这种方法对 Python 进行 fuzz。但是将这种思路应用在浏览器中的只有 ESFuzz。esfuzz 通过不断生成随机的 AST 的方式对 JSC 的 Parser 模块进行测试。可以看到之前的研究大都集中在 Parser 阶段,并没有更进一步,针对 Runtime 或者 JIT 进行更深入的讨论。

After proposing this scheme, I spent one month on a large amount of background analysis. During the researching, I found that some sporadic researchers also use this idea to do fuzzing test. Some use intermediate language to fuzz PHP-Parser, Some one use this method to test Python. Only ESFuzz is targeting on browser. Esfuzz tests JSC’s Parser module by continuously generating random AST tree and convert it into js code. But all the investigation is concentrated in the Parser moudle, and there is no further step for a more in-depth discussion of Runtime or JIT.

AST 全称抽象语法树,是源代码的抽象语法结构的树状表现形式。抽象语法树(Abstract Syntax Tree ,AST)作为程序的一种中间表示形式,在程序分析等诸多领域有广泛的应用。利用抽象语法树可以方便地实现多种源程序处理工具,比如源程序浏览器、智能编辑器、语言翻译器等,比较为人熟知的 js-beautiful 也是使用 AST 实现的。目前已经有很多比较知名的 AST 处理工具,如 ESTOOLS、BabyLon、Babel 等,FireFox 也对外导出了其解析 AST 的接口:Reflect.Parse。在 AST Explorer 上可以方便的查看源代码与其对应的抽象语法树结构。

Abstract syntax tree (AST), or just syntax tree is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code. The syntax is “abstract” in the sense that it does not represent every detail appearing in the real syntax, but rather just the structural, content-related details. AST is widely used in many areas such as program analysis, source code browser, smart editor and language translator. The well-known js-beautiful is also implemented using AST. There are already many well-known AST processing tools, such as ESTOOLS, BabyLon, Babel, etc. FireFox also exports its interface to parse AST: Reflect.Parse. One can easily view ast of source code on AST Explorer

我们重点关注 JS 语言及其抽象语法树。一段简单的 js 可以表示成如图所示的 AST。

We focus on the JS language and its abstract syntax tree. A simple js can be represented as an AST as shown below.

整个代码段是一个 Program 节点,函数的声名,变量的声名分别对应了三个 Declaration 节点。变量 a、b 分别对应了两个 Identifier 节点。而常量 1 对应的是一个 Literal 节点。这样,一段合法的 js 代码就被表示成了一个 AST。同样一个结构完整的 AST 也可以方便的转化成一段在语法上合法的 JS 代码。由于笔者最终的方案选用的是 ESTOOLS 中的 esprima 作为 AST 处理的工具,因此这里将介绍一下 esprima 中 JS 与 AST 节点的对应关系。(不同的 Parse 引擎最终生成的 AST 差异很大,不能通用)。

The code block is a Program node, the declaration of function ,and declarations of varibles corresponds to three Declaration nodes. The variables a and b correspond to two Identifier nodes. The constant 1 corresponds to a Literal node. Thus, a legal js code is represented as an AST. A legal AST can also be easily converted into a syntactically valid JS code. I choose esprima in ESTOOLS as AST processed tools, here I will introduce the correspondence between JS and AST nodes in esprima.The resulting ASTs from different Parse engines are very different and is not universal.

在 esprima 中,js 的 AST 由两种节点构成:StatementExpression。 Statement 包括

In esprima, AST is combined with two kind of node: Statement and Expression. Statement include

1
"MetaProperty"、"BlockStatement"、"BreakStatement"、"ContinueStatement"、"DoWhileStatement"、"EmptyStatement"、"ExpressionStatement"、"ForInStatement"、"ForOfStatement"、"FunctionDeclaration"、"IfStatement"、"LabeledStatement"、"ReturnStatement"、"SwitchStatement"、"ThrowStatement"、"TryStatement"、"VariableDeclaration"、"WhileStatement"、"ClassDeclaration"、"WithStatement"

Expression 包括

Expression include

1
"ArrayExpression"、"AwaitExpression"、"AssignmentExpression"、"BinaryExpression"、"CallExpression"、"ConditionalExpression"、"FunctionExpression"、"LogicalExpression"、"MemberExpression"、"NewExpression"、"ObjectExpression"、"SequenceExpression"、"ThisExpression"、"UnaryExpression"、"UpdateExpression"、"YieldExpression"、"ArrowFunctionExpression"、"ClassExpression"、"ArgumentsToken"、"SpreadElement"、"Literal"

每种节点都拥有很多属性对应不同的 JS 表现形式。以 FunctionDeclaration 举例,节点拥有属性 generator,当该属性为 false 时,FunctionDeclaration 用于表示一个普通函数的声名

Each node has a lot of attributes to represent different JS code. For example, FunctionDeclaration has an attribute called generator. When this attribute is false, FunctionDeclaration represents a normal function.

1
function foo(){}

generator 属性为真时,FunctionDeclaration 用于表示一个生成器的声名

When generator equals true, FunctionDeclaration represents a generator function.

1
function *foo(){}

Statement 和 Expression 之间互相引用。但是通常来说,Statement 用于表示一条相对独立的 JS 语句,而 Expression 用于表示 js 语句中的表达式。Expression 通常都是可以求值的,而 Statement 则不需要。此外在从 AST 生成 JS 的过程中,esprima 会自动忽略那些其不识别的属性。

Statement adn Expression can be referenced by each other. In general, Statement is used to represent a relatively independent JS statement, and Expression is used to represent an expression in a js statement. Expressions are usually evaluated, while Statement is not. And esprima will automatically ignores attributes that it does not recognized during the process of generating JS from AST.

AST FUZZ

笔者设计的这套 Fuzz 方案以 AST 为核心,采用变异的方法生成随机样本。整套方案由四部分构成:种子文件预解析,AST 预处理,AST 的变异,样本的生成。下面笔者将详细阐述这四个部分的设计以及实现的相关细节。

This Fuzzer is based on AST and use mutators to generate random samples. The scheme consists of four parts: seeds pre-parsing, AST pre-processing, AST mutation and sample generation. Next page will elaborate on the design of these four parts and the relevant details of the implementation.

First Part, seed pre-parsing

首先使用开源工具 esprima 对即将变异的样本文件进行预处理,将 JS 转化为 AST,并最终使用 AST 文件作为 fuzzer 的种子文件。由于这套方案本质上还是一个变异式 Fuzz,因此变异样本的选取对 Fuzz 的最终效果至关重要。

At First, Fuzzer use the open-source tool esprima to preprocess the seed file, parsing the js code into AST tree, and using the AST file as the seed file for the fuzzer. Since this scheme is a mutation Fuzz essentially, the selection of samples is critical to the final result.

在变异样本的选择上,笔者选用之前产生漏洞的原始样本、以及各 JS 引擎提供的 testcase 文件。选取漏洞原始样本作为种子是因为在软件工程领域奉行的 “二八法则”,即 80% 的问题都是由 20% 的模块造成的,特别对于一个大型工程来说,问题常常会集中于某几个模块中。此外,对于漏洞而言,其成因往往复杂难懂,如果漏洞的修补者没有充分了解漏洞的本质成因,补丁常常是可以绕过的,这样的例子屡见不鲜。如笔者曾经发现的存在于 Chakra 中的几个问题。

For this Fuzzer , samples is consisted of vulnerabilities poc and testcases of each JS engine. In software engineer there is a principle named Pareto's principle also known as the 80/20 rule. which means for many events, roughly 80% of the effects come from 20% of the causes. Microsoft noted that by fixing the top 20% of the most-reported bugs, 80% of the related errors and crashes in a given system would be eliminated. Lowell Arthur expressed that “20 percent of the code has 80 percent of the errors. Find them, fix them!”. Especially for vulnerability, the cause is often complicated and difficult to understand. If the patcher of the vulnerability does not fully understand the nature of the vulnerability, the patch can be bypassed. This problem is not uncommon. I have found several examples in CharkraCore.

CVE-2017-11840

这个漏洞最早是由 Lokihart 在17年9月时发现的,漏洞编号为 CVE-2017-11840。样本可以在 Project0 的网站上找到。

This vulnerability was first discovered by Lokihart in September 17th, CVE code is CVE-2017-11840. The poc can be get from the site of Project0

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
function opt() {
let obj = [2.3023e-320];
for (let i = 0; i < 1; i++) {
obj.x = 1; // In the first analysis, BailOnNotObject emitted
obj = +obj; // Change the type
obj.x = 1; // Type confusion
}
}

function main() {
for (let i = 0; i < 1000; i++) {
opt();
}
}
main();

从漏洞效果上来看,这是一个类型混淆问题。这个问题的本质成因是由于 Chakra 引擎设计时,对循环的优化只进行了固定的两次遍历,这种设计要求在第一次遍历循环时不能对原有指令进行任何修改;但是在编写处理上述语句的代码时,代码编写者并没有考虑这个问题,在第一次遍历时就修改了原有的指令,从而导致最终的分析结果无法传递到每一条指令,最后导致了类型混淆。

This is a type confusion caused by the designation of Chakra. During loop optimization in JIT, Chakra engine will traverse the loop exactly twice, it is not a quite correct design. It requires no modifications can be made to the original instructions during the first traversal of the loop , or there will be something wrong. However, the coder did not realized this issue, and modify the instructions at the first trvaersal. As a result, the final type analysis result cannot be passed to every instructions, which ultimately leads to type confusion.

或许是受到了 issue 中分析的影响,或者是漏洞的修补者没有意识到这个根本问题,最终的补丁代码发生在了类型转化的位置。换而言之,这个补丁可以很轻易的被绕过

Perhaps influenced by the analysis in the issue, or the security researcher of the vulnerability did not realize the fundamental problem. The final patch happened in another place, which means can be bypassed easily.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
function opt() {
let obj = "aaaaaa";
for (let i = 0; i < 1; i++) {
obj.x = 1; // In the first analysis, BailOnNotObject emitted
obj = +obj; // Change the type
obj.x = 1; // Type confusion
}
}

function main() {
for (let i = 0; i < 1000; i++) {
opt();
}
}
main();

从上面的例子中可以看出,由于这样那样的原因,曾经的漏洞样本经过轻微的变异就有可能导致新的问题,因此选取他们作为种子文件是一个好的决定。

We can se from the above example, for some reasons the slight variation of the previous vulnerability samples may lead to new problems, so choose them as seed files is a good decision.

而各大引擎的 testcase 则是代码编写者和测试人员针对代码本身的特征编写的测试用例,其对于代码本身而言天然具有良好的覆盖率。以 testcase 作为种子而生成的样本相对于其他种子文件而言可以更好覆盖代码功能。

On the other hand, The testcases of the major engines are test cases written by coders and testers for the characteristics of the code, which naturally have good coverage for the code itself. Samples generated based on testcase would have a better functional coverage relative to other seed files.

选取了原始的素材之后,当然可以直接将其作为种子文件。这里对原始样本进行预处理将其转化为 AST 的目的是为了更好的对变异过程进行控制,以减少那些无意义的变异操作。仍以上例中的 JS 样本为例,我们的 Fuzz 所关心的其实只是 opt 函数以及将其 JIT 时传递的参数,其他的部分并不在我们的考虑范围之内。通过预处理 AST ,在我们不关心的语句节点上打上标记,Fuzzer 识别到这个标记之后就不会对该节点及其子节点进行变异,从而减少了一部分无意义的工作量,将变异范围限定在 opt 函数及其调用的语句上。

After selecting the original material, we can mutate these files directly. But we can have a better control and reduce meaningless mutations, if we use AST tree rather than js code as seed. Take the above JS poc as an example. for the fuzzer ,we only focus on the function opt and its parameters. By pre-processing ,we can mark the statement we do not concern, and Fuzzer will not operate these node. This reduces some of the meaningless workload and limits the scope of the variation to the opt function and related statements.

Second part, AST pre-processing

从上文中我们可以知道 AST 由 Statement 和 Expression 两种元素构成,Statement 和 Expression 之间可以相互引用。但是某些 Statement 和 Expression 出现的位置却有着特殊的要求。比如 BreakStatement 只应该出现在循环或者 LabelStatement 中;ArrayPattern只允许出现在参数和赋值语句中。此外,虽然一般而言从 AST 生成的 JS 语句都是符合语法要求的,但是 Fuzz 的样本还应该是符合语意要求的,即不能使 RuntimeError 情况出现的太普遍。这就对 Fuzz 有了更高的要求,如变异所产生的对象节点 Identifier 在多数情况下应该是有意义的,变异产生的函数调用语句不应是明显错误的等等。

From the above we can know that AST consists of two elements, Statement and Expression. Statement and Expression can refer to each other. However, some statements and expressions have special requirements for their location. For example, BreakStatement should only appear in loops or LabelStatement; ArrayPattern is only allowed in parameters and assignment statements. In addition, although the JS statements generated from AST are generally grammatically legal, the Fuzz sample should also be semantically compliant, that is, the RuntimeError situation cannot be made too common. This has higher requirements for Fuzzer. The node Identifier generated by the mutation should be meaningful in most cases. The Callstatement should not be obviously wrong. and so on.

1
2
3
var a;
var b;
var c = 1;

如在 1 节点处的 Identifier 变异,只有产生 a 或者 b 才是有意义的。这就要求 Fuzzer 在变异时需要了解当前节点所处的位置和状态信息,获取当前节点可以访问到的对象,以及各个对象在这个节点时的类型。

For example, the mutation in statement var c = 1 is only make sence when produce a or b. This requires Fuzzer to know the location and state information of the current node when mutating, to get the objects that the current node can access, and the type of each object at this node.

Node Location

第一个问题比较好解决,在遍历过程中 fuzzer 可以记录下已经走过的节点,或者直接在 AST 节点上记录下每个节点的父节点。当变异过程生成 break 等节点时,判断一下其祖先节点是否符合语法要求。为了保持 AST 结构的完整性,笔者采用 Babel 的修饰思想实现此功能。为每个 AST 节点创建一个对应的 Path 对象,用于表示这个节点在 AST 中所处的位置等信息。当变异中需要生成 BreakStatementAwaitStatement 等节点时,Fuzzer 访问当前节点对应的 Path,通过 Path 回溯其祖先节点,检查是否符合语法要求。

This question is easy to solve. During the traversal process, the fuzzer can log the nodes that have passed, and record the parent of each node directly on the AST tree. When the Mutator generates a node such as BreakStatement, Fuzzer will determine whether the ancestor node meets the grammar requirements.In order to maintain the integrity of the AST structure, the Fuzzer uses the idea of ​​Babel to achieve this function. Fuzzer create a corresponding Path object for each Node, to represent the node location in AST. When the mutator needs to generate nodes such as BreakStatement and AwaitStatement, the Fuzzer ​​accesses the corresponding Path, and backtracks its ancestors through Path to check whether it meets the grammar requirements.

Scope

第二个问题也比较好解决。寻找当前节点可访问对象问题可以转化为一个寻找当前节点所处 Scope 问题。节点可以访问当前所在 Scope 以及所有上层 Scope 中所定义的所有对象。具体的做法是,当遇到 ProgramFunctionClassBlockfor 等本身具有 Scope 的节点时就新建一个 scope 对象更新为当前的 scope ,当遍历到 initdeclaration 等节点时就记录下这些节点所生成的 Identifier 信息,并注册到当前的 scope 对象中。为了方便访问,fuzzer 会在节点对应的 Path 中存访一份 scope 对象的引用。变异过程中需要访问可用的对象信息时,直接通过 Path 对象中大的 scope 访问即可。很多开源的 AST 分析工具已经做了这方面的处理,笔者的这套方案为直接修改 ESCOPE 而来。

The second question is also an easy one. The problem finding the current accessible node can be transformed into looking for the current node Scope. The node can access all objects defined in the current Scope and the upper one. Specifically, when we step into self-scoped node like ProgramFunctionClassBlock and for, we can create a new scope and update it as current scope; when we step into init and declaration, we record the Identifier information, and register into current scope. For easy access ,fuzzer will will store a reference of Scope into corresponding Path. When we need to get the available objects during the mutation process, we can access it directly through the Scope of the Path object. There are many open-source tools can achieve this function, I merged ESCOPE into my fuzzer.

ValueInfo

在 Scope 的处理中我们可以注意到,同一个 Scope 中定义的所有对象在这个 Scope 的任何位置都可以被访问到。这就意味着,在程序的最开始阶段就可以访问到整个程序中定义的所有全局对象信息。但是在 JS 中,除了 function 之外的其他对象都不允许变量提升,这就需要 Fuzzer 确定节点的类型,防止变异过程中访问那些未定义的变量。

During the processing of Scope we can notice that all objects defined in the same Scope can be accessed anywhere in it. This means all global objects defined in the entire javascript can be accessed at the very beginning of the code. But in js, objects other than function do not allow variable promotion. So Fuzzer need to determine node’s type and revent access to those undefined variables during the mutation.

这就引出了第三个问题,也是是整套方案中最为复杂的一点。Fuzzer 需要知道在某一节点上其可以访问到的所有对象的类型及其可用属性。一方面我们需要避免 f=2; f(); 这种明显语意错误的代码出现,即分析出对象在节点位置的类型(Number,String,Function,Object,undefine,null,Symbol);另一方面我们还需要知道当对象类型不为 Primitive 时,其可能拥有的属性,以便访问之。笔者在这里采用了类似 Chakra 引擎中 Value/ValueInfo 的实现方法。在每个节点处维护一个 ValueMap,ValueMap 中存访该节点可能访问到的所有对象及其类型信息。类型信息是一个索引,除了可以表示对象的类型之外还可以用于在一个全局表中索引该类型的属性。其整体设计如图

This is the third and the most diffcult question. Fuzzer need to know the types and available properties of all objects it can access at a Path. On the one hand, we need to avoid those apparently semantic wrong code like f=2; f(); appears, that is to analysis the type of all the avaiable objects (Number,String,Function,Object,undefine,null,Symbol). On the other hand , we need to find out avaiable properties when a object’s type is not Primitive. Here I use an implementation method similar to Value/ValueInfo in the Chakra engine. Fuzzer manage a ValueMap at every Path, store all the avaiable objects with its type. The type infor stored there is an index that can be used to search detail of the type in a global table. Its overall design is shown below

ValueTable 用于存储所有的类型信息。在遍历 AST 之前,ValueTable 中预先将 JS 的内置函数放入其中,当遍历到类似 ObjectExpressionFunctionClass 等会创建新类型的节点时,Fuzzer 会新建一个 Type 对象,存储节点中产生的属性信息,并注册 Type 到 ValueTable 中。当遍历到 a.a = b 之类会导致某对象类型发生变化的节点时,Fuzzer 会从原始 Type 对象中产生一份拷贝作为新的 Type,并将节点可能导致的变化更新到新 Type 中。这样一来就可以在一定程度上获取所有对象的类型和属性信息。

ValueTable is a global table used to store tyep information. Before AST tree traversal, Fuzzer will place builtin types into this ValueTable. When step in node like ObjectExpressionFunctionClass, this statement will create a new custom type, Fuzzer will create a type Object as well and register it into the ValueTable. When step in node like a.a = b, this statement can lead to an type change. Fuzzer will duplicate a original type and update the change into the new one. In this way, we can get all the types and attributes to a certain extent.

在变异过程中,如果遇到需要访问对象属性的情况。首先通过 Scope 获取当前节点可以访问的对象集合,接着通过 ValueMap 和 ValueTable 得到对象的类型和属性。

In the mutate part, if we need to access the object properties, first, we can get all avaiable objects through Scope, and then access the type and properties of the object through ValueMap and ValueTable.

这部分设计几经修改仍然没有一个比较满意的方案。现在的版本是目前阶段笔者所能想到的最优设计,但是仍然存在着很大的改进空间。

This part has been changed mant times and still does not have a satisfactory solution. Current version is the optimal design that I can think of at the current stage, but there is still much room for improvement.

Third part, AST Mutation

经过预处理阶段的分析,Fuzzer 可以得到原始样本中所有节点的状态信息。根据状态信息就可以开始进行变异操作了。

After pre-processing ,Fuzzer can get all the need information, and now we can start mutation.

节点可能产生的变异大体上可以分成四类。一是简单变异,即仅对诸如 operatorIdentifierLiteral 等非控制节点进行同类型变异。将 operator 替换为其他的 operator,将 IdentifierLiteral 在合法范围内替换为 LiteralIdentifier。这种变异不会对 AST 的整体结构产生影响,相对来说更简单。二是节点的扩展,即在保留原始节点的基础上增加新的节点,将原始节点作为新节点的某一子节点。三是节点的删除,在合理范围内直接删除某些节点。四是节点的替换,将节点进行同类型替换,Statement 替换为 Statement,Expression 替换为 Expression。出于变异的方便,笔者设计采用父节点控制的方法进行操作。在变异操作进入某一结点时,将该节点作为父节点,随机选取一种变异方式对节点的子节点进行变异。这种方法不仅可以方便的对变异过程进行控制,还可以获取变异之后整个节点的新状态,这一点尤为重要。

Mutation can be divied into four types. First one, simple mutation, Only replace those native data node like operator,Identifier and Literal. operator can only changed into operator, and Identifier,Literal can only be changed into Identifier,Literal. This type of mutation will not change the struct of AST tree. Second one, extend mutation. Extending a new node based on the original node, and using the original node as a child node of the new node. Third one, deletion. directly deleting certain nodes within a reasonable range. Fouth one, Replacement. Replace the node with the same type, replace the Statement with a Statement, and replace Expression with Expression. For the convenience of variation, here Fuzzer use the parent-controlled method to do the mutation. When the mutator enter a node, the node is used as a parent node, and a mutation method is randomly selected to mutate the child nodes. This method is not only convenient for controlling the mutation process, but also for obtaining the new state of the entire node after the mutation, which is especially important.

在一个节点变异完成之后,一个重要的步骤是通知 AST 中其他尚未被变异的节点此次变异所带来的影响。这里笔者通过 Path 对象中的 successor 属性实现。successor 属性指向 AST 中节点顺序访问序列的下一个。通过 successor 可以顺序通知到所有的后续节点,更新其 scope 及 valueMap 状态,以便之后的变异操作使用。

After a mutation is completed, next important step is to notify the rest part of AST. Here I used successor attribute of Path object to do this. successor is pointing to next node in the sequence. Fuzzer can get all the subsequent part of AST throuth successor, and their scope and valueMap status can be updated for subsequent mutation operations.

Fouth part, Sample generator

AST 变异完成之后,将 AST 转化回 JS。现有的很多工具可以实现这一工作。笔者继续选用 esprima 来完成

After the AST mutation is complete, AST need be converted back to JS. I use esprima.

Conclusion

下面的示例图是笔者最初的设计。整套方案的核心思想就是利用 AST 可以灵活转换的性质生成更可靠的 js 代码。

The diagram below is my original design. The core idea of this whole solution is to use the AST’s flexible transformation to generate more reliable js code.

1
2
3
4
5
6
7
8
9
10
11
12
Mutation fuzz 

AST -> Shadow -> mutated AST -> js code

mutated AST
/ | \
1.statement mutate 2. expression mutate 3. value mutate

1. Traverse the AST to mark node that should be mutated, and record the object of each scope
the information stored in another tree named shadow

2. Traverse the Shadow ,mutate the marked node to

这其实是一个比较直观的思路,但是在笔者提出之前,尚未发现任何公开的使用类似思路对浏览器引擎进行模糊测试的方法出现。应用这套方案之后也的确发现了很多以前不曾发现的问题,在很大程度上提升了 fuzz 的效率。

I think it is a relatively straightforward idea, but I have not found any publicly used methods to fuzz the browser engine using similar ideas before. After applying this solution, it has indeed found many problems that have not been discovered before, which greatly improved the efficiency of fuzz.

但是出于一些原因,笔者无法继续这套引擎的开发。于是打算分享给大众,为浏览器研究者们提供一些新的思路。整套引擎的源码在笔者整理过后会放到 GitHub 上。由于开发的时间比较短,目前整套引擎还处于一个很不成熟的阶段,比如对象类型的分析还存在很大的问题,变异部分的代码更是只有一个简单的实现。

For some reason, I can’t continue the development of this engine. So I prepare to share it with the public and provide some new ideas for other browser researchers. The source code of the entire engine will be published on GitHub after compiled. Due to the short development time, the current set of engines is still in a very immature stage. For example, the analysis of object types still has a big problem, and the code of the mutation part has only a simple implementation.

不管怎样,还是希望看到这篇文章的研究者们可以肯定我所做的微小的工作。

Anyway, hope the researchers who see this article can be sure of the tiny work I have done.

Reference

[1] https://github.com/MarkRedeman/ast-based-mutations
[2] https://ir.library.oregonstate.edu/downloads/6h440v193
[3] https://github.com/estools/esfuzz
[4] https://developer.mozilla.org/zh-CN/docs/Mozilla/Projects/SpiderMonkey/Parser_API
[5] https://astexplorer.net/
[6] https://bugs.chromium.org/p/project-zero/issues/detail?id=1365&can=1&q=CVE-2017-11840&colspec=ID%20Status%20Restrict%20Reported%20Vendor%20Product%20Finder%20Summary
[7] https://babeljs.io/
[8] https://github.com/estools/escope

Appendix

Path
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
class Path{
constructor(pathManager, node, parent, pred, scope){

/**
* reference to the current node
*/
this.node = node;

/**
* reference to the parent node
*/
//this.parent = parent;
this.parent = parent;

/**
* reference to the predcessor of current node
*/
this.predcessor = pred;

/**
* reference to the successor of current node
* node to notify
*/
this.successor = null;

/**
* current scope of the current node
*/
this.scope = scope;

/**
* a map record the identifer and it's valueInfo at current node
* <id.name, valueInfo>
*/
this.valueMap = new Map()
}

/**
*
* @param {Identifer} id
* @param {ValueInfo} value
*/
updateValue(id,value){
var valueInfo;
if(valueInfo = this.valueMap.get(id)){
valueInfo.update(value);
}
this.valueMap.set(id,value);
}

/**
* merge the valueMap into path
*/
update(_path){
if(_path)
for(var [key,value] of _path.valueMap){
var valueInfo;
if (valueInfo = this.valueMap.get(key)){
valueInfo.update(value);
}else{
valueInfo = new ValueInfo();
if(!value)
console.log(_path);
valueInfo.update(value);
this.valueMap.set(key,valueInfo);
}
}
}

/**
* replace the value of src with dst in _path
* @param {Path} _path
* @param {key} src
* @param {key} dst
*/
replaceValue(_path, src, dst){
if(!_path.valueMap.get(dst)){
let valueinfo = new ValueInfo();
this.valueMap.set(src,valueinfo);
}else{
this.valueMap.set(src,_path.valueMap.get(dst));
}
}

/**
*
* @param {key} src
* @param {ValueInfo} value
*/
setValue(src, value){
let v = value;
if(!v){
v = new ValueInfo();
}
this.valueMap.set(src, v);
}

__exit(pathManager){
return this.parent;
}
}
ValueInfo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
class ValueInfo{
constructor(type = 0x0000){
/**
* 0x0000 undefine
* 0x0001 null
* 0x0002 number
* 0x0004 string
* 0x0008 boolean
* 0x0010 object
* 0x0020 function
* 0x0040 class
* 0x0080 symbol
*/
this.__type = type;

/**
* type index in valueTable
*/
this.__symIndex = null;

/**
* A set record the property of current Valueinfo
* <id>
*/
this.__props = new Set();

/**
* A set record the method of current ValueInfo
* <id>
*/
this.__methods = new Set();

/**
* if the type is a func, descript the number of param and so on
*/
this.__desc = null;
}

updateType(type){
this.__type = this.__type| type;
}

/**
* merge the props into current props
*/
updateProp(props){
if(props)
for(let prop of props.values()){
this.__props.add(prop);
}
}

/**
* merge the method into current method
*/
updateMethod(methods){
if(methods)
for(let method of methods.values()){
this.__methods.add(method);
}
}

/**
* Merge the valueInfo into current Info
* @param {ValueInfo} valueInfo
*/
update(valueInfo){
this.updateType(valueInfo.__type);
this.updateProp(valueInfo.__props);
this.updateMethod(valueInfo.__methods);
}
}
PathManager
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
class PathManager{
constructor(options){
this.__nodeToPath = new WeakMap();
this.__currentAncestor = null; // pred node in mutate exe
this.__currentParent = null; // parent node
this.__currentPath = null;
this.__options = options;
this.__valueMap = new Map(); // do as the valueMap in chakra
this.__valueTable = new Map();
this.__initialize();
}

__initialize(){
/**
* init Object
*/
var arrayTypeInfo = new ValueInfo(0x0010);
this.__valueTable.set("arrayType",arrayTypeInfo);

/**
* init Array
*/
var arrayTypeInfo = new ValueInfo(0x0010);
this.__valueTable.set("arrayType",arrayTypeInfo);

/**
* Init Function
*/
var funcTypeInfo = new ValueInfo(0x0010);
this.__valueTable.set("funcType",funcTypeInfo);

/**
* Init Class
*/
var classTypeInfo = new ValueInfo(0x0010);
this.__valueTable.set("classType",classTypeInfo);
}

__get(node) {
return this.__nodeToPath.get(node);
}

acquire(node, inner) {
var path, i, iz;

path = this.__get(node);

return path;
}

/*
一步
TODO : currentAncestor is not the real ancestor
*/
__stepPath(path){
// setup the path info
let node = path.node;
let predPath = this.acquire(path.predcessor);
if(predPath){
predPath.successor = node;
path.valueMap = new Map(predPath.valueMap);
}
this.__nodeToPath.set(node, path);
this.__currentAncestor = this.__currentPath;
this.__currentPath = path;
return path;
}

__stepOut(path){
}

__stepExpression(node, scope, parent, pred){

this.__stepPath(new Path(this, node, parent, pred, scope));
}

__stepStatement(node, scope, parent, pred){
this.__stepPath(new Path(this, node, parent, pred, scope));
}

//...
}