Rosemary: development status


The final piece of the Android decompilation/static analysis puzzle is native reverse engineering. After completing the decompilation of the Java/dex parts, I decided to challenge myself with the decompilation and static analysis of the native parts, and I gave this native part a different name: rosemary.


For static analysis tools, the decoder is the most difficult part. It’s not that writing the decoder is difficult, but rather the massive number of decoding rules.

The phase of rosemary’s decoder has been completed. I completed the following in the decoder.

1. A64/A32/T32 decoder

I wrote a ruby dsl to describe arm assembly, rather than dsl, it’s more like data. thanks the format and grammar tag, its save lots of debug time.

{
    param: '24:0,23:0,22:0,21:0,20:0', 
    name: 'and', 
    format: 'r%d, r%d, #0x%x', 
    grammar: 'AND{<c>}{<q>} {<Rd>,} <Rn>, #<const>', 
    ops: '8:4,16:4,f:fm', condition: true,
    ops_desc: [
        { t: :r_r, v: '8:4' }, { t: :r_r, v: '16:4' }, { t: :imm, v: 'f:fm' }
    ],
    df: { def: [0], use: [1] } 
},
{
    param: '24:0,23:0,22:0,21:0,20:1|11:1,10:1,9:1,8:1', 
    name: 'ands', format: 'r%d, r%d, #0x%x', 
    grammar: 'ANDS{<c>}{<q>} {<Rd>,} <Rn>, #<const>', 
    ops: '8:4,16:4,f:fm', condition: true,
    ops_desc: [
        { t: :r_r, v: '8:4' }, { t: :r_r, v: '16:4' }, { t: :imm, v: 'f:fm' }
    ],
    df: { def: [0, :n, :z], use: [1] }
},
{
    param: '24:0,21:0,22:0,20:0', 
  	name: 'str', 
  	format: 'r%d, [r%d], #%s0x%x', 
  	grammar: 'STR{<c>}{<q>} <Rt>, [<Rn>], #{+/-}<imm>', 
  	ops: '12:4,16:4,f:f0,0:12', 
  	condition: true,
    ops_desc: [
        { t: :r_r, v: '12:4' }, 
        { 
            t: :mem, 
            op: MemOp::W, 
            mode: MemMode::POST_IDX, 
            base_reg: :r_r, v: '16:4', 
            offset: { t: :imm, v: '0:12', pos_neg: 'f:f0', hex: true } 
        }
    ],
    df: { use: [0] }
},

For the ARM A architecture, DDI0487_M_a_a_a-profile_architecture_reference_manual.pdf contains approximately 5600 instruction codes.

These instructions have been described using a DSL in this data format.

A very interesting thing at this stage was that I asked many LLMs about the definition of this DSL, and they all gave lambda patterns like above:

def_instruction do |bits|
	src: ''
	dst: ''
	format: ''
	grammar: ''
  condition: ''
end

I really don’t like maintaining so much code. Ultimately, I chose this very visually striking data format because it’s the fastest and easiest to understand for me, and it’s very editor-friendly, whether it’s Vim or VS Code; batch text operations are very simple.

The storage state of instruction set is a decoder tree structure, Both in manual and data.

-- root
---- chapater1
------ chapater1.1 

After filling in the data, Ruby’s DSL engine translates this data into static C language data and functions, and then it works.

2. Unit test for decoder

The data generated by hands and lots of simple scripts will hava many inputs errors. So I need a testing method to verify the the correctness of the decoding.

In the beginning, LLMs told me that I can generates instruction by myself, compares to capstone and llvm-mc’s decode result, and it failed. For example:

CASP <Ws>, <W(s+1)>, <Wt>, <W(t+1)>, [<Xn|SP>{, #0}]

Ws and W(s+1) requires two registers is continuous. I can’t generate like this, because my DSL doesn’t describe the relationship between these two registers.

Then LLMs gave me the second method: use the uint test of capstone and llvm’s test data, and it’s works.

Unit testing in C is a much more painful process than in Ruby, even Ruby’s built-in unit test gems are much more convenient than C’s. Therefore, decoding unit tests should be performed in Ruby, as they are simpler to write and debug.

 require File.expand_path("#{File.dirname(__FILE__)}/../arm")
 require File.expand_path("#{File.dirname(__FILE__)}/load_test_case")
 
 def elf_path
   File.expand_path("#{File.dirname(__FILE__)}/../../../../build/instruction")
 end
 
 def run_cmd(cmd)
   output = `#{cmd}`
   return output
 end
 
 def parse_ins(bin)
   run_cmd "#{elf_path} #{bin}"
 end
 
 a64_files = [
   'basic-a64-instructions.s.yaml',
   # files of capstone's test data
 ]
 
 arr = Arm.init_from_hash(Arm::ARM_A64)
 
 a64_files.each do |file|
   RSpec.describe "a64 decoder test" do
     test_case = LoadTestCase.new("#{File.dirname(__FILE__)}/AArch64/#{file}")
 
     test_case.res.each do |h|
       it h["asm_text"] do
         result = parse_ins(h['input'])
         puts "#{h['asm_text']} -- #{result.strip} -- #{h}"
         expect(result.strip).to eq(h['asm_text'])
       end
     end
   end
 end
 

This type of testing is more fun compared to C language testing.

3. Controlflow and Dataflow

Building the cfg and dfg is easy, but storage is much more difficult. I often use a 40MB Android .so file for testing.

This ELF file, after decoding, contains approximately 7.7 million instructions. On a 64-bit machine, a pointer is 8 bytes, so 7.7 million pointers already amount to nearly 60MB, not even counting the data.

Furthermore, frequent malloc calls quickly cause memory to explode.

If you want fast data reading while maintaining data contiguousness in memory, arrays or chunked-page-style storage are almost the only options.

After a long period of development and testing, the prototype of this static analysis tool is finally complete. And the test results make me happy, I think this is the charm of the C language.

My machine is Mac mini m4

~/workspace/clang/garlic (fea/rosemary)$ time -v ./build/elf
[elf] file path: /Users/neo/workspace/elfs/weixin/libapp.so
[elf region] exec region: a30000 - 284c710
[elf error]: elf entry addr: 0, no region
[elf memory] elf's binary buffer: 42271680, 40 MB
[elf memory] elf's entry points: 105393, size: 32 memory: 3 MB
[elf memory] elf's entry map: 105393 size: 16, memory: 1 MB
[elf memory] elf's region bitmap: 2 MB
[elf memory] elf's basic block count: 1057458, size: 64, memory: 64 MB
		 basic block's defs count: 1057458, size: 16, memory: 16 MB
		 basic block's uses count: 1057458, size: 16, memory: 16 MB
[elf memory] elf's edge count: 3358970, size: 8, memory: 25 MB
[elf memory] elf's instruction count: 7683142, size: 32, memory: 234 MB
[elf memory] elf's template count: 6000, size: 60, memory: 0 MB
	Command being timed: "./build/elf"
	User time (seconds): 1.26
	System time (seconds): 0.06
	Percent of CPU this job got: 98%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.35
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 384720
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 35
	Minor (reclaiming a frame) page faults: 25097
	Voluntary context switches: 6
	Involuntary context switches: 2115
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 16384
	Exit status: 0