The soul of every Android app is its code, compiled into a compact, efficient Dalvik Executable (DEX) format. dex-parser is the surgical tool designed to lay this soul bare.
This is a standalone, dependency-free, native Python library built to parse the complete structure of DEX files. It is a core pillar of the new Androguard Ecosystem, providing a high-fidelity map of an application's code layout—its classes, methods, fields, and strings—before deeper analysis begins.
Following the "Deconstruct to Reconstruct" philosophy, dex-parser operates as a specialized, independent library. It does not concern itself with the meaning of the bytecode; its singular focus is on perfectly and performantly reading the blueprint of the executable. This separation of concerns makes it a robust and reliable foundation for any tool that needs to understand the structure of Dalvik code.
- Full Structure Parsing: Reads and indexes the entire DEX file, including the header, string table, type identifiers, method prototypes, and class definitions.
- Class & Method Enumeration: Provides a clean, Pythonic API to iterate through all defined classes, their methods (both direct and virtual), and their fields.
- On demand access for each fields by using Hachoir library.
- Cross-Reference Ready: Lays the groundwork for building cross-references by cleanly separating method and field definitions from their invocations.
- Pure & Pythonic: Written in native Python with zero external dependencies for maximum portability.
- [TODO] Multi-DEX Aware: Natively understands and can parse classes.dex, classes2.dex, and so on, providing a unified view of the application's code.
If you would like to install it locally, please create a new venv to use it directly, and then:
$ git clone https://github.com/androguard/dex-parser.git
$ pip install -e .
or directly via pypi:
$ pip install dexparser-ag
You can directly use it by command line to parse and display quickly information about a DEX file, but the purpose of this tool is mainly to be a library for other tools like Androguard.
$ dexparser -i Test.dex
You can open a dex file directly by using the DEX class:
from hachoir.stream.input_helper import FileInputStream
from dexparser import DEX
d = DEX(FileInputStream(arguments.input))
and use directly the raw access to each field of the DEX structure, like the header, and after access to each subfields:
print(d["header"])
print(d["headermagic/magic"].value)
Main fields that are accessible are:
- header
- map_list
- string_id_item
- string_data_item
- proto_id_item
- type_id_item
- method_id_item
- field_id_item
- class_id_item
- class_data_item
And so you can have access to all subfields, please see each corresponding class in the source code :)
Or you can use the DEXHelper class to quickly get access to class name, method name,
field name, but also code item for each method for disassembling:
from dexparser import DEXHelper
dh = DEXHelper.from_rawdex(d)
for method in dh.get_methods():
print("METHOD", method, method.get_internal_struct())
code = method.get_code()
if code:
print(
"\t CODE",
code["debug_info_off"],
code["insns_size"],
len(code["insns"].value),
)
my_func_to_disassemble(code["insns"].value)
A pure Rust implementation lives in dexparser-rs/. It parses the same DEX structure and can be used without Python.
&[u8] (file bytes)
│
▼ DexFile::parse()
+─────────────+
| DexFile | header + string_ids, type_ids, proto_ids, field_ids, method_ids (index tables)
+─────────────+
│
│ DexHelper::from_dex(&dex)
▼
+─────────────+
| DexHelper | high-level iterators over the same DexFile
+─────────────+
│
├──► classes() ──► ClassInfo (name, superclass_name) per class_def
├──► methods() ──► MethodInfoItem (class, name, proto, code_item) per direct/virtual method
└──► fields() ──► FieldInfoItem (class, name, type) per static/instance field
+------------------+
| header_item | magic "dex\n", version, file_size, offsets for every section
+------------------+
| string_ids[] | offset → string_data (MUTF-8) in data section
| type_ids[] | descriptor_idx → string_ids
| proto_ids[] | shorty_idx, return_type_idx, parameters_off
| field_ids[] | class_idx, type_idx, name_idx
| method_ids[] | class_idx, proto_idx, name_idx
| class_defs[] | class_idx, superclass_idx, class_data_off, ...
+------------------+
| map_list | (type, count, offset) for each section
+------------------+
| data section | string_data, type_list, class_data_item, code_item (insns), ...
+------------------+
dexparser single file: parse and print header, classes, methods, fields
│ usage: dexparser -i classes.dex [-s] [-v]
│
dexparse-dir directory: find DEX files (by magic or .dex), parse each, report time per file
│ usage: dexparse-dir -d /path [-r] [--by-extension]
│
└──► with --features disasm (dex-bytecode): parse + disassemble all method bytecode
│
▼
"X.XX ms parse Y.YY ms disasm file.dex (classes=... insns=...)"
See dexparser-rs/README.md for API details, dependency, and optional disassembly with dex-bytecode.
Distributed under the Apache License, Version 2.0.
