Skip to content

Commit 5b3ee8d

Browse files
committed
fix: add clang doc parser
1 parent 95afaff commit 5b3ee8d

File tree

1 file changed

+331
-0
lines changed

1 file changed

+331
-0
lines changed

generator/parser/README.md

Lines changed: 331 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,331 @@
1+
# Clang-Doc YAML to JSON Parser
2+
3+
A Python script that parses clang-doc's YAML output and converts it into a structured JSON format suitable for the Plugify project.
4+
5+
## Overview
6+
7+
This script replaces the previous parser that used cxxheaderparser to directly parse C++ files with Doxygen comments. The new approach leverages clang-doc's YAML output, which provides more accurate and complete information about functions, enums, and typedefs.
8+
9+
## Features
10+
11+
- **Parses clang-doc YAML output**: Extracts function signatures, parameters, return types, and documentation
12+
- **Type mapping**: Converts C++ types to simplified user-friendly types (e.g., `plg::string``string`)
13+
- **Enum handling**: Automatically resolves enum types to their base types
14+
- **Enum structure generation**: Includes full enum definitions with values and descriptions
15+
- **Function typedef parsing**: Parses function pointer typedefs and generates complete prototypes
16+
- **Nested enum support**: Handles enums within function prototypes
17+
- **Typedef support**: Identifies and marks typedef parameters
18+
- **Description extraction**: Extracts Doxygen/brief descriptions, parameter descriptions, and return descriptions
19+
- **Filtering**: Filter functions by name or source file prefix
20+
- **Batch processing**: Process single YAML files or entire directories
21+
22+
## Requirements
23+
24+
- Python 3.6+
25+
- PyYAML
26+
27+
Install dependencies:
28+
```bash
29+
pip install pyyaml
30+
```
31+
32+
## Usage
33+
34+
### Basic Usage
35+
36+
```bash
37+
python3 clang_doc_parser.py <input_path> <output_file>
38+
```
39+
40+
**Arguments:**
41+
- `input_path`: Path to a YAML file or directory containing YAML files
42+
- `output_file`: Path to the output JSON file
43+
44+
### Examples
45+
46+
1. **Process a single YAML file:**
47+
```bash
48+
python3 clang_doc_parser.py index.yaml output.json
49+
```
50+
51+
2. **Process all YAML files in a directory:**
52+
```bash
53+
python3 clang_doc_parser.py ./docs output.json
54+
```
55+
56+
3. **Filter by function name:**
57+
```bash
58+
python3 clang_doc_parser.py index.yaml output.json --name-filter "Command"
59+
```
60+
This will only include functions with "Command" in their name.
61+
62+
4. **Filter by source file prefix:**
63+
```bash
64+
python3 clang_doc_parser.py index.yaml output.json --file-filter "commands"
65+
```
66+
This will only include functions from files starting with "commands" (e.g., `commands.cpp`).
67+
68+
5. **Combine filters:**
69+
```bash
70+
python3 clang_doc_parser.py index.yaml output.json --name-filter "Add" --file-filter "commands"
71+
```
72+
73+
### Command-Line Options
74+
75+
- `--name-filter` / `-n`: Filter functions by name (case-insensitive substring match)
76+
- `--file-filter` / `-f`: Filter functions by source filename prefix
77+
- `--help` / `-h`: Show help message
78+
79+
## Integration with CMake
80+
81+
To integrate clang-doc into your CMake project:
82+
83+
```cmake
84+
find_program(CLANG_DOC clang-doc)
85+
if(CLANG_DOC)
86+
add_custom_target(docs
87+
COMMAND ${CLANG_DOC}
88+
--executor=all-TUs
89+
-p ${CMAKE_CURRENT_BINARY_DIR}
90+
--output=${CMAKE_CURRENT_SOURCE_DIR}/docs
91+
--extra-arg=-Wno-error
92+
--format=yaml
93+
${CMAKE_SOURCE_DIR}/src/*.cpp
94+
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
95+
COMMENT "Generating documentation with clang-doc"
96+
)
97+
endif()
98+
```
99+
100+
Then run:
101+
```bash
102+
cmake --build . --target docs
103+
python3 clang_doc_parser.py docs/index.yaml exported_functions.json
104+
```
105+
106+
## Output Format
107+
108+
The script generates JSON with comprehensive type information including enum structures and function prototypes.
109+
110+
### Basic Function Example
111+
112+
```json
113+
{
114+
"name": "AddAdminCommand",
115+
"group": "Commands",
116+
"description": "Creates a console command as an administrative command.",
117+
"funcName": "AddAdminCommand",
118+
"paramTypes": [
119+
{
120+
"name": "name",
121+
"type": "string",
122+
"ref": false,
123+
"description": "The name of the console command."
124+
}
125+
],
126+
"retType": {
127+
"type": "bool",
128+
"description": "A boolean indicating whether the command was successfully added."
129+
}
130+
}
131+
```
132+
133+
### Parameter with Enum Structure
134+
135+
When a parameter uses an enum type, the full enum definition is included:
136+
137+
```json
138+
{
139+
"name": "mode",
140+
"type": "uint8",
141+
"ref": false,
142+
"description": "Whether the hook was in post mode (after processing) or pre mode (before processing).",
143+
"enum": {
144+
"name": "HookMode",
145+
"description": "Enum representing the type of callback.",
146+
"values": [
147+
{
148+
"name": "Pre",
149+
"value": 0,
150+
"description": "Callback will be executed before the original function"
151+
},
152+
{
153+
"name": "Post",
154+
"value": 1,
155+
"description": "Callback will be executed after the original function"
156+
}
157+
]
158+
}
159+
}
160+
```
161+
162+
### Parameter with Function Prototype
163+
164+
When a parameter is a function pointer typedef, the complete function signature is included:
165+
166+
```json
167+
{
168+
"name": "callback",
169+
"type": "function",
170+
"ref": false,
171+
"description": "A callback function that is invoked when the command is executed.",
172+
"prototype": {
173+
"name": "CommandCallback",
174+
"funcName": "CommandCallback",
175+
"description": "Handles the execution of a command triggered by a caller.",
176+
"paramTypes": [
177+
{
178+
"name": "param1",
179+
"type": "int32",
180+
"ref": false
181+
},
182+
{
183+
"name": "param2",
184+
"type": "int32",
185+
"ref": false,
186+
"enum": {
187+
"name": "CommandCallingContext",
188+
"description": "The command execution context.",
189+
"values": [
190+
{
191+
"name": "Console",
192+
"value": 0,
193+
"description": "The command execute from the client's console."
194+
}
195+
]
196+
}
197+
}
198+
],
199+
"retType": {
200+
"type": "int32",
201+
"enum": {
202+
"name": "ResultType",
203+
"description": "Enum representing the possible results of an operation.",
204+
"values": [...]
205+
}
206+
}
207+
}
208+
}
209+
```
210+
211+
### Field Descriptions
212+
213+
- `name`: Function name
214+
- `group`: Derived from the source filename (e.g., "Commands" from `commands.cpp`)
215+
- `description`: Brief description from Doxygen comments
216+
- `funcName`: Function name (same as `name`)
217+
- `paramTypes`: Array of parameter objects:
218+
- `name`: Parameter name
219+
- `type`: Mapped type (e.g., "string", "int32", "vec3[]", "function")
220+
- `ref`: Boolean indicating if it's a reference parameter
221+
- `description`: Parameter description from Doxygen comments (if available)
222+
- `enum`: (Optional) Full enum structure if parameter is an enum type
223+
- `prototype`: (Optional) Full function signature if parameter is a function pointer typedef
224+
- `retType`: Return type object:
225+
- `type`: Mapped return type
226+
- `description`: Return description from Doxygen comments (if available)
227+
- `enum`: (Optional) Full enum structure if return type is an enum
228+
229+
## Advanced Features
230+
231+
### Enum Structure Generation
232+
233+
The parser automatically detects enum types and includes their complete definition:
234+
- Enum name and description
235+
- All enum values with their numeric values
236+
- Per-value descriptions (if documented)
237+
- Automatically filters out sentinel values like "Count", "MAX", "INVALID"
238+
239+
### Function Pointer Typedef Parsing
240+
241+
The parser can parse function pointer typedefs from their underlying signature:
242+
- Extracts return type and parameter types
243+
- Recursively processes parameter types (including nested enums)
244+
- Generates parameter names (param1, param2, etc.)
245+
- Includes full function description from typedef documentation
246+
247+
**Note**: Parameter names in function prototypes are auto-generated as `param1`, `param2`, etc. To include meaningful parameter names and descriptions, you'll need to add them manually or extend the parser to read from additional documentation sources.
248+
249+
### Nested Enum Support
250+
251+
Enums can appear at multiple levels:
252+
- Function parameters
253+
- Function return types
254+
- Function prototype parameters (within typedef)
255+
- Function prototype return types (within typedef)
256+
257+
All enum references include the complete enum structure with values and descriptions.
258+
259+
## Type Mapping
260+
261+
The script maps C++ types to simplified types:
262+
263+
| C++ Type | Mapped Type |
264+
|----------|-------------|
265+
| `plg::string`, `const plg::string &` | `string` |
266+
| `plg::vector<plg::string>` | `string[]` |
267+
| `int`, `int32_t`, `long` | `int32` |
268+
| `int64_t`, `long long` | `int64` |
269+
| `bool` | `bool` |
270+
| `float` | `float` |
271+
| `double` | `double` |
272+
| `void*` | `ptr64` |
273+
| `plg::vector<int>` | `int32[]` |
274+
| `Vector`, `QAngle`, `plg::vec3` | `vec3` |
275+
| Enums | Base type (e.g., `uint8`, `int32`) + enum structure |
276+
| Function pointer typedefs | `function` + prototype structure |
277+
| Other typedefs | `?` |
278+
279+
Pointers (except `void*`) are mapped to `ptr64`, and unknown types default to `?`.
280+
281+
## Advantages over Previous Parser
282+
283+
1. **More accurate parsing**: Leverages Clang's AST instead of regex-based parsing
284+
2. **Better enum handling**: Automatically resolves enum base types and includes full enum definitions
285+
3. **Function typedef support**: Parses function pointer signatures from typedefs
286+
4. **Structured documentation**: Extracts Doxygen comments in a more reliable way
287+
5. **Type safety**: Uses Clang's type system for accurate type information
288+
6. **Better scalability**: Can handle complex C++ constructs that regex parsing struggled with
289+
7. **Nested type support**: Handles enums within function prototypes and other complex scenarios
290+
291+
## Troubleshooting
292+
293+
### No functions exported
294+
- Ensure your YAML file contains a `ChildFunctions` section
295+
- Check that your filters aren't too restrictive
296+
- Verify the YAML file is properly formatted
297+
298+
### Type showing as '?'
299+
- The type is not in the type mapping dictionary
300+
- You can add custom type mappings to the `map_type()` function
301+
302+
### Missing descriptions
303+
- Ensure your source files have proper Doxygen comments with `@brief`, `@param`, and `@return` tags
304+
- Verify clang-doc is extracting the comments (check the YAML file)
305+
306+
### Missing enum structures
307+
- Verify the enum is defined in the YAML file's `ChildEnums` section
308+
- Ensure the enum has proper Doxygen documentation
309+
- Check that enum values are documented
310+
311+
### Function prototype parameters have generic names
312+
- This is expected behavior - clang-doc doesn't preserve parameter names in typedef signatures
313+
- Parameter names are auto-generated as `param1`, `param2`, etc.
314+
- To add meaningful names, you can either:
315+
- Manually edit the JSON output
316+
- Extend the parser to read parameter metadata from additional sources
317+
- Use wrapper functions with documented parameters instead of raw typedefs
318+
319+
## Configuration
320+
321+
### Filtering Sentinel Enum Values
322+
323+
By default, the parser filters out common sentinel enum values like "Count", "MAX", "INVALID", etc. You can customize this behavior by modifying the `sentinel_names` set in the `build_enum_structure()` function:
324+
325+
```python
326+
sentinel_names = {'Count', 'MAX', 'Max', 'INVALID', 'Invalid', 'NUM', 'Num'}
327+
```
328+
329+
## License
330+
331+
This script is part of the Plugify project.

0 commit comments

Comments
 (0)