|
| 1 | +# Clang-Doc YAML to JSON Parser |
| 2 | + |
| 3 | +A Python script that parses clang-doc's YAML output and converts it into a structured JSON format suitable for the Plugify project. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +This script replaces the previous parser that used cxxheaderparser to directly parse C++ files with Doxygen comments. The new approach leverages clang-doc's YAML output, which provides more accurate and complete information about functions, enums, and typedefs. |
| 8 | + |
| 9 | +## Features |
| 10 | + |
| 11 | +- **Parses clang-doc YAML output**: Extracts function signatures, parameters, return types, and documentation |
| 12 | +- **Type mapping**: Converts C++ types to simplified user-friendly types (e.g., `plg::string` → `string`) |
| 13 | +- **Enum handling**: Automatically resolves enum types to their base types |
| 14 | +- **Enum structure generation**: Includes full enum definitions with values and descriptions |
| 15 | +- **Function typedef parsing**: Parses function pointer typedefs and generates complete prototypes |
| 16 | +- **Nested enum support**: Handles enums within function prototypes |
| 17 | +- **Typedef support**: Identifies and marks typedef parameters |
| 18 | +- **Description extraction**: Extracts Doxygen/brief descriptions, parameter descriptions, and return descriptions |
| 19 | +- **Filtering**: Filter functions by name or source file prefix |
| 20 | +- **Batch processing**: Process single YAML files or entire directories |
| 21 | + |
| 22 | +## Requirements |
| 23 | + |
| 24 | +- Python 3.6+ |
| 25 | +- PyYAML |
| 26 | + |
| 27 | +Install dependencies: |
| 28 | +```bash |
| 29 | +pip install pyyaml |
| 30 | +``` |
| 31 | + |
| 32 | +## Usage |
| 33 | + |
| 34 | +### Basic Usage |
| 35 | + |
| 36 | +```bash |
| 37 | +python3 clang_doc_parser.py <input_path> <output_file> |
| 38 | +``` |
| 39 | + |
| 40 | +**Arguments:** |
| 41 | +- `input_path`: Path to a YAML file or directory containing YAML files |
| 42 | +- `output_file`: Path to the output JSON file |
| 43 | + |
| 44 | +### Examples |
| 45 | + |
| 46 | +1. **Process a single YAML file:** |
| 47 | + ```bash |
| 48 | + python3 clang_doc_parser.py index.yaml output.json |
| 49 | + ``` |
| 50 | + |
| 51 | +2. **Process all YAML files in a directory:** |
| 52 | + ```bash |
| 53 | + python3 clang_doc_parser.py ./docs output.json |
| 54 | + ``` |
| 55 | + |
| 56 | +3. **Filter by function name:** |
| 57 | + ```bash |
| 58 | + python3 clang_doc_parser.py index.yaml output.json --name-filter "Command" |
| 59 | + ``` |
| 60 | + This will only include functions with "Command" in their name. |
| 61 | + |
| 62 | +4. **Filter by source file prefix:** |
| 63 | + ```bash |
| 64 | + python3 clang_doc_parser.py index.yaml output.json --file-filter "commands" |
| 65 | + ``` |
| 66 | + This will only include functions from files starting with "commands" (e.g., `commands.cpp`). |
| 67 | + |
| 68 | +5. **Combine filters:** |
| 69 | + ```bash |
| 70 | + python3 clang_doc_parser.py index.yaml output.json --name-filter "Add" --file-filter "commands" |
| 71 | + ``` |
| 72 | + |
| 73 | +### Command-Line Options |
| 74 | + |
| 75 | +- `--name-filter` / `-n`: Filter functions by name (case-insensitive substring match) |
| 76 | +- `--file-filter` / `-f`: Filter functions by source filename prefix |
| 77 | +- `--help` / `-h`: Show help message |
| 78 | + |
| 79 | +## Integration with CMake |
| 80 | + |
| 81 | +To integrate clang-doc into your CMake project: |
| 82 | + |
| 83 | +```cmake |
| 84 | +find_program(CLANG_DOC clang-doc) |
| 85 | +if(CLANG_DOC) |
| 86 | + add_custom_target(docs |
| 87 | + COMMAND ${CLANG_DOC} |
| 88 | + --executor=all-TUs |
| 89 | + -p ${CMAKE_CURRENT_BINARY_DIR} |
| 90 | + --output=${CMAKE_CURRENT_SOURCE_DIR}/docs |
| 91 | + --extra-arg=-Wno-error |
| 92 | + --format=yaml |
| 93 | + ${CMAKE_SOURCE_DIR}/src/*.cpp |
| 94 | + WORKING_DIRECTORY ${CMAKE_BINARY_DIR} |
| 95 | + COMMENT "Generating documentation with clang-doc" |
| 96 | + ) |
| 97 | +endif() |
| 98 | +``` |
| 99 | + |
| 100 | +Then run: |
| 101 | +```bash |
| 102 | +cmake --build . --target docs |
| 103 | +python3 clang_doc_parser.py docs/index.yaml exported_functions.json |
| 104 | +``` |
| 105 | + |
| 106 | +## Output Format |
| 107 | + |
| 108 | +The script generates JSON with comprehensive type information including enum structures and function prototypes. |
| 109 | + |
| 110 | +### Basic Function Example |
| 111 | + |
| 112 | +```json |
| 113 | +{ |
| 114 | + "name": "AddAdminCommand", |
| 115 | + "group": "Commands", |
| 116 | + "description": "Creates a console command as an administrative command.", |
| 117 | + "funcName": "AddAdminCommand", |
| 118 | + "paramTypes": [ |
| 119 | + { |
| 120 | + "name": "name", |
| 121 | + "type": "string", |
| 122 | + "ref": false, |
| 123 | + "description": "The name of the console command." |
| 124 | + } |
| 125 | + ], |
| 126 | + "retType": { |
| 127 | + "type": "bool", |
| 128 | + "description": "A boolean indicating whether the command was successfully added." |
| 129 | + } |
| 130 | +} |
| 131 | +``` |
| 132 | + |
| 133 | +### Parameter with Enum Structure |
| 134 | + |
| 135 | +When a parameter uses an enum type, the full enum definition is included: |
| 136 | + |
| 137 | +```json |
| 138 | +{ |
| 139 | + "name": "mode", |
| 140 | + "type": "uint8", |
| 141 | + "ref": false, |
| 142 | + "description": "Whether the hook was in post mode (after processing) or pre mode (before processing).", |
| 143 | + "enum": { |
| 144 | + "name": "HookMode", |
| 145 | + "description": "Enum representing the type of callback.", |
| 146 | + "values": [ |
| 147 | + { |
| 148 | + "name": "Pre", |
| 149 | + "value": 0, |
| 150 | + "description": "Callback will be executed before the original function" |
| 151 | + }, |
| 152 | + { |
| 153 | + "name": "Post", |
| 154 | + "value": 1, |
| 155 | + "description": "Callback will be executed after the original function" |
| 156 | + } |
| 157 | + ] |
| 158 | + } |
| 159 | +} |
| 160 | +``` |
| 161 | + |
| 162 | +### Parameter with Function Prototype |
| 163 | + |
| 164 | +When a parameter is a function pointer typedef, the complete function signature is included: |
| 165 | + |
| 166 | +```json |
| 167 | +{ |
| 168 | + "name": "callback", |
| 169 | + "type": "function", |
| 170 | + "ref": false, |
| 171 | + "description": "A callback function that is invoked when the command is executed.", |
| 172 | + "prototype": { |
| 173 | + "name": "CommandCallback", |
| 174 | + "funcName": "CommandCallback", |
| 175 | + "description": "Handles the execution of a command triggered by a caller.", |
| 176 | + "paramTypes": [ |
| 177 | + { |
| 178 | + "name": "param1", |
| 179 | + "type": "int32", |
| 180 | + "ref": false |
| 181 | + }, |
| 182 | + { |
| 183 | + "name": "param2", |
| 184 | + "type": "int32", |
| 185 | + "ref": false, |
| 186 | + "enum": { |
| 187 | + "name": "CommandCallingContext", |
| 188 | + "description": "The command execution context.", |
| 189 | + "values": [ |
| 190 | + { |
| 191 | + "name": "Console", |
| 192 | + "value": 0, |
| 193 | + "description": "The command execute from the client's console." |
| 194 | + } |
| 195 | + ] |
| 196 | + } |
| 197 | + } |
| 198 | + ], |
| 199 | + "retType": { |
| 200 | + "type": "int32", |
| 201 | + "enum": { |
| 202 | + "name": "ResultType", |
| 203 | + "description": "Enum representing the possible results of an operation.", |
| 204 | + "values": [...] |
| 205 | + } |
| 206 | + } |
| 207 | + } |
| 208 | +} |
| 209 | +``` |
| 210 | + |
| 211 | +### Field Descriptions |
| 212 | + |
| 213 | +- `name`: Function name |
| 214 | +- `group`: Derived from the source filename (e.g., "Commands" from `commands.cpp`) |
| 215 | +- `description`: Brief description from Doxygen comments |
| 216 | +- `funcName`: Function name (same as `name`) |
| 217 | +- `paramTypes`: Array of parameter objects: |
| 218 | + - `name`: Parameter name |
| 219 | + - `type`: Mapped type (e.g., "string", "int32", "vec3[]", "function") |
| 220 | + - `ref`: Boolean indicating if it's a reference parameter |
| 221 | + - `description`: Parameter description from Doxygen comments (if available) |
| 222 | + - `enum`: (Optional) Full enum structure if parameter is an enum type |
| 223 | + - `prototype`: (Optional) Full function signature if parameter is a function pointer typedef |
| 224 | +- `retType`: Return type object: |
| 225 | + - `type`: Mapped return type |
| 226 | + - `description`: Return description from Doxygen comments (if available) |
| 227 | + - `enum`: (Optional) Full enum structure if return type is an enum |
| 228 | + |
| 229 | +## Advanced Features |
| 230 | + |
| 231 | +### Enum Structure Generation |
| 232 | + |
| 233 | +The parser automatically detects enum types and includes their complete definition: |
| 234 | +- Enum name and description |
| 235 | +- All enum values with their numeric values |
| 236 | +- Per-value descriptions (if documented) |
| 237 | +- Automatically filters out sentinel values like "Count", "MAX", "INVALID" |
| 238 | + |
| 239 | +### Function Pointer Typedef Parsing |
| 240 | + |
| 241 | +The parser can parse function pointer typedefs from their underlying signature: |
| 242 | +- Extracts return type and parameter types |
| 243 | +- Recursively processes parameter types (including nested enums) |
| 244 | +- Generates parameter names (param1, param2, etc.) |
| 245 | +- Includes full function description from typedef documentation |
| 246 | + |
| 247 | +**Note**: Parameter names in function prototypes are auto-generated as `param1`, `param2`, etc. To include meaningful parameter names and descriptions, you'll need to add them manually or extend the parser to read from additional documentation sources. |
| 248 | + |
| 249 | +### Nested Enum Support |
| 250 | + |
| 251 | +Enums can appear at multiple levels: |
| 252 | +- Function parameters |
| 253 | +- Function return types |
| 254 | +- Function prototype parameters (within typedef) |
| 255 | +- Function prototype return types (within typedef) |
| 256 | + |
| 257 | +All enum references include the complete enum structure with values and descriptions. |
| 258 | + |
| 259 | +## Type Mapping |
| 260 | + |
| 261 | +The script maps C++ types to simplified types: |
| 262 | + |
| 263 | +| C++ Type | Mapped Type | |
| 264 | +|----------|-------------| |
| 265 | +| `plg::string`, `const plg::string &` | `string` | |
| 266 | +| `plg::vector<plg::string>` | `string[]` | |
| 267 | +| `int`, `int32_t`, `long` | `int32` | |
| 268 | +| `int64_t`, `long long` | `int64` | |
| 269 | +| `bool` | `bool` | |
| 270 | +| `float` | `float` | |
| 271 | +| `double` | `double` | |
| 272 | +| `void*` | `ptr64` | |
| 273 | +| `plg::vector<int>` | `int32[]` | |
| 274 | +| `Vector`, `QAngle`, `plg::vec3` | `vec3` | |
| 275 | +| Enums | Base type (e.g., `uint8`, `int32`) + enum structure | |
| 276 | +| Function pointer typedefs | `function` + prototype structure | |
| 277 | +| Other typedefs | `?` | |
| 278 | + |
| 279 | +Pointers (except `void*`) are mapped to `ptr64`, and unknown types default to `?`. |
| 280 | + |
| 281 | +## Advantages over Previous Parser |
| 282 | + |
| 283 | +1. **More accurate parsing**: Leverages Clang's AST instead of regex-based parsing |
| 284 | +2. **Better enum handling**: Automatically resolves enum base types and includes full enum definitions |
| 285 | +3. **Function typedef support**: Parses function pointer signatures from typedefs |
| 286 | +4. **Structured documentation**: Extracts Doxygen comments in a more reliable way |
| 287 | +5. **Type safety**: Uses Clang's type system for accurate type information |
| 288 | +6. **Better scalability**: Can handle complex C++ constructs that regex parsing struggled with |
| 289 | +7. **Nested type support**: Handles enums within function prototypes and other complex scenarios |
| 290 | + |
| 291 | +## Troubleshooting |
| 292 | + |
| 293 | +### No functions exported |
| 294 | +- Ensure your YAML file contains a `ChildFunctions` section |
| 295 | +- Check that your filters aren't too restrictive |
| 296 | +- Verify the YAML file is properly formatted |
| 297 | + |
| 298 | +### Type showing as '?' |
| 299 | +- The type is not in the type mapping dictionary |
| 300 | +- You can add custom type mappings to the `map_type()` function |
| 301 | + |
| 302 | +### Missing descriptions |
| 303 | +- Ensure your source files have proper Doxygen comments with `@brief`, `@param`, and `@return` tags |
| 304 | +- Verify clang-doc is extracting the comments (check the YAML file) |
| 305 | + |
| 306 | +### Missing enum structures |
| 307 | +- Verify the enum is defined in the YAML file's `ChildEnums` section |
| 308 | +- Ensure the enum has proper Doxygen documentation |
| 309 | +- Check that enum values are documented |
| 310 | + |
| 311 | +### Function prototype parameters have generic names |
| 312 | +- This is expected behavior - clang-doc doesn't preserve parameter names in typedef signatures |
| 313 | +- Parameter names are auto-generated as `param1`, `param2`, etc. |
| 314 | +- To add meaningful names, you can either: |
| 315 | + - Manually edit the JSON output |
| 316 | + - Extend the parser to read parameter metadata from additional sources |
| 317 | + - Use wrapper functions with documented parameters instead of raw typedefs |
| 318 | + |
| 319 | +## Configuration |
| 320 | + |
| 321 | +### Filtering Sentinel Enum Values |
| 322 | + |
| 323 | +By default, the parser filters out common sentinel enum values like "Count", "MAX", "INVALID", etc. You can customize this behavior by modifying the `sentinel_names` set in the `build_enum_structure()` function: |
| 324 | + |
| 325 | +```python |
| 326 | +sentinel_names = {'Count', 'MAX', 'Max', 'INVALID', 'Invalid', 'NUM', 'Num'} |
| 327 | +``` |
| 328 | + |
| 329 | +## License |
| 330 | + |
| 331 | +This script is part of the Plugify project. |
0 commit comments