This directory contains Python scripts for extracting and processing Digital Adaptation Kit (DAK) content from WHO SMART Guidelines. These scripts form a comprehensive extraction pipeline that transforms various input formats (Excel files, BPMN diagrams, CQL logic, SVG images, etc.) into FHIR-compatible resources and Implementation Guide content.
The extraction pipeline is designed to process L2 (DAK) content and generate the structured artifacts needed for L3 (FHIR Implementation Guide) content, facilitating the creation of computable clinical guidelines.
- Python 3.x
- Required Python dependencies (see installation below)
- A WHO SMART Guideline FHIR IG repository structure
Before running the extraction scripts, install the required Python dependencies:
# From the root of any SMART Guideline repository
pip install -r ../smart-base/input/scripts/requirements.txtOr if running from within the same repository:
pip install -r input/scripts/requirements.txtTo extract DAK content from any WHO SMART Guideline repository, run the main extraction script from the root of the target guideline repository:
# Example: Extract DAK content from smart-immunizations
gh repo clone WorldHealthOrganization/smart-base
gh repo clone WorldHealthOrganization/smart-immunizations
cd smart-immunizations
# Install dependencies
pip install -r ../smart-base/input/scripts/requirements.txt
# Run the extraction
python ../smart-base/input/scripts/extract_dak.pyThe script will orchestrate the entire extraction pipeline, processing all available content types and generating FHIR resources in the appropriate directories of the current working directory (the target guideline repository).
The extract_dak.py script can be run against any WHO SMART Guideline repository. Simply:
- Clone the smart-base repository (contains the extraction scripts)
- Clone or navigate to your target guideline repository
- Install the dependencies from smart-base
- Run the extraction script from the target repository, pointing to the smart-base scripts
# Example with different guideline repositories:
# For smart-malaria:
cd smart-malaria
python ../smart-base/input/scripts/extract_dak.py
# For smart-hiv:
cd smart-hiv
python ../smart-base/input/scripts/extract_dak.pyThe extraction will process DAK content from the current directory and generate FHIR resources appropriate for that specific guideline.
| File Name | Goal | Inputs | Outputs |
|---|---|---|---|
codesystem_manager.py |
Manages FHIR CodeSystem and ValueSet resources by registering, merging, and rendering codes and properties for DAKs. | Code system IDs, titles, codes, display names, definitions, designations, properties; uses stringer for escaping/hashing. |
FHIR CodeSystem and ValueSet FSH representations stored in dictionaries or rendered for implementation guides. |
bpmn_extractor.py |
Extracts business process data from BPMN files and transforms them into FHIR FSH format using bpmn2fhirfsh.xsl. |
BPMN files (*.bpmn) from input/business-processes/, bpmn2fhirfsh.xsl, installer object. |
FHIR FSH resources (e.g., SGRequirements, SGActor) stored via installer.add_resource, logs transformation success/failure. |
dd_extractor.py |
Extracts data dictionary entries from Excel files, generating FHIR ValueSets linked to business processes, tasks, decision tables, and indicators. | Excel files (*.xlsx) from input/dictionary/, cover sheet with tab names/descriptions, installer object. |
FHIR ValueSet FSH representations stored via installer.add_resource, logs extraction details. |
DHIExtractor.py |
Extracts digital health intervention (DHI) classifications and categories from text files, creating FHIR CodeSystems, ValueSets, and ConceptMaps. | Text files (system_categories.txt, dhi_v1.txt) from input/data/, installer object. |
FHIR CodeSystem, ValueSet, ConceptMap FSH representations stored via installer.add_resource, logs extraction details. |
extractor.py |
Base class for extracting data from various sources (e.g., Excel, BPMN), providing utility functions for data frame processing and logging. | Input file paths, column mappings, sheet names, installer object; subclasses define specific inputs. |
Processed data frames with normalized columns, logs, resources stored via installer (specific to subclasses). |
extract_dhi.py |
Orchestrates extraction of DHI data using DHIExtractor, coordinating with installer to process and store results. |
Command-line arguments (optional, e.g., --help), text files via DHIExtractor. |
Installed FHIR resources via installer.install(), logs success/failure, exits with status code. |
dt_extractor.py |
Extracts decision table logic from Excel and CQL files, generating FHIR ValueSets, PlanDefinitions, ActivityDefinitions, and DMN representations. | Excel files (*.xlsx) from input/decision-logic/, CQL files (*.cql) from input/cql/, dmn2html.xslt, installer object. |
FHIR ValueSet, PlanDefinition, ActivityDefinition FSH, DMN XML, markdown pages stored via installer.add_resource/add_page, logs details. |
extract_dak.py |
Orchestrates extraction of DAK content by coordinating multiple extractors (data dictionary, BPMN, SVG, requirements, decision tables, personas). | Command-line arguments (optional, e.g., --help), files processed by extractors (dd_extractor, etc.). |
Installed FHIR resources via installer.install(), logs success/failure, exits with status code. |
installer.py |
Manages installation of FHIR resources, pages, CQL files, and DMN tables, handling transformations (e.g., via bpmn2fhirfsh.xsl, dmn2html.xslt, svg2svg.xsl) and storage. |
FHIR resources, CQL content, markdown pages, DMN XML, XSLT files, sushi-config.yaml, multifile.xsd, aliases. |
Installed files in input/fsh/, input/cql/, input/dmn/, input/pagecontent/, logs installation success/failure. |
req_extractor.py |
Extracts functional and non-functional requirements from Excel files, generating FHIR Requirement and ActorDefinition resources. | Excel files (*.xlsx) from input/system-requirements/, functional/non-functional sheet column mappings, installer object. |
FHIR Requirement, ActorDefinition FSH stored via installer.add_resource, CodeSystem/ValueSet for categories, logs extraction details. |
svg_extractor.py |
Extracts and transforms SVG files from business processes into FHIR-compatible formats using svg2svg.xsl. |
SVG files (*.svg) from input/business-processes/, svg2svg.xsl, installer object. |
Transformed SVG files stored in input/images/, logs transformation success/failure. |
stringer.py |
Provides utility functions for string manipulation, including escaping, hashing, and ID normalization for FHIR resource generation. | Strings for escaping (XML, markdown, code, rulesets), names for ID conversion, inputs for blank/dash checks. | Escaped strings, hashed IDs, normalized IDs, logs for long ID hashing or errors. |
multifile_processor.py |
Processes multifile XML to apply file changes to a Git repository, handling branching, committing, and pushing. | Multifile XML (<path_to_multifile.xml>) with file paths, content, diff formats, Git repository context. |
Updated files in repository, Git commits/pushes, logs for parsing and Git operation success/failure. |
generate_valueset_schemas.py |
Generates JSON schemas from FHIR IG publisher expansions.json output, creating enum-based schemas for ValueSet codes. | FHIR expansions.json Bundle with ValueSet resources containing expanded codes. | JSON Schema files with enum constraints for each ValueSet, logs processing details. |
extractpr.py |
Extracts personas/actors content from PDF files containing SMART Guidelines documentation, focusing on Generic Personas and Related Personas tables. | PDF files (*.pdf) from input/personas/, installer object. |
FHIR ActorDefinition FSH resources stored via installer.add_resource, CodeSystem for persona types, logs extraction details. |
includes/bpmn2fhirfsh.xsl |
Transforms BPMN XML into FHIR FSH, generating resources like Requirements, Actors, Questionnaires, and Decision Tables for business processes. | BPMN XML from input/business-processes/*.bpmn, processed via installer.transform_xml. |
FHIR FSH resources (e.g., SGRequirements, SGActor) stored via installer.add_resource, with links to CodeSystems and StructureDefinitions. |
includes/dmn2html.xslt |
Transforms DMN XML into HTML for displaying decision tables in implementation guides, including decision IDs, rules, triggers, inputs, and outputs. | DMN XML from installer.add_dmn_table, processed via installer.transform_xml. |
HTML files in input/pagecontent/ (e.g., <id>.xml), with links to FHIR CodeSystems, logs transformation details. |
includes/svg2svg.xsl |
Transforms SVG files to ensure compatibility with FHIR implementation guides, likely preserving or modifying business process visualizations. | SVG XML content from input/business-processes/*.svg, processed via installer.transform_xml. |
Transformed SVG files stored in input/images/, compatible with FHIR rendering. |
extract_dak.py- Main orchestrator coordinating all extraction processesinstaller.py- Resource manager handling FHIR installation and transformationsextractor.py- Base class providing common functionality for specialized extractors
dd_extractor.py- Data Dictionary extraction from Excel filesreq_extractor.py- Requirements processing for functional/non-functional specsbpmn_extractor.py- Business Process transformation from BPMN to FHIRdt_extractor.py- Decision Tables conversion to computable formatssvg_extractor.py- Graphics processing for IG compatibilityDHIExtractor.py- Digital Health Interventions classification extractionextractpr.py- Personas extraction from PDF documents
codesystem_manager.py- Terminology management for CodeSystems and ValueSetsstringer.py- String manipulation utilities for FHIR resource generationmultifile_processor.py- Git integration for automated repository workflows
generate_valueset_schemas.py- JSON Schema generation from IG publisher expansions.json outputgenerate_logical_model_schemas.py- JSON Schema generation from StructureDefinition JSON files for logical models
| Directory/File | Purpose |
|---|---|
xsd/ |
Contains XSD schema files for DMN and other XML validation |
includes/multifile.xsd |
Schema for multifile XML processing |
- Data Dictionary Processing (
dd_extractor.py): Extracts terminology and value sets from Excel files - Requirements Processing (
req_extractor.py): Converts functional requirements into FHIR resources - Business Process Processing (
bpmn_extractor.py): Transforms BPMN workflows into FHIR actors and requirements - Decision Logic Processing (
dt_extractor.py): Converts decision tables and CQL into executable FHIR resources - Visual Content Processing (
svg_extractor.py): Optimizes diagrams for IG presentation - Personas Processing (
extractpr.py): Extracts actor definitions from PDF documentation - Resource Installation (
installer.py): Coordinates final resource generation and file organization
The extraction process generates content in the following directories:
input/fsh/- FHIR Shorthand (FSH) resource definitionsinput/cql/- Clinical Quality Language filesinput/pagecontent/- Markdown pages and HTML contentinput/images/- Processed SVG diagramsinput/dmn/- Decision Model and Notation files
Note: These DAK extraction scripts are currently hosted in this repository as a convenience. They will be migrated to their own dedicated repository in the future to better separate the core FHIR profiles from the extraction tooling.
extract_dhi.py- Standalone script for Digital Health Intervention extractioncheck_pages.sh- Shell script for page validation
generate_valueset_schemas.py- Generate JSON schemas from IG publisher output
The generate_valueset_schemas.py script processes the expansions.json file generated by the FHIR IG publisher and creates JSON schemas for each ValueSet using enum constraints.
Usage:
# Using default paths (output/expansions.json -> output/)
python input/scripts/generate_valueset_schemas.py
# Specifying input file only (output dir defaults to output/)
python input/scripts/generate_valueset_schemas.py path/to/expansions.json
# Specifying both input and output paths
python input/scripts/generate_valueset_schemas.py path/to/expansions.json path/to/output/dirOutput:
- Creates three files per ValueSet:
ValueSet-{id}.schema.json- JSON schema with enum validationValueSet-{id}.displays.json- Display values with multilingual supportValueSet-{id}.system.json- System URI mappings
- Creates an index.html file with links to all generated schemas
- Schema files use enum to constrain values to the expanded codes and reference display/system files
- Display files use multilingual structure to support translations
- Includes FHIR metadata (ValueSet URL, expansion timestamp, etc.)
Example generated files:
Schema file (ValueSet-example.schema.json):
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "http://smart.who.int/base/ValueSet-example.schema.json",
"title": "Example ValueSet Schema",
"description": "JSON Schema for Example ValueSet codes. Generated from FHIR expansions.",
"type": "string",
"enum": ["code1", "code2", "code3"],
"fhir:displays": "http://smart.who.int/base/ValueSet-example.displays.json",
"fhir:system": "http://smart.who.int/base/ValueSet-example.system.json",
"fhir:valueSet": "http://smart.who.int/base/ValueSet/example",
"fhir:expansionTimestamp": "2023-01-01T00:00:00Z"
}Display file (ValueSet-example.displays.json):
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "http://smart.who.int/base/ValueSet-example.displays.json",
"title": "Example ValueSet Display Values",
"description": "Display values for Example ValueSet codes. Generated from FHIR expansions.",
"fhir:displays": {
"code1": {"en": "Display One"},
"code2": {"en": "Display Two"},
"code3": {"en": "Display Three"}
},
"fhir:valueSet": "http://smart.who.int/base/ValueSet/example"
}The generate_logical_model_schemas.py script processes JSON StructureDefinition files generated by the FHIR IG Publisher for FHIR Logical Models and generates JSON schemas for each Logical Model with support for ValueSet bindings.
Usage:
# Using default paths (output -> output/)
python input/scripts/generate_logical_model_schemas.py
# Specifying input directory only (output dir defaults to current directory)
python input/scripts/generate_logical_model_schemas.py output
# Specifying both input and output paths
python input/scripts/generate_logical_model_schemas.py output/StructureDefinition output/schemasFeatures:
- Processes JSON StructureDefinition files with
"kind": "logical" - Maps FHIR datatypes to JSON Schema types (string, boolean, integer, etc.)
- Handles cardinality mapping (1..1 → required, 0..1 → optional, 0..* → array)
- Supports choice types with
oneOfconstraints - Detects ValueSet bindings and creates
$refreferences to ValueSet schemas - Uses canonical URLs from StructureDefinition for schema
$id
Output:
- Creates one JSON schema file per Logical Model:
StructureDefinition-{model-name}.schema.json - Schema
$iduses the base URL withStructureDefinition-{model-name}.schema.jsonpattern to match FHIR canonicals - Includes FHIR metadata and references to ValueSet schemas where applicable
Example generated schema:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "http://smart.who.int/base/StructureDefinition-Animal.schema.json",
"title": "Animal",
"description": "Logical Model for representing animals",
"type": "object",
"properties": {
"name": { "type": "string" },
"species": { "$ref": "ValueSet-AnimalSpeciesVS.schema.json" },
"age": { "type": "integer" }
},
"required": ["name", "species"],
"fhir:logicalModel": "http://smart.who.int/base/StructureDefinition/Animal"
}For questions or issues with the DAK extraction scripts, please refer to the main repository documentation or submit an issue.