This repository provides a Python-based tool and sample code for extracting use-case-based macro services from monolithic Java applications produced by AWS Transform Refactor. Given a list of programs for a use case, the tool automatically resolves dependencies and generates a standalone, deployable Java project containing only the required programs.
Input: Large Java codebase (AWS Transform refactored application) and CSV list of programs to extract
Output: Smaller, focused Maven project containing only the programs needed for a specific use case, with minimal dependencies
- Dependency Analysis: Analyzes Java import statements and program dependencies
- Program Identification: Extracts program identifiers from the Java codebase
- Multiple Split Strategies: Supports symbolic link copies (
split_copy) and full copies (full_split_copy) - DAO Management: Handles Data Access Object dependencies automatically
- Web Component Support: Optional web module splitting
- Groovy Support: Analyzes Groovy script dependencies
- Configuration-Driven: YAML-based configuration system
├── macroservice-extractor/ # Python extraction tool (source code, configs, workspace)
├── modernized-card-demo/ # Sample modernized Java codebase (CardDemo application)
├── card-demo-macroservice/ # Example extraction output
- Python 3.10 or higher
- Java 21 (Amazon Corretto recommended)
- Apache Maven 3.6+
- AWS Transform refactored application (Gapwalk framework)
- Dependency data AWS Transform
-
Install Python dependencies:
pip install -r macroservice-extractor/Java_Code_Spliter/requirements.txt
-
Configure the project settings:
- Edit
macroservice-extractor/ressources/project_config.ymlwith your project paths - Edit
macroservice-extractor/command.ymlwith extraction parameters
- Edit
-
Create a programs CSV listing the programs for your use case (one per line) in
macroservice-extractor/Workspace/CSV/ -
Run the extraction:
python macroservice-extractor/Java_Code_Spliter/src/MainCommandLine.py -s macroservice-extractor/command.yml
-
Verify the output in the extraction path specified in
command.yml
The tool requires two separate YAML configuration files to operate:
This file contains project-specific settings that describe your Java codebase structure and dependencies.
workspace_path: /path/to/macroservice-extractor/Workspace
dependencies_bluage_path: /path/to/dependencies.csv
project_name: app
project_path: /path/to/modernized-java-project
project_library_name: com/company/app
groovy_path: /path/to/groovy/scripts
sub_project_list:
- entities
- service
runtime_included:
- service:
- program/utils/**.*
- servlet/**.*-
workspace_path: Directory where the tool stores cached analysis files (pickle files)
-
dependencies_bluage_path: Path to the CSV file containing dependency information. The dependency data comes from the output artifacts of the AWS Transform analyze job (from the AWS Transform web application). Use
jsonConverter.pyto convert the JSON output to the CSV format expected by this tool. -
project_name: Name of your Java project (typically the root module name, e.g.,
appforapp-pom,app-service,app-entities) -
project_path: Full path to your Java project codebase directory (the folder that contains
[project_name]-pom/) -
project_library_name: Java package structure using forward slashes
- Example:
com/company/appfor packagecom.company.app - To verify: check any Java file's
packagedeclaration
- Example:
-
groovy_path: Path to Groovy script files (if your project uses Groovy). Leave empty or omit if not using Groovy.
-
sub_project_list: List of Maven submodules the tool should scan for dependencies and extract code from
- Common examples:
entities,service
- Common examples:
-
runtime_included: Runtime dependencies for each subproject that must always be included regardless of import analysis
- Specifies which files/packages should be included at runtime for each module
- Uses glob patterns (e.g.,
**.*for all files in subdirectories)
-
dao_ids_included (optional): List of Data Access Object identifiers to always include
This file contains the execution parameters that control how the splitter runs.
action_type: split_copy
project_config_path: /path/to/project_config.yml
extraction_path: /path/to/output/
split_csv_path: /path/to/programs.csv
to_split_web: false
to_delete: true
to_load_dependencies: true-
action_type: The operation to perform
analyse— Scan all Java files and build the dependency graph. Run this first or whenever the source code changes. Results are cached for subsequent operations.split_copy— Extract a macro service using symbolic links. Each Java file in the output links back to the source file, so code changes automatically reflect in the main codebase. Recommended for development.full_split_copy— Extract a macro service using full file copies. No symbolic links — changes in the macro service do NOT reflect in the source. Use for production deployment.
-
project_config_path: Absolute path to the project configuration YAML file
-
extraction_path: Directory where the extracted macro service will be created
-
split_csv_path: Path to CSV file containing the list of programs to extract
-
to_split_web: Boolean flag to include web components in the extraction
true: Include the Angular web module (needed if the use case has frontend screens)false: Exclude the web module (use for backend-only / batch use cases)
-
to_delete: Boolean flag to delete existing output folder before extraction
true: Delete and recreate output directory (recommended for clean runs)false: Keep existing files (may cause conflicts)
-
to_load_dependencies: Boolean flag to load additional dependencies from the dependency CSV
true: Use the dependency CSV to resolve the full program call chain from entry points (recommended)false: Only extract the exact programs listed in the CSV without following dependency chains
A simple CSV file listing the programs you want to extract, one program name per line:
COSGN00C
CORPT00C
COTRN00C
You can create this file within macroservice-extractor/Workspace/CSV/ folder (sample provided at macroservice-extractor/Workspace/CSV/cc00.csv).
- You create both YAML files with your project settings
- The
command.ymlreferences theproject_config.ymlvia theproject_config_pathparameter - When you run the splitter, you pass the
command.ymlfile - The tool loads the project configuration from
project_config.yml - The splitter executes the specified action using both configurations
python macroservice-extractor/Java_Code_Spliter/src/MainCommandLine.py -s macroservice-extractor/command.ymlThe dependency data comes from the output artifacts of the AWS Transform analyze job (available in the AWS Transform web application). Convert the JSON output to the CSV format expected by this tool:
python macroservice-extractor/jsonConverter.py <input>.json <output>.csv<input>.json: Path to the AWS Transform JSON output file containing dependency information<output>.csv: Path where the converted CSV file will be saved
The converted CSV file should then be referenced by the dependencies_bluage_path parameter in project_config.yml.
- Program.py: Represents individual programs with identifiers
- Split.py: Contains split configuration and dependencies
- ProjectConfig.py: Project-wide configuration settings
- SplitConfig.py: Split-specific configuration
- DependenciesAnalysis.py: Dependency analysis results
- Spliter.py: Abstract base class for all splitters
- SymbolicCopySpliter.py: Creates symbolic links for split projects
- FullCopySpliter.py: Creates complete copies of split projects
- DependenciesAnalyser.py: Main dependency analysis engine
- ProgramId.py: Program identification and mapping
- GrooviesAnalyser.py: Groovy file analysis
- MainCommandLine.py: CLI entry point
macroservice-extractor/Java_Code_Spliter/src/
├── dataClass/ # Data models and configurations
├── Spliter/ # Core splitting logic
├── utils/ # Utility functions and helpers
├── DependenciesAnalyser.py # Main analysis engine
├── GrooviesAnalyser.py # Groovy analysis
├── MainCommandLine.py # CLI entry point
└── ProgramId.py # Program identification
- Import Analysis: Scans all Java files for import statements
- DAO Extraction: Identifies Data Access Object dependencies via
FileIdsreferences - Cross-Reference Resolution: Resolves dependencies between modules
- Circular Dependency Detection: Identifies and handles circular references
- Program Grouping: Groups related programs together based on the CSV input
- Dependency Resolution: For each program, resolves all transitive dependencies (imports, DAOs, runtime includes)
- Module Creation: Copies (or symlinks) only the required files into a new Maven project structure
- Configuration Generation: Generates Maven pom.xml and build configurations for the extracted module
The tool includes comprehensive error handling for:
- Missing configuration files
- Invalid CSV formats
- Circular dependencies
- File access issues
- Module path problems
Configurable logging system with:
- Debug mode (
--debug) for detailed analysis output - Warning messages for potential issues (missing programs, unresolved dependencies)
- Error reporting with context
- Progress tracking for long operations
- Log output written to
macroservice-extractor/app.log
This project does not deploy any cloud resources. To clean up locally, delete the extraction output folder and any generated .pkl cache files in macroservice-extractor/Workspace/.
See SECURITY.md for security policy and production hardening recommendations. Additionally, see CONTRIBUTING for more information.
This project is licensed under the MIT-0 License. See the LICENSE file.