This repository contains a converter that reads a structured Excel workbook and produces a merged JSON file of resource cards used by the Dev Hub.
This README explains how to set up, run, and customize the converter step-by-step for a new contributor.
- Python 3.8+ installed
- Create a virtual environment (recommended)
- Install dependencies:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txtexcel_to_json_converter.py: Main converter script.config.py: Configuration for sheet mappings, validation, and tag inference.app-dev-hub-v5.xlsx: Input Excel workbook (default path inconfig.py).output/: Directory where generated files are written:templates.json- generated JSON outputvalidation_report.txt- validation summarystatistics.json- conversion statsconverter.log- detailed run log- keep the source excel file in root
- Activate virtual environment (see above).
- Place your Excel workbook at the path specified by
config.INPUT_FILEor pass it with--input. - Run the converter:
python excel_to_json_converter.py- Optional: disable tag inference for a single run:
python excel_to_json_converter.py --no-inference--input: Path to the input Excel file (default fromconfig.py).--output: Path to output JSON file.--validate-only: Validate an existing JSON output instead of converting.--verbose: Enable debug logging.--no-inference: Disable automatic tag inference for this run.
- First pass:
Learning Pathsheets are processed to collect learning-path metadata keyed byLink. - Second pass: Content sheets (App Dev, AI, Analytics, Blogs, Samples) are processed. Each row is keyed by
Linkand merged into a single resource card. - Tags are built from explicit tag columns,
featuredflag, and learning-path title (normalized). Tag inference can be enabled/disabled viaconfig.pyor--no-inference. - When a Link appears in both LP and a content sheet, the content sheet's
TitleandDescriptiontake precedence for the maintitle/descriptionfields; LP metadata is attached aslearningPathTitle,learningPathDescription, and optionaltileNumber.
SHEETS_CONFIG: Maps Excel sheet names to behavior. Edit this if your sheet names differ.TAG_INFERENCE['enabled']: Turn on/off inference. The project default can be changed here.VALIDATION: Adjust required fields, URL schemes, and minimum tags.
-
Learning Path subsheet:
Learning Path Title: section title used for LP taggingCard title: becomeslearningPathTitleDescription: becomeslearningPathDescriptionLink: canonical URL (used as unique key)No.: optional tile number for LP entriesFeatured: optionalYesto mark as featured
-
Content subsheets (App Dev, AI, Blogs, ...):
Title,Link,Description, optionalTags,Featured,Priority/Importance,Last Updated Date
- Explicit tags: from
Tagscolumn and anyUnnamedcolumns — multiple tags can be separated by,or;. - Featured: any row with
Featuredvalueyes(case-insensitive) will include thefeaturedtag. - Learning Path tag: normalized value of
Learning Path Titleis added for LP entries. - Inference: when enabled, tags may also be inferred from URL/title/description patterns configured in
config.TAG_INFERENCE.
- If the converter complains about missing required fields, open
config.pyand inspectVALIDATION['required_fields']. - If
templates.jsonlooks wrong, checkoutput/converter.logfor warnings andoutput/validation_report.txtfor errors.
Generate JSON and save outputs:
python excel_to_json_converter.py --input app-dev-hub-v5.xlsx --output output/templates.jsonValidate an existing output:
python excel_to_json_converter.py --validate-only --output output/templates.json- Make small, focused changes and run the converter locally to verify output.
- Update
README.mdwhen you add new sheets or change expected columns.
- Add unit tests for tag extraction and LP merging.
- Add an example Excel file with sample rows.
- Automate a pre-commit check that runs the converter in validation-only mode.
Created/updated by automation. Ask me to adjust wording or add examples you want included.
A robust Python application that converts Excel files containing developer hub content into structured JSON format with comprehensive tag processing, validation, and error handling.
- Overview
- Features
- Installation
- Quick Start
- Excel Structure
- Configuration
- Tag Processing
- Usage
- Output Structure
- Validation
- Testing
- Troubleshooting
- Contributing
This converter transforms Excel content (learning paths, blogs, app development resources, etc.) into a unified JSON array with enriched metadata and intelligent tag processing. It's designed for developer hub websites where content is managed in Excel for easy editing by non-technical users, and JSON is consumed by frontend applications.
- Multi-sheet Processing: Handles different content types (Learning Paths, Blogs, App Dev, AI, Analytics, Samples)
- Intelligent Tag Processing:
- Automatic tag normalization
- Tag inference from URLs and content
- Multi-source tag merging
- Comprehensive Validation: Entry-level and output-level validation
- Detailed Reporting: Statistics, validation reports, and conversion logs
- Error Recovery: Continues processing even if some rows fail
- Production-Ready: Robust error handling and logging
- Python 3.8 or higher
- pip (Python package installer)
pip install -r requirements.txtThis will install:
pandas- Excel file reading and data manipulationopenpyxl- Excel file support (.xlsx)jsonschema- JSON validation
-
Place your Excel file in the project directory (or specify path with
--input) -
Run the converter:
python excel_to_json_converter.py
-
Check the output in the
output/directory:templates.json- Converted JSON datastatistics.json- Conversion statisticsvalidation_report.txt- Validation resultsconverter.log- Detailed logs
- Purpose: Sequential learning content with tile numbers
- Columns:
Learning Path Title- Groups cardsNo.- Sequential number (not used)Card title- Entry title (REQUIRED)Link- URL (REQUIRED)Description- Detailed descriptionTags+Unnamed: 9-19- Tag columns
Special Behavior:
- Auto-adds
learning-pathtag - Auto-adds normalized learning path title as tag
- Auto-adds
flexibleservertag - Assigns sequential
tileNumberwithin each group
- Purpose: Content resources
- Columns:
Topic Area- Category (for context)Title- Entry title (REQUIRED)Link- URL (REQUIRED)Description- Detailed descriptionPriority/Importance- P0, P1, P2, etc.Last Updated Date- Date fieldTags+Unnamed: 9-19- Tag columns
Auto-added Tags:
- App Dev →
app-dev - AI →
genai - Analytics →
analytics
- Purpose: Blog content
- Columns: Same as content sheets, plus:
Featured- If "Yes", adds "featured" tag
- Auto-added Tags:
blog
- Purpose: Code samples
- Columns: Same as content sheets
- Auto-added Tags:
sample
Edit config.py to customize:
INPUT_FILE = "app-dev-hub-v5.xlsx"
OUTPUT_DIR = "output"
OUTPUT_FILE = f"{OUTPUT_DIR}/templates.json"SHEETS_CONFIG = {
'Learning Path': {
'type': 'learning_path',
'base_tags': ['learning-path'],
'default_tags': ['flexibleserver'],
'has_tile_numbers': True,
},
# ... more sheets
}TAG_INFERENCE = {
'enabled': True,
'url_patterns': {
'postgresql/flexible-server': 'flexibleserver',
'/python': 'python',
# ... more patterns
},
'title_patterns': {
'python': 'python',
'tutorial': 'tutorial',
# ... more patterns
}
}VALIDATION = {
'required_fields': ['title', 'website', 'source', 'tags', 'image', 'description'],
'url_schemes': ['http://', 'https://'],
'min_tags': 1,
'max_title_length': 200,
'max_description_length': 1000,
}- Base Tags: From sheet type (e.g.,
app-dev,genai,blog) - Explicit Tags: From Excel tag columns
- Inferred Tags: From URL and content analysis
Tags are automatically normalized:
- Convert to lowercase
- Replace spaces with hyphens
- Remove special characters
- Trim leading/trailing hyphens
Examples:
- "App Dev" → "app-dev"
- "How To" → "how-to"
- "Building GenAI Apps" → "building-genai-apps"
The converter automatically infers tags from:
URL Patterns:
postgresql/flexible-server→flexibleservergithub.com→github/python→python
Content Keywords:
- "tutorial" →
tutorial - "how to" →
how-to - "quickstart" →
quickstart - "python", "java", etc. → language tags
python excel_to_json_converter.pypython excel_to_json_converter.py --input myfile.xlsx --output result.jsonpython excel_to_json_converter.py --verbosepython excel_to_json_converter.py --no-inferencepython excel_to_json_converter.py --validate-only --output templates.json| Argument | Description | Default |
|---|---|---|
--input |
Input Excel file path | app-dev-hub-v5.xlsx |
--output |
Output JSON file path | output/templates.json |
--validate-only |
Only validate without converting | False |
--verbose |
Enable verbose logging | False |
--no-inference |
Disable tag inference | False |
Each entry in the output JSON array has this structure:
{
"title": "Build a CRUD app with Azure Database for PostgreSQL",
"website": "https://learn.microsoft.com/azure/postgresql/flexible-server/connect-python",
"source": "https://learn.microsoft.com/azure/postgresql/flexible-server/connect-python",
"tags": [
"learning-path",
"developing-core-applications",
"flexibleserver",
"python",
"how-to"
],
"image": "./img/build-a-crud-app-with-azure-database-for-postgresq.png",
"description": "Learn how to build a simple Python app...",
"tileNumber": 2,
"priority": "P0",
"date": "2025-11-25"
}title- Entry titlewebsite- URL to resourcesource- Same as websitetags- Array of tags (minimum 1)image- Generated image pathdescription- Description (can be empty string)
tileNumber- Sequential number (Learning Paths only)priority- P0, P1, P2, etc.date- Last updated date (YYYY-MM-DD format)
Image paths are auto-generated from titles:
"Build a CRUD App" → "./img/build-a-crud-app.png"
"AI & ML Guide!" → "./img/ai-ml-guide.png"
Each entry is validated for:
- All required fields present
- Valid URL format (http:// or https://)
- At least 1 tag
- Title ≤ 200 characters
- Description ≤ 1000 characters
The entire output is validated for:
- Duplicate titles (logged as warnings)
- All entries valid
- Proper JSON structure
The validation_report.txt file contains:
- Total entries count
- Valid vs. invalid entries
- List of errors
- List of warnings
python tests/test_converter.pyThe test suite includes:
- Tag normalization tests
- Tag merging tests
- Image path generation tests
- Date parsing tests
- Text cleaning tests
- Entry validation tests
- Tag inference tests
- Tag extraction tests
- Ensure the Excel file exists in the specified location
- Check the filename in
config.pymatches your file
- Verify your Excel file has the expected sheet names
- Update
SHEETS_CONFIGinconfig.pyto match your sheets
- Check that Title and Link columns have values
- Empty rows are automatically skipped
- URLs must start with
http://orhttps:// - Check for typos in the Link column
- Ensure tag columns contain data
- Check if tag inference is enabled
- Verify base tags are configured for the sheet
Enable verbose logging to see detailed processing information:
python excel_to_json_converter.py --verboseCheck the log file for details:
output/converter.log
The converter generates comprehensive statistics including:
- Total entries processed and output
- Entries by sheet
- Skip reasons breakdown
- Tag statistics (total, unique, top tags)
- Field coverage (priority, date, tileNumber)
- Validation results
Example:
{
"conversion_date": "2025-11-25T10:30:00",
"total_entries_output": 106,
"entries_by_sheet": {
"Learning Path": 23,
"App Dev": 34,
"AI": 41
},
"tag_statistics": {
"unique_tags": 48,
"top_tags": [
{ "tag": "flexibleserver", "count": 104 },
{ "tag": "how-to", "count": 60 }
]
}
}- Follow PEP 8 style guide
- Add docstrings to all functions
- Write unit tests for new features
- Update
config.pywith new settings - Implement feature in
excel_to_json_converter.py - Add tests in
tests/test_converter.py - Update this README
When reporting issues, include:
- Python version
- Error messages
- Sample Excel data (if possible)
- Steps to reproduce
This project is designed for internal use in developer hub content management.
Built for converting Excel-based content management to JSON format for modern web applications.
Need Help? Check the troubleshooting section or review the conversion logs in output/converter.log.