Excel to JSON Converter (Dev Hub)

This repository contains a converter that reads a structured Excel workbook and produces a merged JSON file of resource cards used by the Dev Hub.

This README explains how to set up, run, and customize the converter step-by-step for a new contributor.

Requirements

Python 3.8+ installed
Create a virtual environment (recommended)
Install dependencies:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

Files and Structure

excel_to_json_converter.py: Main converter script.
config.py: Configuration for sheet mappings, validation, and tag inference.
app-dev-hub-v5.xlsx: Input Excel workbook (default path in config.py).
output/: Directory where generated files are written:
- templates.json - generated JSON output
- validation_report.txt - validation summary
- statistics.json - conversion stats
- converter.log - detailed run log
- keep the source excel file in root

Quick Start

Activate virtual environment (see above).
Place your Excel workbook at the path specified by config.INPUT_FILE or pass it with --input.
Run the converter:

python excel_to_json_converter.py

Optional: disable tag inference for a single run:

python excel_to_json_converter.py --no-inference

CLI Options

--input: Path to the input Excel file (default from config.py).
--output: Path to output JSON file.
--validate-only: Validate an existing JSON output instead of converting.
--verbose: Enable debug logging.
--no-inference: Disable automatic tag inference for this run.

How the Converter Works (High Level)

First pass: Learning Path sheets are processed to collect learning-path metadata keyed by Link.
Second pass: Content sheets (App Dev, AI, Analytics, Blogs, Samples) are processed. Each row is keyed by Link and merged into a single resource card.
Tags are built from explicit tag columns, featured flag, and learning-path title (normalized). Tag inference can be enabled/disabled via config.py or --no-inference.
When a Link appears in both LP and a content sheet, the content sheet's Title and Description take precedence for the main title / description fields; LP metadata is attached as learningPathTitle, learningPathDescription, and optional tileNumber.

Configuration Notes (`config.py`)

SHEETS_CONFIG: Maps Excel sheet names to behavior. Edit this if your sheet names differ.
TAG_INFERENCE['enabled']: Turn on/off inference. The project default can be changed here.
VALIDATION: Adjust required fields, URL schemes, and minimum tags.

Expected Excel Columns

Learning Path subsheet:
- Learning Path Title: section title used for LP tagging
- Card title: becomes learningPathTitle
- Description: becomes learningPathDescription
- Link: canonical URL (used as unique key)
- No.: optional tile number for LP entries
- Featured: optional Yes to mark as featured
Content subsheets (App Dev, AI, Blogs, ...):
- Title, Link, Description, optional Tags, Featured, Priority/Importance, Last Updated Date

Tag Rules

Explicit tags: from Tags column and any Unnamed columns — multiple tags can be separated by , or ;.
Featured: any row with Featured value yes (case-insensitive) will include the featured tag.
Learning Path tag: normalized value of Learning Path Title is added for LP entries.
Inference: when enabled, tags may also be inferred from URL/title/description patterns configured in config.TAG_INFERENCE.

Troubleshooting

If the converter complains about missing required fields, open config.py and inspect VALIDATION['required_fields'].
If templates.json looks wrong, check output/converter.log for warnings and output/validation_report.txt for errors.

Example Runs

Generate JSON and save outputs:

python excel_to_json_converter.py --input app-dev-hub-v5.xlsx --output output/templates.json

Validate an existing output:

python excel_to_json_converter.py --validate-only --output output/templates.json

Contributing

Make small, focused changes and run the converter locally to verify output.
Update README.md when you add new sheets or change expected columns.

Next steps I can help with

Add unit tests for tag extraction and LP merging.
Add an example Excel file with sample rows.
Automate a pre-commit check that runs the converter in validation-only mode.

Created/updated by automation. Ask me to adjust wording or add examples you want included.

Excel to JSON Converter for Dev Hub Content

A robust Python application that converts Excel files containing developer hub content into structured JSON format with comprehensive tag processing, validation, and error handling.

🎯 Overview

This converter transforms Excel content (learning paths, blogs, app development resources, etc.) into a unified JSON array with enriched metadata and intelligent tag processing. It's designed for developer hub websites where content is managed in Excel for easy editing by non-technical users, and JSON is consumed by frontend applications.

✨ Features

Multi-sheet Processing: Handles different content types (Learning Paths, Blogs, App Dev, AI, Analytics, Samples)
Intelligent Tag Processing:
- Automatic tag normalization
- Tag inference from URLs and content
- Multi-source tag merging
Comprehensive Validation: Entry-level and output-level validation
Detailed Reporting: Statistics, validation reports, and conversion logs
Error Recovery: Continues processing even if some rows fail
Production-Ready: Robust error handling and logging

🚀 Installation

Prerequisites

Python 3.8 or higher
pip (Python package installer)

Install Dependencies

pip install -r requirements.txt

This will install:

pandas - Excel file reading and data manipulation
openpyxl - Excel file support (.xlsx)
jsonschema - JSON validation

🏃 Quick Start

Place your Excel file in the project directory (or specify path with --input)
Run the converter:
```
python excel_to_json_converter.py
```
Check the output in the output/ directory:
- templates.json - Converted JSON data
- statistics.json - Conversion statistics
- validation_report.txt - Validation results
- converter.log - Detailed logs

📊 Excel Structure

Supported Sheets

1. Learning Path

Purpose: Sequential learning content with tile numbers
Columns:
- Learning Path Title - Groups cards
- No. - Sequential number (not used)
- Card title - Entry title (REQUIRED)
- Link - URL (REQUIRED)
- Description - Detailed description
- Tags + Unnamed: 9-19 - Tag columns

Special Behavior:

Auto-adds learning-path tag
Auto-adds normalized learning path title as tag
Auto-adds flexibleserver tag
Assigns sequential tileNumber within each group

2-4. App Dev, AI, Analytics

Purpose: Content resources
Columns:
- Topic Area - Category (for context)
- Title - Entry title (REQUIRED)
- Link - URL (REQUIRED)
- Description - Detailed description
- Priority/Importance - P0, P1, P2, etc.
- Last Updated Date - Date field
- Tags + Unnamed: 9-19 - Tag columns

Auto-added Tags:

App Dev → app-dev
AI → genai
Analytics → analytics

5. Blogs

Purpose: Blog content
Columns: Same as content sheets, plus:
- Featured - If "Yes", adds "featured" tag
Auto-added Tags: blog

6. Samples

Purpose: Code samples
Columns: Same as content sheets
Auto-added Tags: sample

⚙️ Configuration

Edit config.py to customize:

File Paths

INPUT_FILE = "app-dev-hub-v5.xlsx"
OUTPUT_DIR = "output"
OUTPUT_FILE = f"{OUTPUT_DIR}/templates.json"

Sheet Configuration

SHEETS_CONFIG = {
    'Learning Path': {
        'type': 'learning_path',
        'base_tags': ['learning-path'],
        'default_tags': ['flexibleserver'],
        'has_tile_numbers': True,
    },
    # ... more sheets
}

Tag Inference

TAG_INFERENCE = {
    'enabled': True,
    'url_patterns': {
        'postgresql/flexible-server': 'flexibleserver',
        '/python': 'python',
        # ... more patterns
    },
    'title_patterns': {
        'python': 'python',
        'tutorial': 'tutorial',
        # ... more patterns
    }
}

Validation Rules

VALIDATION = {
    'required_fields': ['title', 'website', 'source', 'tags', 'image', 'description'],
    'url_schemes': ['http://', 'https://'],
    'min_tags': 1,
    'max_title_length': 200,
    'max_description_length': 1000,
}

🏷️ Tag Processing

Tag Sources (Merged in Order)

Base Tags: From sheet type (e.g., app-dev, genai, blog)
Explicit Tags: From Excel tag columns
Inferred Tags: From URL and content analysis

Tag Normalization

Tags are automatically normalized:

Convert to lowercase
Replace spaces with hyphens
Remove special characters
Trim leading/trailing hyphens

Examples:

"App Dev" → "app-dev"
"How To" → "how-to"
"Building GenAI Apps" → "building-genai-apps"

Tag Inference

The converter automatically infers tags from:

URL Patterns:

postgresql/flexible-server → flexibleserver
github.com → github
/python → python

Content Keywords:

"tutorial" → tutorial
"how to" → how-to
"quickstart" → quickstart
"python", "java", etc. → language tags

💻 Usage

Basic Usage

python excel_to_json_converter.py

Custom Input/Output

python excel_to_json_converter.py --input myfile.xlsx --output result.json

Verbose Mode (for Debugging)

python excel_to_json_converter.py --verbose

Disable Tag Inference

python excel_to_json_converter.py --no-inference

Validate Existing Output

python excel_to_json_converter.py --validate-only --output templates.json

Command Line Arguments

Argument	Description	Default
`--input`	Input Excel file path	`app-dev-hub-v5.xlsx`
`--output`	Output JSON file path	`output/templates.json`
`--validate-only`	Only validate without converting	`False`
`--verbose`	Enable verbose logging	`False`
`--no-inference`	Disable tag inference	`False`

📤 Output Structure

JSON Format

Each entry in the output JSON array has this structure:

{
  "title": "Build a CRUD app with Azure Database for PostgreSQL",
  "website": "https://learn.microsoft.com/azure/postgresql/flexible-server/connect-python",
  "source": "https://learn.microsoft.com/azure/postgresql/flexible-server/connect-python",
  "tags": [
    "learning-path",
    "developing-core-applications",
    "flexibleserver",
    "python",
    "how-to"
  ],
  "image": "./img/build-a-crud-app-with-azure-database-for-postgresq.png",
  "description": "Learn how to build a simple Python app...",
  "tileNumber": 2,
  "priority": "P0",
  "date": "2025-11-25"
}

Required Fields

title - Entry title
website - URL to resource
source - Same as website
tags - Array of tags (minimum 1)
image - Generated image path
description - Description (can be empty string)

Optional Fields

tileNumber - Sequential number (Learning Paths only)
priority - P0, P1, P2, etc.
date - Last updated date (YYYY-MM-DD format)

Image Path Generation

Image paths are auto-generated from titles:

"Build a CRUD App" → "./img/build-a-crud-app.png"
"AI & ML Guide!" → "./img/ai-ml-guide.png"

✅ Validation

Entry-Level Validation

Each entry is validated for:

All required fields present
Valid URL format (http:// or https://)
At least 1 tag
Title ≤ 200 characters
Description ≤ 1000 characters

Output-Level Validation

The entire output is validated for:

Duplicate titles (logged as warnings)
All entries valid
Proper JSON structure

Validation Report

The validation_report.txt file contains:

Total entries count
Valid vs. invalid entries
List of errors
List of warnings

🧪 Testing

Run All Tests

python tests/test_converter.py

Test Coverage

The test suite includes:

Tag normalization tests
Tag merging tests
Image path generation tests
Date parsing tests
Text cleaning tests
Entry validation tests
Tag inference tests
Tag extraction tests

🔧 Troubleshooting

Common Issues

"Input file not found"

Ensure the Excel file exists in the specified location
Check the filename in config.py matches your file

"Sheet not found"

Verify your Excel file has the expected sheet names
Update SHEETS_CONFIG in config.py to match your sheets

"Missing required field"

Check that Title and Link columns have values
Empty rows are automatically skipped

"Invalid URL scheme"

URLs must start with http:// or https://
Check for typos in the Link column

"No tags found"

Ensure tag columns contain data
Check if tag inference is enabled
Verify base tags are configured for the sheet

Debug Mode

Enable verbose logging to see detailed processing information:

python excel_to_json_converter.py --verbose

Check the log file for details:

output/converter.log

📈 Statistics

The converter generates comprehensive statistics including:

Total entries processed and output
Entries by sheet
Skip reasons breakdown
Tag statistics (total, unique, top tags)
Field coverage (priority, date, tileNumber)
Validation results

Example:

{
  "conversion_date": "2025-11-25T10:30:00",
  "total_entries_output": 106,
  "entries_by_sheet": {
    "Learning Path": 23,
    "App Dev": 34,
    "AI": 41
  },
  "tag_statistics": {
    "unique_tags": 48,
    "top_tags": [
      { "tag": "flexibleserver", "count": 104 },
      { "tag": "how-to", "count": 60 }
    ]
  }
}

🤝 Contributing

Code Style

Follow PEP 8 style guide
Add docstrings to all functions
Write unit tests for new features

Adding New Features

Update config.py with new settings
Implement feature in excel_to_json_converter.py
Add tests in tests/test_converter.py
Update this README

Reporting Issues

When reporting issues, include:

Python version
Error messages
Sample Excel data (if possible)
Steps to reproduce

📝 License

This project is designed for internal use in developer hub content management.

🙏 Acknowledgments

Built for converting Excel-based content management to JSON format for modern web applications.

Need Help? Check the troubleshooting section or review the conversion logs in output/converter.log.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
tests		tests
.gitignore		.gitignore
README.md		README.md
app-dev-hub-v5.xlsx		app-dev-hub-v5.xlsx
config.py		config.py
excel_to_json_converter.py		excel_to_json_converter.py
requirements.txt		requirements.txt

Emumba-Abdullah/app-devhub-data-validation-script

Folders and files

Latest commit

History

Repository files navigation

Excel to JSON Converter (Dev Hub)

Requirements

Files and Structure

Quick Start

CLI Options

How the Converter Works (High Level)

Configuration Notes (config.py)

Expected Excel Columns

Tag Rules

Troubleshooting

Example Runs

Contributing

Next steps I can help with

Excel to JSON Converter for Dev Hub Content

📋 Table of Contents

🎯 Overview

✨ Features

🚀 Installation

Prerequisites

Install Dependencies

🏃 Quick Start

📊 Excel Structure

Supported Sheets

1. Learning Path

2-4. App Dev, AI, Analytics

5. Blogs

6. Samples

⚙️ Configuration

File Paths

Sheet Configuration

Tag Inference

Validation Rules

🏷️ Tag Processing

Tag Sources (Merged in Order)

Tag Normalization

Tag Inference

💻 Usage

Basic Usage

Custom Input/Output

Verbose Mode (for Debugging)

Disable Tag Inference

Validate Existing Output

Command Line Arguments

📤 Output Structure

JSON Format

Required Fields

Optional Fields

Image Path Generation

✅ Validation

Entry-Level Validation

Output-Level Validation

Validation Report

🧪 Testing

Run All Tests

Test Coverage

🔧 Troubleshooting

Common Issues

"Input file not found"

"Sheet not found"

"Missing required field"

"Invalid URL scheme"

"No tags found"

Debug Mode

📈 Statistics

🤝 Contributing

Code Style

Adding New Features

Reporting Issues

📝 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Configuration Notes (`config.py`)

Packages