Skip to content

LDFLK/datasets

Repository files navigation

πŸ‡±πŸ‡° Sri Lanka Government Statistics Datasets (2019–2024)

Clean, structured datasets from Sri Lankan government sources

πŸ“Š What's Inside

5 Years of Data | 4 Key Ministries | Multiple Departments

  • Foreign Affairs & Relations
  • Immigration & Emigration
  • Foreign Employment
  • Tourism Development

πŸ—‚οΈ Data Categories

  • πŸ›οΈ Foreign Affairs: Diplomatic missions, communications, organizational data
  • πŸ›‚ Immigration: Asylum seekers, visas, passports, refugee statistics
  • πŸ’Ό Employment: Worker complaints, remittances, registration data, legal performance
  • πŸ–οΈ Tourism: Arrivals, accommodations, occupancy rates, revenue statistics

πŸ“‹ Data Matrix

Note

🚨 Action Required: View the Missing Datasets Report to see which datasets need to be populated.

Data Source Dataset Category Years Available Collection Status Verification Status
Ministry of Foreign Affairs Diplomatic Missions 2019-2023 βœ… Collected ⚠️ Pending (2024)
Ministry of Foreign Affairs Official Communications 2019-2023 βœ… Collected ⚠️ Pending (2024)
Department of Immigration and Emigration Asylum Seekers & Refugees 2019-2023 βœ… Collected ⚠️ Pending (2024)
Department of Immigration and Emigration Visas & Passports 2019-2023 βœ… Collected ⚠️ Pending (2024)
Sri Lanka Bureau of Foreign Employment Worker Complaints 2019-2023 βœ… Collected ⚠️ Pending (2024)
Sri Lanka Bureau of Foreign Employment Remittances & Earnings 2019-2023 βœ… Collected ⚠️ Pending (2024)
Sri Lanka Bureau of Foreign Employment Registrations (SLBFE) 2019-2023 βœ… Collected ⚠️ Pending (2024)
Sri Lanka Tourism Development Authority Tourist Arrivals 2019-2024 βœ… Collected βœ… Verified (2024 Partial)
Sri Lanka Tourism Development Authority Accommodations & Occupancy 2019-2024 βœ… Collected βœ… Verified (2024 Partial)
Sri Lanka Tourism Development Authority Revenue Statistics 2019-2024 βœ… Collected βœ… Verified (2024 Partial)

πŸ“… Years Available

  • 2019
  • 2020-2021
  • 2022-2023
  • 2024

πŸš€ Quick Start

πŸ“– Browse all data interactively β†’

🌐 View online at GitHub Pages β†’

All datasets are in clean JSON format with metadata .

This repository contains cleaned and organized datasets from various Sri Lankan government public sources, compiled by the Lanka Data Foundation. The data spans from 2019 to 2024 and covers multiple ministries and departments.

πŸ› οΈ Installation & Setup

To run the data ingestion and utility scripts, you'll need to set up the Python environment. We recommend using Mamba (or Conda).

  1. Create the environment:

    mamba env create -f environment.yml

    (If using Conda: conda env create -f environment.yml)

  2. Activate the environment:

    mamba activate datasets_env
  3. Run the scripts:

    # Run the optimized ingestion script
    python insert.py
    
    # Run the attribute writer (optional year filter)
    python write_attributes.py --year 2023

πŸ“Š Dataset Overview

  • Total Years: 6 (2019-2024)
  • Total Datasets: 175+ JSON files
  • Ministries Covered: 4 main categories
  • Data Sources: Public government sources

πŸ—οΈ Repository Structure

datasets/
β”œβ”€β”€ data/                           # Main data directory
β”‚   β”œβ”€β”€ 2019/                      # Year-based organization
β”‚   β”œβ”€β”€ 2020/
β”‚   β”œβ”€β”€ 2021/
β”‚   β”œβ”€β”€ 2022/
β”‚   └── 2023/
β”œβ”€β”€ generate_static_html.py         # HTML generator script
β”œβ”€β”€ index.html                      # Generated static HTML
β”œβ”€β”€ styles.css                      # CSS stylesheet
└── README.md                       # This file

πŸ“ Data Organization

Data is organized hierarchically:

  • Year β†’ Government β†’ President β†’ Ministry β†’ Department β†’ Data Files

Data File Structure

Each dataset contains:

  • data.json - The main dataset
  • metadata.json - Metadata about the dataset (optional)

πŸ”„ How to Update Data and Regenerate HTML

1. Adding New Data

Adding Data for a New Year

  1. Create a new folder under data/ (e.g., data/2024/)
  2. Follow the existing folder structure:
    data/2024/
    └── Government of Sri Lanka(government)/
        └── [President Name](citizen)/
            └── [Ministry Name](minister)/
                └── [Department Name](department)/
                    β”œβ”€β”€ [category]/
                    β”‚   β”œβ”€β”€ data.json
                    β”‚   └── metadata.json (optional)
    

Adding Data to Existing Year

  1. Navigate to the appropriate year folder in data/
  2. Follow the existing hierarchy to find the correct ministry/department
  3. Add your data.json and optional metadata.json files

Data File Requirements

  • data.json: Must contain valid JSON data
  • metadata.json: Optional, should contain dataset metadata (description, source, etc.)
  • Files must be placed in appropriately named folders with category indicators

2. Update the Website (Optional)

The API documentation website is built with Jekyll on GitHub Pages. The data listing is auto-generated and injected into docs/index.md.

To update the data listing:

  1. Run the update script:
    python3 update_dataset_index.py
  2. This will:
    • Scan the data/ directory.
    • Generate ZIP files for each year.
    • Inject the file listing into docs/index.md.
  3. Commit and push changes to main branch.

3. What Gets Generated

ZIP Files

  • Automatically created for each year folder
  • Contains all JSON files from that year
  • Named as [YEAR]_Data.zip (e.g., 2019_Data.zip)

HTML Features

  • Interactive collapsible sections
  • Download buttons for yearly ZIP files
  • In-browser JSON viewer with copy/download functionality
  • Responsive design with CSS styling

4. Folder Structure Guidelines

Special Naming Conventions

  • Use (government), (citizen), (minister), (department) suffixes for proper categorization
  • Use (AS_CATEGORY) for sub-categories
  • Underscores in folder names will be converted to spaces in display

5. Customization

Adding New Emojis

Edit the get_emoji_for_type() function in generate_static_html.py:

emoji_map = {
    'your_category': '🎯',
    # ... existing mappings
}

Modifying CSS

Edit styles.css to customize the appearance:

  • Colors, fonts, spacing
  • Responsive breakpoints
  • Modal styling for JSON viewer

Updating Statistics

The script automatically counts datasets, but you can manually update the description in the main() function.

πŸš€ Deployment

The generated index.html is ready for deployment on:

  • GitHub Pages
  • Any static hosting service
  • Local web servers

πŸ“ž Contact

For any enquiries please contact: [email protected]

Codebase at: https://github.com/LDFLK/datasets

πŸ“„ License

See LICENSE file for details.

About

Raw data extracted to be inserted into the databases

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7