Skip to content

Conversation

@arash77
Copy link
Contributor

@arash77 arash77 commented Jan 16, 2026

Introduce a normalization function to convert version fields to strings across various import scripts, ensuring consistent data formatting. This change enhances data integrity when processing tool and package metadata.
Closes research-software-ecosystem/content#1190

@mihai-sysbio
Copy link

Thanks @arash77 this is a neat contribution. It's mixed together with formatting though, which albeit a great idea, it muddles what is the fix vs purely formatting. Is there a way you could split the two aspects? And if an adoption of PEP8 is desired in this repo, how about a GH Action that applies it automatically?

@arash77
Copy link
Contributor Author

arash77 commented Jan 19, 2026

I will exclude the formatting from this PR. I can create a separate PR to talk about how an automated formatting could be applied.

@arash77 arash77 force-pushed the normalize-version-fields branch from 17db00a to 74ba363 Compare January 19, 2026 16:23
Add normalize_version_fields function to convert version fields
(which can be int, float, or str) to string type for consistency.

Integrate version normalization into all import scripts:
- bioconda: normalize package.version
- bioconductor: normalize Version
- biotools: normalize version and nested version fields
- galaxytool: normalize Suite_version, conda package version, and workflow versions
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces a new common utility module for normalizing version fields from numeric types to strings across various metadata import scripts, addressing data integrity issues when processing tool and package metadata.

Changes:

  • Added common/metadata.py module with normalize_version_to_string and normalize_version_fields functions
  • Updated four import scripts (galaxytool-import, biotools-import, bioconductor-import, bioconda-import) to use the new normalization functions

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
common/metadata.py New utility module providing functions to normalize version fields (integers/floats) to strings with support for nested paths and list structures
galaxytool-import/galaxytool-import.py Integrated version normalization for Suite_version, Latest_suite_conda_package_version, and Related_Workflows latest_version fields
biotools-import/import.py Added version field normalization for both top-level version field and nested version fields within version arrays
bioconductor-import/import.py Applied normalization to the Version field in package metadata
bioconda-import/bioconda_importer.py Normalized package.version field in conda package data

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hmenager hmenager requested a review from mihai-sysbio January 22, 2026 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consistent field types in metadata formats

2 participants