Adapting to data model changes by manulera · Pull Request #383 · OpenCloning/OpenCloning_backend

manulera · 2025-12-06T02:37:11Z

Issue #382

Will need some work in the frontend as well.

Related to this PR OpenCloning/OpenCloning_LinkML#71

Note

Adapts backend to the new LinkML schema by introducing NCBISequenceSource and location-based coordinates, updating external import endpoints and manually-typed input, refactoring pombe batch cloning, and bumping dependencies/tests accordingly.

API/Schema Updates (breaking):
- Replace RepositoryIdSource with NCBISequenceSource for GenBank; GenomeCoordinatesSource now uses repository_id and coordinates (location) instead of sequence_accession/start/end/strand.
- Update endpoint routing to map by source type; revise error messages and responses to new models.
Endpoints:
- POST /repository_id/genbank: accepts NCBISequenceSource, enforces max length from settings.
- POST /genome_coordinates: validates coordinates via SequenceLocationStr, enforces length, and uses new fields in NCBI calls.
- POST /manually_typed: now takes ManuallyTypedSequence payload separate from ManuallyTypedSource.
Batch cloning (pombe):
- Switch to pydna.assembly2 (pcr_assembly, homologous_recombination_integration), fetch plasmids via request_from_addgene, and genome regions via get_genome_region_from_annotation; export with CloningStrategy.from_dseqrecords.
Settings:
- Add NCBI_MAX_SEQUENCE_LENGTH env/config; wire into NCBI endpoints.
Tests/Fixtures:
- Update tests and JSON fixtures to new schema (repository_id, coordinates, NCBISequenceSource, schema_version 0.4.9); adjust assertions and mocked errors.
Docs:
- Refresh notebook outputs to reflect new source types and coordinate display.
Dependencies:
- Bump multiple packages (e.g., fastapi 0.123.9, starlette 0.50.0, networkx 3.6, coverage 7.12.0), opencloning-linkml to 0.4.9, and update pydna git rev; update Poetry to 2.2.1.

^{Written by Cursor Bugbot for commit e1de76e. This will update automatically on new commits. Configure here.}

codecov · 2025-12-06T02:38:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.25%. Comparing base (b52103a) to head (e1de76e).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #383      +/-   ##
==========================================
- Coverage   97.29%   97.25%   -0.05%     
==========================================
  Files          29       29              
  Lines        1703     1713      +10     
==========================================
+ Hits         1657     1666       +9     
- Misses         46       47       +1

Flag	Coverage Δ
unittests	`97.25% <100.00%> (-0.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR refactors the codebase to align with data model changes from opencloning-linkml schema 0.4.9. The primary changes involve replacing sequence_accession + start/end/strand fields with repository_id + coordinates, introducing NCBISequenceSource, and splitting the manually typed endpoint to accept separate source and sequence objects.

Removes repository_name field and implements source-type-based routing via a mapping dictionary
Updates genome coordinate handling to use Bio.SeqFeature.Location objects instead of separate numeric fields
Refactors /manually_typed endpoint to accept both ManuallyTypedSource and ManuallyTypedSequence as separate parameters

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/test_stub_route.py	Updated test to use separate source and sequence objects for manually typed endpoint
tests/test_ncbi_requests.py	Changed assertions to use `coordinates` SimpleLocation instead of start/end/strand fields
tests/test_files/*.json	Updated JSON fixtures to remove `repository_name`, replace start/end/strand with `coordinates`, and bump schema to 0.4.9
tests/test_endpoints_no_input.py	Refactored manually_typed tests to construct payload with separate source/sequence dictionaries
tests/test_endpoints_external_import.py	Removed `repository_name` from all source instantiations, updated coordinate field tests, and fixed provider name capitalization in error messages
src/opencloning/request_examples.py	Replaced start/end/strand with `coordinates` string in all examples
src/opencloning/ncbi_requests.py	Implemented all-or-none validation for start/end/strand, created Location objects from coordinates, switched to `NCBISequenceSource`, removed pre-request validator
src/opencloning/endpoints/no_input.py	Changed manually_typed signature to accept separate source and sequence parameters, removed circular/overhang validation (now in model)
src/opencloning/endpoints/external_import.py	Implemented source-type-to-repository mapping, updated coordinates parsing and validation, removed `repository_name` references, capitalized provider names in error messages
src/opencloning/batch_cloning/pombe/pombe_summary.py	Updated field reference from `sequence_accession` to `repository_id` (but missed updating .start/.end fields - see critical bug)
src/opencloning/batch_cloning/pombe/pombe_clone.py	Removed `repository_name` from AddgeneIdSource instantiation
pyproject.toml	Updated pydna git revision (but missing opencloning-linkml dependency - see critical bug)
poetry.lock	Updated multiple dependencies including opencloning-linkml to 0.4.9, FastAPI, Starlette, coverage, networkx, rpds-py, and others
docs/notebooks/external_sequences.ipynb	Updated notebook output to show new field names in source representation

Comments suppressed due to low confidence (1)

src/opencloning/batch_cloning/pombe/pombe_summary.py:53

The locus_source fields start, end, and strand no longer exist in the new data model. These have been replaced with the coordinates field. The code attempts to access locus_source.start (lines 49, 52) which will cause an AttributeError at runtime. You need to extract the start/end positions from locus_source.coordinates using the appropriate method (e.g., location_boundaries(locus_source.coordinates)).

    insertion_start = (
        locus_source.start + location_boundaries(Location.fromstring(hrec_source.input[0].right_location))[1]
    )
    insertion_end = (
        locus_source.start + location_boundaries(Location.fromstring(hrec_source.input[-1].left_location))[0]
    )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-06T02:41:50Z

 packaging = "^25.0"
 pairwise-alignments-to-msa = "^0.1.1"
-pydna = {git = "https://github.com/pydna-group/pydna", rev = "e95e8603b082e50b7873eab0ae1f8abf7b4d25bc"}
+pydna = {git = "https://github.com/pydna-group/pydna", rev = "6f25090ec653e38219021aa46fa4440f888e958c"}


The opencloning-linkml dependency is missing from the [tool.poetry.dependencies] section but is present in poetry.lock at version 0.4.9. This dependency should be explicitly declared in pyproject.toml (e.g., opencloning-linkml = "0.4.9") to ensure reproducible builds and clear dependency management.

Suggested change

pydna = {git = "https://github.com/pydna-group/pydna", rev = "6f25090ec653e38219021aa46fa4440f888e958c"}

pydna = {git = "https://github.com/pydna-group/pydna", rev = "6f25090ec653e38219021aa46fa4440f888e958c"}

opencloning-linkml = "0.4.9"

cursor

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Bug: pombe_clone.py uses old GenomeCoordinatesSource field names

The GenomeCoordinatesSource constructor uses old field names start, end, strand, and sequence_accession that no longer exist in the data model after the changes. The model now expects coordinates (a location string like "1..100" or "complement(1..100)") and repository_id instead of sequence_accession. This will cause validation errors when creating the source.

src/opencloning/batch_cloning/pombe/pombe_clone.py#L53-L64

https://github.com/manulera/OpenCloning_backend/blob/4b50bbd1cd6b219b6bb4264e075c96d621cc8c8d/src/opencloning/batch_cloning/pombe/pombe_clone.py#L53-L64

cursor

Bug: `GenomeCoordinatesSource` uses deprecated properties in `pombe_clone.py`

The GenomeCoordinatesSource constructor still uses the old interface with start, end, strand, and sequence_accession parameters, but the data model changed to use repository_id and coordinates (a location string like '1877009..1881726' or 'complement(20..2050)'). This will cause attribute errors or validation failures when creating the source object with the new opencloning-linkml model.

src/opencloning/batch_cloning/pombe/pombe_clone.py#L53-L64

https://github.com/manulera/OpenCloning_backend/blob/3c98642cca3c9ffe199af7a1edf7cb3a6c622499/src/opencloning/batch_cloning/pombe/pombe_clone.py#L53-L64

…ng/OpenCloning_backend#383

cursor

Bug: GenomeCoordinatesSource uses removed attributes in pombe_clone.py

The GenomeCoordinatesSource constructor uses attributes (start, end, strand, sequence_accession) that were removed in the data model changes. According to the new model (visible in the updated JSON test files), these have been replaced by repository_id and coordinates. The code needs to construct the location string (like '1877009..1881726' or 'complement(20..2050)') and use repository_id instead of sequence_accession.

src/opencloning/batch_cloning/pombe/pombe_clone.py#L53-L64

https://github.com/manulera/OpenCloning_backend/blob/6711478369e1bea3a6531e38dd75d8ad4ca21dd8/src/opencloning/batch_cloning/pombe/pombe_clone.py#L53-L64

cursor · 2025-12-09T17:53:42Z

    )
    insertion_end = (
-        locus_source.start + location_boundaries(Location.fromstring(hrec_source.input[-1].left_location))[0]
+        locus_location.start + location_boundaries(Location.fromstring(hrec_source.input[-1].left_location))[0]


Bug: Off-by-one error in insertion coordinate calculations

The insertion_start and insertion_end calculations use locus_location.start which is 0-indexed (BioPython convention), but the previous implementation used locus_source.start which was 1-indexed (NCBI coordinate convention). Since these coordinates represent genomic positions that will be written to a summary JSON for human consumption, using the 0-indexed locus_location.start will produce coordinates that are off by one from the expected NCBI-style 1-indexed genomic coordinates.

…ng/OpenCloning_backend#383 (#57)

manulera added 2 commits December 5, 2025 17:34

update pydna and fix genbank endpoint

4a59c1f

closes #382

4b50bbd

manulera requested a review from Copilot December 6, 2025 02:38

Copilot started reviewing on behalf of manulera December 6, 2025 02:38 View session

Copilot AI reviewed Dec 6, 2025

View reviewed changes

cursor Bot reviewed Dec 6, 2025

View reviewed changes

Comment thread src/opencloning/batch_cloning/pombe/pombe_summary.py

improve test coverage

3c98642

cursor Bot reviewed Dec 9, 2025

View reviewed changes

Comment thread src/opencloning/endpoints/external_import.py

manulera mentioned this pull request Dec 9, 2025

Adapt to model and backend changes OpenCloning/OpenCloning_frontend#563

Merged

enable setting of max length via env vars

6711478

manulera added a commit to OpenCloning/OpenCloning that referenced this pull request Dec 9, 2025

update docker-compose with env var NCBI_MAX_SEQUENCE_LENGTH OpenCloni…

b7ac872

…ng/OpenCloning_backend#383

manulera mentioned this pull request Dec 9, 2025

Update docker-compose with env var NCBI_MAX_SEQUENCE_LENGTH OpenCloning/OpenCloning#57

Merged

cursor Bot reviewed Dec 9, 2025

View reviewed changes

fix pombe example

b961dd3

cursor Bot reviewed Dec 9, 2025

View reviewed changes

Comment thread src/opencloning/batch_cloning/pombe/pombe_clone.py Outdated

fix annotation

e1de76e

manulera merged commit 442c102 into master Dec 9, 2025
6 of 8 checks passed

manulera deleted the issue_382 branch December 9, 2025 17:45

cursor Bot reviewed Dec 9, 2025

View reviewed changes

manulera added a commit to OpenCloning/OpenCloning that referenced this pull request Dec 11, 2025

update docker-compose with env var NCBI_MAX_SEQUENCE_LENGTH OpenCloni…

93b917a

…ng/OpenCloning_backend#383 (#57)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapting to data model changes#383

Adapting to data model changes#383
manulera merged 6 commits into
masterfrom
issue_382

manulera commented Dec 6, 2025 •

edited by cursor Bot

Loading

Uh oh!

codecov Bot commented Dec 6, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 6, 2025

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

cursor Bot Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	pydna = {git = "https://github.com/pydna-group/pydna", rev = "6f25090ec653e38219021aa46fa4440f888e958c"}
	pydna = {git = "https://github.com/pydna-group/pydna", rev = "6f25090ec653e38219021aa46fa4440f888e958c"}
	opencloning-linkml = "0.4.9"

Conversation

manulera commented Dec 6, 2025 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Bug: pombe_clone.py uses old GenomeCoordinatesSource field names

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Bug: `GenomeCoordinatesSource` uses deprecated properties in `pombe_clone.py`

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Bug: GenomeCoordinatesSource uses removed attributes in pombe_clone.py

Uh oh!

Uh oh!

Uh oh!

cursor Bot Dec 9, 2025

Choose a reason for hiding this comment

Bug: Off-by-one error in insertion coordinate calculations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

manulera commented Dec 6, 2025 •

edited by cursor Bot

Loading

codecov Bot commented Dec 6, 2025 •

edited

Loading