Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #383 +/- ##
==========================================
- Coverage 97.29% 97.25% -0.05%
==========================================
Files 29 29
Lines 1703 1713 +10
==========================================
+ Hits 1657 1666 +9
- Misses 46 47 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR refactors the codebase to align with data model changes from opencloning-linkml schema 0.4.9. The primary changes involve replacing sequence_accession + start/end/strand fields with repository_id + coordinates, introducing NCBISequenceSource, and splitting the manually typed endpoint to accept separate source and sequence objects.
- Removes
repository_namefield and implements source-type-based routing via a mapping dictionary - Updates genome coordinate handling to use
Bio.SeqFeature.Locationobjects instead of separate numeric fields - Refactors
/manually_typedendpoint to accept bothManuallyTypedSourceandManuallyTypedSequenceas separate parameters
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/test_stub_route.py | Updated test to use separate source and sequence objects for manually typed endpoint |
| tests/test_ncbi_requests.py | Changed assertions to use coordinates SimpleLocation instead of start/end/strand fields |
| tests/test_files/*.json | Updated JSON fixtures to remove repository_name, replace start/end/strand with coordinates, and bump schema to 0.4.9 |
| tests/test_endpoints_no_input.py | Refactored manually_typed tests to construct payload with separate source/sequence dictionaries |
| tests/test_endpoints_external_import.py | Removed repository_name from all source instantiations, updated coordinate field tests, and fixed provider name capitalization in error messages |
| src/opencloning/request_examples.py | Replaced start/end/strand with coordinates string in all examples |
| src/opencloning/ncbi_requests.py | Implemented all-or-none validation for start/end/strand, created Location objects from coordinates, switched to NCBISequenceSource, removed pre-request validator |
| src/opencloning/endpoints/no_input.py | Changed manually_typed signature to accept separate source and sequence parameters, removed circular/overhang validation (now in model) |
| src/opencloning/endpoints/external_import.py | Implemented source-type-to-repository mapping, updated coordinates parsing and validation, removed repository_name references, capitalized provider names in error messages |
| src/opencloning/batch_cloning/pombe/pombe_summary.py | Updated field reference from sequence_accession to repository_id (but missed updating .start/.end fields - see critical bug) |
| src/opencloning/batch_cloning/pombe/pombe_clone.py | Removed repository_name from AddgeneIdSource instantiation |
| pyproject.toml | Updated pydna git revision (but missing opencloning-linkml dependency - see critical bug) |
| poetry.lock | Updated multiple dependencies including opencloning-linkml to 0.4.9, FastAPI, Starlette, coverage, networkx, rpds-py, and others |
| docs/notebooks/external_sequences.ipynb | Updated notebook output to show new field names in source representation |
Comments suppressed due to low confidence (1)
src/opencloning/batch_cloning/pombe/pombe_summary.py:53
- The
locus_sourcefieldsstart,end, andstrandno longer exist in the new data model. These have been replaced with thecoordinatesfield. The code attempts to accesslocus_source.start(lines 49, 52) which will cause an AttributeError at runtime. You need to extract the start/end positions fromlocus_source.coordinatesusing the appropriate method (e.g.,location_boundaries(locus_source.coordinates)).
insertion_start = (
locus_source.start + location_boundaries(Location.fromstring(hrec_source.input[0].right_location))[1]
)
insertion_end = (
locus_source.start + location_boundaries(Location.fromstring(hrec_source.input[-1].left_location))[0]
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| packaging = "^25.0" | ||
| pairwise-alignments-to-msa = "^0.1.1" | ||
| pydna = {git = "https://github.com/pydna-group/pydna", rev = "e95e8603b082e50b7873eab0ae1f8abf7b4d25bc"} | ||
| pydna = {git = "https://github.com/pydna-group/pydna", rev = "6f25090ec653e38219021aa46fa4440f888e958c"} |
There was a problem hiding this comment.
The opencloning-linkml dependency is missing from the [tool.poetry.dependencies] section but is present in poetry.lock at version 0.4.9. This dependency should be explicitly declared in pyproject.toml (e.g., opencloning-linkml = "0.4.9") to ensure reproducible builds and clear dependency management.
| pydna = {git = "https://github.com/pydna-group/pydna", rev = "6f25090ec653e38219021aa46fa4440f888e958c"} | |
| pydna = {git = "https://github.com/pydna-group/pydna", rev = "6f25090ec653e38219021aa46fa4440f888e958c"} | |
| opencloning-linkml = "0.4.9" |
There was a problem hiding this comment.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
Bug: pombe_clone.py uses old GenomeCoordinatesSource field names
The GenomeCoordinatesSource constructor uses old field names start, end, strand, and sequence_accession that no longer exist in the data model after the changes. The model now expects coordinates (a location string like "1..100" or "complement(1..100)") and repository_id instead of sequence_accession. This will cause validation errors when creating the source.
src/opencloning/batch_cloning/pombe/pombe_clone.py#L53-L64
There was a problem hiding this comment.
Bug: `GenomeCoordinatesSource` uses deprecated properties in `pombe_clone.py`
The GenomeCoordinatesSource constructor still uses the old interface with start, end, strand, and sequence_accession parameters, but the data model changed to use repository_id and coordinates (a location string like '1877009..1881726' or 'complement(20..2050)'). This will cause attribute errors or validation failures when creating the source object with the new opencloning-linkml model.
src/opencloning/batch_cloning/pombe/pombe_clone.py#L53-L64
There was a problem hiding this comment.
Bug: GenomeCoordinatesSource uses removed attributes in pombe_clone.py
The GenomeCoordinatesSource constructor uses attributes (start, end, strand, sequence_accession) that were removed in the data model changes. According to the new model (visible in the updated JSON test files), these have been replaced by repository_id and coordinates. The code needs to construct the location string (like '1877009..1881726' or 'complement(20..2050)') and use repository_id instead of sequence_accession.
src/opencloning/batch_cloning/pombe/pombe_clone.py#L53-L64
| ) | ||
| insertion_end = ( | ||
| locus_source.start + location_boundaries(Location.fromstring(hrec_source.input[-1].left_location))[0] | ||
| locus_location.start + location_boundaries(Location.fromstring(hrec_source.input[-1].left_location))[0] |
There was a problem hiding this comment.
Bug: Off-by-one error in insertion coordinate calculations
The insertion_start and insertion_end calculations use locus_location.start which is 0-indexed (BioPython convention), but the previous implementation used locus_source.start which was 1-indexed (NCBI coordinate convention). Since these coordinates represent genomic positions that will be written to a summary JSON for human consumption, using the 0-indexed locus_location.start will produce coordinates that are off by one from the expected NCBI-style 1-indexed genomic coordinates.
Issue #382
Will need some work in the frontend as well.
Related to this PR OpenCloning/OpenCloning_LinkML#71
Note
Adapts backend to the new LinkML schema by introducing
NCBISequenceSourceand location-based coordinates, updating external import endpoints and manually-typed input, refactoring pombe batch cloning, and bumping dependencies/tests accordingly.RepositoryIdSourcewithNCBISequenceSourcefor GenBank;GenomeCoordinatesSourcenow usesrepository_idandcoordinates(location) instead ofsequence_accession/start/end/strand.type; revise error messages and responses to new models.POST /repository_id/genbank: acceptsNCBISequenceSource, enforces max length from settings.POST /genome_coordinates: validatescoordinatesviaSequenceLocationStr, enforces length, and uses new fields in NCBI calls.POST /manually_typed: now takesManuallyTypedSequencepayload separate fromManuallyTypedSource.pydna.assembly2(pcr_assembly,homologous_recombination_integration), fetch plasmids viarequest_from_addgene, and genome regions viaget_genome_region_from_annotation; export withCloningStrategy.from_dseqrecords.NCBI_MAX_SEQUENCE_LENGTHenv/config; wire into NCBI endpoints.repository_id,coordinates,NCBISequenceSource, schema_version0.4.9); adjust assertions and mocked errors.fastapi0.123.9,starlette0.50.0,networkx3.6,coverage7.12.0),opencloning-linkmlto0.4.9, and updatepydnagit rev; update Poetry to 2.2.1.Written by Cursor Bugbot for commit e1de76e. This will update automatically on new commits. Configure here.