Fix multi-source convert path collision (#442)#444
Conversation
Sibling sources whose parent directories share a name (e.g.
analyses/{1,2,3}/analysis) wrote to identical intermediate parquet paths,
causing ArrowInvalid (race) or FileNotFoundError (overwrite). Add a stable
SHA-1 hash of the source's parent path to the intermediate directory, and
extend cleanup in _concat_source_group to remove the new dir and its parent.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR adds SHA-1 hash-based path discrimination to parquet exports for multi-source conversions with colliding parent directory names, expands directory cleanup after concatenation, bumps mypy to v1.20.2, and introduces a regression test validating the multi-site scenario. ChangesMulti-source collision handling
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Mark sha1 path-discriminator hash as usedforsecurity=False (bandit B324). - Narrow convert() return type with isinstance(dict) asserts in the new multi-source regression test so it passes mypy (test_convert.py is mypy-excluded; test_convert_threaded.py is not). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.pre-commit-config.yaml:
- Line 66: The .pre-commit-config.yaml specifies an invalid mypy rev v1.20.2;
update the rev for the mypy hook to a valid release (e.g., change the rev value
from "v1.20.2" to "v1.20.1" or "v1.20.0") so the pre-commit hook points to an
existing mypy release.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c49fdcfb-7c00-4a65-814f-29ddea900389
📒 Files selected for processing (3)
.pre-commit-config.yamlcytotable/convert.pytests/test_convert_threaded.py
| - id: vulture | ||
| - repo: https://github.com/pre-commit/mirrors-mypy | ||
| rev: v1.20.1 | ||
| rev: v1.20.2 |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
mypy version 1.20.2 release notes or changelog
💡 Result:
No evidence found for a mypy version 1.20.2 release as of 2026-05-07. The latest versions are mypy 1.20.0 (released March 31, 2026 [1][2]) and 1.20.1 (April 2026 [3][4][5]). GitHub tags confirm up to v1.20.0 [1], no v1.20.2 tag or PyPI page [2]. Changelog at v1.20.1 includes 1.20.1 fixes like SQLite cache sync disable and narrowing fixes [3][4]. Official release notes for 1.20.0 cover better type narrowing, Python 3.9 support drop, SQLite cache default, local partial types improvements, and Mypy 2.0 plans [6][7]. For full 1.20 details, see https://github.com/python/mypy/blob/v1.20.1/CHANGELOG.md [3] or https://mypy.readthedocs.io/en/stable/changelog.html [7].
Citations:
- 1: https://github.com/python/mypy/tags
- 2: https://pypi.org/project/mypy/1.20.0/
- 3: https://github.com/python/mypy/blob/v1.20.1/CHANGELOG.md
- 4: python/mypy@f315c8a
- 5: python/mypy@c60e8bf
- 6: https://mypy-lang.blogspot.com/2026/03/mypy-120-released.html
- 7: http://mypy.readthedocs.io/en/stable/changelog.html
Change mypy version to v1.20.1 or v1.20.0.
mypy v1.20.2 does not exist. As of May 2026, the latest available versions are v1.20.1 (April 2026) and v1.20.0 (March 2026). Update the revision to one of these releases.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.pre-commit-config.yaml at line 66, The .pre-commit-config.yaml specifies an
invalid mypy rev v1.20.2; update the rev for the mypy hook to a valid release
(e.g., change the rev value from "v1.20.2" to "v1.20.1" or "v1.20.0") so the
pre-commit hook points to an existing mypy release.
Description
Fixes #442. When
cytotable.convert()was called with multiple per-source subdirectories that share a parent directory name (e.g.analyses/{1,2,3}/analysis/),_source_pageset_to_parquetwrote each source's intermediate parquet to the same path, causingArrowInvalid(concurrent write/read race) orFileNotFoundError(later writer overwrites). The fix inserts a short, stable SHA-1 hash of the source's parent path into the intermediate directory, guaranteeing per-source uniqueness; cleanup in_concat_source_groupis extended to walk up the additional level. A regression test (test_convert_multi_source_colliding_parent_dir_names) reproduces the original failure onmainand passes with the fix.What is the nature of your change?
Checklist
Summary by CodeRabbit
Bug Fixes
Tests