Add cellprofiler_csv_full preset (includes all image table data)#426
Add cellprofiler_csv_full preset (includes all image table data)#426d33bs wants to merge 3 commits intocytomining:mainfrom
cellprofiler_csv_full preset (includes all image table data)#426Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughA new configuration preset Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can customize the tone of the review comments and chat replies.Configure the |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
cytotable/presets.py (1)
93-107: Consider adding a comment clarifying the difference fromcellprofiler_csv.The SQL using
image.*(line 95) is the key differentiator fromcellprofiler_csvwhich selects onlyimage.Metadata_ImageNumberandCOLUMNS('Image_FileName_.*'). A brief inline comment would help users understand when to choose this preset over the original.📝 Suggested comment
# compartment and metadata joins performed using DuckDB SQL # and modified at runtime as needed + # note: unlike cellprofiler_csv, this preset selects all image + # table columns (image.*) rather than a subset "CONFIG_JOINS": """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cytotable/presets.py` around lines 93 - 107, Add a brief inline comment above the "CONFIG_JOINS" SQL preset explaining that this preset selects full image.* columns (vs cellprofiler_csv which only selects image.Metadata_ImageNumber and COLUMNS('Image_FileName_.*')), so users should pick this preset when they need all image-level fields; reference the CONFIG_JOINS preset and the cellprofiler_csv variant in the comment and mention the key difference is the use of image.* in the SELECT list.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@cytotable/presets.py`:
- Around line 93-107: Add a brief inline comment above the "CONFIG_JOINS" SQL
preset explaining that this preset selects full image.* columns (vs
cellprofiler_csv which only selects image.Metadata_ImageNumber and
COLUMNS('Image_FileName_.*')), so users should pick this preset when they need
all image-level fields; reference the CONFIG_JOINS preset and the
cellprofiler_csv variant in the comment and mention the key difference is the
use of image.* in the SELECT list.
|
Hi Dave! I've been trying this out on a new dataset. Thanks for putting this together, it's a great start. However, I see that the preset hard-codes compartments (cells, nuclei, cytoplasm) and failed when I ran it on my current dataset which has only Cells and Nuclei objects. For biologists to adopt using cytotable, we do need to be able to use it without writing any SQL. Can this be modified such that I can pass in the compartments I have? I see from documentation that custom compartments are supported elsewhere in Cytotable. |
|
Hi @kate-bowers-broad, thanks for trying this out and for the helpful feedback. Good catch on the compartments. That is a fair limitation right now. The preset assumes Cells, Nuclei, and Cytoplasm, which will not work for datasets that do not include all three. I think it makes sense to allow passing in the compartments you have rather than relying on a fixed set. That should align with how Cytotable supports custom compartments elsewhere and keep things more flexible. On the SQL point, I agree it is useful for more advanced use cases, but it would be better if common workflows like this did not require it. Would you be able to share the dataset you are using, or a small example of it? That would help make sure any update works cleanly for your case. Thanks again for testing this out, this is really helpful. |
|
Thanks so much, Dave! That sounds great. Here's a link to the analysis files from one plate in the batch I was working on, on the Cell Painting Gallery: https://open.quiltdata.com/b/cellpainting-gallery/tree/cpg0037-oasis/xellar/workspace/analysis/2026_03_03_Batch2/OASIS--20Xvs40X--20x-Dev69_40723/analysis/ Thanks again for your help! |
Description
This PR adds a preset which includes all data from CSV-based CellProfiler output. Specifically, we include all image table data, where the
cellprofiler_csvonly provides a small portion of the image table data.Related to discussion in #425
What is the nature of your change?
Checklist
Please ensure that all boxes are checked before indicating that a pull request is ready for review.
Summary by CodeRabbit
cellprofiler_csv_fullpreset for CellProfiler CSV-derived data: automatic configuration for compartment identification and metadata, joins across cytoplasm/cell/nuclei/image tables while exposing full image fields, pagination keys for parent/child linking, and a default processing chunk size (1000). Compatible with CellProfiler v4.0.0 layout.