Skip to content

Add cellprofiler_csv_full preset (includes all image table data)#426

Open
d33bs wants to merge 3 commits intocytomining:mainfrom
d33bs:cellprofiler-csv-full-preset
Open

Add cellprofiler_csv_full preset (includes all image table data)#426
d33bs wants to merge 3 commits intocytomining:mainfrom
d33bs:cellprofiler-csv-full-preset

Conversation

@d33bs
Copy link
Copy Markdown
Member

@d33bs d33bs commented Mar 20, 2026

Description

This PR adds a preset which includes all data from CSV-based CellProfiler output. Specifically, we include all image table data, where the cellprofiler_csv only provides a small portion of the image table data.

Related to discussion in #425

What is the nature of your change?

  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).
  • This change requires a documentation update.

Checklist

Please ensure that all boxes are checked before indicating that a pull request is ready for review.

  • I have read the CONTRIBUTING.md guidelines.
  • My code follows the style guidelines of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • New and existing unit tests pass locally with my changes.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have deleted all non-relevant text in this pull request template.

Summary by CodeRabbit

  • New Features
    • Added cellprofiler_csv_full preset for CellProfiler CSV-derived data: automatic configuration for compartment identification and metadata, joins across cytoplasm/cell/nuclei/image tables while exposing full image fields, pagination keys for parent/child linking, and a default processing chunk size (1000). Compatible with CellProfiler v4.0.0 layout.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 590e3e44-324c-47b9-a0c4-9a7f0d6c5d5a

📥 Commits

Reviewing files that changed from the base of the PR and between b89c1a6 and 477012f.

📒 Files selected for processing (1)
  • cytotable/presets.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • cytotable/presets.py

📝 Walkthrough

Walkthrough

A new configuration preset cellprofiler_csv_full was added to cytotable/presets.py defining parameters and a DuckDB SQL join template for CellProfiler CSV-derived parquet tables, including source version, compartment/metadata names, identifying columns, page keys, chunk size, and joins across image, cytoplasm, cells, and nuclei.

Changes

Cohort / File(s) Summary
CellProfiler CSV Configuration Preset
cytotable/presets.py
Added new cellprofiler_csv_full preset with CONFIG_SOURCE_VERSION, CONFIG_NAMES_COMPARTMENTS, CONFIG_NAMES_METADATA, CONFIG_IDENTIFYING_COLUMNS, CONFIG_PAGE_KEYS, CONFIG_CHUNK_SIZE, and CONFIG_JOINS (DuckDB SQL joining cytoplasm, cells, nuclei, and image, selecting image.* and excluding overlapping metadata columns).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A preset hopped in, neat and full,
Joins linked cells, nuclei, and skull—
Image fields gathered, rows align,
Parquet paths now brightly shine,
Hooray for pipelines, short and woolly! 🥕✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a new preset configuration that includes all image table data, matching the actual implementation in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can customize the tone of the review comments and chat replies.

Configure the tone_instructions setting to customize the tone of the review comments and chat replies. For example, you can set the tone to Act like a strict teacher, Act like a pirate and more.

@d33bs d33bs marked this pull request as ready for review March 20, 2026 23:16
@d33bs d33bs requested a review from gwaybio as a code owner March 20, 2026 23:16
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cytotable/presets.py (1)

93-107: Consider adding a comment clarifying the difference from cellprofiler_csv.

The SQL using image.* (line 95) is the key differentiator from cellprofiler_csv which selects only image.Metadata_ImageNumber and COLUMNS('Image_FileName_.*'). A brief inline comment would help users understand when to choose this preset over the original.

📝 Suggested comment
         # compartment and metadata joins performed using DuckDB SQL
         # and modified at runtime as needed
+        # note: unlike cellprofiler_csv, this preset selects all image
+        # table columns (image.*) rather than a subset
         "CONFIG_JOINS": """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cytotable/presets.py` around lines 93 - 107, Add a brief inline comment above
the "CONFIG_JOINS" SQL preset explaining that this preset selects full image.*
columns (vs cellprofiler_csv which only selects image.Metadata_ImageNumber and
COLUMNS('Image_FileName_.*')), so users should pick this preset when they need
all image-level fields; reference the CONFIG_JOINS preset and the
cellprofiler_csv variant in the comment and mention the key difference is the
use of image.* in the SELECT list.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cytotable/presets.py`:
- Around line 93-107: Add a brief inline comment above the "CONFIG_JOINS" SQL
preset explaining that this preset selects full image.* columns (vs
cellprofiler_csv which only selects image.Metadata_ImageNumber and
COLUMNS('Image_FileName_.*')), so users should pick this preset when they need
all image-level fields; reference the CONFIG_JOINS preset and the
cellprofiler_csv variant in the comment and mention the key difference is the
use of image.* in the SELECT list.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d2f0a89b-d9fc-405a-bd38-2380ea96d154

📥 Commits

Reviewing files that changed from the base of the PR and between 6a1aeb1 and b89c1a6.

📒 Files selected for processing (1)
  • cytotable/presets.py

@kate-bowers-broad
Copy link
Copy Markdown

Hi Dave! I've been trying this out on a new dataset. Thanks for putting this together, it's a great start. However, I see that the preset hard-codes compartments (cells, nuclei, cytoplasm) and failed when I ran it on my current dataset which has only Cells and Nuclei objects. For biologists to adopt using cytotable, we do need to be able to use it without writing any SQL. Can this be modified such that I can pass in the compartments I have? I see from documentation that custom compartments are supported elsewhere in Cytotable.

@d33bs
Copy link
Copy Markdown
Member Author

d33bs commented Apr 27, 2026

Hi @kate-bowers-broad, thanks for trying this out and for the helpful feedback.

Good catch on the compartments. That is a fair limitation right now. The preset assumes Cells, Nuclei, and Cytoplasm, which will not work for datasets that do not include all three.

I think it makes sense to allow passing in the compartments you have rather than relying on a fixed set. That should align with how Cytotable supports custom compartments elsewhere and keep things more flexible.

On the SQL point, I agree it is useful for more advanced use cases, but it would be better if common workflows like this did not require it.

Would you be able to share the dataset you are using, or a small example of it? That would help make sure any update works cleanly for your case.

Thanks again for testing this out, this is really helpful.

@kate-bowers-broad
Copy link
Copy Markdown

Thanks so much, Dave! That sounds great. Here's a link to the analysis files from one plate in the batch I was working on, on the Cell Painting Gallery: https://open.quiltdata.com/b/cellpainting-gallery/tree/cpg0037-oasis/xellar/workspace/analysis/2026_03_03_Batch2/OASIS--20Xvs40X--20x-Dev69_40723/analysis/

Thanks again for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants