Public grouping keys #737

mccalluc · 2025-11-21T14:39:39Z

Fix Prompt for members of groups #736

This gets a little complicated because the user provided values need to match the type of the column, so I've added the inferred schema to the appstate.

For reviewer:

Does the UI seem ok? Does the explanation make sense?
Does the generated notebook seem good? Is the code idiomatic?
QA: Does it actually seem to work? Have I added enough test coverage?

TODO:

Public grouping keys for synthetic data #742

… to check user-supplied grouping keys

ekraffmiller · 2025-12-01T22:10:25Z

Hi @mccalluc, I tested this with some data that has numeric data, for example a column called 'married' that has values 0 or 1. When I try to group by that column, I get an error, see below.
When I use the sample.csv file and group by class_year_str, it works. Is this a known issue for numeric keys?

mccalluc · 2025-12-01T22:49:39Z

@ekraffmiller , It should work with integer columns (and I added an integer class_year to the sample.csv, and that worked locally), but it obviously doesn't work for you. Looking at the stack trace, it looks like the type sniffing failed: "0", "1" are strings rather than integers.

Can you attach the CSV you're using, either here, or in Slack? And are you just grouping, or are you grouping and specifying public keys? (Which reminds me that the public keys should be included in the summary line at the top.) Thanks!

ekraffmiller · 2025-12-02T14:41:52Z

@ekraffmiller , It should work with integer columns (and I added an integer class_year to the sample.csv, and that worked locally), but it obviously doesn't work for you. Looking at the stack trace, it looks like the type sniffing failed: "0", "1" are strings rather than integers.

Can you attach the CSV you're using, either here, or in Slack? And are you just grouping, or are you grouping and specifying public keys? (Which reminds me that the public keys should be included in the summary line at the top.) Thanks!

Hi @mccalluc , here is the file I'm using. I only get the error when I specify the public keys:
pums_1000.csv

mccalluc · 2025-12-02T15:54:05Z

@ekraffmiller : Thank you!

Right now, I am getting an error, but not your error:

OpenDPException: 
  FailedFunction("ComputeError(ErrString("could not parse `1e+05` as dtype `i64` at column 'income' (column number 5)\n\nThe current offset in the file is 2888 bytes.\n\nYou might want to try:\n- increasing `infer_schema_length` (e.g. `infer_schema_length=10000`),\n- specifying correct dtype with the `schema_overrides` argument\n- setting `ignore_errors` to `True`,\n- adding `1e+05` to the `null_values` list.\n\nOriginal error: ```remaining bytes non-empty```"))")

I'll work on this bug, but it's curious that I don't see exactly what you see. Would you mind either installing in a fresh venv, or copying the environment info from the "About" tab? Thanks!

ekraffmiller · 2025-12-02T16:33:11Z

Interesting, when I choose income as my first column, I'm getting the same error as you. To get the error with the marriage keys, I have been testing 'age' as my first column, with 1 and 100 as the upper and lower limit.
I did update my environment with a fresh venv, here is the about tab:

mccalluc added 11 commits November 20, 2025 11:33

list -> dict of lists

99a141b

rename in AppState: groups -> group_column_names

2038da8

stub group_module

32dbb77

name the new card

dcac107

UI, but not hooked up

886b3b9

user input in app state

a0f9217

clean user input

b4f1959

handle commas

962cbc7

uncruft

47ddc5c

checkpoint: almost all tests pass

b920e20

fix doctest

86fd917

github-project-automation bot moved this to Pending in DP Wizard Nov 21, 2025

github-project-automation bot added this to DP Wizard Nov 21, 2025

mccalluc marked this pull request as draft November 21, 2025 14:39

mccalluc mentioned this pull request Nov 21, 2025

fill_attributes opendp/dp-wizard-templates#22

Closed

mccalluc added 14 commits November 21, 2025 14:33

version bump templates

eb7e0ce

fill_attributes checkpoint: 3 tests still fail

082119c

one failing test may be race condition?

56d32aa

Revert! need to keep these values, and handle it downstream

d2a70ff

helper method for groups_with_keys

84884b9

note to self

a8c5fe3

fix csv test

d9bbdd9

start to move grouping upstream

f5c9af1

update comment

bdafe5f

Remove TODOs; expand plan matrix

69b1385

make schema part of state, and not just column names; Next, use types…

8f11360

… to check user-supplied grouping keys

checkpoint: convert to expected type; drop values that do not parse

d31fe39

numeric column grouping!

3d4e0e7

hint text

6238254

mccalluc marked this pull request as ready for review November 27, 2025 02:22

mccalluc mentioned this pull request Nov 27, 2025

Allow column types to be specified in the cloud #741

Open

explain; add TODO

45f11c8

mccalluc mentioned this pull request Dec 1, 2025

Public grouping keys for synthetic data #742

Open

mccalluc added 5 commits December 2, 2025 11:59

add failing test

0be38e5

turn on ignore_errors

fd40985

Add explanation

f49ba5f

rename variable

9d17b3a

confirm that the test fixture demonstrates the problem

50fca63

mccalluc mentioned this pull request Dec 5, 2025

Decide on direction for data-dependent behavior of scan_csv, and implement opendp/opendp#2587

Open

mccalluc moved this from Pending to Ready for Review in DP Wizard Dec 8, 2025

merge

1d8b280

This was referenced Dec 9, 2025

Leave out some values in demo.csv to exercise impute #346

Closed

check that scan_csv always has encoding="utf8-lossy" #220

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Public grouping keys #737

Public grouping keys #737

Uh oh!

mccalluc commented Nov 21, 2025 •

edited

Loading

Uh oh!

ekraffmiller commented Dec 1, 2025

Uh oh!

mccalluc commented Dec 1, 2025

Uh oh!

ekraffmiller commented Dec 2, 2025

Uh oh!

mccalluc commented Dec 2, 2025

Uh oh!

ekraffmiller commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Public grouping keys #737

Are you sure you want to change the base?

Public grouping keys #737

Uh oh!

Conversation

mccalluc commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ekraffmiller commented Dec 1, 2025

Uh oh!

mccalluc commented Dec 1, 2025

Uh oh!

ekraffmiller commented Dec 2, 2025

Uh oh!

mccalluc commented Dec 2, 2025

Uh oh!

ekraffmiller commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mccalluc commented Nov 21, 2025 •

edited

Loading