Skip to content

Conversation

@mccalluc
Copy link
Contributor

@mccalluc mccalluc commented Nov 21, 2025

This gets a little complicated because the user provided values need to match the type of the column, so I've added the inferred schema to the appstate.

For reviewer:

  • Does the UI seem ok? Does the explanation make sense?
  • Does the generated notebook seem good? Is the code idiomatic?
  • QA: Does it actually seem to work? Have I added enough test coverage?

TODO:

@mccalluc mccalluc marked this pull request as ready for review November 27, 2025 02:22
@ekraffmiller
Copy link
Member

Hi @mccalluc, I tested this with some data that has numeric data, for example a column called 'married' that has values 0 or 1. When I try to group by that column, I get an error, see below.
When I use the sample.csv file and group by class_year_str, it works. Is this a known issue for numeric keys?

Screenshot 2025-12-01 at 5 06 22 PM

@mccalluc
Copy link
Contributor Author

mccalluc commented Dec 1, 2025

@ekraffmiller , It should work with integer columns (and I added an integer class_year to the sample.csv, and that worked locally), but it obviously doesn't work for you. Looking at the stack trace, it looks like the type sniffing failed: "0", "1" are strings rather than integers.

Can you attach the CSV you're using, either here, or in Slack? And are you just grouping, or are you grouping and specifying public keys? (Which reminds me that the public keys should be included in the summary line at the top.) Thanks!

@ekraffmiller
Copy link
Member

@ekraffmiller , It should work with integer columns (and I added an integer class_year to the sample.csv, and that worked locally), but it obviously doesn't work for you. Looking at the stack trace, it looks like the type sniffing failed: "0", "1" are strings rather than integers.

Can you attach the CSV you're using, either here, or in Slack? And are you just grouping, or are you grouping and specifying public keys? (Which reminds me that the public keys should be included in the summary line at the top.) Thanks!

Hi @mccalluc , here is the file I'm using. I only get the error when I specify the public keys:
pums_1000.csv

@mccalluc
Copy link
Contributor Author

mccalluc commented Dec 2, 2025

@ekraffmiller : Thank you!

Right now, I am getting an error, but not your error:

OpenDPException: 
  FailedFunction("ComputeError(ErrString("could not parse `1e+05` as dtype `i64` at column 'income' (column number 5)\n\nThe current offset in the file is 2888 bytes.\n\nYou might want to try:\n- increasing `infer_schema_length` (e.g. `infer_schema_length=10000`),\n- specifying correct dtype with the `schema_overrides` argument\n- setting `ignore_errors` to `True`,\n- adding `1e+05` to the `null_values` list.\n\nOriginal error: ```remaining bytes non-empty```"))")

I'll work on this bug, but it's curious that I don't see exactly what you see. Would you mind either installing in a fresh venv, or copying the environment info from the "About" tab? Thanks!

@ekraffmiller
Copy link
Member

Interesting, when I choose income as my first column, I'm getting the same error as you. To get the error with the marriage keys, I have been testing 'age' as my first column, with 1 and 100 as the upper and lower limit.
I did update my environment with a fresh venv, here is the about tab:
Screenshot 2025-12-02 at 11 29 17 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Ready for Review

Development

Successfully merging this pull request may close these issues.

Prompt for members of groups

3 participants