Skip to content

select() ignores distinct #154

@kornerc

Description

@kornerc

What happens?

select("distinct <columns>") ignores distinct.

Expected behavior: a) throw an exception if this is not supported or b) apply the distinct.

E.g. select(count("*") throws an exception _duckdb.BinderException: Binder Error: Aggregates cannot be present in a Project relation!

To Reproduce

import duckdb
import pandas as pd

df = pd.DataFrame({"a": [1, 2, 1, 1], "b": [10, 10, 10, 20], "c": [0, 1, 2, 3]})
duckdb.execute("CREATE TABLE foo AS SELECT * FROM df")

sql_query_selectdistinct = duckdb.table("foo").select("DISTINCT a, b").sql_query()

print(sql_query_selectdistinct)

assert "SELECT DISTINCT a, b FROM main.foo" == sql_query_selectdistinct
assert duckdb.table("foo").select("a, b").distinct().sql_query() == sql_query_selectdistinct

Output:

SELECT a, b FROM main.foo

Both assertions fail

OS:

Windows 11 x86_64

DuckDB Package Version:

1.4.1

Python Version:

3.13.6

Full Name:

Clemens Korner

Affiliation:

AIT Austrian Institute of Technology

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions