Skip to content

Conversation

@vijmeister
Copy link
Contributor

@vijmeister
Copy link
Contributor Author

vijmeister commented Oct 19, 2025

I realized that I need to further think about the logic of my proposed fix. The .iloc usage is not capturing the intended columns to merge ...

  • Re-do logic of fix

@vijmeister vijmeister marked this pull request as draft October 19, 2025 21:34
@vijmeister vijmeister marked this pull request as ready for review October 28, 2025 02:33
new_columns_out = self.columns.union(other_columns, sort=False)
# Deduplicate column names if necessary
self_columns = Index(
dedup_names(list(self_columns), False), dtype=self_columns.dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would dedup_names be necessay here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having unique column names would preserve the original logic of using column names. Switching to indices would require multiple indices.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh?

other_columns = Index(
dedup_names(list(other_columns), False), dtype=other_columns.dtype
)
this.columns = Index(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why alter this.columns?

result = {}
for col in new_columns:
for col in new_columns_unique:
series = this[col]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get rid of all the dedup_names stuff above and just iterate over range(this.shape[1]) and use series = this.iloc[:, i], other_series = other.iloc[:, i]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There has to be other fixes because the logic heavily relied on column names instead of indices. I think this, other and new_columns(which result uses) would each need to have their own indices.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic heavily relied on column names instead of indices

how so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the docstring of the function there is an example in which one dataframe has columns A,B while other has B,C. In that case it would be tricky to use index instead of column name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On L9100 we call self.align(other). After that, the columns are aligned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what align is doing now after seeing the docsting example with different column names.

@vijmeister
Copy link
Contributor Author

@jbrockmendel , I was able to use .iloc instead of column names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.combine with non-unique columns

2 participants