Problem
The current transform method on SubspaceDiscrete contains a defensive hack to anchor the encoded column set:
try:
return comp_rep[self.comp_rep.columns]
except AttributeError:
return comp_rep
This exists because comp_rep is computed once at construction time and then used as the reference for which encoded columns should appear in all subsequent transforms. The pattern comp_rep[self.comp_rep.columns] ensures that new transforms produce exactly the same columns as the original comp_rep, even if the encoding logic would produce different columns (e.g., due to decorrelation selecting different columns).
Investigation confirms this is purely defensive — no subspace-level column modification ever occurs beyond what individual parameters produce via their comp_df cached properties (which apply decorrelation and constant-column-dropping at the parameter level).
Why it matters
- Fragile coupling: The hack creates an implicit dependency between the
transform method and the construction-time comp_rep. If any future code modifies the column set, this line silently papers over the inconsistency instead of surfacing it.
- Unnecessary indirection: The column anchoring serves no functional purpose given the current codebase, but it persists as dead-code-like complexity that every reader must reason about.
- Blocks encoding refactoring: Moving encoding to be purely parameter-level is cleaner without this subspace-level column-set override.
Related
Problem
The current
transformmethod onSubspaceDiscretecontains a defensive hack to anchor the encoded column set:This exists because
comp_repis computed once at construction time and then used as the reference for which encoded columns should appear in all subsequent transforms. The patterncomp_rep[self.comp_rep.columns]ensures that new transforms produce exactly the same columns as the originalcomp_rep, even if the encoding logic would produce different columns (e.g., due to decorrelation selecting different columns).Investigation confirms this is purely defensive — no subspace-level column modification ever occurs beyond what individual parameters produce via their
comp_dfcached properties (which apply decorrelation and constant-column-dropping at the parameter level).Why it matters
transformmethod and the construction-timecomp_rep. If any future code modifies the column set, this line silently papers over the inconsistency instead of surfacing it.Related
comp_repas a mutable attribute. The column-set anchor exists because the cached representation could theoretically drift.comp_repis eagerly computed and cached at construction time.