Skip to content

Sample weights#120

Closed
nikisix wants to merge 7 commits into
maciejkula:masterfrom
nikisix:sample-weights
Closed

Sample weights#120
nikisix wants to merge 7 commits into
maciejkula:masterfrom
nikisix:sample-weights

Conversation

@nikisix

@nikisix nikisix commented Jul 17, 2018

Copy link
Copy Markdown
Contributor

Commit corresponds to discussion at #118
Loss functions and implicit factorization done.
Implicit sequence coming.

@maciejkula maciejkula left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, I left a couple of comments. I think once we've settled on a design for the implicit factorization models we can extend to all the other models as well.

Thanks for your help, I really appreciate it!

Comment thread spotlight/losses.py Outdated
mask = mask.float()
loss = loss * mask
return loss.sum() / mask.sum()
if sample_weights is not None or mask is not None:

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need these checks here: we can always call the base loss function. I think that will make for less code repetition in all the loss functions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally

Comment thread spotlight/losses.py Outdated


def pointwise_loss(positive_predictions, negative_predictions, mask=None):
def base_loss(loss, sample_weights=None, mask=None):

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call this _weighted_loss? This would reflect the fact that:

  1. This is an internal function.
  2. Its essence lies in applying weights.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to this name, but it'd be a misnomer in case they call us with a mask and no weights. Although, conceptually at least, a mask could be thought of as a weight of zero.

_base_loss
vs
_weighted_loss
vs
_modified_loss

¯_(ツ)_/¯ ?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If, as you suggest, we subsume masks under weights this will all be fine!

Comment thread spotlight/factorization/implicit.py Outdated
raise ValueError('Degenerate epoch loss: {}'
.format(epoch_loss))

def fit_weighted(self, interactions, verbose=False):

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep a single fit method. If weights are present, we use them; if not, we don't.

This could probably be accomplished by changing the minibatch function to just infinitely yield None for those arguments that are None rather than tensors. This way, it always yields None for batch_sample_weights which ties in nicely with how the loss functions are changed.

@nikisix nikisix Jul 18, 2018

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely tried to keep it to a single fit function at first. But I want to avoid consuming ram for an unnecessary tensor of Nones.
Would this lead to a tensor of Nones (not None) being passed to the loss functions, rather than a single None? (unless the tensor is not created by yielding Nones?)
Would also have to augment torch_utils.shuffle and add a few if all(weights==None) checks in (may slow it down a bit).

What if we just checked if weights are specified at the beginning of fit, and if they are just have fit call _fit_weighted? That way we can maintain the api of having the user always call fit, but still keep the code relatively simple.

Sequence weight question for you: I was thinking about the sequence interactions yesterday, and my guess is to make a parallel sequence tensor full of sample weights, of the same dimensions as the sequence tensor of item-ids. Does this sound right?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we won't need a tensor of nones. I'll add a comment with a prototype of what I meant.

nikisix added 3 commits July 18, 2018 12:16
…ed_loss. rename fit_weighted to _fit_weighted and call it from fit to streamline the user-api. add cscope database files to gitignore.
@nikisix

nikisix commented Jul 18, 2018

Copy link
Copy Markdown
Contributor Author

I notice that the sequence losses already make pretty heavy use of the mask argument via the PADDING_INDEX.

                loss = self._loss_func(positive_prediction,
                                       negative_prediction,
                                       mask=(sequence_var != PADDING_IDX))

Is it correct for sample weights to override them in the _weighted_loss, or is there something more sophisticated we could be doing?

@maciejkula

Copy link
Copy Markdown
Owner

This is a good point: we can definitely subsume masks under weights by just setting weights to zero where mask should be false.

Comment thread spotlight/factorization/implicit.py Outdated
raise ValueError('Degenerate epoch loss: {}'
.format(epoch_loss))

def _fit_weighted(self, interactions, verbose=False):

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced about having a separate _fit_weighted function, even if it is internal. I think it introduces too much code duplication that will have to be kept in sync.

What about modifying minibatch to look roughly like this:

def minibatch(*tensors, **kwargs):

    batch_size = kwargs.get('batch_size', 128)

    if len(tensors) == 1:
        tensor = tensors[0]
        for i in range(0, len(tensor), batch_size):
            yield tensor[i:i + batch_size]
    else:
        for i in range(0, len(tensors[0]), batch_size):
            yield tuple(x[i:i + batch_size] if x is not None else None for x in tensors)

This way, it emits tensor slices if an argument is a tensor (as before), but also emits None in the tuple if an argument is None.

Comment thread spotlight/factorization/implicit.py Outdated
raise ValueError('Degenerate epoch loss: {}'
.format(epoch_loss))

def fit_weighted(self, interactions, verbose=False):

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we won't need a tensor of nones. I'll add a comment with a prototype of what I meant.

@maciejkula maciejkula left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of making masks simply zero weights.

But I still think we should try to keep only one fit method :)

@maciejkula

Copy link
Copy Markdown
Owner

Closing in favour of #122

@maciejkula maciejkula closed this Aug 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants