Sample weights#120
Conversation
maciejkula
left a comment
There was a problem hiding this comment.
This looks great, I left a couple of comments. I think once we've settled on a design for the implicit factorization models we can extend to all the other models as well.
Thanks for your help, I really appreciate it!
| mask = mask.float() | ||
| loss = loss * mask | ||
| return loss.sum() / mask.sum() | ||
| if sample_weights is not None or mask is not None: |
There was a problem hiding this comment.
We don't need these checks here: we can always call the base loss function. I think that will make for less code repetition in all the loss functions.
|
|
||
|
|
||
| def pointwise_loss(positive_predictions, negative_predictions, mask=None): | ||
| def base_loss(loss, sample_weights=None, mask=None): |
There was a problem hiding this comment.
Could we call this _weighted_loss? This would reflect the fact that:
- This is an internal function.
- Its essence lies in applying weights.
There was a problem hiding this comment.
I'm not opposed to this name, but it'd be a misnomer in case they call us with a mask and no weights. Although, conceptually at least, a mask could be thought of as a weight of zero.
_base_loss
vs
_weighted_loss
vs
_modified_loss
¯_(ツ)_/¯ ?
There was a problem hiding this comment.
If, as you suggest, we subsume masks under weights this will all be fine!
| raise ValueError('Degenerate epoch loss: {}' | ||
| .format(epoch_loss)) | ||
|
|
||
| def fit_weighted(self, interactions, verbose=False): |
There was a problem hiding this comment.
I think we should keep a single fit method. If weights are present, we use them; if not, we don't.
This could probably be accomplished by changing the minibatch function to just infinitely yield None for those arguments that are None rather than tensors. This way, it always yields None for batch_sample_weights which ties in nicely with how the loss functions are changed.
There was a problem hiding this comment.
Definitely tried to keep it to a single fit function at first. But I want to avoid consuming ram for an unnecessary tensor of Nones.
Would this lead to a tensor of Nones (not None) being passed to the loss functions, rather than a single None? (unless the tensor is not created by yielding Nones?)
Would also have to augment torch_utils.shuffle and add a few if all(weights==None) checks in (may slow it down a bit).
What if we just checked if weights are specified at the beginning of fit, and if they are just have fit call _fit_weighted? That way we can maintain the api of having the user always call fit, but still keep the code relatively simple.
Sequence weight question for you: I was thinking about the sequence interactions yesterday, and my guess is to make a parallel sequence tensor full of sample weights, of the same dimensions as the sequence tensor of item-ids. Does this sound right?
There was a problem hiding this comment.
No, we won't need a tensor of nones. I'll add a comment with a prototype of what I meant.
…ed_loss. rename fit_weighted to _fit_weighted and call it from fit to streamline the user-api. add cscope database files to gitignore.
|
I notice that the sequence losses already make pretty heavy use of the mask argument via the PADDING_INDEX. Is it correct for sample weights to override them in the _weighted_loss, or is there something more sophisticated we could be doing? |
|
This is a good point: we can definitely subsume masks under weights by just setting weights to zero where mask should be false. |
| raise ValueError('Degenerate epoch loss: {}' | ||
| .format(epoch_loss)) | ||
|
|
||
| def _fit_weighted(self, interactions, verbose=False): |
There was a problem hiding this comment.
I'm still not convinced about having a separate _fit_weighted function, even if it is internal. I think it introduces too much code duplication that will have to be kept in sync.
What about modifying minibatch to look roughly like this:
def minibatch(*tensors, **kwargs):
batch_size = kwargs.get('batch_size', 128)
if len(tensors) == 1:
tensor = tensors[0]
for i in range(0, len(tensor), batch_size):
yield tensor[i:i + batch_size]
else:
for i in range(0, len(tensors[0]), batch_size):
yield tuple(x[i:i + batch_size] if x is not None else None for x in tensors)This way, it emits tensor slices if an argument is a tensor (as before), but also emits None in the tuple if an argument is None.
| raise ValueError('Degenerate epoch loss: {}' | ||
| .format(epoch_loss)) | ||
|
|
||
| def fit_weighted(self, interactions, verbose=False): |
There was a problem hiding this comment.
No, we won't need a tensor of nones. I'll add a comment with a prototype of what I meant.
maciejkula
left a comment
There was a problem hiding this comment.
I like the idea of making masks simply zero weights.
But I still think we should try to keep only one fit method :)
…ights and removing _fit_weighted.
|
Closing in favour of #122 |
Commit corresponds to discussion at #118
Loss functions and implicit factorization done.
Implicit sequence coming.