- Rename
PreprocessingMetadata -> PreppedTokenMetadata
- Represent
word_boundaries field as a list of the number of subtoken in each token, e.g
[1, 3, 1, 2] instead of [0, 1, 4, 5, 7]
- Remove
non-processible tokens filed. Return non-processible tokens as a separate object
- Provide a method for returning the metadata for the last tokens:
>>> metadata.for_last_tokens(n: int)
PreprocessingMetadata->PreppedTokenMetadataword_boundariesfield as a list of the number of subtoken in each token, e.g[1, 3, 1, 2] instead of [0, 1, 4, 5, 7]
non-processibletokens filed. Return non-processible tokens as a separate object