I've noticed that Tiktoken is really slow for strings of repeated characters like "a" * 100_000
. Interestingly, when you add spaces, like "a " * 50_000
, the performance is orders of magnitude better:
Is this a bug or a fundamental property of BPE?
My Tiktoken version is 0.5.0.