We currently have pretranslations generated that look like:
...
"translation": "声音已经说出来了,耶稣就独自一人了.他们在那些日子里什么都没告诉任何人.",
"translationTokens": [
"声音已经说出来了,耶稣就独自一人了.他们在那些日子里什么都没告诉任何人",
"."
],
...
I assume that this is because we are using the LatinWordTokenizer for translation alignment. This is likely happening for some other scripts as well. We should evaluate how many projects this affects and consider using another tokenizer (or dynamically choosing a tokenizer). Unless I'm missing something, this would make the marker placement feature basically unavailable for those translating into scripts that the LatinWordTokenizer does not tokenize properly.