Chinese text is not being tokenized properly

We currently have pretranslations generated that look like:
```
...
"translation": "声音已经说出来了,耶稣就独自一人了.他们在那些日子里什么都没告诉任何人.",
"translationTokens": [
  "声音已经说出来了,耶稣就独自一人了.他们在那些日子里什么都没告诉任何人",
  "."
],
...
```
I assume that this is because we are using the `LatinWordTokenizer` for translation alignment. This is likely happening for some other scripts as well. We should evaluate how many projects this affects and consider using another tokenizer (or dynamically choosing a tokenizer). Unless I'm missing something, this would make the marker placement feature basically unavailable for those translating into scripts that the `LatinWordTokenizer` does not tokenize properly. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chinese text is not being tokenized properly #280

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Chinese text is not being tokenized properly #280

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions