Skip to content

Commit 2b90022

Browse files
committed
Move 喫茶小舖/純喫茶 to TWPhrases/HKPhrases
- 此二項未涉及地區詞轉換,而是用於地區用字轉換的校正詞典,故移至新增的獨立的詞典檔 TWVariantsPhrases.txt,類似 TWVariantsRevPhrases / TWVariantsRev 的關係。 - 同理,加入 HKVariantsPhrases。 - 修改 t2tw.json/t2hk.json 的分詞詞典為新增的詞典檔。
1 parent cfdd373 commit 2b90022

14 files changed

Lines changed: 102 additions & 23 deletions

CONTRIBUTING.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -239,15 +239,15 @@ bazel test //data/config:config_dict_validation_test
239239

240240
2. **`s2tw.json`** - 簡體轉臺灣正體
241241
- 使用 `STPhrases.txt``STCharacters.txt`
242-
- 額外使用 `TWVariants.txt`
242+
- 額外使用 `TWVariantsPhrases.txt``TWVariants.txt`
243243

244244
3. **`s2twp.json`** - 簡體轉臺灣正體(含慣用詞)
245245
- 使用 `STPhrases.txt``STCharacters.txt`
246-
- 額外使用 `TWPhrases.txt``TWVariants.txt`
246+
- 額外使用 `TWPhrases.txt``TWVariantsPhrases.txt``TWVariants.txt`
247247

248248
4. **`s2hk.json`** - 簡體轉香港繁體
249249
- 使用 `STPhrases.txt``STCharacters.txt`
250-
- 額外使用 `HKVariants.txt`
250+
- 額外使用 `HKVariantsPhrases.txt``HKVariants.txt`
251251

252252
### 測試建議
253253

@@ -272,8 +272,8 @@ bazel test //data/config:config_dict_validation_test
272272

273273
- **僅修改基本簡繁對應**:修改 `STCharacters.txt`,測試至少包含 `s2t`
274274
- **修改詞組轉換**:修改 `STPhrases.txt`,測試包含 `s2t``s2tw``s2twp``s2hk`
275-
- **臺灣特有用詞**:修改 `TWPhrases*.txt``TWVariants.txt`,測試包含 `s2tw``s2twp`
276-
- **香港特有用詞**:修改 `HKVariants*.txt`,測試包含 `s2hk`
275+
- **臺灣特有用詞**:修改 `TWPhrases*.txt``TWVariantsPhrases.txt``TWVariants.txt`,測試包含 `s2tw``s2twp`
276+
- **香港特有用詞**:修改 `HKVariantsPhrases.txt``HKVariants*.txt`,測試包含 `s2hk`
277277

278278
## 提交變更
279279

data/CMakeLists.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,10 @@ set(
1414
TWPhrases
1515
TWPhrasesRev
1616
TWVariants
17+
TWVariantsPhrases
1718
TWVariantsRevPhrases
1819
HKVariants
20+
HKVariantsPhrases
1921
HKVariantsRevPhrases
2022
JPVariants
2123
JPShinjitaiCharacters

data/config/s2hk.json

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,14 @@
2020
}
2121
}, {
2222
"dict": {
23-
"type": "ocd2",
24-
"file": "HKVariants.ocd2"
23+
"type": "group",
24+
"dicts": [{
25+
"type": "ocd2",
26+
"file": "HKVariantsPhrases.ocd2"
27+
}, {
28+
"type": "ocd2",
29+
"file": "HKVariants.ocd2"
30+
}]
2531
}
2632
}]
2733
}

data/config/s2tw.json

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,14 @@
2020
}
2121
}, {
2222
"dict": {
23-
"type": "ocd2",
24-
"file": "TWVariants.ocd2"
23+
"type": "group",
24+
"dicts": [{
25+
"type": "ocd2",
26+
"file": "TWVariantsPhrases.ocd2"
27+
}, {
28+
"type": "ocd2",
29+
"file": "TWVariants.ocd2"
30+
}]
2531
}
2632
}]
2733
}

data/config/s2twp.json

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,14 @@
2525
}
2626
}, {
2727
"dict": {
28-
"type": "ocd2",
29-
"file": "TWVariants.ocd2"
28+
"type": "group",
29+
"dicts": [{
30+
"type": "ocd2",
31+
"file": "TWVariantsPhrases.ocd2"
32+
}, {
33+
"type": "ocd2",
34+
"file": "TWVariants.ocd2"
35+
}]
3036
}
3137
}]
3238
}

data/config/t2hk.json

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,19 @@
44
"type": "mmseg",
55
"dict": {
66
"type": "ocd2",
7-
"file": "HKVariants.ocd2"
7+
"file": "HKVariantsPhrases.ocd2"
88
}
99
},
1010
"conversion_chain": [{
1111
"dict": {
12-
"type": "ocd2",
13-
"file": "HKVariants.ocd2"
12+
"type": "group",
13+
"dicts": [{
14+
"type": "ocd2",
15+
"file": "HKVariantsPhrases.ocd2"
16+
}, {
17+
"type": "ocd2",
18+
"file": "HKVariants.ocd2"
19+
}]
1420
}
1521
}]
1622
}

data/config/t2tw.json

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,19 @@
44
"type": "mmseg",
55
"dict": {
66
"type": "ocd2",
7-
"file": "TWVariants.ocd2"
7+
"file": "TWVariantsPhrases.ocd2"
88
}
99
},
1010
"conversion_chain": [{
1111
"dict": {
12-
"type": "ocd2",
13-
"file": "TWVariants.ocd2"
12+
"type": "group",
13+
"dicts": [{
14+
"type": "ocd2",
15+
"file": "TWVariantsPhrases.ocd2"
16+
}, {
17+
"type": "ocd2",
18+
"file": "TWVariants.ocd2"
19+
}]
1420
}
1521
}]
1622
}

data/dictionary/DictionaryTest.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,10 +65,10 @@ std::string DictionaryRunfilesTest::runfile_dir_;
6565
INSTANTIATE_TEST_SUITE_P(
6666
, DictionaryTest,
6767
::testing::Values(
68-
"HKVariants", "HKVariantsRev", "HKVariantsRevPhrases",
68+
"HKVariants", "HKVariantsPhrases", "HKVariantsRev", "HKVariantsRevPhrases",
6969
"JPShinjitaiCharacters", "JPShinjitaiPhrases", "JPVariants",
7070
"JPVariantsRev", "STCharacters", "STPhrases", "TSCharacters",
71-
"TSPhrases", "TWPhrases", "TWPhrasesRev", "TWVariants",
71+
"TSPhrases", "TWPhrases", "TWPhrasesRev", "TWVariants", "TWVariantsPhrases",
7272
"TWVariantsRev", "TWVariantsRevPhrases"),
7373
[](const testing::TestParamInfo<DictionaryTest::ParamType>& info) {
7474
return info.param;
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Open Chinese Convert (OpenCC) Dictionary
2+
# File: HKVariantsPhrases.txt
3+
# Format: key value(s) (values separated by spaces)
4+
# License: Apache-2.0 (see LICENSE)
5+
# Source: https://github.com/ByVoid/OpenCC
6+
# Used in configs: s2hk.json, t2hk.json
7+
8+
喫茶小舖 喫茶小舖
9+
純喫茶 純喫茶

data/dictionary/TWPhrases.txt

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,6 @@ U盤 隨身碟
111111
哈希 雜湊
112112
哈薩克斯坦 哈薩克
113113
哥斯達黎加 哥斯大黎加
114-
喫茶小舖 喫茶小舖
115114
單片機 微控制器
116115
回調 回撥
117116
固件 韌體
@@ -362,7 +361,6 @@ U盤 隨身碟
362361
粘貼 貼上 粘貼
363362
紅心大戰 傷心小棧
364363
納米 奈米
365-
純喫茶 純喫茶
366364
索馬里 索馬利亞
367365
組件 元件
368366
綁定 繫結

0 commit comments

Comments
 (0)