νκ΅μ΄ ννμ λΆμκΈ° - MeCabμ νκ΅μ΄ Fork
μμ νλ’ νλ‘μ νΈμμ μμλ MeCab-Koλ₯Ό νλννκ³ , Rustλ‘ μ¬κ΅¬ννλ νλ‘μ νΈμ λλ€.
μ΄ μ μ₯μλ λ κ°μ§ ꡬνμ ν¬ν¨ν©λλ€:
| ꡬν | κ²½λ‘ | μν | μ€λͺ |
|---|---|---|---|
| Legacy (C/C++) | /legacy/ |
β μμ | κΈ°μ‘΄ mecab-ko ꡬν |
| Rust (v2) | /rust/ |
π§ κ°λ°μ€ | νλμ Rust μ¬κ΅¬ν |
| κΈ°μ‘΄ λ¬Έμ μ | Rust v2 ν΄κ²°μ± |
|---|---|
| μ€λλ μ¬μ (2018λ μ΄ν μ λ°μ΄νΈ μμ) | 2024λ μ΅μ λ§λμΉ κΈ°λ° μ¬μ v3.0 |
| C/C++ λ©λͺ¨λ¦¬ μμ μ± μ΄μ | Rustμ λ©λͺ¨λ¦¬ μμ μ± λ³΄μ₯ |
| 볡μ‘ν λΉλ (autotools) | Cargo κΈ°λ° κ°νΈν λΉλ |
| νλ«νΌ μ μ½ (WASM λ―Έμ§μ) | WASM, λ€μν νλ«νΌ μ§μ |
| λΆλ¦¬λ λ°μΈλ© νλ‘μ νΈλ€ | ν΅ν©λ Python/Node.js λ°μΈλ© |
cd legacy
./configure
make
make install
# μ¬μ μ€μΉ
cd mecab-ko-dic
./configure
make
make install
# μ€ν
echo "μλ
νμΈμ" | mecabcd rust
# λΉλ
cargo build --release
# ν
μ€νΈ
cargo test
# μ€ν
cargo run --bin mecab-ko -- "μλ
νμΈμ"# Cargo.toml
[dependencies]
mecab-ko = "0.1"use mecab_ko::Tokenizer;
fn main() {
let tokenizer = Tokenizer::new().unwrap();
let tokens = tokenizer.tokenize("μλ
νμΈμ, ννμ λΆμκΈ°μ
λλ€.");
for token in tokens {
println!("{}\t{}", token.surface, token.pos);
}
}pip install mecab-ko-rsfrom mecab_ko import Mecab
mecab = Mecab()
print(mecab.morphs("μλ
νμΈμ")) # ['μλ
', 'ν', 'μΈμ']
print(mecab.nouns("ννμ λΆμκΈ°")) # ['ννμ', 'λΆμκΈ°']mecab-ko/
βββ legacy/ # κΈ°μ‘΄ C/C++ ꡬν
β βββ src/ # MeCab μμ€ μ½λ
β βββ mecab-ko-dic/ # νκ΅μ΄ μ¬μ
β β βββ seed/ # μ¬μ μλ³Έ λ°μ΄ν°
β βββ configure # autotools λΉλ
β βββ Makefile
β
βββ rust/ # Rust v2 ꡬν
β βββ crates/
β β βββ mecab-ko-core/ # ν΅μ¬ λΆμ μμ§
β β βββ mecab-ko-dict/ # μ¬μ κ΄λ¦¬
β β βββ mecab-ko-hangul/ # νκΈ μ νΈλ¦¬ν°
β β βββ mecab-ko-cli/ # CLI λꡬ
β βββ Cargo.toml # Workspace μ€μ
β βββ README.md # Rust ꡬν μμΈ
β
βββ docs/ # νλ‘μ νΈ λ¬Έμ
β βββ PROJECT_PLAN.md # 24μ£Ό λ‘λλ§΅
β βββ ISSUE_BACKLOG.md # μ΄μ λ°±λ‘κ·Έ
β βββ AGENTS.md # λ©ν° μμ΄μ νΈ μμ€ν
β βββ DEVELOPMENT_WORKFLOW.md # κ°λ° μν¬νλ‘μ°
β βββ AUTOMATION_GUIDE.md # μλν κ°μ΄λ
β
βββ .github/ # GitHub μ€μ
β βββ workflows/ # CI/CD
β βββ ISSUE_TEMPLATE/ # μ΄μ ν
νλ¦Ώ
β
βββ CONTRIBUTING.md # κΈ°μ¬ κ°μ΄λ
βββ SECURITY.md # 보μ μ μ±
βββ CODE_QUALITY.md # μ½λ νμ§ κΈ°μ€
βββ README.md # μ΄ νμΌ
| λ©νΈλ¦ | Legacy | Kiwi | Rust v2 (λͺ©ν) |
|---|---|---|---|
| μλ (μ΄μ /μ΄) | ~100K | ~120K | ~150K |
| μ νλ | ~93% | ~87% | ~95% |
| λ©λͺ¨λ¦¬ | ~200MB | ~100MB | ~150MB |
| WASM μ§μ | β | β | β |
- νλ‘μ νΈ μ€κ³ λ° κ³ν
- νκΈ μ νΈλ¦¬ν° ꡬν
- μ¬μ ν¬λ§· μ€κ³
- κΈ°λ³Έ ν ν¬λμ΄μ
- Viterbi μκ³ λ¦¬μ¦
- μ¬μ v3.0 λΉλ
- CLI λꡬ
- Python λ°μΈλ© (PyO3)
- WASM μ§μ
- Elasticsearch νλ¬κ·ΈμΈ
- μ±λ₯ μ΅μ ν
- λ¬Έμν
- v1.0 릴리μ€
μμΈν κ³νμ PROJECT_PLAN.mdλ₯Ό μ°Έμ‘°νμΈμ.
κΈ°μ¬λ₯Ό νμν©λλ€! CONTRIBUTING.mdλ₯Ό μ°Έμ‘°ν΄μ£ΌμΈμ.
# μ μ₯μ ν΄λ‘
git clone https://github.com/hephaex/mecab-ko.git
cd mecab-ko
# Rust κ°λ°
cd rust
cargo build
cargo test
# Legacy λΉλ (μ ν)
cd ../legacy
./configure && make- Legacy (C/C++): GPL / LGPL / BSD (MeCab μλ³Έ λΌμ΄μΌμ€)
- Rust v2: MIT OR Apache-2.0
μ¬μ λ°μ΄ν°λ Apache License 2.0μ λ°λ¦ λλ€.
- MeCab - Taku Kudo
- μμ νλ’ νλ‘μ νΈ - mecab-ko μλ³Έ
- Lindera - Rust ννμ λΆμκΈ° μ°Έμ‘°
- Kiwi - νκ΅μ΄ ννμ λΆμκΈ° μ°Έμ‘°
- Author: hephaex ([email protected])
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Made with β€οΈ for Korean NLP