diff --git a/TRAINING_GUIDE_KO.md b/TRAINING_GUIDE_KO.md
new file mode 100644
index 0000000..39235e4
--- /dev/null
+++ b/TRAINING_GUIDE_KO.md
@@ -0,0 +1,1120 @@
+# COMET Reference-Free 모델 학습 가이드 (A-Z)
+
+> EN-KO 특허 번역 품질 평가를 위한 Reference-Free COMET 모델 학습 완벽 가이드
+
+## 목차
+
+1. [개요](#1-개요)
+2. [배경 지식](#2-배경-지식)
+3. [환경 설정](#3-환경-설정)
+4. [데이터 준비](#4-데이터-준비)
+5. [학습 접근법 비교](#5-학습-접근법-비교)
+6. [Step-by-Step 학습 실행](#6-step-by-step-학습-실행)
+7. [모델 평가](#7-모델-평가)
+8. [하이퍼파라미터 튜닝](#8-하이퍼파라미터-튜닝)
+9. [문제 해결 (Troubleshooting)](#9-문제-해결-troubleshooting)
+10. [참고 자료](#10-참고-자료)
+
+---
+
+## 1. 개요
+
+### 1.1 목표
+
+현재 보유한 EN-KO 특허 번역 품질 평가 데이터를 사용하여 **reference 없이** source와 MT만으로 번역 품질 점수를 예측하는 COMET 모델을 학습합니다.
+
+### 1.2 Reference-Free란?
+
+일반적인 번역 평가 지표(BLEU, COMET 기본 모델)는 정답 번역(reference)이 필요합니다. 하지만 실제 서비스 환경에서는 reference가 없는 경우가 많습니다. **Reference-Free (Quality Estimation, QE)** 모델은 원문(source)과 기계번역(MT)만으로 품질을 예측합니다.
+
+```
+# Reference-based (기존)
+Score = Model(source, MT, reference)
+
+# Reference-free (목표)
+Score = Model(source, MT)     ← reference 불필요!
+```
+
+### 1.3 보유 데이터 현황
+
+| 파일 | 행 수 | 용도 |
+|------|-------|------|
+| `en-ko-qe-patent-balanced_train.csv` | ~9,663,341 | Pointwise 학습 |
+| `en-ko-qe-patent-balanced_val.csv` | ~508,621 | Pointwise 검증 |
+| `en-ko-qe-patent-balanced_pairwise_train.csv` | ~1,437,196 | Pairwise 학습 |
+| `en-ko-qe-patent-balanced_pairwise_val.csv` | ~1,314 | Pairwise 검증 |
+
+**Pointwise 형식**: `src, mt, ref, score` (개별 점수)
+**Pairwise 형식**: `src, mt_good, mt_bad, score_good, score_bad` (쌍 비교)
+
+---
+
+## 2. 배경 지식
+
+### 2.1 COMET 아키텍처
+
+COMET은 사전학습된 다국어 언어 모델(XLM-RoBERTa, InfoXLM 등)을 인코더로 사용하고, 그 위에 회귀 Head를 붙여 번역 품질 점수를 예측합니다.
+
+```
+┌─────────────────────────────────────────────────┐
+│                  Quality Score                    │
+│                    (0~1)                          │
+├─────────────────────────────────────────────────┤
+│            Feed-Forward Head                      │
+│         (2048 → 1024 → 1)                        │
+├─────────────────────────────────────────────────┤
+│           Feature Construction                    │
+│   [mt_emb, src_emb, mt*src, |mt-src|]            │
+├──────────────────┬──────────────────────────────┤
+│  MT Embedding    │    Source Embedding            │
+├──────────────────┴──────────────────────────────┤
+│       Pretrained Encoder (XLM-R / InfoXLM)       │
+│               (frozen → unfreeze)                 │
+└─────────────────────────────────────────────────┘
+```
+
+### 2.2 Reference-Free 모델 종류
+
+COMET에는 2가지 Reference-Free 아키텍처가 있습니다:
+
+#### (A) ReferencelessRegression (단순 구조)
+
+```python
+# comet/models/regression/referenceless.py
+src_emb = encoder(source)      # 소스 인코딩
+mt_emb = encoder(MT)           # MT 인코딩  (별도 인코딩!)
+
+features = [mt_emb, src_emb, mt_emb * src_emb, |mt_emb - src_emb|]
+score = feedforward(features)  # 4 * 1024 = 4096 dim 입력
+```
+
+- Source와 MT를 **별도로** 인코딩
+- 4가지 특징 벡터 조합 (연결, 곱, 절대 차이)
+- 구조가 단순하고 이해하기 쉬움
+- 과거 모델: `wmt20-comet-qe-da`, `wmt21-comet-qe-da`
+
+#### (B) UnifiedMetric QE 모드 (COMETKiwi 구조, 추천)
+
+```python
+# comet/models/multitask/unified_metric.py
+combined_input = "[CLS] MT [SEP] Source [SEP]"
+encoder_out = encoder(combined_input)  # 하나의 시퀀스로 인코딩!
+
+cls_embedding = encoder_out[:, 0, :]   # CLS 토큰
+score = feedforward(cls_embedding)     # 1024 dim 입력
+```
+
+- Source와 MT를 **하나의 시퀀스로 연결**하여 인코딩
+- Cross-attention 효과 (토큰 간 상호작용)
+- CLS 토큰으로 문장 표현
+- 현재 SOTA: `wmt22-cometkiwi-da`, `wmt23-cometkiwi-da-xl`
+
+### 2.3 어떤 구조를 선택해야 할까?
+
+| 기준 | ReferencelessRegression | UnifiedMetric QE |
+|------|------------------------|-------------------|
+| 성능 | 보통 | **더 높음** (SOTA) |
+| 학습 난이도 | 쉬움 | 약간 복잡 |
+| 메모리 사용 | 더 높음 (2회 인코딩) | 더 효율적 (1회 인코딩) |
+| Fine-tuning 호환 | wmt20/21 QE 체크포인트 | **COMETKiwi 체크포인트** |
+| 추천도 | 실험/비교용 | **프로덕션 추천** |
+
+**결론: UnifiedMetric QE 모드 + COMETKiwi fine-tuning을 가장 추천합니다.**
+
+### 2.4 학습 전략: From Scratch vs Fine-tuning
+
+#### From Scratch
+- 사전학습 인코더(XLM-R)에 새로운 Head를 학습
+- 장점: 도메인에 처음부터 맞출 수 있음
+- 단점: 많은 데이터와 긴 학습 시간 필요
+
+#### Full Fine-tuning
+- COMETKiwi 전체 파라미터를 학습
+- 장점: 도메인에 깊이 적응 가능, 대용량 데이터에 적합
+- 단점: **소규모 데이터에서 과적합/분포 붕괴 위험** (MTQE 논문에서 확인됨)
+
+#### Parameter-Efficient Fine-tuning (PEFT)
+- LoRA, BitFit, FTHead 등으로 일부 파라미터만 학습
+- 장점: 과적합 방지, VRAM 절약, 안정적 학습
+- 단점: 표현력이 제한될 수 있음
+
+### 2.5 실제 사례 분석: COMETKiwi Fine-tuning 성공/실패 사례
+
+#### (A) ComeTH (wasanx/ComeTH) - EN-TH Fine-tuning 성공 사례
+
+[ComeTH](https://huggingface.co/wasanx/ComeTH)는 COMETKiwi(`wmt22-cometkiwi-da`)를
+영어-태국어 번역 품질 평가에 맞게 fine-tuning한 모델입니다.
+
+| 항목 | 내용 |
+|------|------|
+| **베이스 모델** | `Unbabel/wmt22-cometkiwi-da` (UnifiedMetric) |
+| **인코더** | `microsoft/infoxlm-large` |
+| **학습 데이터** | Human MQM 어노테이션 + Claude 3.5 Sonnet 증강 데이터 |
+| **에폭** | 5 |
+| **Gradient Accumulation** | 8 |
+| **결과** | Baseline Spearman 0.4570 → **+4.9% 향상** |
+
+**핵심 인사이트:**
+- COMETKiwi를 특정 언어쌍에 fine-tuning하면 유의미한 성능 향상이 가능함을 입증
+- **LLM 증강 데이터** (Claude 3.5 Sonnet이 생성한 품질 판단)가 성능을 추가로 향상
+- LLM 증강 모델이 직접 LLM 평가보다도 높은 상관계수 달성 (비용 효율적)
+- 우리의 접근법 3과 **동일한 전략** (COMETKiwi + UnifiedMetric + fine-tuning)
+
+#### (B) MTQE.en-he (arXiv:2602.06546) - Full Fine-tuning 위험성 발견
+
+2026년 2월 발표된 이 논문은 **소규모 데이터에서 COMETKiwi full fine-tuning의 위험성**을
+실증적으로 보여줍니다. 300개 EN-HE 샘플에서 실험:
+
+| Fine-tuning 방법 | 학습 파라미터 비율 | 결과 |
+|------------------|-------------------|------|
+| **Full Fine-tuning** | 100% | **성능 악화** (2~3pp 하락, 분포 붕괴) |
+| **LoRA** | ~1% | +2~3pp 향상 (Pearson/Spearman) |
+| **BitFit** (bias만) | ~0.2% | +2~3pp 향상 |
+| **FTHead** (head만) | ~2% | +2~3pp 향상 (LoRA보다 약간 낮음) |
+
+> **경고**: 소규모 데이터에서 Full fine-tuning은 COMETKiwi에서 **overfitting과
+> score distribution collapse**를 유발합니다. 모델이 특정 점수 범위로만 예측하게 됩니다.
+
+**단, 이 논문의 조건과 우리 상황은 매우 다릅니다:**
+
+| 조건 | MTQE.en-he | **우리 프로젝트** |
+|------|-----------|-----------------|
+| 학습 데이터 크기 | **300 샘플** | **~960만 샘플** (32,000배) |
+| 검증 데이터 | 100 샘플 | ~50만 샘플 |
+| 도메인 | 일반 WMT | 특허 (특정 도메인) |
+| 언어쌍 | EN-HE | EN-KO |
+
+**우리의 대용량 데이터(960만 행)에서는 full fine-tuning이 성공할 가능성이 높습니다.**
+소규모 데이터에서의 과적합/분포 붕괴 문제는 대규모 데이터에서는 자연스럽게 완화됩니다.
+하지만 **안전한 대안**으로 LoRA/BitFit/FTHead도 함께 제공합니다.
+
+#### (C) 기타 참고 사례
+
+| 프로젝트 | 전략 | 핵심 교훈 |
+|----------|------|-----------|
+| **AfriCOMET** (아프리카 13개 언어) | 언어 특화 인코더(AfroXLM-R) 사용 | 언어 계열 전용 인코더가 효과적 |
+| **IndicCOMET** (인도 5개 언어, 7000 MQM) | COMET-MQM 체크포인트에서 시작 | 더 강한 체크포인트 = 더 좋은 결과 |
+| **Cometoid** (WMT 2023, 지식 증류) | 교사 모델로 합성 점수 생성 | Reference-free가 reference-based 능가 가능 |
+| **도메인 적응 연구** (ACL 2024) | 도메인 간 전이 분석 | fine-tuned 모델은 학습 도메인 밖에서 성능 저하 가능 |
+
+### 2.6 우리 데이터에 대한 전략적 판단
+
+**결론: 접근법 3 (COMETKiwi Full Fine-tuning)은 성공 가능성이 높습니다.**
+
+근거:
+1. **ComeTH가 동일 전략으로 성공** → COMETKiwi fine-tuning 자체는 검증된 방법
+2. **960만 행의 대규모 데이터** → MTQE 논문의 과적합 문제(300샘플)와는 상황이 완전히 다름
+3. **특허 도메인 특화** → 도메인 적응이 성능 향상에 기여할 것
+
+**추천 실행 순서 (수정됨):**
+1. 먼저 **접근법 5 (FTHead)** → 가장 안전하고 빠르게 baseline 확보
+2. 그 다음 **접근법 3 (Full Fine-tuning)** → 대용량 데이터의 장점 활용
+3. 접근법 3에서 과적합 징후 시 **접근법 6 (LoRA)** → 중간 지점
+4. 성능 비교 후 최적 방법 선택
+
+---
+
+## 3. 환경 설정
+
+### 3.1 하드웨어 요구사항
+
+| 구성 | 최소 | 권장 |
+|------|------|------|
+| GPU | 1x V100 (32GB) | 1-2x A100 (80GB) |
+| RAM | 32GB | 64GB+ |
+| Disk | 50GB | 100GB+ |
+| CUDA | 11.7+ | 12.0+ |
+
+> **참고**: 데이터가 ~960만 행으로 매우 크기 때문에 RAM이 충분해야 합니다.
+> 메모리가 부족하면 `--max_train_rows`로 데이터를 줄여 실험할 수 있습니다.
+
+### 3.2 소프트웨어 설치
+
+```bash
+# 1. 프로젝트 디렉토리 이동
+cd /path/to/COMET
+
+# 2. Python 가상환경 생성 (권장)
+python -m venv .venv
+source .venv/bin/activate
+
+# 3. Poetry를 사용한 의존성 설치 (pyproject.toml 기반)
+pip install poetry
+poetry install
+
+# 또는 pip으로 직접 설치
+pip install unbabel-comet
+
+# 4. 추가 패키지 (평가 스크립트용)
+pip install scipy scikit-learn
+
+# 5. LoRA Fine-tuning용 (접근법 5, 6에서 사용)
+pip install peft>=0.6.0
+
+# 5. 설치 확인
+comet-score --help
+comet-train --help
+```
+
+### 3.3 설치 확인
+
+```bash
+# Python에서 COMET 임포트 확인
+python -c "
+import comet
+print(f'COMET version: {comet.__version__}')
+from comet.models import ReferencelessRegression, UnifiedMetric
+print('Models imported successfully')
+
+import torch
+print(f'PyTorch: {torch.__version__}')
+print(f'CUDA available: {torch.cuda.is_available()}')
+if torch.cuda.is_available():
+    print(f'GPU: {torch.cuda.get_device_name(0)}')
+    print(f'VRAM: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB')
+"
+```
+
+---
+
+## 4. 데이터 준비
+
+### 4.1 COMET이 요구하는 데이터 형식
+
+**ReferencelessRegression** (`referenceless.py:202-214`):
+```csv
+src,mt,score
+"source sentence in English","번역된 한국어 문장",0.75
+```
+
+**UnifiedMetric QE** (`unified_metric.py:288-306`):
+```csv
+src,mt,score
+"source sentence in English","번역된 한국어 문장",0.75
+```
+
+> 두 모델 모두 `src, mt, score` 3개 컬럼만 필요합니다.
+> `ref` 컬럼은 무시됩니다 (reference-free).
+
+### 4.2 데이터 변환 실행
+
+```bash
+# 기본 변환 (pointwise 데이터만)
+python scripts/prepare_data.py \
+    --input_dir /path/to/train_data \
+    --output_dir data/en-ko-qe
+
+# Pairwise 데이터도 포함
+python scripts/prepare_data.py \
+    --input_dir /path/to/train_data \
+    --output_dir data/en-ko-qe \
+    --include_pairwise
+
+# 빠른 실험용 (100만 행으로 제한)
+python scripts/prepare_data.py \
+    --input_dir /path/to/train_data \
+    --output_dir data/en-ko-qe \
+    --max_train_rows 1000000 \
+    --include_pairwise
+```
+
+### 4.3 변환 결과 확인
+
+```
+data/en-ko-qe/
+├── train.csv           # 학습 데이터 (src, mt, score) - 모든 접근법에서 공통 사용
+├── val.csv             # 검증 데이터 (src, mt, score)
+├── mini_train.csv      # 파이프라인 테스트용 (1000 rows)
+└── mini_val.csv        # 파이프라인 테스트용 (200 rows)
+```
+
+> **참고**: `--include_pairwise` 사용 시 pairwise 데이터가 pointwise로 변환되어
+> train.csv/val.csv에 자동 합쳐집니다. 별도 파일이 생성되지 않습니다.
+
+### 4.4 데이터 포맷 상세 설명
+
+현재 보유 데이터의 주요 컬럼:
+
+```
+# Pointwise (en-ko-qe-patent-balanced_train.csv)
+lp          : 언어쌍 (en-ko)
+src         : 영어 원문                    ← COMET 사용
+mt          : 기계 번역 (한국어)            ← COMET 사용
+ref         : 정답 번역 (한국어)            ← Reference-free에서는 미사용
+score       : 품질 점수 (0~1)              ← COMET 사용
+domain      : 도메인 (us_cl 등)
+model_type  : MT 모델 종류 (gemma 등)
+src_scores  : 소스 기반 점수
+mqm_scores  : MQM 점수
+```
+
+`score` 컬럼이 0~1 범위인 것은 COMET 학습에 이상적입니다 (정규화 불필요).
+
+### 4.5 점수 분포 분석과 리밸런싱 (매우 중요!)
+
+> **핵심**: COMET은 학습 데이터의 점수 분포에 **매우 민감**합니다.
+> "COMET is highly susceptible to the distribution of scores in the training data"
+> — Falcao et al. (LREC 2024), Zouhar et al. (WMT 2024)
+
+#### 현재 데이터의 점수 분포 문제
+
+```
+0.0-0.3:   344,930 (  3.5%)  ██▌             ← 심각하게 부족!
+0.3-0.5: 1,363,461 ( 14.2%) ██████████
+0.5-0.7: 2,197,215 ( 22.8%) ████████████████
+0.7-1.0: 5,717,209 ( 59.4%) ████████████████████████████████████  ← 과다!
+```
+
+**문제점:**
+1. **고품질 편향**: 0.7~1.0 구간이 60%를 차지 → 모델이 항상 높은 점수를 예측하는 경향
+2. **저품질 식별 불가**: 0.0~0.3 구간이 3.5%뿐 → 나쁜 번역을 낮은 점수로 평가하지 못함
+3. **평균 회귀**: MSE 손실은 본질적으로 학습 데이터 평균으로 수렴하는 경향
+4. **예측 범위 압축**: 출력이 0.5~0.8 좁은 범위에 몰리는 "score distribution collapse"
+
+#### 이 문제가 발생하는 이유
+
+이것은 WMT DA 학습 데이터에서도 동일하게 발생하는 알려진 문제입니다.
+Zouhar et al. (WMT 2024) "Pitfalls in Using COMET" 논문에서 실험으로 증명:
+
+| 실험 | 결과 |
+|------|------|
+| 전체 데이터로 학습 | 기준선 |
+| 상위 75% 점수만 사용 | 점수가 체계적으로 **높아짐** |
+| 하위 75% 점수만 사용 | 점수가 체계적으로 **낮아짐** |
+
+→ 학습 데이터의 점수 분포가 모델 출력을 직접적으로 편향시킵니다.
+
+#### 해결 방법: 리밸런싱
+
+```bash
+# 1. 먼저 현재 분포 분석
+python scripts/analyze_and_rebalance.py \
+    --input data/en-ko-qe/train.csv \
+    --analyze_only
+
+# 2-A. 소프트 리밸런싱 (권장, 균형과 데이터 보존 타협)
+python scripts/analyze_and_rebalance.py \
+    --input data/en-ko-qe/train.csv \
+    --output data/en-ko-qe/train_balanced.csv \
+    --strategy soft \
+    --target_total 3000000 \
+    --smoothing 0.5
+
+# 2-B. 완전 균등 리밸런싱 (가장 공격적)
+python scripts/analyze_and_rebalance.py \
+    --input data/en-ko-qe/train.csv \
+    --output data/en-ko-qe/train_equal.csv \
+    --strategy equal
+```
+
+#### 리밸런싱 전략 비교
+
+| 전략 | 설명 | 장점 | 단점 |
+|------|------|------|------|
+| **soft (권장)** | 과소 구간 오버샘플 + 과다 구간 언더샘플 | 균형과 데이터 보존 타협 | 완벽히 균등하지 않음 |
+| equal | 모든 구간 동일 샘플 수 | 완벽히 균등 | 데이터 손실 큼 (최소 구간에 맞춤) |
+| weighted | sample_weight 컬럼 추가 | 원본 보존 | 커스텀 loss 코드 필요 |
+
+#### 소프트 리밸런싱 적용 시 예상 변화
+
+`--smoothing 0.5` (제곱근 역빈도) 적용 시:
+
+```
+            원본                →        리밸런싱 후
+0.0-0.3:   3.5% (34만)         →        ~12% (36만, 오버샘플)
+0.3-0.5:  14.2% (136만)        →        ~18% (54만, 유사)
+0.5-0.7:  22.8% (220만)        →        ~25% (75만, 언더샘플)
+0.7-1.0:  59.4% (572만)        →        ~45% (135만, 언더샘플)
+```
+
+→ 저품질 구간이 보강되어 모델이 좋고 나쁜 번역을 더 잘 구별할 수 있게 됩니다.
+
+#### 학습 시 적용
+
+리밸런싱된 데이터를 사용하려면 config에서 train_data 경로를 변경:
+
+```yaml
+train_data:
+  - data/en-ko-qe/train_balanced.csv   # 리밸런싱 데이터
+```
+
+#### 주의사항
+
+1. **검증 데이터는 리밸런싱하지 마세요** — 실제 분포를 반영해야 정확한 평가 가능
+2. **오버샘플링된 데이터에서 과적합 주의** — 같은 샘플이 반복되므로 에폭 수를 줄이기
+3. **단계적 실험** — 먼저 원본으로 학습 → 리밸런싱으로 학습 → 결과 비교
+
+---
+
+## 5. 학습 접근법 비교
+
+총 **6가지 접근법**을 준비했습니다. 상황에 맞게 선택하세요.
+
+### 접근법 비교표
+
+| # | 접근법 | 아키텍처 | 시작점 | 학습 파라미터 | 난이도 | 위험도 |
+|---|--------|---------|--------|-------------|--------|--------|
+| 1 | ReferencelessRegression Scratch | ReferencelessRegression | XLM-R | 100% | ★☆☆ | 낮음 |
+| 2 | UnifiedMetric QE Scratch | UnifiedMetric | InfoXLM | 100% | ★★☆ | 낮음 |
+| **3** | **COMETKiwi Full Fine-tuning** | **UnifiedMetric** | **COMETKiwi** | **100%** | **★★☆** | **중간** |
+| 4 | ReferencelessRegression Fine-tuning | ReferencelessRegression | wmt21-qe | 100% | ★★☆ | 중간 |
+| **5** | **COMETKiwi FTHead (Head만)** | **UnifiedMetric** | **COMETKiwi** | **~2%** | **★☆☆** | **낮음** |
+| **6** | **COMETKiwi LoRA** | **UnifiedMetric** | **COMETKiwi** | **~1%** | **★★☆** | **낮음** |
+
+### ComeTH와 우리 접근법 비교
+
+| 항목 | ComeTH (EN-TH) | **우리 프로젝트 (EN-KO)** |
+|------|----------------|--------------------------|
+| 베이스 모델 | wmt22-cometkiwi-da | wmt22-cometkiwi-da |
+| 방법 | Full Fine-tuning | Full FT + LoRA + FTHead (비교) |
+| 데이터 규모 | 소~중규모 (MQM) | **~960만 행** (초대규모) |
+| 데이터 증강 | Claude 3.5 Sonnet | MQM 점수 기반 |
+| 에폭 | 5 | 3~5 |
+| Grad Accum | 8 | 4~8 |
+| 성과 | Spearman +4.9% | 측정 예정 |
+
+### 추천 실행 순서
+
+```
+STEP 0: 미니 테스트 (파이프라인 확인)
+  │
+  ▼
+STEP 5: 접근법 5 - FTHead (가장 안전, 빠른 baseline)
+  │
+  ├── 성능 양호 → STEP 3으로
+  └── 성능 부족 → STEP 6으로
+  │
+STEP 3: 접근법 3 - Full Fine-tuning (대용량 데이터 활용)
+  │
+  ├── 과적합 없음 → 최적 모델 후보
+  └── 과적합 발생 → STEP 6으로
+  │
+STEP 6: 접근법 6 - LoRA (과적합 방지 + 성능 균형)
+  │
+  ▼
+최적 모델 선택 (val_kendall 기준)
+```
+
+---
+
+## 6. Step-by-Step 학습 실행
+
+### STEP 0: 파이프라인 테스트 (필수!)
+
+실제 학습 전에 작은 데이터로 모든 것이 정상 동작하는지 확인합니다.
+
+```bash
+# 미니 데이터가 준비되었는지 확인
+ls data/en-ko-qe/mini_train.csv data/en-ko-qe/mini_val.csv
+
+# 미니 테스트 실행
+bash scripts/run_training.sh mini
+
+# 또는 직접 실행
+comet-train --cfg configs/models/en-ko-qe/approach_mini_test.yaml --seed_everything 12
+```
+
+정상 동작 시 아래와 유사한 출력이 나옵니다:
+```
+TRAINER ARGUMENTS:
+{...}
+MODEL ARGUMENTS:
+{...}
+Epoch 0: 100%|██████████| 20/20 [00:XX<00:00, X.XX it/s, train_loss=X.XXX]
+```
+
+### STEP 1: 접근법 1 - ReferencelessRegression From Scratch
+
+```bash
+# 학습 실행
+bash scripts/run_training.sh scratch1 --seed 12
+
+# 또는 직접 실행
+comet-train \
+    --cfg configs/models/en-ko-qe/approach1_referenceless_scratch.yaml \
+    --seed_everything 12
+```
+
+### STEP 2: 접근법 2 - UnifiedMetric QE From Scratch
+
+```bash
+bash scripts/run_training.sh scratch2 --seed 12
+
+# 또는 직접 실행
+comet-train \
+    --cfg configs/models/en-ko-qe/approach2_unified_qe_scratch.yaml \
+    --seed_everything 12
+```
+
+### STEP 3: 접근법 3 - COMETKiwi Fine-tuning (추천)
+
+이 접근법이 가장 높은 성능을 낼 가능성이 높습니다.
+
+```bash
+# 3-1. COMETKiwi 체크포인트 다운로드
+python scripts/download_checkpoint.py --model Unbabel/wmt22-cometkiwi-da
+
+# 다운로드된 체크포인트 경로가 출력됩니다. 예:
+#   Checkpoint file: /root/.cache/huggingface/hub/.../checkpoints/model.ckpt
+
+# 3-2. 체크포인트 경로 확인
+# (출력된 경로를 CHECKPOINT_PATH에 저장)
+CHECKPOINT_PATH="위에서_출력된_경로"
+
+# 3-3. Fine-tuning 실행
+bash scripts/run_training.sh finetune --checkpoint $CHECKPOINT_PATH
+
+# 또는 직접 실행
+comet-train \
+    --cfg configs/models/en-ko-qe/approach3_finetune_cometkiwi.yaml \
+    --load_from_checkpoint $CHECKPOINT_PATH \
+    --seed_everything 12
+```
+
+### STEP 4: 접근법 4 - ReferencelessRegression Fine-tuning
+
+```bash
+# 4-1. 기존 QE 체크포인트 다운로드
+python scripts/download_checkpoint.py --model wmt21-comet-qe-da --legacy
+
+# 4-2. Fine-tuning 실행
+CHECKPOINT_PATH="다운로드된_체크포인트_경로"
+bash scripts/run_training.sh ft-qe --checkpoint $CHECKPOINT_PATH
+```
+
+### STEP 5: 접근법 5 - COMETKiwi FTHead (Head만 학습, 가장 안전)
+
+인코더를 완전히 동결하고 estimator head와 layerwise_attention만 학습합니다.
+MTQE 논문에서 검증된 안전한 방법으로, 가장 먼저 시도하기를 추천합니다.
+
+```bash
+# 5-1. COMETKiwi 체크포인트 다운로드 (STEP 3에서 이미 했다면 생략)
+python scripts/download_checkpoint.py --model Unbabel/wmt22-cometkiwi-da
+CHECKPOINT_PATH="다운로드된_경로"
+
+# 5-2. FTHead Fine-tuning 실행
+python scripts/finetune_lora.py \
+    --base_model $CHECKPOINT_PATH \
+    --train_data data/en-ko-qe/train.csv \
+    --val_data data/en-ko-qe/val.csv \
+    --output_dir outputs/cometkiwi-fthead-en-ko \
+    --mode fthead \
+    --learning_rate 1e-4 \
+    --batch_size 16 \
+    --epochs 5
+
+# 대용량 데이터 시 샘플링하여 빠른 실험
+python scripts/finetune_lora.py \
+    --base_model $CHECKPOINT_PATH \
+    --train_data data/en-ko-qe/train.csv \
+    --val_data data/en-ko-qe/val.csv \
+    --output_dir outputs/cometkiwi-fthead-en-ko-1M \
+    --mode fthead \
+    --max_train_rows 1000000 \
+    --epochs 3
+```
+
+#### 백그라운드 실행 (SSH 연결 끊겨도 학습 유지)
+
+```bash
+# 백그라운드 실행
+nohup env python scripts/finetune_lora.py \
+    --base_model $CHECKPOINT_PATH \
+    --train_data data/en-ko-qe/train.csv \
+    --val_data data/en-ko-qe/val.csv \
+    --output_dir outputs/cometkiwi-fthead-en-ko \
+    --mode fthead \
+    --learning_rate 1e-4 \
+    --batch_size 16 \
+    --epochs 5 > train.nohup.out 2>&1 &
+
+# 실시간 로그 확인
+tail -f train.nohup.out
+
+# 강제 종료가 필요한 경우
+ps -ef | grep finetune_lora.py
+kill -TERM <PID>
+```
+
+### STEP 6: 접근법 6 - COMETKiwi LoRA (Parameter-Efficient)
+
+LoRA 어댑터를 인코더에 적용하여 ~1%의 파라미터만 학습합니다.
+Full fine-tuning의 과적합 위험 없이 인코더도 일부 적응시킬 수 있습니다.
+
+```bash
+# 추가 설치 필요
+pip install peft>=0.6.0
+
+# 6-1. LoRA Fine-tuning
+python scripts/finetune_lora.py \
+    --base_model $CHECKPOINT_PATH \
+    --train_data data/en-ko-qe/train.csv \
+    --val_data data/en-ko-qe/val.csv \
+    --output_dir outputs/cometkiwi-lora-en-ko \
+    --mode lora \
+    --lora_rank 16 \
+    --lora_alpha 32 \
+    --learning_rate 1e-4 \
+    --batch_size 16 \
+    --epochs 3
+
+# 6-2. BitFit (bias만 학습, 최소한의 변경)
+python scripts/finetune_lora.py \
+    --base_model $CHECKPOINT_PATH \
+    --train_data data/en-ko-qe/train.csv \
+    --val_data data/en-ko-qe/val.csv \
+    --output_dir outputs/cometkiwi-bitfit-en-ko \
+    --mode bitfit \
+    --learning_rate 1e-4 \
+    --epochs 5
+```
+
+#### 백그라운드 실행 (SSH 연결 끊겨도 학습 유지)
+
+```bash
+# 백그라운드 실행 (LoRA 예시)
+nohup env python scripts/finetune_lora.py \
+    --base_model $CHECKPOINT_PATH \
+    --train_data data/en-ko-qe/train.csv \
+    --val_data data/en-ko-qe/val.csv \
+    --output_dir outputs/cometkiwi-lora-en-ko \
+    --mode lora \
+    --lora_rank 16 \
+    --lora_alpha 32 \
+    --learning_rate 1e-4 \
+    --batch_size 16 \
+    --epochs 3 > train.nohup.out 2>&1 &
+
+# 실시간 로그 확인
+tail -f train.nohup.out
+
+# 강제 종료가 필요한 경우
+ps -ef | grep finetune_lora.py
+kill -TERM <PID>
+```
+
+**LoRA 하이퍼파라미터 가이드:**
+| 파라미터 | 설명 | 권장값 |
+|---------|------|--------|
+| `lora_rank` | 저랭크 차원 (클수록 표현력 증가) | 8, 16, 32 |
+| `lora_alpha` | 스케일링 팩터 (보통 rank의 2배) | 16, 32, 64 |
+| `learning_rate` | PEFT는 더 높은 LR 사용 가능 | 1e-4 ~ 5e-4 |
+
+### 학습 중 모니터링
+
+학습이 시작되면 TensorBoard로 실시간 모니터링이 가능합니다:
+
+```bash
+# TensorBoard 실행 (별도 터미널)
+tensorboard --logdir lightning_logs/
+
+# 웹 브라우저에서 http://localhost:6006 접속
+```
+
+#### TensorBoard 백그라운드 실행
+
+여러 프로젝트의 로그를 동시에 모니터링하려면 `--logdir_spec`으로 이름을 지정하여 백그라운드로 실행합니다:
+
+```bash
+# 백그라운드 실행 (xCOMET-lite + COMET 동시 모니터링)
+nohup env tensorboard \
+    --logdir_spec=xcomet-lite:/home/wengine/Python_workspace/comet_quantization/xCOMET-lite/runs,comet:/home/wengine/Python_workspace/COMET/outputs \
+    --bind_all > /dev/null 2>&1 &
+
+# 웹 브라우저에서 http://<서버IP>:6006 접속
+```
+
+#### TensorBoard 강제 종료
+
+```bash
+# 방법 1: kill 명령
+kill -TERM $(pgrep -f tensorboard)
+
+# 방법 2: pkill 명령
+pkill tensorboard
+```
+
+주요 모니터링 지표:
+- `train_loss`: 학습 손실 (감소해야 함)
+- `val_kendall`: Kendall τ 상관계수 (증가해야 함, 핵심 지표)
+- `val_pearson`: Pearson 상관계수 (증가해야 함)
+- `val_spearman`: Spearman 상관계수 (증가해야 함)
+
+### 체크포인트 위치
+
+학습이 완료되면 체크포인트는 다음 경로에 저장됩니다:
+```
+lightning_logs/
+└── version_X/
+    ├── checkpoints/
+    │   ├── epoch=0-step=XXXX-val_kendall=0.XXXX.ckpt
+    │   ├── epoch=1-step=XXXX-val_kendall=0.XXXX.ckpt
+    │   └── epoch=2-step=XXXX-val_kendall=0.XXXX.ckpt
+    ├── hparams.yaml
+    └── events.out.tfevents.*
+```
+
+`val_kendall` 값이 가장 높은 체크포인트가 최적 모델입니다.
+
+---
+
+## 7. 모델 평가
+
+### 7.1 검증 데이터에서 평가
+
+```bash
+# ReferencelessRegression 모델 평가
+python scripts/evaluate_model.py \
+    --checkpoint lightning_logs/version_X/checkpoints/best.ckpt \
+    --test_data data/en-ko-qe/val.csv \
+    --model_type referenceless
+
+# UnifiedMetric 모델 평가
+python scripts/evaluate_model.py \
+    --checkpoint lightning_logs/version_X/checkpoints/best.ckpt \
+    --test_data data/en-ko-qe/val.csv \
+    --model_type unified
+```
+
+출력 예시:
+```
+============================================================
+[RESULTS] Evaluation Metrics
+============================================================
+  Pearson r:     0.8234 (p=1.23e-45)
+  Spearman rho:  0.7891 (p=2.34e-40)
+  Kendall tau:   0.6123 (p=3.45e-35)
+  MSE:           0.012345
+  MAE:           0.089012
+============================================================
+```
+
+### 7.2 개별 문장 평가
+
+```bash
+python scripts/evaluate_model.py \
+    --checkpoint lightning_logs/version_X/checkpoints/best.ckpt \
+    --model_type referenceless \
+    --src "activate a scanning directed acyclic graph to inspect the load-ready data" \
+    --mt "로드 준비 데이터를 검사하기 위해 스캐닝 지시 비순환 그래프를 활성화하는 것"
+```
+
+### 7.3 comet-score CLI로 평가
+
+```bash
+# 텍스트 파일로 평가 (줄 단위)
+comet-score \
+    -s source_sentences.txt \
+    -t mt_sentences.txt \
+    --model lightning_logs/version_X/checkpoints/best.ckpt
+```
+
+### 7.4 Python API로 사용
+
+```python
+from comet import load_from_checkpoint
+
+# 모델 로드
+model = load_from_checkpoint("lightning_logs/version_X/checkpoints/best.ckpt")
+
+# 예측
+data = [
+    {
+        "src": "The method according to claim 13",
+        "mt": "청구항 13에 따른 방법은"
+    },
+    {
+        "src": "A device for controlling fluid flow",
+        "mt": "유체 흐름을 제어하기 위한 장치"
+    }
+]
+
+output = model.predict(data, batch_size=8, gpus=1)
+print(output.scores)       # [0.82, 0.91]
+print(output.system_score) # 0.865 (평균)
+```
+
+---
+
+## 8. 하이퍼파라미터 튜닝
+
+### 8.1 주요 하이퍼파라미터
+
+| 파라미터 | 설명 | 기본값 | 튜닝 범위 |
+|---------|------|--------|-----------|
+| `encoder_learning_rate` | 인코더 학습률 | 1e-6 | 1e-7 ~ 5e-6 |
+| `learning_rate` | Head 학습률 | 1.5e-5 | 1e-5 ~ 5e-5 |
+| `batch_size` | 배치 크기 | 16 | 8, 16, 32 |
+| `accumulate_grad_batches` | 그래디언트 누적 | 4 | 2, 4, 8, 16 |
+| `nr_frozen_epochs` | 인코더 동결 기간 | 0.3 | 0.1 ~ 0.9 |
+| `layerwise_decay` | 레이어별 학습률 감쇠 | 0.95 | 0.9 ~ 1.0 |
+| `dropout` | 드롭아웃 비율 | 0.1 | 0.05 ~ 0.3 |
+| `hidden_sizes` | Head 은닉층 | [2048, 1024] | 다양 |
+| `max_epochs` | 최대 에폭 수 | 5 | 3 ~ 10 |
+| `warmup_steps` | 워밍업 스텝 | 0 | 0 ~ 500 |
+
+### 8.2 튜닝 우선순위
+
+1. **유효 배치 크기** (`batch_size * accumulate_grad_batches * devices`)
+   - 64~256이 일반적으로 좋음
+   - 너무 크면 일반화 성능 저하
+
+2. **학습률 비율** (`learning_rate / encoder_learning_rate`)
+   - 보통 10~15x 차이
+   - Fine-tuning 시 둘 다 절반으로 줄이기
+
+3. **인코더 동결 기간** (`nr_frozen_epochs`)
+   - From scratch: 0.3 (30% 후 unfreeze)
+   - Fine-tuning: 0.5~0.9 (더 오래 동결)
+
+4. **드롭아웃** - 과적합 징후가 보이면 0.15~0.2로 증가
+
+### 8.3 대용량 데이터 학습 팁
+
+데이터가 ~960만 행으로 매우 크므로:
+
+```yaml
+# 1. 그래디언트 누적으로 유효 배치 크기 증가
+accumulate_grad_batches: 8    # 16 * 8 = 128 유효 배치
+
+# 2. 에폭을 줄이고 데이터를 많이 봄
+max_epochs: 2                 # 960만 * 2 = ~1920만 스텝
+
+# 3. 멀티 GPU 활용
+devices: 2
+strategy: ddp                 # 분산 학습
+
+# 4. Mixed Precision (VRAM 절약 + 속도 향상)
+# trainer.yaml에서:
+precision: 16                 # FP16 학습
+```
+
+### 8.4 YAML 설정 수정 방법
+
+설정 파일을 직접 수정하거나, 커맨드라인에서 오버라이드할 수 있습니다:
+
+```bash
+# YAML 파일 수정 없이 파라미터 오버라이드
+comet-train \
+    --cfg configs/models/en-ko-qe/approach1_referenceless_scratch.yaml \
+    --seed_everything 42 \
+    --referenceless_regression_metric.init_args.batch_size 32 \
+    --referenceless_regression_metric.init_args.learning_rate 2e-5
+```
+
+---
+
+## 9. 문제 해결 (Troubleshooting)
+
+### 9.1 CUDA Out of Memory
+
+```
+RuntimeError: CUDA out of memory
+```
+
+**해결 방법:**
+1. `batch_size`를 줄이기 (16 → 8 → 4)
+2. `accumulate_grad_batches`를 늘려서 유효 배치 크기 유지
+3. `keep_embeddings_frozen: True` 확인
+4. `precision: 16` 추가 (Mixed Precision)
+5. `max_length`가 큰 문장이 있으면 데이터 전처리에서 잘라내기
+
+```yaml
+# 메모리 절약 설정 예시
+batch_size: 4
+accumulate_grad_batches: 16   # 유효 배치 = 64
+keep_embeddings_frozen: True
+# trainer에 추가:
+precision: 16
+```
+
+### 9.2 학습이 수렴하지 않음 (Loss가 감소하지 않음)
+
+**원인과 해결:**
+1. **학습률이 너무 크거나 작음** → `learning_rate`를 조정
+2. **인코더가 너무 오래 동결** → `nr_frozen_epochs` 줄이기
+3. **데이터 문제** → `score` 분포 확인, 이상치 제거
+4. **배치 크기 문제** → 유효 배치 크기를 64~256으로 조정
+
+### 9.3 val_kendall이 개선되지 않음
+
+1. **과적합**: `dropout` 증가, `max_epochs` 줄이기
+2. **학습 데이터 부족**: pairwise 데이터 포함 (`--include_pairwise`)
+3. **인코더 문제**: `layerwise_decay`를 0.9로 낮추기
+
+### 9.4 데이터 로딩 시 메모리 부족
+
+데이터가 ~960만 행이면 RAM에서 로딩 시 문제가 될 수 있습니다.
+
+```bash
+# 데이터를 줄여서 학습
+python scripts/prepare_data.py \
+    --input_dir /path/to/train_data \
+    --output_dir data/en-ko-qe \
+    --max_train_rows 2000000    # 200만 행으로 제한
+```
+
+### 9.5 체크포인트 로드 실패
+
+```
+RuntimeError: Error(s) in loading state_dict
+```
+
+**해결:**
+- `--strict_load` 옵션을 빼고 실행 (strict=False가 기본)
+- 아키텍처가 체크포인트와 일치하는지 확인
+  - COMETKiwi → UnifiedMetric (O)
+  - COMETKiwi → ReferencelessRegression (X, 호환 불가)
+
+### 9.6 Multi-GPU 학습 문제
+
+```yaml
+# DDP 설정
+trainer:
+  init_args:
+    accelerator: gpu
+    devices: 2                  # GPU 수
+    strategy: ddp               # Distributed Data Parallel
+    use_distributed_sampler: true
+```
+
+```bash
+# 환경 변수 설정
+export CUDA_VISIBLE_DEVICES=0,1
+comet-train --cfg your_config.yaml
+```
+
+---
+
+## 10. 참고 자료
+
+### 10.1 공식 문서
+
+- **COMET 공식 문서**: https://unbabel.github.io/COMET/html/index.html
+- **COMET 학습 가이드**: https://unbabel.github.io/COMET/html/training.html
+- **COMET 모델 카탈로그**: https://unbabel.github.io/COMET/html/models.html
+- **COMET GitHub**: https://github.com/Unbabel/COMET
+- **COMET 모델 목록 (MODELS.md)**: https://github.com/Unbabel/COMET/blob/master/MODELS.md
+
+### 10.2 논문
+
+- **COMET (원본)**: Rei et al., "COMET: A Neural Framework for MT Evaluation" (EMNLP 2020)
+  - https://aclanthology.org/2020.emnlp-main.213/
+- **COMETKiwi**: Rei et al., "COMETKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task" (WMT 2022)
+  - https://aclanthology.org/2022.wmt-1.60/
+- **COMET-22**: Rei et al., "COMET-22: Unbabel-IST 2022 Submission for the Metrics Shared Task" (WMT 2022)
+  - https://aclanthology.org/2022.wmt-1.52/
+- **xCOMET**: Guerreiro et al., "xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection" (2023)
+  - https://arxiv.org/abs/2310.10482
+- **UniTE**: Wan et al., "UniTE: Unified Translation Evaluation" (ACL 2022)
+  - https://arxiv.org/abs/2204.13346
+- **MTQE.en-he** (COMETKiwi LoRA/BitFit Fine-tuning): "Machine Translation Quality Estimation for English-Hebrew" (2026)
+  - https://arxiv.org/abs/2602.06546
+  - Full fine-tuning의 위험성과 PEFT 방법의 우수성을 실증
+- **AfriCOMET** (아프리카 언어 COMET): "AfriCOMET: COMET for African Languages" (2023)
+  - https://arxiv.org/abs/2311.09828
+- **Pitfalls in Using COMET**: Zouhar et al., "Pitfalls and Outlooks in Using COMET" (WMT 2024)
+  - https://arxiv.org/abs/2408.15366
+  - Fine-tuned 메트릭의 도메인 편향과 score 분포 문제 분석
+- **COMET for Low-Resource MT Evaluation** (LREC 2024)
+  - https://aclanthology.org/2024.lrec-main.315/
+- **Cometoid** (지식 증류 기반 QE): "Cometoid: Knowledge Distillation from COMET" (WMT 2023)
+  - https://aclanthology.org/2023.wmt-1.62/
+- **BitFit**: Ben Zaken et al., "BitFit: Simple Parameter-efficient Fine-tuning" (ACL 2022)
+  - https://aclanthology.org/2022.acl-short.1/
+- **LoRA**: Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models" (ICLR 2022)
+  - https://arxiv.org/abs/2106.09685
+
+### 10.3 블로그 및 한국어 자료
+
+- **COMET 신경망 기반 번역 품질 평가 지표 (한국어)**: https://velog.io/@judy_choi/NMT-COMET-%EC%8B%A0%EA%B2%BD%EB%A7%9D-%EA%B8%B0%EB%B0%98-%EB%B2%88%EC%97%AD-%ED%92%88%EC%A7%88-%ED%8F%89%EA%B0%80-%EC%A7%80%ED%91%9C
+
+### 10.4 HuggingFace 모델 및 Fine-tuning 사례
+
+- **wmt22-cometkiwi-da**: https://huggingface.co/Unbabel/wmt22-cometkiwi-da
+- **wmt22-comet-da**: https://huggingface.co/Unbabel/wmt22-comet-da
+- **XCOMET-XL**: https://huggingface.co/Unbabel/XCOMET-XL
+- **wmt23-cometkiwi-da-xl**: https://huggingface.co/Unbabel/wmt23-cometkiwi-da-xl
+- **ComeTH (EN-TH fine-tuned)**: https://huggingface.co/wasanx/ComeTH
+  - COMETKiwi fine-tuning 성공 사례 (Spearman +4.9%)
+- **ComeTH 데이터셋**: https://huggingface.co/datasets/wasanx/cometh_finetune
+- **ComeTH 모델 컬렉션**: https://huggingface.co/collections/wasanx/cometh-model-682c410d3ba4cfbca8a07c9d
+
+### 10.5 주요 소스 코드 경로
+
+| 파일 | 설명 |
+|------|------|
+| `comet/cli/train.py` | 학습 CLI 진입점 |
+| `comet/models/base.py` | 모든 모델의 기반 클래스 |
+| `comet/models/regression/referenceless.py` | ReferencelessRegression 구현 |
+| `comet/models/multitask/unified_metric.py` | UnifiedMetric (COMETKiwi) 구현 |
+| `comet/encoders/xlmr.py` | XLM-RoBERTa 인코더 |
+| `comet/modules/feedforward.py` | Feed-Forward Head |
+| `comet/modules/layerwise_attention.py` | 레이어별 어텐션 |
+| `configs/models/en-ko-qe/` | 본 가이드의 학습 설정 파일 |
+| `scripts/prepare_data.py` | 데이터 전처리 스크립트 |
+| `scripts/evaluate_model.py` | 모델 평가 스크립트 |
+| `scripts/run_training.sh` | 학습 실행 스크립트 |
+| `scripts/finetune_lora.py` | LoRA/BitFit/FTHead Fine-tuning 스크립트 |
+| `scripts/download_checkpoint.py` | 체크포인트 다운로드 |
+
+---
+
+## 빠른 시작 요약 (Quick Start)
+
+전체 과정을 한눈에 보려면:
+
+```bash
+# 1. 환경 설정
+cd /path/to/COMET
+pip install unbabel-comet scipy peft
+
+# 2. 데이터 준비
+python scripts/prepare_data.py \
+    --input_dir /path/to/train_data \
+    --output_dir data/en-ko-qe \
+    --include_pairwise
+
+# 3. 파이프라인 테스트
+comet-train --cfg configs/models/en-ko-qe/approach_mini_test.yaml
+
+# 4. COMETKiwi 체크포인트 다운로드
+python scripts/download_checkpoint.py --model Unbabel/wmt22-cometkiwi-da
+CKPT="체크포인트_경로"
+
+# 5. [안전한 방법] FTHead Fine-tuning (Head만, 가장 먼저 시도!)
+python scripts/finetune_lora.py \
+    --base_model $CKPT \
+    --train_data data/en-ko-qe/train.csv \
+    --val_data data/en-ko-qe/val.csv \
+    --output_dir outputs/cometkiwi-fthead \
+    --mode fthead --epochs 5
+
+# 6. [대용량 데이터 활용] Full Fine-tuning (ComeTH와 동일 전략)
+comet-train \
+    --cfg configs/models/en-ko-qe/approach3_finetune_cometkiwi.yaml \
+    --load_from_checkpoint $CKPT \
+    --seed_everything 12
+
+# 7. [과적합 방지] LoRA Fine-tuning (과적합 시 대안)
+python scripts/finetune_lora.py \
+    --base_model $CKPT \
+    --train_data data/en-ko-qe/train.csv \
+    --val_data data/en-ko-qe/val.csv \
+    --output_dir outputs/cometkiwi-lora \
+    --mode lora --lora_rank 16 --epochs 3
+
+# 8. 평가 (각 모델 비교)
+python scripts/evaluate_model.py \
+    --checkpoint outputs/cometkiwi-fthead/best_model.ckpt \
+    --test_data data/en-ko-qe/val.csv \
+    --model_type unified
+
+# 9. 실제 사용
+python -c "
+from comet import load_from_checkpoint
+model = load_from_checkpoint('outputs/cometkiwi-fthead/best_model.ckpt')
+data = [{'src': 'The method of claim 1', 'mt': '청구항 1의 방법'}]
+print(model.predict(data, batch_size=1, gpus=1).scores)
+"
+```
diff --git a/configs/models/en-ko-qe/approach1_referenceless_scratch.yaml b/configs/models/en-ko-qe/approach1_referenceless_scratch.yaml
new file mode 100644
index 0000000..78004b7
--- /dev/null
+++ b/configs/models/en-ko-qe/approach1_referenceless_scratch.yaml
@@ -0,0 +1,100 @@
+# ============================================================
+# 접근법 1: ReferencelessRegression - 처음부터 학습 (From Scratch)
+# ============================================================
+# XLM-RoBERTa-large 인코더로 reference-free QE 모델을 처음부터 학습합니다.
+# 인코더 위에 feed-forward 회귀 head를 새로 학습합니다.
+#
+# 아키텍처:
+#   - 인코더: XLM-RoBERTa-large (frozen -> unfreeze)
+#   - 입력: src + mt (각각 별도 인코딩)
+#   - 특징: [mt_emb, src_emb, mt*src, |mt-src|] (4 * 1024 = 4096 dim)
+#   - Head: 4096 -> 2048 -> 1024 -> 1
+#
+# 필요 GPU: 1-2x A100 (80GB) 또는 2-4x V100 (32GB)
+# 예상 VRAM: ~20-30GB (batch_size=16, fp32)
+#
+# 실행:
+#   comet-train --cfg configs/models/en-ko-qe/approach1_referenceless_scratch.yaml
+# ============================================================
+
+referenceless_regression_metric:
+  class_path: comet.models.ReferencelessRegression
+  init_args:
+    # --- 인코더 설정 ---
+    encoder_model: XLM-RoBERTa
+    pretrained_model: xlm-roberta-large       # 560M params
+
+    # --- 프리징 전략 ---
+    nr_frozen_epochs: 0.3                      # 첫 에폭의 30%는 인코더 동결
+    keep_embeddings_frozen: True               # 임베딩 레이어는 항상 동결 (메모리 절약)
+
+    # --- 옵티마이저 ---
+    optimizer: AdamW
+    encoder_learning_rate: 1.0e-06             # 인코더 학습률 (매우 작게)
+    learning_rate: 1.5e-05                     # Head 학습률
+    layerwise_decay: 0.95                      # 하위 레이어일수록 학습률 감소
+    warmup_steps: 0
+
+    # --- 레이어 설정 ---
+    pool: avg                                  # 평균 풀링
+    layer: mix                                 # 모든 레이어 가중합
+    layer_transformation: sparsemax            # 희소 어텐션 (일부 레이어만 활성)
+    layer_norm: False
+
+    # --- 손실 함수 ---
+    loss: mse                                  # Mean Squared Error
+
+    # --- 회귀 Head ---
+    hidden_sizes:
+      - 2048
+      - 1024
+    activations: Tanh
+    dropout: 0.1
+
+    # --- 데이터 ---
+    batch_size: 16
+    train_data:
+      - data/en-ko-qe/train.csv
+    validation_data:
+      - data/en-ko-qe/val.csv
+
+# --- 트레이너 설정 ---
+trainer:
+  class_path: pytorch_lightning.trainer.trainer.Trainer
+  init_args:
+    accelerator: gpu
+    devices: 1                                 # GPU 수 (환경에 맞게 조정)
+    # strategy: ddp                            # multi-GPU시 주석 해제
+    accumulate_grad_batches: 4                 # 유효 배치: 16 * 4 = 64
+    max_epochs: 5
+    min_epochs: 1
+    gradient_clip_val: 1.0
+    gradient_clip_algorithm: norm
+    check_val_every_n_epoch: 1
+    log_every_n_steps: 100
+    enable_progress_bar: true
+    enable_model_summary: true
+    num_sanity_val_steps: 3
+    deterministic: false
+
+# --- 얼리 스토핑 ---
+early_stopping:
+  class_path: pytorch_lightning.callbacks.early_stopping.EarlyStopping
+  init_args:
+    monitor: val_kendall                       # Kendall tau 상관계수 모니터링
+    min_delta: 0.0
+    patience: 2                                # 2 에폭 동안 개선 없으면 중단
+    mode: max
+    verbose: False
+
+# --- 모델 체크포인트 ---
+model_checkpoint:
+  class_path: pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint
+  init_args:
+    filename: '{epoch}-{step}-{val_kendall:.4f}'
+    monitor: val_kendall
+    save_top_k: 3
+    mode: max
+    save_weights_only: True
+    every_n_epochs: 1
+    verbose: True
diff --git a/configs/models/en-ko-qe/approach2_unified_qe_scratch.yaml b/configs/models/en-ko-qe/approach2_unified_qe_scratch.yaml
new file mode 100644
index 0000000..5e7c0a0
--- /dev/null
+++ b/configs/models/en-ko-qe/approach2_unified_qe_scratch.yaml
@@ -0,0 +1,109 @@
+# ============================================================
+# 접근법 2: UnifiedMetric QE 모드 - 처음부터 학습 (From Scratch)
+# ============================================================
+# COMETKiwi와 동일한 UnifiedMetric 아키텍처를 사용하되
+# input_segments: [mt, src]로 설정하여 reference-free 모드로 학습합니다.
+#
+# 아키텍처:
+#   - 인코더: InfoXLM-large (XLM-R 계열, COMETKiwi에서 사용)
+#   - 입력: [mt SEP src] (하나의 시퀀스로 연결)
+#   - 특징: CLS 토큰 임베딩 (1024 dim)
+#   - Head: 1024 -> 3072 -> 1024 -> 1
+#
+# ReferencelessRegression과의 차이:
+#   - src/mt를 하나의 시퀀스로 연결 (cross-attention 효과)
+#   - CLS 토큰 사용 (평균 풀링 대신)
+#   - COMETKiwi와 동일 구조 (fine-tuning으로 전환 용이)
+#
+# 필요 GPU: 1-2x A100 (80GB) 또는 2-4x V100 (32GB)
+#
+# 실행:
+#   comet-train --cfg configs/models/en-ko-qe/approach2_unified_qe_scratch.yaml
+# ============================================================
+
+unified_metric:
+  class_path: comet.models.UnifiedMetric
+  init_args:
+    # --- 인코더 설정 ---
+    encoder_model: XLM-RoBERTa
+    pretrained_model: microsoft/infoxlm-large  # COMETKiwi와 동일 인코더
+
+    # --- 프리징 전략 ---
+    nr_frozen_epochs: 0.3
+    keep_embeddings_frozen: True
+
+    # --- 옵티마이저 ---
+    optimizer: AdamW
+    encoder_learning_rate: 1.0e-06
+    learning_rate: 1.5e-05
+    layerwise_decay: 0.95
+    warmup_steps: 0
+
+    # --- 레이어 설정 ---
+    sent_layer: mix                            # 문장 레벨: 모든 레이어 가중합
+    layer_transformation: sparsemax
+    layer_norm: True
+    word_layer: 24                             # 단어 레벨 (미사용이지만 설정 필요)
+
+    # --- 손실 함수 ---
+    loss: mse
+
+    # --- 회귀 Head ---
+    hidden_sizes:
+      - 3072
+      - 1024
+    activations: Tanh
+    dropout: 0.1
+
+    # --- QE 설정 (핵심!) ---
+    input_segments:                            # reference-free: mt + src만 사용
+      - mt
+      - src
+    word_level_training: False                 # 단어 레벨 학습 비활성
+
+    # --- 데이터 ---
+    batch_size: 16
+    train_data:
+      - data/en-ko-qe/train.csv
+    validation_data:
+      - data/en-ko-qe/val.csv
+
+# --- 트레이너 설정 ---
+trainer:
+  class_path: pytorch_lightning.trainer.trainer.Trainer
+  init_args:
+    accelerator: gpu
+    devices: 1
+    accumulate_grad_batches: 4
+    max_epochs: 5
+    min_epochs: 1
+    gradient_clip_val: 1.0
+    gradient_clip_algorithm: norm
+    check_val_every_n_epoch: 1
+    log_every_n_steps: 100
+    enable_progress_bar: true
+    enable_model_summary: true
+    num_sanity_val_steps: 3
+    deterministic: false
+
+# --- 얼리 스토핑 ---
+early_stopping:
+  class_path: pytorch_lightning.callbacks.early_stopping.EarlyStopping
+  init_args:
+    monitor: val_kendall
+    min_delta: 0.0
+    patience: 2
+    mode: max
+    verbose: False
+
+# --- 모델 체크포인트 ---
+model_checkpoint:
+  class_path: pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint
+  init_args:
+    filename: '{epoch}-{step}-{val_kendall:.4f}'
+    monitor: val_kendall
+    save_top_k: 3
+    mode: max
+    save_weights_only: True
+    every_n_epochs: 1
+    verbose: True
diff --git a/configs/models/en-ko-qe/approach3_finetune_cometkiwi.yaml b/configs/models/en-ko-qe/approach3_finetune_cometkiwi.yaml
new file mode 100644
index 0000000..dbb7e58
--- /dev/null
+++ b/configs/models/en-ko-qe/approach3_finetune_cometkiwi.yaml
@@ -0,0 +1,114 @@
+# ============================================================
+# 접근법 3: COMETKiwi 기반 Fine-tuning (추천)
+# ============================================================
+# 사전학습된 COMETKiwi (wmt22-cometkiwi-da) 체크포인트를 로드하고
+# 한국어-영어 특허 도메인 데이터로 추가 학습합니다.
+#
+# 이 접근법의 장점:
+#   - 이미 QE에 최적화된 가중치에서 시작
+#   - WMT 학습 데이터의 다국어 지식 활용
+#   - 적은 에폭으로도 도메인 적응 가능
+#   - 가장 높은 성능 기대
+#
+# 주의사항:
+#   - 체크포인트 경로를 실제 경로로 수정 필요
+#   - nr_frozen_epochs를 높여서 catastrophic forgetting 방지
+#   - learning_rate를 낮게 설정
+#
+# 필요 GPU: 1-2x A100 (80GB) 또는 2-4x V100 (32GB)
+#
+# 실행 (2단계):
+#   1. 먼저 체크포인트 다운로드:
+#      python scripts/download_checkpoint.py --model Unbabel/wmt22-cometkiwi-da
+#
+#   2. 학습 실행:
+#      comet-train --cfg configs/models/en-ko-qe/approach3_finetune_cometkiwi.yaml \
+#          --load_from_checkpoint checkpoints/wmt22-cometkiwi-da/checkpoints/model.ckpt
+# ============================================================
+
+unified_metric:
+  class_path: comet.models.UnifiedMetric
+  init_args:
+    # --- 인코더 설정 (COMETKiwi 원본과 동일) ---
+    encoder_model: XLM-RoBERTa
+    pretrained_model: microsoft/infoxlm-large
+
+    # --- 프리징 전략 (Fine-tuning용으로 조정) ---
+    nr_frozen_epochs: 0.5                      # 더 오래 동결 (forgetting 방지)
+    keep_embeddings_frozen: True
+
+    # --- 옵티마이저 (Fine-tuning용으로 학습률 감소) ---
+    optimizer: AdamW
+    encoder_learning_rate: 5.0e-07             # 인코더: 원본의 절반
+    learning_rate: 1.0e-05                     # Head: 원본보다 낮게
+    layerwise_decay: 0.95
+    warmup_steps: 100                          # 워밍업 추가 (안정성)
+
+    # --- 레이어 설정 ---
+    sent_layer: mix
+    layer_transformation: sparsemax
+    layer_norm: True
+    word_layer: 24
+
+    # --- 손실 함수 ---
+    loss: mse
+
+    # --- 회귀 Head ---
+    hidden_sizes:
+      - 3072
+      - 1024
+    activations: Tanh
+    dropout: 0.1
+
+    # --- QE 설정 ---
+    input_segments:
+      - mt
+      - src
+    word_level_training: False
+
+    # --- 데이터 ---
+    batch_size: 16
+    train_data:
+      - data/en-ko-qe/train.csv
+    validation_data:
+      - data/en-ko-qe/val.csv
+
+# --- 트레이너 설정 (Fine-tuning) ---
+trainer:
+  class_path: pytorch_lightning.trainer.trainer.Trainer
+  init_args:
+    accelerator: gpu
+    devices: 1
+    accumulate_grad_batches: 4
+    max_epochs: 3                              # Fine-tuning이므로 적은 에폭
+    min_epochs: 1
+    gradient_clip_val: 1.0
+    gradient_clip_algorithm: norm
+    check_val_every_n_epoch: 1
+    log_every_n_steps: 100
+    enable_progress_bar: true
+    enable_model_summary: true
+    num_sanity_val_steps: 3
+    deterministic: false
+
+# --- 얼리 스토핑 ---
+early_stopping:
+  class_path: pytorch_lightning.callbacks.early_stopping.EarlyStopping
+  init_args:
+    monitor: val_kendall
+    min_delta: 0.0
+    patience: 2
+    mode: max
+    verbose: False
+
+# --- 모델 체크포인트 ---
+model_checkpoint:
+  class_path: pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint
+  init_args:
+    filename: 'cometkiwi-ft-{epoch}-{step}-{val_kendall:.4f}'
+    monitor: val_kendall
+    save_top_k: 3
+    mode: max
+    save_weights_only: True
+    every_n_epochs: 1
+    verbose: True
diff --git a/configs/models/en-ko-qe/approach4_referenceless_finetune_qe.yaml b/configs/models/en-ko-qe/approach4_referenceless_finetune_qe.yaml
new file mode 100644
index 0000000..fe91636
--- /dev/null
+++ b/configs/models/en-ko-qe/approach4_referenceless_finetune_qe.yaml
@@ -0,0 +1,100 @@
+# ============================================================
+# 접근법 4: ReferencelessRegression - 기존 QE 모델 Fine-tuning
+# ============================================================
+# 사전학습된 wmt21-comet-qe-da 체크포인트를 로드하고
+# ReferencelessRegression 구조로 추가 학습합니다.
+#
+# 접근법 3(COMETKiwi)과의 차이:
+#   - 더 단순한 아키텍처 (src/mt 별도 인코딩)
+#   - 더 가벼운 모델
+#   - wmt20/21 QE 모델 계열 활용
+#
+# 실행:
+#   1. 체크포인트 다운로드 (wmt21-comet-qe-da):
+#      python scripts/download_checkpoint.py --model wmt21-comet-qe-da --legacy
+#
+#   2. 학습:
+#      comet-train --cfg configs/models/en-ko-qe/approach4_referenceless_finetune_qe.yaml \
+#          --load_from_checkpoint checkpoints/wmt21-comet-qe-da/checkpoints/model.ckpt
+# ============================================================
+
+referenceless_regression_metric:
+  class_path: comet.models.ReferencelessRegression
+  init_args:
+    # --- 인코더 설정 ---
+    encoder_model: XLM-RoBERTa
+    pretrained_model: xlm-roberta-large
+
+    # --- 프리징 전략 ---
+    nr_frozen_epochs: 0.5                      # Fine-tuning: 더 오래 동결
+    keep_embeddings_frozen: True
+
+    # --- 옵티마이저 ---
+    optimizer: AdamW
+    encoder_learning_rate: 5.0e-07
+    learning_rate: 1.0e-05
+    layerwise_decay: 0.95
+    warmup_steps: 100
+
+    # --- 레이어 설정 ---
+    pool: avg
+    layer: mix
+    layer_transformation: sparsemax
+    layer_norm: False
+
+    # --- 손실 함수 ---
+    loss: mse
+
+    # --- 회귀 Head ---
+    hidden_sizes:
+      - 2048
+      - 1024
+    activations: Tanh
+    dropout: 0.1
+
+    # --- 데이터 ---
+    batch_size: 16
+    train_data:
+      - data/en-ko-qe/train.csv
+    validation_data:
+      - data/en-ko-qe/val.csv
+
+# --- 트레이너 설정 ---
+trainer:
+  class_path: pytorch_lightning.trainer.trainer.Trainer
+  init_args:
+    accelerator: gpu
+    devices: 1
+    accumulate_grad_batches: 4
+    max_epochs: 3
+    min_epochs: 1
+    gradient_clip_val: 1.0
+    gradient_clip_algorithm: norm
+    check_val_every_n_epoch: 1
+    log_every_n_steps: 100
+    enable_progress_bar: true
+    enable_model_summary: true
+    num_sanity_val_steps: 3
+    deterministic: false
+
+# --- 얼리 스토핑 ---
+early_stopping:
+  class_path: pytorch_lightning.callbacks.early_stopping.EarlyStopping
+  init_args:
+    monitor: val_kendall
+    min_delta: 0.0
+    patience: 2
+    mode: max
+    verbose: False
+
+# --- 모델 체크포인트 ---
+model_checkpoint:
+  class_path: pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint
+  init_args:
+    filename: 'referenceless-ft-{epoch}-{step}-{val_kendall:.4f}'
+    monitor: val_kendall
+    save_top_k: 3
+    mode: max
+    save_weights_only: True
+    every_n_epochs: 1
+    verbose: True
diff --git a/configs/models/en-ko-qe/approach_mini_test.yaml b/configs/models/en-ko-qe/approach_mini_test.yaml
new file mode 100644
index 0000000..8f4c952
--- /dev/null
+++ b/configs/models/en-ko-qe/approach_mini_test.yaml
@@ -0,0 +1,73 @@
+# ============================================================
+# 미니 테스트: 파이프라인 동작 확인용 설정
+# ============================================================
+# 소규모 데이터와 빠른 설정으로 학습 파이프라인이 정상 동작하는지
+# 확인합니다. 실제 학습 전 반드시 이것부터 실행하세요.
+#
+# 실행:
+#   comet-train --cfg configs/models/en-ko-qe/approach_mini_test.yaml
+# ============================================================
+
+referenceless_regression_metric:
+  class_path: comet.models.ReferencelessRegression
+  init_args:
+    encoder_model: XLM-RoBERTa
+    pretrained_model: xlm-roberta-large
+    nr_frozen_epochs: 0.3
+    keep_embeddings_frozen: True
+    optimizer: AdamW
+    encoder_learning_rate: 1.0e-06
+    learning_rate: 1.5e-05
+    layerwise_decay: 0.95
+    pool: avg
+    layer: mix
+    layer_transformation: sparsemax
+    layer_norm: False
+    loss: mse
+    hidden_sizes:
+      - 2048
+      - 1024
+    activations: Tanh
+    dropout: 0.1
+    batch_size: 4                              # 작은 배치로 VRAM 절약
+    train_data:
+      - data/en-ko-qe/mini_train.csv
+    validation_data:
+      - data/en-ko-qe/mini_val.csv
+
+trainer:
+  class_path: pytorch_lightning.trainer.trainer.Trainer
+  init_args:
+    accelerator: gpu
+    devices: 1
+    accumulate_grad_batches: 1
+    max_epochs: 1                              # 1 에폭만
+    min_epochs: 1
+    gradient_clip_val: 1.0
+    gradient_clip_algorithm: norm
+    check_val_every_n_epoch: 1
+    log_every_n_steps: 10
+    enable_progress_bar: true
+    enable_model_summary: true
+    num_sanity_val_steps: 1
+    fast_dev_run: False
+    limit_train_batches: 20                    # 20 배치만 학습
+    limit_val_batches: 5                       # 5 배치만 검증
+    deterministic: false
+
+early_stopping:
+  class_path: pytorch_lightning.callbacks.early_stopping.EarlyStopping
+  init_args:
+    monitor: val_kendall
+    min_delta: 0.0
+    patience: 2
+    mode: max
+
+model_checkpoint:
+  class_path: pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint
+  init_args:
+    filename: 'mini-test-{epoch}-{val_kendall:.4f}'
+    monitor: val_kendall
+    save_top_k: 1
+    mode: max
+    save_weights_only: True
diff --git a/scripts/analyze_and_rebalance.py b/scripts/analyze_and_rebalance.py
new file mode 100644
index 0000000..d560f5e
--- /dev/null
+++ b/scripts/analyze_and_rebalance.py
@@ -0,0 +1,377 @@
+#!/usr/bin/env python3
+"""
+COMET 학습 데이터 점수 분포 분석 및 리밸런싱 스크립트
+====================================================
+
+점수 분포의 편향이 COMET 학습에 미치는 영향:
+  - 고점수 구간(0.7-1.0)에 데이터가 집중되면 모델이 해당 범위 주변으로만 예측
+  - 저점수 구간(0.0-0.3)이 부족하면 나쁜 번역을 식별하지 못함
+  - "score distribution collapse" 발생: 모델 출력이 평균 근처로 수렴
+
+해결 전략:
+  1. Stratified Sampling: 점수 구간별 균등 샘플링
+  2. Inverse-Frequency Weighting: 희소 구간 가중치 증가
+  3. Equal-Mass Binning: 동일 샘플 수 구간 분할
+
+참고 논문:
+  - "Pitfalls and Outlooks in Using COMET" (WMT 2024, arXiv:2408.15366)
+    → COMET은 학습 데이터 점수 분포에 매우 민감
+  - "Delving into Deep Imbalanced Regression" (ICML 2021)
+    → Label Distribution Smoothing (LDS) 기법 제안
+  - "COMET for Low-Resource MT Evaluation" (LREC 2024)
+    → "COMET is highly susceptible to the distribution of scores"
+  - Kim et al. (ACM TALLIP 2020)
+    → 불균형 QE 학습 데이터가 고품질 편향 점수 유발
+
+사용법:
+    # 분포 분석만
+    python scripts/analyze_and_rebalance.py \
+        --input data/en-ko-qe/train.csv \
+        --analyze_only
+
+    # 균등 리밸런싱 (기본: 구간별 최소 샘플 수에 맞춤)
+    python scripts/analyze_and_rebalance.py \
+        --input data/en-ko-qe/train.csv \
+        --output data/en-ko-qe/train_balanced.csv \
+        --strategy equal
+
+    # 역빈도 가중치 CSV 생성 (학습 시 가중치 적용)
+    python scripts/analyze_and_rebalance.py \
+        --input data/en-ko-qe/train.csv \
+        --output data/en-ko-qe/train_weighted.csv \
+        --strategy weighted
+
+    # 소프트 리밸런싱 (과소 구간 오버샘플 + 과다 구간 언더샘플)
+    python scripts/analyze_and_rebalance.py \
+        --input data/en-ko-qe/train.csv \
+        --output data/en-ko-qe/train_soft.csv \
+        --strategy soft \
+        --target_total 3000000
+"""
+
+import argparse
+import os
+import sys
+from collections import Counter
+
+import numpy as np
+import pandas as pd
+
+
+# ============================================================
+# 분석 함수
+# ============================================================
+
+def analyze_distribution(df: pd.DataFrame, name: str = "Dataset", n_bins: int = 10):
+    """점수 분포 상세 분석"""
+    scores = df["score"].values
+
+    print(f"\n{'='*70}")
+    print(f"  점수 분포 분석: {name}")
+    print(f"  총 샘플 수: {len(df):,}")
+    print(f"{'='*70}")
+
+    # 기본 통계
+    print(f"\n  [기본 통계]")
+    print(f"  Mean:   {np.mean(scores):.4f}")
+    print(f"  Std:    {np.std(scores):.4f}")
+    print(f"  Median: {np.median(scores):.4f}")
+    print(f"  Min:    {np.min(scores):.4f}")
+    print(f"  Max:    {np.max(scores):.4f}")
+    print(f"  Skew:   {pd.Series(scores).skew():.4f}")
+
+    # 구간별 분포
+    bin_edges = np.linspace(0, 1, n_bins + 1)
+    counts = np.zeros(n_bins, dtype=int)
+    for i in range(n_bins):
+        if i == n_bins - 1:
+            mask = (scores >= bin_edges[i]) & (scores <= bin_edges[i + 1])
+        else:
+            mask = (scores >= bin_edges[i]) & (scores < bin_edges[i + 1])
+        counts[i] = mask.sum()
+
+    max_count = max(counts)
+    print(f"\n  [구간별 분포]")
+    print(f"  {'구간':>10}  {'샘플 수':>12}  {'비율':>7}  히스토그램")
+    print(f"  {'-'*60}")
+
+    for i in range(n_bins):
+        pct = counts[i] / len(df) * 100
+        bar_len = int(counts[i] / max_count * 40)
+        bar = "█" * bar_len
+        label = f"{bin_edges[i]:.1f}-{bin_edges[i+1]:.1f}"
+        print(f"  {label:>10}  {counts[i]:>10,}  ({pct:>5.1f}%)  {bar}")
+
+    # 불균형 정도 분석
+    imbalance_ratio = max(counts) / max(min(counts), 1)
+    print(f"\n  [불균형 지표]")
+    print(f"  최대/최소 구간 비율: {imbalance_ratio:.1f}x")
+    print(f"  최대 구간: {bin_edges[np.argmax(counts)]:.1f}-{bin_edges[np.argmax(counts)+1]:.1f} ({max(counts):,})")
+    print(f"  최소 구간: {bin_edges[np.argmin(counts)]:.1f}-{bin_edges[np.argmin(counts)+1]:.1f} ({min(counts):,})")
+
+    # 학습 영향 진단
+    print(f"\n  [학습 영향 진단]")
+
+    low_pct = counts[:3].sum() / len(df) * 100  # 0.0-0.3
+    mid_pct = counts[3:7].sum() / len(df) * 100  # 0.3-0.7
+    high_pct = counts[7:].sum() / len(df) * 100  # 0.7-1.0
+
+    print(f"  저품질 (0.0-0.3): {counts[:3].sum():>10,} ({low_pct:>5.1f}%)")
+    print(f"  중품질 (0.3-0.7): {counts[3:7].sum():>10,} ({mid_pct:>5.1f}%)")
+    print(f"  고품질 (0.7-1.0): {counts[7:].sum():>10,} ({high_pct:>5.1f}%)")
+
+    if high_pct > 50:
+        print(f"\n  ⚠️  경고: 고품질 구간이 {high_pct:.0f}%로 과다 → 모델이 높은 점수로 편향될 위험")
+    if low_pct < 10:
+        print(f"  ⚠️  경고: 저품질 구간이 {low_pct:.1f}%로 부족 → 나쁜 번역 식별 능력 저하")
+    if imbalance_ratio > 10:
+        print(f"  ⚠️  경고: 구간 간 불균형 비율 {imbalance_ratio:.0f}x → 리밸런싱 강력 권장")
+
+    return counts, bin_edges
+
+
+# ============================================================
+# 리밸런싱 전략
+# ============================================================
+
+def rebalance_equal(df: pd.DataFrame, n_bins: int = 10, seed: int = 42) -> pd.DataFrame:
+    """
+    전략 1: 완전 균등 리밸런싱
+    각 구간에서 최소 구간의 샘플 수만큼만 샘플링합니다.
+
+    장점: 완벽히 균등한 분포
+    단점: 데이터 손실이 클 수 있음 (최소 구간에 맞춤)
+    """
+    np.random.seed(seed)
+    bin_edges = np.linspace(0, 1, n_bins + 1)
+
+    bin_dfs = []
+    min_count = float("inf")
+
+    # 각 구간의 데이터 분리 및 최소 카운트 찾기
+    for i in range(n_bins):
+        if i == n_bins - 1:
+            mask = (df["score"] >= bin_edges[i]) & (df["score"] <= bin_edges[i + 1])
+        else:
+            mask = (df["score"] >= bin_edges[i]) & (df["score"] < bin_edges[i + 1])
+        bin_df = df[mask]
+        bin_dfs.append(bin_df)
+        if len(bin_df) > 0:
+            min_count = min(min_count, len(bin_df))
+
+    print(f"\n  [균등 리밸런싱]")
+    print(f"  각 구간 목표 샘플 수: {min_count:,}")
+    print(f"  총 목표: {min_count * n_bins:,} (원본: {len(df):,})")
+
+    # 각 구간에서 min_count만큼 샘플링
+    result_dfs = []
+    for i, bin_df in enumerate(bin_dfs):
+        if len(bin_df) >= min_count:
+            sampled = bin_df.sample(n=min_count, random_state=seed)
+        else:
+            # 오버샘플링 (복원 추출)
+            sampled = bin_df.sample(n=min_count, replace=True, random_state=seed)
+        result_dfs.append(sampled)
+
+    return pd.concat(result_dfs, ignore_index=True).sample(frac=1, random_state=seed).reset_index(drop=True)
+
+
+def rebalance_soft(df: pd.DataFrame, target_total: int = 3000000,
+                   n_bins: int = 10, seed: int = 42,
+                   smoothing: float = 0.5) -> pd.DataFrame:
+    """
+    전략 2: 소프트 리밸런싱 (권장)
+    과다 구간은 언더샘플링, 과소 구간은 오버샘플링하되
+    완전 균등은 아닌 '부드러운' 균형을 만듭니다.
+
+    smoothing=1.0: 완전 균등 (equal과 동일)
+    smoothing=0.5: 제곱근 역빈도 (sqrt inverse frequency)
+    smoothing=0.0: 원본 분포 유지
+
+    장점: 데이터 손실 최소화하면서 분포 개선
+    단점: 완벽히 균등하지는 않음
+    """
+    np.random.seed(seed)
+    bin_edges = np.linspace(0, 1, n_bins + 1)
+
+    bin_dfs = []
+    bin_counts = []
+
+    for i in range(n_bins):
+        if i == n_bins - 1:
+            mask = (df["score"] >= bin_edges[i]) & (df["score"] <= bin_edges[i + 1])
+        else:
+            mask = (df["score"] >= bin_edges[i]) & (df["score"] < bin_edges[i + 1])
+        bin_df = df[mask]
+        bin_dfs.append(bin_df)
+        bin_counts.append(len(bin_df))
+
+    bin_counts = np.array(bin_counts, dtype=float)
+    bin_counts = np.maximum(bin_counts, 1)  # 0 방지
+
+    # 역빈도 가중치 계산 (smoothing 적용)
+    if smoothing > 0:
+        # smoothing=0.5: sqrt(1/freq), smoothing=1.0: 1/freq
+        weights = (1.0 / bin_counts) ** smoothing
+    else:
+        weights = bin_counts / bin_counts.sum()  # 원본 분포 유지
+
+    # 가중치 정규화하여 목표 총 샘플 수에 맞춤
+    weights = weights / weights.sum()
+    target_per_bin = (weights * target_total).astype(int)
+
+    print(f"\n  [소프트 리밸런싱] smoothing={smoothing}")
+    print(f"  목표 총 샘플 수: {target_total:,}")
+    print(f"  {'구간':>10}  {'원본':>10}  {'목표':>10}  {'비율변화':>10}")
+    print(f"  {'-'*45}")
+    for i in range(n_bins):
+        label = f"{bin_edges[i]:.1f}-{bin_edges[i+1]:.1f}"
+        ratio = target_per_bin[i] / max(bin_counts[i], 1)
+        change = "▲ 오버샘플" if ratio > 1 else "▼ 언더샘플" if ratio < 1 else "= 유지"
+        print(f"  {label:>10}  {int(bin_counts[i]):>10,}  {target_per_bin[i]:>10,}  {change}")
+
+    # 샘플링 실행
+    result_dfs = []
+    for i, bin_df in enumerate(bin_dfs):
+        target = target_per_bin[i]
+        if target == 0:
+            continue
+        if len(bin_df) >= target:
+            sampled = bin_df.sample(n=target, random_state=seed)
+        else:
+            # 오버샘플링 (복원 추출)
+            sampled = bin_df.sample(n=target, replace=True, random_state=seed)
+        result_dfs.append(sampled)
+
+    return pd.concat(result_dfs, ignore_index=True).sample(frac=1, random_state=seed).reset_index(drop=True)
+
+
+def add_sample_weights(df: pd.DataFrame, n_bins: int = 10,
+                       smoothing: float = 0.5) -> pd.DataFrame:
+    """
+    전략 3: 가중치 컬럼 추가
+    데이터 자체는 변경하지 않고, sample_weight 컬럼을 추가합니다.
+    학습 시 커스텀 loss에서 이 가중치를 사용합니다.
+
+    장점: 원본 데이터 보존, 유연한 적용
+    단점: COMET 코드 수정 필요 (커스텀 loss)
+    """
+    bin_edges = np.linspace(0, 1, n_bins + 1)
+    scores = df["score"].values
+
+    # 각 샘플의 구간 찾기
+    bin_indices = np.digitize(scores, bin_edges[1:])  # 0-indexed
+    bin_indices = np.clip(bin_indices, 0, n_bins - 1)
+
+    # 구간별 카운트
+    bin_counts = np.bincount(bin_indices, minlength=n_bins).astype(float)
+    bin_counts = np.maximum(bin_counts, 1)
+
+    # 역빈도 가중치
+    bin_weights = (1.0 / bin_counts) ** smoothing
+    bin_weights = bin_weights / bin_weights.mean()  # 평균=1로 정규화
+
+    # 가중치 할당
+    df = df.copy()
+    df["sample_weight"] = bin_weights[bin_indices]
+
+    print(f"\n  [가중치 부여] smoothing={smoothing}")
+    print(f"  {'구간':>10}  {'샘플 수':>10}  {'가중치':>8}")
+    print(f"  {'-'*35}")
+    for i in range(n_bins):
+        label = f"{bin_edges[i]:.1f}-{bin_edges[i+1]:.1f}"
+        print(f"  {label:>10}  {int(bin_counts[i]):>10,}  {bin_weights[i]:>8.3f}")
+
+    return df
+
+
+# ============================================================
+# 메인
+# ============================================================
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="COMET 학습 데이터 점수 분포 분석 및 리밸런싱"
+    )
+    parser.add_argument("--input", type=str, required=True, help="입력 CSV 경로")
+    parser.add_argument("--output", type=str, help="출력 CSV 경로")
+    parser.add_argument(
+        "--strategy",
+        type=str,
+        choices=["equal", "soft", "weighted"],
+        default="soft",
+        help="리밸런싱 전략: equal(완전균등), soft(소프트, 권장), weighted(가중치 부여)",
+    )
+    parser.add_argument(
+        "--target_total",
+        type=int,
+        default=3000000,
+        help="소프트 리밸런싱 시 목표 총 샘플 수",
+    )
+    parser.add_argument(
+        "--smoothing",
+        type=float,
+        default=0.5,
+        help="리밸런싱 강도 (0=원본유지, 0.5=제곱근역빈도, 1.0=완전균등)",
+    )
+    parser.add_argument("--n_bins", type=int, default=10, help="점수 구간 수")
+    parser.add_argument("--seed", type=int, default=42, help="랜덤 시드")
+    parser.add_argument(
+        "--analyze_only", action="store_true", help="분석만 수행 (리밸런싱 없이)"
+    )
+    args = parser.parse_args()
+
+    # 데이터 로드
+    print(f"[INFO] Loading: {args.input}")
+    df = pd.read_csv(args.input)
+    if "score" not in df.columns:
+        print("[ERROR] 'score' 컬럼이 없습니다.")
+        sys.exit(1)
+
+    # 분석
+    counts, bin_edges = analyze_distribution(df, "원본 데이터", args.n_bins)
+
+    if args.analyze_only:
+        print("\n[DONE] 분석 완료 (--analyze_only)")
+        return
+
+    if not args.output:
+        print("[ERROR] --output 경로를 지정해주세요.")
+        sys.exit(1)
+
+    # 리밸런싱 실행
+    if args.strategy == "equal":
+        result = rebalance_equal(df, n_bins=args.n_bins, seed=args.seed)
+    elif args.strategy == "soft":
+        result = rebalance_soft(
+            df,
+            target_total=args.target_total,
+            n_bins=args.n_bins,
+            seed=args.seed,
+            smoothing=args.smoothing,
+        )
+    elif args.strategy == "weighted":
+        result = add_sample_weights(df, n_bins=args.n_bins, smoothing=args.smoothing)
+    else:
+        print(f"[ERROR] Unknown strategy: {args.strategy}")
+        sys.exit(1)
+
+    # 결과 분석
+    analyze_distribution(result, f"리밸런싱 후 ({args.strategy})", args.n_bins)
+
+    # 저장
+    os.makedirs(os.path.dirname(args.output) or ".", exist_ok=True)
+    result.to_csv(args.output, index=False)
+    print(f"\n[SAVED] {args.output} ({len(result):,} rows)")
+
+    print(f"\n{'='*70}")
+    print(f"  리밸런싱 전략 비교 가이드")
+    print(f"{'='*70}")
+    print(f"  equal:    각 구간 동일 샘플 수 → 가장 공격적, 데이터 손실 큼")
+    print(f"  soft:     소프트 리밸런싱 → 권장, 균형과 데이터 보존 타협")
+    print(f"  weighted: 가중치 추가 → 데이터 손실 없음, 코드 수정 필요")
+    print(f"{'='*70}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/download_checkpoint.py b/scripts/download_checkpoint.py
new file mode 100644
index 0000000..03e14e9
--- /dev/null
+++ b/scripts/download_checkpoint.py
@@ -0,0 +1,98 @@
+#!/usr/bin/env python3
+"""
+COMET 사전학습 모델 체크포인트 다운로드 스크립트
+================================================
+
+HuggingFace Hub 또는 레거시 S3에서 COMET 체크포인트를 다운로드합니다.
+
+사용법:
+    # HuggingFace 모델 다운로드 (권장)
+    python scripts/download_checkpoint.py --model Unbabel/wmt22-cometkiwi-da
+
+    # 레거시 모델 다운로드
+    python scripts/download_checkpoint.py --model wmt21-comet-qe-da --legacy
+
+    # 체크포인트 경로만 확인
+    python scripts/download_checkpoint.py --model Unbabel/wmt22-cometkiwi-da --print_path
+"""
+
+import argparse
+import os
+import sys
+
+
+def download_from_huggingface(model_name: str, output_dir: str) -> str:
+    """HuggingFace Hub에서 모델 다운로드"""
+    try:
+        from comet import download_model
+    except ImportError:
+        print("[ERROR] comet 패키지를 먼저 설치하세요: pip install unbabel-comet")
+        sys.exit(1)
+
+    print(f"[INFO] Downloading {model_name} from HuggingFace Hub...")
+    model_path = download_model(model_name)
+    print(f"[INFO] Model downloaded to: {model_path}")
+    return model_path
+
+
+def download_legacy(model_name: str) -> str:
+    """레거시 COMET 모델 다운로드 (S3)"""
+    try:
+        from comet import download_model
+    except ImportError:
+        print("[ERROR] comet 패키지를 먼저 설치하세요: pip install unbabel-comet")
+        sys.exit(1)
+
+    print(f"[INFO] Downloading legacy model: {model_name}...")
+    model_path = download_model(model_name)
+    print(f"[INFO] Model downloaded to: {model_path}")
+    return model_path
+
+
+def find_checkpoint(model_path: str) -> str:
+    """다운로드된 모델 디렉토리에서 .ckpt 파일 찾기"""
+    for root, dirs, files in os.walk(model_path):
+        for f in files:
+            if f.endswith(".ckpt"):
+                return os.path.join(root, f)
+    return None
+
+
+def main():
+    parser = argparse.ArgumentParser(description="COMET 체크포인트 다운로드")
+    parser.add_argument(
+        "--model",
+        type=str,
+        required=True,
+        help="모델 이름 (예: Unbabel/wmt22-cometkiwi-da, wmt21-comet-qe-da)",
+    )
+    parser.add_argument(
+        "--legacy",
+        action="store_true",
+        help="레거시 모델 다운로드 (S3)",
+    )
+    parser.add_argument(
+        "--print_path",
+        action="store_true",
+        help="체크포인트 경로만 출력",
+    )
+    args = parser.parse_args()
+
+    if args.legacy:
+        model_path = download_legacy(args.model)
+    else:
+        model_path = download_from_huggingface(args.model, "checkpoints")
+
+    ckpt = find_checkpoint(model_path)
+    if ckpt:
+        print(f"\n[SUCCESS] Checkpoint file: {ckpt}")
+        print(f"\n사용 예시:")
+        print(f"  comet-train --cfg YOUR_CONFIG.yaml \\")
+        print(f"      --load_from_checkpoint {ckpt}")
+    else:
+        print(f"\n[INFO] Model path: {model_path}")
+        print(f"  (.ckpt 파일을 직접 확인하세요)")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/evaluate_model.py b/scripts/evaluate_model.py
new file mode 100644
index 0000000..2e7e9e0
--- /dev/null
+++ b/scripts/evaluate_model.py
@@ -0,0 +1,169 @@
+#!/usr/bin/env python3
+"""
+학습된 COMET 모델 평가 스크립트
+================================
+
+학습된 체크포인트를 사용하여 테스트 데이터에서 번역 품질 점수를 예측하고
+성능 지표를 계산합니다.
+
+사용법:
+    # 기본 평가 (CSV 파일)
+    python scripts/evaluate_model.py \
+        --checkpoint lightning_logs/version_X/checkpoints/best.ckpt \
+        --test_data data/en-ko-qe/val.csv
+
+    # UnifiedMetric 평가
+    python scripts/evaluate_model.py \
+        --checkpoint lightning_logs/version_X/checkpoints/best.ckpt \
+        --test_data data/en-ko-qe/val.csv \
+        --model_type unified
+
+    # 개별 문장 평가
+    python scripts/evaluate_model.py \
+        --checkpoint lightning_logs/version_X/checkpoints/best.ckpt \
+        --src "activate a scanning directed acyclic graph" \
+        --mt "스캐닝 지시 비순환 그래프를 활성화하는 것" \
+        --model_type referenceless
+"""
+
+import argparse
+import sys
+
+import numpy as np
+import pandas as pd
+import torch
+from scipy import stats
+
+
+def load_model(checkpoint_path: str, model_type: str):
+    """체크포인트에서 모델 로드"""
+    if model_type == "referenceless":
+        from comet.models import ReferencelessRegression
+        model = ReferencelessRegression.load_from_checkpoint(checkpoint_path)
+    elif model_type == "unified":
+        from comet.models import UnifiedMetric
+        model = UnifiedMetric.load_from_checkpoint(checkpoint_path)
+    else:
+        raise ValueError(f"Unknown model type: {model_type}")
+
+    model.eval()
+    return model
+
+
+def evaluate_csv(model, test_path: str):
+    """CSV 테스트 데이터 평가"""
+    df = pd.read_csv(test_path)
+    print(f"[INFO] Test data: {test_path} ({len(df)} samples)")
+
+    # 모델에 필요한 형식으로 데이터 구성
+    samples = []
+    for _, row in df.iterrows():
+        sample = {"src": str(row["src"]), "mt": str(row["mt"])}
+        samples.append(sample)
+
+    # 예측
+    print("[INFO] Running predictions...")
+    output = model.predict(samples, batch_size=32, gpus=1)
+    predicted_scores = output.scores
+
+    # 실제 점수가 있으면 상관계수 계산
+    if "score" in df.columns:
+        actual_scores = df["score"].values
+
+        # Pearson 상관계수
+        pearson_r, pearson_p = stats.pearsonr(actual_scores, predicted_scores)
+        # Spearman 상관계수
+        spearman_r, spearman_p = stats.spearmanr(actual_scores, predicted_scores)
+        # Kendall tau
+        kendall_tau, kendall_p = stats.kendalltau(actual_scores, predicted_scores)
+
+        # MSE & MAE
+        mse = np.mean((np.array(actual_scores) - np.array(predicted_scores)) ** 2)
+        mae = np.mean(np.abs(np.array(actual_scores) - np.array(predicted_scores)))
+
+        print(f"\n{'='*60}")
+        print(f"[RESULTS] Evaluation Metrics")
+        print(f"{'='*60}")
+        print(f"  Pearson r:     {pearson_r:.4f} (p={pearson_p:.2e})")
+        print(f"  Spearman rho:  {spearman_r:.4f} (p={spearman_p:.2e})")
+        print(f"  Kendall tau:   {kendall_tau:.4f} (p={kendall_p:.2e})")
+        print(f"  MSE:           {mse:.6f}")
+        print(f"  MAE:           {mae:.6f}")
+        print(f"{'='*60}")
+
+        # 점수 분포 비교
+        print(f"\n  Score Distribution:")
+        print(f"  {'':>15}  {'Actual':>10}  {'Predicted':>10}")
+        print(f"  {'Mean':>15}  {np.mean(actual_scores):>10.4f}  {np.mean(predicted_scores):>10.4f}")
+        print(f"  {'Std':>15}  {np.std(actual_scores):>10.4f}  {np.std(predicted_scores):>10.4f}")
+        print(f"  {'Min':>15}  {np.min(actual_scores):>10.4f}  {np.min(predicted_scores):>10.4f}")
+        print(f"  {'Max':>15}  {np.max(actual_scores):>10.4f}  {np.max(predicted_scores):>10.4f}")
+
+    else:
+        print(f"\n[INFO] Predicted scores (no ground truth available):")
+        print(f"  Mean:  {np.mean(predicted_scores):.4f}")
+        print(f"  Std:   {np.std(predicted_scores):.4f}")
+        print(f"  Min:   {np.min(predicted_scores):.4f}")
+        print(f"  Max:   {np.max(predicted_scores):.4f}")
+
+    # 결과를 CSV로 저장
+    df["predicted_score"] = predicted_scores
+    output_path = test_path.replace(".csv", "_predictions.csv")
+    df.to_csv(output_path, index=False)
+    print(f"\n[SAVED] Predictions saved to: {output_path}")
+
+    return predicted_scores
+
+
+def evaluate_single(model, src: str, mt: str):
+    """단일 문장 쌍 평가"""
+    samples = [{"src": src, "mt": mt}]
+    output = model.predict(samples, batch_size=1, gpus=1)
+
+    print(f"\n{'='*60}")
+    print(f"[RESULT] Single Sentence Evaluation")
+    print(f"{'='*60}")
+    print(f"  Source: {src}")
+    print(f"  MT:     {mt}")
+    print(f"  Score:  {output.scores[0]:.4f}")
+    print(f"{'='*60}")
+
+    return output.scores[0]
+
+
+def main():
+    parser = argparse.ArgumentParser(description="COMET 모델 평가")
+    parser.add_argument(
+        "--checkpoint", type=str, required=True, help="모델 체크포인트 경로"
+    )
+    parser.add_argument(
+        "--model_type",
+        type=str,
+        choices=["referenceless", "unified"],
+        default="referenceless",
+        help="모델 타입",
+    )
+    parser.add_argument("--test_data", type=str, help="테스트 CSV 경로")
+    parser.add_argument("--src", type=str, help="소스 문장 (단일 평가)")
+    parser.add_argument("--mt", type=str, help="번역 문장 (단일 평가)")
+    parser.add_argument(
+        "--gpus", type=int, default=1, help="사용할 GPU 수"
+    )
+
+    args = parser.parse_args()
+
+    # 모델 로드
+    print(f"[INFO] Loading model from: {args.checkpoint}")
+    model = load_model(args.checkpoint, args.model_type)
+
+    if args.test_data:
+        evaluate_csv(model, args.test_data)
+    elif args.src and args.mt:
+        evaluate_single(model, args.src, args.mt)
+    else:
+        print("[ERROR] --test_data 또는 --src/--mt를 지정해주세요.")
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/finetune_lora.py b/scripts/finetune_lora.py
new file mode 100644
index 0000000..86622d8
--- /dev/null
+++ b/scripts/finetune_lora.py
@@ -0,0 +1,540 @@
+#!/usr/bin/env python3
+"""
+COMET COMETKiwi LoRA Fine-tuning 스크립트
+==========================================
+
+HuggingFace PEFT (LoRA)를 사용하여 COMETKiwi의 인코더에
+저랭크 어댑터만 학습합니다.
+
+Full fine-tuning 대비 장점:
+  - 학습 파라미터 수 ~1% 이하 → 과적합 방지
+  - VRAM 절약 (V100 32GB에서도 실행 가능)
+  - COMETKiwi의 사전학습 지식 보존 (catastrophic forgetting 방지)
+  - 소규모 데이터에서도 안정적인 학습
+
+참고 논문:
+  - MTQE.en-he (arXiv:2602.06546): Full FT는 COMETKiwi를 오히려 악화시키고,
+    LoRA/BitFit이 +2~3pp 개선 달성
+  - ComeTH (wasanx/ComeTH): COMETKiwi fine-tune으로 EN-TH에서 +4.9% 향상
+
+사용법:
+    # LoRA Fine-tuning
+    python scripts/finetune_lora.py \
+        --base_model Unbabel/wmt22-cometkiwi-da \
+        --train_data data/en-ko-qe/train.csv \
+        --val_data data/en-ko-qe/val.csv \
+        --output_dir outputs/cometkiwi-lora-en-ko \
+        --lora_rank 16 \
+        --epochs 3
+
+    # Head-only Fine-tuning (FTHead)
+    python scripts/finetune_lora.py \
+        --base_model Unbabel/wmt22-cometkiwi-da \
+        --train_data data/en-ko-qe/train.csv \
+        --val_data data/en-ko-qe/val.csv \
+        --output_dir outputs/cometkiwi-fthead-en-ko \
+        --mode fthead \
+        --epochs 5
+
+    # BitFit (bias terms only)
+    python scripts/finetune_lora.py \
+        --base_model Unbabel/wmt22-cometkiwi-da \
+        --train_data data/en-ko-qe/train.csv \
+        --val_data data/en-ko-qe/val.csv \
+        --output_dir outputs/cometkiwi-bitfit-en-ko \
+        --mode bitfit \
+        --epochs 5
+
+설치 필요:
+    pip install peft>=0.6.0
+"""
+
+import argparse
+import logging
+import os
+import sys
+
+import numpy as np
+import pandas as pd
+import torch
+from scipy import stats
+from torch.utils.data import DataLoader, Dataset, RandomSampler
+from torch.utils.tensorboard import SummaryWriter
+
+logger = logging.getLogger(__name__)
+
+
+class QEDataset(Dataset):
+    """Reference-free QE 데이터셋"""
+
+    def __init__(self, csv_path: str, max_rows: int = 0):
+        df = pd.read_csv(csv_path)
+        df = df[["src", "mt", "score"]].dropna()
+        df["src"] = df["src"].astype(str)
+        df["mt"] = df["mt"].astype(str)
+        df["score"] = df["score"].astype(float)
+
+        if max_rows > 0 and len(df) > max_rows:
+            df = df.sample(n=max_rows, random_state=42).reset_index(drop=True)
+
+        self.data = df.to_dict("records")
+        logger.info(f"Loaded {len(self.data)} samples from {csv_path}")
+
+    def __len__(self):
+        return len(self.data)
+
+    def __getitem__(self, idx):
+        return self.data[idx]
+
+
+def freeze_encoder_keep_bias(model):
+    """BitFit: 인코더의 bias 파라미터만 학습 가능하게 설정"""
+    trainable_count = 0
+    frozen_count = 0
+
+    for name, param in model.named_parameters():
+        if "estimator" in name:
+            # Head는 항상 학습
+            param.requires_grad = True
+            trainable_count += param.numel()
+        elif "bias" in name:
+            # Bias만 학습
+            param.requires_grad = True
+            trainable_count += param.numel()
+        elif "layerwise_attention" in name:
+            # Layer attention도 학습
+            param.requires_grad = True
+            trainable_count += param.numel()
+        else:
+            param.requires_grad = False
+            frozen_count += param.numel()
+
+    total = trainable_count + frozen_count
+    logger.info(
+        f"BitFit: {trainable_count:,} trainable ({trainable_count/total*100:.2f}%) / "
+        f"{frozen_count:,} frozen ({frozen_count/total*100:.2f}%)"
+    )
+
+
+def freeze_encoder_head_only(model):
+    """FTHead: Head(estimator)와 layerwise_attention만 학습"""
+    trainable_count = 0
+    frozen_count = 0
+
+    for name, param in model.named_parameters():
+        if "estimator" in name or "layerwise_attention" in name:
+            param.requires_grad = True
+            trainable_count += param.numel()
+        else:
+            param.requires_grad = False
+            frozen_count += param.numel()
+
+    total = trainable_count + frozen_count
+    logger.info(
+        f"FTHead: {trainable_count:,} trainable ({trainable_count/total*100:.2f}%) / "
+        f"{frozen_count:,} frozen ({frozen_count/total*100:.2f}%)"
+    )
+
+
+def apply_lora(model, rank: int = 16, alpha: int = 32, target_modules: list = None):
+    """LoRA 어댑터를 인코더에 적용"""
+    try:
+        from peft import LoraConfig, get_peft_model
+    except ImportError:
+        logger.error("peft 패키지가 필요합니다: pip install peft>=0.6.0")
+        sys.exit(1)
+
+    if target_modules is None:
+        # XLM-RoBERTa / InfoXLM의 attention + FFN 레이어
+        target_modules = [
+            "query", "key", "value",  # self-attention
+            "dense",  # output projection + FFN
+        ]
+
+    # LoRA는 encoder.model 에 적용
+    encoder_model = model.encoder.model
+
+    lora_config = LoraConfig(
+        r=rank,
+        lora_alpha=alpha,
+        target_modules=target_modules,
+        lora_dropout=0.1,
+        bias="none",
+        task_type=None,  # 커스텀 task
+    )
+
+    # PEFT 적용
+    model.encoder.model = get_peft_model(encoder_model, lora_config)
+
+    # Head와 layerwise_attention은 학습 가능으로 유지
+    for name, param in model.named_parameters():
+        if "estimator" in name or "layerwise_attention" in name:
+            param.requires_grad = True
+
+    # 통계 출력
+    trainable_count = sum(p.numel() for p in model.parameters() if p.requires_grad)
+    total_count = sum(p.numel() for p in model.parameters())
+    logger.info(
+        f"LoRA (rank={rank}, alpha={alpha}): "
+        f"{trainable_count:,} trainable ({trainable_count/total_count*100:.2f}%) / "
+        f"{total_count:,} total"
+    )
+
+    return model
+
+
+def _move_inputs_to_device(batch_input, device):
+    """batch_input을 device로 이동. tuple of dicts 또는 단일 dict 모두 처리."""
+    if isinstance(batch_input, (tuple, list)):
+        # UnifiedMetric: tuple of dicts (각 forward pass 입력)
+        return tuple(
+            {k: v.to(device) for k, v in inp.items()} for inp in batch_input
+        )
+    else:
+        # ReferencelessRegression: 단일 dict
+        return {k: v.to(device) for k, v in batch_input.items()}
+
+
+def train_epoch(model, dataloader, optimizer, device, epoch,
+                writer=None, global_step=0, log_interval=100,
+                val_loader=None, eval_fn=None, eval_interval=0):
+    """1 에폭 학습
+
+    Args:
+        eval_interval: N step마다 중간 validation 실행 (0=비활성)
+    """
+    model.train()
+    total_loss = 0
+    num_batches = 0
+
+    for batch_idx, batch in enumerate(dataloader):
+        model_inputs, targets = batch
+        model_inputs = _move_inputs_to_device(model_inputs, device)
+        targets_score = targets["score"].to(device)
+
+        optimizer.zero_grad()
+
+        # UnifiedMetric은 tuple of dicts (여러 forward pass),
+        # ReferencelessRegression은 단일 dict
+        if isinstance(model_inputs, (tuple, list)):
+            # UnifiedMetric 방식: 각 input sequence에 대해 forward pass
+            loss = torch.tensor(0.0, device=device)
+            for input_seq in model_inputs:
+                prediction = model(**input_seq)
+                loss = loss + torch.nn.MSELoss()(prediction.score, targets_score)
+        else:
+            prediction = model(**model_inputs)
+            loss = torch.nn.MSELoss()(prediction.score, targets_score)
+
+        loss.backward()
+
+        # Gradient norm (clipping 전)
+        grad_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
+        optimizer.step()
+
+        total_loss += loss.item()
+        num_batches += 1
+        global_step += 1
+
+        # TensorBoard: step별 로깅
+        if writer is not None:
+            writer.add_scalar("train/step_loss", loss.item(), global_step)
+            writer.add_scalar("train/grad_norm", grad_norm.item(), global_step)
+            writer.add_scalar("train/lr", optimizer.param_groups[0]["lr"], global_step)
+
+        if (batch_idx + 1) % log_interval == 0:
+            avg_loss = total_loss / num_batches
+            logger.info(f"  Epoch {epoch} [{batch_idx+1}/{len(dataloader)}] "
+                        f"loss={avg_loss:.6f} grad_norm={grad_norm:.4f}")
+
+        # 에폭 중간 validation
+        if eval_interval > 0 and val_loader is not None and eval_fn is not None:
+            if (batch_idx + 1) % eval_interval == 0:
+                logger.info(f"  [Mid-epoch validation at step {global_step}]")
+                mid_metrics = eval_fn(model, val_loader, device)
+                logger.info(f"    Pearson={mid_metrics['pearson']:.4f} "
+                            f"Spearman={mid_metrics['spearman']:.4f} "
+                            f"Kendall={mid_metrics['kendall']:.4f} "
+                            f"MSE={mid_metrics['mse']:.6f}")
+                if writer is not None:
+                    writer.add_scalar("val_mid/pearson", mid_metrics["pearson"], global_step)
+                    writer.add_scalar("val_mid/spearman", mid_metrics["spearman"], global_step)
+                    writer.add_scalar("val_mid/kendall", mid_metrics["kendall"], global_step)
+                    writer.add_scalar("val_mid/mse", mid_metrics["mse"], global_step)
+                model.train()  # evaluate에서 eval()로 전환되므로 복구
+
+    return total_loss / max(num_batches, 1), global_step
+
+
+def evaluate(model, dataloader, device):
+    """검증 데이터 평가"""
+    model.eval()
+    all_preds = []
+    all_targets = []
+
+    with torch.no_grad():
+        for batch in dataloader:
+            model_inputs, targets = batch
+            model_inputs = _move_inputs_to_device(model_inputs, device)
+
+            # UnifiedMetric: 여러 forward pass의 마지막 prediction 사용
+            if isinstance(model_inputs, (tuple, list)):
+                prediction = None
+                for input_seq in model_inputs:
+                    prediction = model(**input_seq)
+            else:
+                prediction = model(**model_inputs)
+
+            all_preds.extend(prediction.score.cpu().tolist())
+            all_targets.extend(targets["score"].tolist())
+
+    preds = np.array(all_preds)
+    targets = np.array(all_targets)
+
+    pearson_r, _ = stats.pearsonr(targets, preds)
+    spearman_r, _ = stats.spearmanr(targets, preds)
+    kendall_tau, _ = stats.kendalltau(targets, preds)
+    mse = np.mean((targets - preds) ** 2)
+    mae = np.mean(np.abs(targets - preds))
+
+    return {
+        "pearson": pearson_r,
+        "spearman": spearman_r,
+        "kendall": kendall_tau,
+        "mse": mse,
+        "mae": mae,
+        "preds": preds,
+        "targets": targets,
+    }
+
+
+def log_epoch_metrics(writer, metrics, train_loss, epoch):
+    """에폭별 TensorBoard 로깅 (전체 항목)"""
+    preds = metrics["preds"]
+    targets = metrics["targets"]
+
+    # --- 1. 기본 성능 지표 ---
+    writer.add_scalar("train/epoch_loss", train_loss, epoch)
+    writer.add_scalar("val/pearson", metrics["pearson"], epoch)
+    writer.add_scalar("val/spearman", metrics["spearman"], epoch)
+    writer.add_scalar("val/kendall", metrics["kendall"], epoch)
+    writer.add_scalar("val/mse", metrics["mse"], epoch)
+    writer.add_scalar("val/mae", metrics["mae"], epoch)
+
+    # --- 2. 분포 collapse 감지 ---
+    # 예측값 분포가 좁아지면 모델이 특정 점수 범위로만 예측 (collapse)
+    pred_std = float(np.std(preds))
+    target_std = float(np.std(targets))
+    writer.add_scalar("collapse/pred_std", pred_std, epoch)
+    writer.add_scalar("collapse/target_std", target_std, epoch)
+    writer.add_scalar("collapse/std_ratio", pred_std / (target_std + 1e-8), epoch)
+    writer.add_scalar("collapse/pred_range",
+                      float(np.max(preds) - np.min(preds)), epoch)
+    writer.add_scalar("collapse/pred_iqr",
+                      float(np.percentile(preds, 75) - np.percentile(preds, 25)), epoch)
+
+    # --- 3. 점수 편향 감지 ---
+    # 예측 평균이 정답 평균에서 크게 벗어나면 고점/저점 편향
+    writer.add_scalar("bias/pred_mean", float(np.mean(preds)), epoch)
+    writer.add_scalar("bias/target_mean", float(np.mean(targets)), epoch)
+    writer.add_scalar("bias/mean_diff",
+                      float(np.mean(preds) - np.mean(targets)), epoch)
+    writer.add_scalar("bias/pred_skewness",
+                      float(pd.Series(preds).skew()), epoch)
+
+    # --- 4. 예측 분포 히스토그램 ---
+    writer.add_histogram("distribution/predictions", preds, epoch)
+    writer.add_histogram("distribution/targets", targets, epoch)
+    writer.add_histogram("distribution/errors", preds - targets, epoch)
+
+    # --- 5. 분위수 추적 ---
+    writer.add_scalar("quantile/pred_q25", float(np.percentile(preds, 25)), epoch)
+    writer.add_scalar("quantile/pred_q50", float(np.percentile(preds, 50)), epoch)
+    writer.add_scalar("quantile/pred_q75", float(np.percentile(preds, 75)), epoch)
+
+
+def main():
+    parser = argparse.ArgumentParser(description="COMETKiwi LoRA/BitFit/FTHead Fine-tuning")
+    parser.add_argument("--base_model", type=str, default="Unbabel/wmt22-cometkiwi-da",
+                        help="Base COMET model name or checkpoint path")
+    parser.add_argument("--train_data", type=str, required=True, help="Training CSV path")
+    parser.add_argument("--val_data", type=str, required=True, help="Validation CSV path")
+    parser.add_argument("--output_dir", type=str, default="outputs/cometkiwi-lora",
+                        help="Output directory for checkpoints")
+    parser.add_argument("--mode", type=str, default="lora",
+                        choices=["lora", "bitfit", "fthead"],
+                        help="Fine-tuning mode")
+    parser.add_argument("--lora_rank", type=int, default=16, help="LoRA rank")
+    parser.add_argument("--lora_alpha", type=int, default=32, help="LoRA alpha")
+    parser.add_argument("--learning_rate", type=float, default=1e-4,
+                        help="Learning rate (PEFT 방법은 더 높은 LR 사용 가능)")
+    parser.add_argument("--batch_size", type=int, default=16, help="Batch size")
+    parser.add_argument("--epochs", type=int, default=3, help="Number of epochs")
+    parser.add_argument("--max_train_rows", type=int, default=0,
+                        help="Max training rows (0=all)")
+    parser.add_argument("--eval_interval", type=int, default=0,
+                        help="에폭 중간 validation 간격 (step 단위, 0=에폭 끝에만 평가)")
+    parser.add_argument("--seed", type=int, default=42, help="Random seed")
+
+    args = parser.parse_args()
+
+    logging.basicConfig(
+        level=logging.INFO,
+        format="%(asctime)s [%(levelname)s] %(message)s",
+    )
+
+    # Seed
+    torch.manual_seed(args.seed)
+    np.random.seed(args.seed)
+
+    os.makedirs(args.output_dir, exist_ok=True)
+
+    # ========================================
+    # 1. 모델 로드
+    # ========================================
+    logger.info(f"Loading base model: {args.base_model}")
+
+    if os.path.exists(args.base_model):
+        # 로컬 체크포인트
+        from comet import load_from_checkpoint
+        model = load_from_checkpoint(args.base_model)
+    else:
+        # HuggingFace 모델
+        from comet import download_model, load_from_checkpoint
+        model_path = download_model(args.base_model)
+        model = load_from_checkpoint(model_path)
+
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    logger.info(f"Device: {device}")
+
+    # ========================================
+    # 2. Fine-tuning 방법 적용
+    # ========================================
+    if args.mode == "lora":
+        logger.info(f"Applying LoRA (rank={args.lora_rank}, alpha={args.lora_alpha})")
+        model = apply_lora(model, rank=args.lora_rank, alpha=args.lora_alpha)
+    elif args.mode == "bitfit":
+        logger.info("Applying BitFit (bias-only fine-tuning)")
+        freeze_encoder_keep_bias(model)
+    elif args.mode == "fthead":
+        logger.info("Applying FTHead (head-only fine-tuning)")
+        freeze_encoder_head_only(model)
+
+    model = model.to(device)
+
+    # ========================================
+    # 3. 데이터 로드
+    # ========================================
+    train_dataset = QEDataset(args.train_data, max_rows=args.max_train_rows)
+    val_dataset = QEDataset(args.val_data, max_rows=10000)  # 검증은 최대 10k
+
+    def collate_fn(batch):
+        return model.prepare_sample(batch, stage="fit")
+
+    def collate_fn_val(batch):
+        return model.prepare_sample(batch, stage="validate")
+
+    train_loader = DataLoader(
+        train_dataset, batch_size=args.batch_size,
+        sampler=RandomSampler(train_dataset),
+        collate_fn=collate_fn, num_workers=2,
+    )
+    val_loader = DataLoader(
+        val_dataset, batch_size=args.batch_size,
+        collate_fn=collate_fn_val, num_workers=2,
+    )
+
+    # ========================================
+    # 4. 옵티마이저
+    # ========================================
+    trainable_params = [p for p in model.parameters() if p.requires_grad]
+    optimizer = torch.optim.AdamW(trainable_params, lr=args.learning_rate, weight_decay=0.01)
+
+    logger.info(f"Trainable parameters: {sum(p.numel() for p in trainable_params):,}")
+    logger.info(f"Training samples: {len(train_dataset)}")
+    logger.info(f"Validation samples: {len(val_dataset)}")
+    logger.info(f"Epochs: {args.epochs}, LR: {args.learning_rate}")
+    if args.eval_interval > 0:
+        logger.info(f"Mid-epoch validation every {args.eval_interval} steps")
+
+    # ========================================
+    # 5. TensorBoard 초기화
+    # ========================================
+    tb_log_dir = os.path.join(args.output_dir, "tensorboard")
+    writer = SummaryWriter(log_dir=tb_log_dir)
+    logger.info(f"TensorBoard log dir: {tb_log_dir}")
+    logger.info(f"  -> tensorboard --logdir {tb_log_dir}")
+
+    # 하이퍼파라미터 기록
+    writer.add_text("hparams", (
+        f"mode={args.mode}, base_model={args.base_model}, "
+        f"lr={args.learning_rate}, batch_size={args.batch_size}, "
+        f"epochs={args.epochs}, lora_rank={args.lora_rank}, lora_alpha={args.lora_alpha}, "
+        f"train_samples={len(train_dataset)}, val_samples={len(val_dataset)}, "
+        f"trainable_params={sum(p.numel() for p in trainable_params):,}"
+    ))
+
+    # ========================================
+    # 6. 학습 루프
+    # ========================================
+    best_kendall = -1
+    best_epoch = -1
+    global_step = 0
+
+    for epoch in range(args.epochs):
+        logger.info(f"\n{'='*60}")
+        logger.info(f"Epoch {epoch + 1}/{args.epochs}")
+        logger.info(f"{'='*60}")
+
+        train_loss, global_step = train_epoch(
+            model, train_loader, optimizer, device, epoch + 1,
+            writer=writer, global_step=global_step,
+            val_loader=val_loader, eval_fn=evaluate,
+            eval_interval=args.eval_interval,
+        )
+        logger.info(f"  Train loss: {train_loss:.6f}")
+
+        metrics = evaluate(model, val_loader, device)
+        logger.info(f"  Val Pearson:  {metrics['pearson']:.4f}")
+        logger.info(f"  Val Spearman: {metrics['spearman']:.4f}")
+        logger.info(f"  Val Kendall:  {metrics['kendall']:.4f}")
+        logger.info(f"  Val MSE:      {metrics['mse']:.6f}")
+        logger.info(f"  Val MAE:      {metrics['mae']:.6f}")
+
+        # TensorBoard: 전체 에폭별 로깅
+        log_epoch_metrics(writer, metrics, train_loss, epoch + 1)
+
+        # 체크포인트 저장
+        if metrics["kendall"] > best_kendall:
+            best_kendall = metrics["kendall"]
+            best_epoch = epoch + 1
+            save_path = os.path.join(args.output_dir, "best_model.ckpt")
+
+            # COMET 형식으로 저장 (load_from_checkpoint으로 로드 가능)
+            torch.save({
+                "state_dict": model.state_dict(),
+                "hyper_parameters": dict(model.hparams),
+            }, save_path)
+            logger.info(f"  -> Best model saved: {save_path} (kendall={best_kendall:.4f})")
+
+        # 에폭별 체크포인트
+        epoch_path = os.path.join(args.output_dir, f"epoch{epoch+1}_kendall{metrics['kendall']:.4f}.ckpt")
+        torch.save({
+            "state_dict": model.state_dict(),
+            "hyper_parameters": dict(model.hparams),
+        }, epoch_path)
+
+    # TensorBoard 종료
+    writer.flush()
+    writer.close()
+    logger.info(f"\nTensorBoard logs saved to: {tb_log_dir}")
+
+    logger.info(f"\n{'='*60}")
+    logger.info(f"Training complete!")
+    logger.info(f"Best epoch: {best_epoch}, Best Kendall: {best_kendall:.4f}")
+    logger.info(f"Best model: {os.path.join(args.output_dir, 'best_model.ckpt')}")
+    logger.info(f"{'='*60}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/prepare_data.py b/scripts/prepare_data.py
new file mode 100644
index 0000000..1eae9ef
--- /dev/null
+++ b/scripts/prepare_data.py
@@ -0,0 +1,239 @@
+#!/usr/bin/env python3
+"""
+COMET Reference-Free 학습을 위한 데이터 전처리 스크립트
+=====================================================
+
+원본 학습 데이터를 COMET 학습용 (src, mt, score) 형식으로 변환합니다.
+모든 접근법(ReferencelessRegression, UnifiedMetric, COMETKiwi 등)에서
+동일한 출력 파일을 사용할 수 있습니다.
+
+사용법:
+    # 기본 변환 (pointwise 데이터만)
+    python scripts/prepare_data.py \
+        --input_dir /path/to/train_data \
+        --output_dir data/en-ko-qe
+
+    # Pairwise 데이터도 합쳐서 포함
+    python scripts/prepare_data.py \
+        --input_dir /path/to/train_data \
+        --output_dir data/en-ko-qe \
+        --include_pairwise
+
+    # 빠른 실험용 (100만 행으로 제한)
+    python scripts/prepare_data.py \
+        --input_dir /path/to/train_data \
+        --output_dir data/en-ko-qe \
+        --max_train_rows 1000000
+
+필수 입력 파일:
+    - en-ko-qe-patent-balanced_train.csv       (pointwise 학습 데이터)
+    - en-ko-qe-patent-balanced_val.csv         (pointwise 검증 데이터)
+    - en-ko-qe-patent-balanced_pairwise_train.csv  (pairwise, --include_pairwise 시)
+    - en-ko-qe-patent-balanced_pairwise_val.csv    (pairwise, --include_pairwise 시)
+
+출력 파일:
+    - train.csv           # 학습 데이터 (src, mt, score)
+    - val.csv             # 검증 데이터 (src, mt, score)
+    - mini_train.csv      # 파이프라인 테스트용 (1000 rows)
+    - mini_val.csv        # 파이프라인 테스트용 (200 rows)
+"""
+
+import argparse
+import os
+import sys
+
+import numpy as np
+import pandas as pd
+
+
+def load_csv(filepath: str) -> pd.DataFrame:
+    """CSV 파일 로드"""
+    print(f"[INFO] Loading: {filepath}")
+    df = pd.read_csv(filepath)
+    print(f"  -> {len(df):,} rows, columns: {list(df.columns)}")
+    return df
+
+
+def extract_src_mt_score(df: pd.DataFrame) -> pd.DataFrame:
+    """(src, mt, score) 컬럼만 추출하고 정제"""
+    required_cols = ["src", "mt", "score"]
+    for col in required_cols:
+        if col not in df.columns:
+            raise ValueError(f"Missing required column: {col}")
+
+    result = df[required_cols].copy()
+    result["src"] = result["src"].astype(str)
+    result["mt"] = result["mt"].astype(str)
+    result["score"] = result["score"].astype(float)
+
+    result = result.dropna(subset=required_cols)
+    result = result[result["src"].str.strip() != ""]
+    result = result[result["mt"].str.strip() != ""]
+
+    print(f"  -> Cleaned: {len(result):,} rows")
+    return result
+
+
+def expand_pairwise(df: pd.DataFrame) -> pd.DataFrame:
+    """
+    Pairwise → Pointwise 변환
+    (src, mt_good, mt_bad, score_good, score_bad) → (src, mt, score) x 2
+    """
+    good = df[["src", "mt_good", "score_good"]].rename(
+        columns={"mt_good": "mt", "score_good": "score"}
+    )
+    bad = df[["src", "mt_bad", "score_bad"]].rename(
+        columns={"mt_bad": "mt", "score_bad": "score"}
+    )
+
+    result = pd.concat([good, bad], ignore_index=True)
+    result["src"] = result["src"].astype(str)
+    result["mt"] = result["mt"].astype(str)
+    result["score"] = result["score"].astype(float)
+
+    result = result.dropna(subset=["src", "mt", "score"])
+    result = result[result["src"].str.strip() != ""]
+    result = result[result["mt"].str.strip() != ""]
+
+    before = len(result)
+    result = result.drop_duplicates(subset=["src", "mt"], keep="first")
+    print(f"  -> Pairwise expanded: {before:,} -> {len(result):,} rows (after dedup)")
+    return result
+
+
+def stratified_sample(df: pd.DataFrame, max_rows: int, seed: int = 42) -> pd.DataFrame:
+    """점수 구간별 균등 샘플링"""
+    if len(df) <= max_rows:
+        return df
+
+    print(f"  -> Sampling {max_rows:,} from {len(df):,} rows (stratified)")
+    np.random.seed(seed)
+
+    bins = [0, 0.2, 0.4, 0.6, 0.8, 1.0]
+    labels = ["0.0-0.2", "0.2-0.4", "0.4-0.6", "0.6-0.8", "0.8-1.0"]
+    df = df.copy()
+    df["_bin"] = pd.cut(df["score"], bins=bins, labels=labels, include_lowest=True)
+
+    per_bin = max_rows // len(labels)
+    sampled = df.groupby("_bin", group_keys=False).apply(
+        lambda x: x.sample(n=min(len(x), per_bin), random_state=seed)
+    )
+
+    remaining = max_rows - len(sampled)
+    if remaining > 0:
+        not_sampled = df.drop(sampled.index)
+        extra = not_sampled.sample(n=min(remaining, len(not_sampled)), random_state=seed)
+        sampled = pd.concat([sampled, extra])
+
+    return sampled.drop(columns=["_bin"]).reset_index(drop=True)
+
+
+def print_stats(df: pd.DataFrame, name: str) -> None:
+    """데이터 통계 출력"""
+    scores = df["score"]
+    print(f"\n{'='*60}")
+    print(f"  {name} ({len(df):,} rows)")
+    print(f"{'='*60}")
+    print(f"  mean={scores.mean():.4f}  std={scores.std():.4f}  "
+          f"median={scores.median():.4f}  min={scores.min():.4f}  max={scores.max():.4f}")
+
+    bins = [0, 0.2, 0.4, 0.6, 0.8, 1.0]
+    labels = ["0.0-0.2", "0.2-0.4", "0.4-0.6", "0.6-0.8", "0.8-1.0"]
+    binned = pd.cut(scores, bins=bins, labels=labels, include_lowest=True)
+    for label in labels:
+        count = (binned == label).sum()
+        pct = count / len(df) * 100
+        print(f"    {label}: {count:>10,} ({pct:>5.1f}%)")
+
+
+def main():
+    parser = argparse.ArgumentParser(description="COMET 학습 데이터 전처리")
+    parser.add_argument("--input_dir", type=str, required=True,
+                        help="원본 데이터 디렉토리")
+    parser.add_argument("--output_dir", type=str, default="data/en-ko-qe",
+                        help="출력 디렉토리")
+    parser.add_argument("--max_train_rows", type=int, default=0,
+                        help="학습 데이터 최대 행 수 (0=전체)")
+    parser.add_argument("--include_pairwise", action="store_true",
+                        help="Pairwise 데이터를 변환하여 합침")
+    parser.add_argument("--seed", type=int, default=42, help="랜덤 시드")
+    args = parser.parse_args()
+
+    os.makedirs(args.output_dir, exist_ok=True)
+
+    # ========================================
+    # 1. Pointwise 데이터 로드
+    # ========================================
+    train_path = os.path.join(args.input_dir, "en-ko-qe-patent-balanced_train.csv")
+    val_path = os.path.join(args.input_dir, "en-ko-qe-patent-balanced_val.csv")
+
+    for path in [train_path, val_path]:
+        if not os.path.exists(path):
+            print(f"[ERROR] File not found: {path}")
+            sys.exit(1)
+
+    train_df = extract_src_mt_score(load_csv(train_path))
+    val_df = extract_src_mt_score(load_csv(val_path))
+
+    # ========================================
+    # 2. Pairwise 합치기 (선택)
+    # ========================================
+    if args.include_pairwise:
+        pw_train_path = os.path.join(
+            args.input_dir, "en-ko-qe-patent-balanced_pairwise_train.csv"
+        )
+        pw_val_path = os.path.join(
+            args.input_dir, "en-ko-qe-patent-balanced_pairwise_val.csv"
+        )
+
+        if os.path.exists(pw_train_path):
+            pw_train = expand_pairwise(load_csv(pw_train_path))
+            train_df = pd.concat([train_df, pw_train], ignore_index=True)
+            train_df = train_df.drop_duplicates(subset=["src", "mt"], keep="first")
+            print(f"  -> Combined train: {len(train_df):,} rows")
+        else:
+            print(f"[WARN] Not found: {pw_train_path}")
+
+        if os.path.exists(pw_val_path):
+            pw_val = expand_pairwise(load_csv(pw_val_path))
+            val_df = pd.concat([val_df, pw_val], ignore_index=True)
+            val_df = val_df.drop_duplicates(subset=["src", "mt"], keep="first")
+            print(f"  -> Combined val: {len(val_df):,} rows")
+        else:
+            print(f"[WARN] Not found: {pw_val_path}")
+
+    # ========================================
+    # 3. 샘플링 (선택)
+    # ========================================
+    if args.max_train_rows > 0:
+        train_df = stratified_sample(train_df, args.max_train_rows, args.seed)
+
+    # ========================================
+    # 4. 저장
+    # ========================================
+    train_out = os.path.join(args.output_dir, "train.csv")
+    val_out = os.path.join(args.output_dir, "val.csv")
+    train_df.to_csv(train_out, index=False)
+    val_df.to_csv(val_out, index=False)
+    print(f"\n[SAVED] train: {train_out} ({len(train_df):,} rows)")
+    print(f"[SAVED] val:   {val_out} ({len(val_df):,} rows)")
+
+    # Mini dataset
+    mini_train = stratified_sample(train_df, 1000, args.seed)
+    mini_val = stratified_sample(val_df, 200, args.seed)
+    mini_train.to_csv(os.path.join(args.output_dir, "mini_train.csv"), index=False)
+    mini_val.to_csv(os.path.join(args.output_dir, "mini_val.csv"), index=False)
+    print(f"[SAVED] mini_train: {len(mini_train):,} rows")
+    print(f"[SAVED] mini_val:   {len(mini_val):,} rows")
+
+    # ========================================
+    # 5. 통계 출력
+    # ========================================
+    print_stats(train_df, "Train")
+    print_stats(val_df, "Val")
+
+    print("\n[DONE] Data preparation complete!")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/run_training.sh b/scripts/run_training.sh
new file mode 100755
index 0000000..c1d5bee
--- /dev/null
+++ b/scripts/run_training.sh
@@ -0,0 +1,171 @@
+#!/bin/bash
+# ============================================================
+# COMET Reference-Free 학습 실행 스크립트
+# ============================================================
+# 사용법:
+#   bash scripts/run_training.sh [approach] [options]
+#
+# approach:
+#   mini     - 미니 테스트 (파이프라인 확인용)
+#   scratch1 - 접근법 1: ReferencelessRegression from scratch
+#   scratch2 - 접근법 2: UnifiedMetric QE from scratch
+#   finetune - 접근법 3: COMETKiwi fine-tuning (추천)
+#   ft-qe    - 접근법 4: ReferencelessRegression fine-tuning
+#
+# options:
+#   --seed N           : 랜덤 시드 (기본값: 12)
+#   --checkpoint PATH  : Fine-tuning용 체크포인트 경로
+#   --gpus N           : GPU 수 (기본값: 1)
+#
+# 예시:
+#   bash scripts/run_training.sh mini
+#   bash scripts/run_training.sh scratch1 --seed 42
+#   bash scripts/run_training.sh finetune --checkpoint /path/to/model.ckpt
+# ============================================================
+
+set -e
+
+APPROACH=${1:-"mini"}
+shift || true
+
+# 기본값
+SEED=12
+CHECKPOINT=""
+GPUS=1
+
+# 인자 파싱
+while [[ $# -gt 0 ]]; do
+    case $1 in
+        --seed)
+            SEED="$2"
+            shift 2
+            ;;
+        --checkpoint)
+            CHECKPOINT="$2"
+            shift 2
+            ;;
+        --gpus)
+            GPUS="$2"
+            shift 2
+            ;;
+        *)
+            echo "[ERROR] Unknown argument: $1"
+            exit 1
+            ;;
+    esac
+done
+
+# 프로젝트 루트로 이동
+cd "$(dirname "$0")/.."
+
+echo "============================================================"
+echo "COMET Reference-Free 학습"
+echo "============================================================"
+echo "  접근법:     $APPROACH"
+echo "  랜덤 시드:  $SEED"
+echo "  GPU 수:     $GPUS"
+echo "  체크포인트:  ${CHECKPOINT:-'없음 (from scratch)'}"
+echo "============================================================"
+
+# 설정 파일 선택
+case $APPROACH in
+    mini)
+        CONFIG="configs/models/en-ko-qe/approach_mini_test.yaml"
+        echo "[INFO] 미니 테스트 실행 (파이프라인 확인)"
+        ;;
+    scratch1)
+        CONFIG="configs/models/en-ko-qe/approach1_referenceless_scratch.yaml"
+        echo "[INFO] 접근법 1: ReferencelessRegression from scratch"
+        ;;
+    scratch2)
+        CONFIG="configs/models/en-ko-qe/approach2_unified_qe_scratch.yaml"
+        echo "[INFO] 접근법 2: UnifiedMetric QE from scratch"
+        ;;
+    finetune)
+        CONFIG="configs/models/en-ko-qe/approach3_finetune_cometkiwi.yaml"
+        echo "[INFO] 접근법 3: COMETKiwi fine-tuning"
+        if [ -z "$CHECKPOINT" ]; then
+            echo "[ERROR] Fine-tuning에는 --checkpoint 옵션이 필요합니다."
+            echo "  먼저 체크포인트를 다운로드하세요:"
+            echo "  python scripts/download_checkpoint.py --model Unbabel/wmt22-cometkiwi-da"
+            exit 1
+        fi
+        ;;
+    ft-qe)
+        CONFIG="configs/models/en-ko-qe/approach4_referenceless_finetune_qe.yaml"
+        echo "[INFO] 접근법 4: ReferencelessRegression fine-tuning"
+        if [ -z "$CHECKPOINT" ]; then
+            echo "[ERROR] Fine-tuning에는 --checkpoint 옵션이 필요합니다."
+            exit 1
+        fi
+        ;;
+    *)
+        echo "[ERROR] Unknown approach: $APPROACH"
+        echo "  Available: mini, scratch1, scratch2, finetune, ft-qe"
+        exit 1
+        ;;
+esac
+
+# 설정 파일 확인
+if [ ! -f "$CONFIG" ]; then
+    echo "[ERROR] Config file not found: $CONFIG"
+    exit 1
+fi
+
+# 데이터 파일 확인
+echo ""
+echo "[CHECK] Verifying data files..."
+if [ "$APPROACH" == "mini" ]; then
+    TRAIN_FILE="data/en-ko-qe/mini_train.csv"
+    VAL_FILE="data/en-ko-qe/mini_val.csv"
+else
+    TRAIN_FILE="data/en-ko-qe/train.csv"
+    VAL_FILE="data/en-ko-qe/val.csv"
+fi
+
+if [ ! -f "$TRAIN_FILE" ]; then
+    echo "[ERROR] Training data not found: $TRAIN_FILE"
+    echo "  먼저 데이터 전처리를 실행하세요:"
+    echo "  python scripts/prepare_data.py --input_dir /path/to/train_data --output_dir data/en-ko-qe"
+    exit 1
+fi
+if [ ! -f "$VAL_FILE" ]; then
+    echo "[ERROR] Validation data not found: $VAL_FILE"
+    exit 1
+fi
+
+TRAIN_LINES=$(wc -l < "$TRAIN_FILE")
+VAL_LINES=$(wc -l < "$VAL_FILE")
+echo "  Train: $TRAIN_FILE ($((TRAIN_LINES - 1)) samples)"
+echo "  Val:   $VAL_FILE ($((VAL_LINES - 1)) samples)"
+
+# GPU 확인
+echo ""
+echo "[CHECK] GPU status..."
+if command -v nvidia-smi &> /dev/null; then
+    nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader
+else
+    echo "  nvidia-smi not available"
+fi
+
+# 학습 명령 구성
+CMD="comet-train --cfg $CONFIG --seed_everything $SEED"
+
+if [ -n "$CHECKPOINT" ]; then
+    CMD="$CMD --load_from_checkpoint $CHECKPOINT"
+fi
+
+echo ""
+echo "[RUN] $CMD"
+echo "============================================================"
+echo ""
+
+# 학습 실행
+eval $CMD
+
+echo ""
+echo "============================================================"
+echo "[DONE] Training completed!"
+echo "  체크포인트 위치: lightning_logs/"
+echo "  평가 실행: python scripts/evaluate_model.py --checkpoint <path>"
+echo "============================================================"