Skip to content

Conversation

@taeungshin
Copy link

No description provided.

claude and others added 14 commits February 8, 2026 23:43
- TRAINING_GUIDE_KO.md: A-Z guide for training reference-free QE models
  covering 4 approaches (ReferencelessRegression scratch, UnifiedMetric QE
  scratch, COMETKiwi fine-tuning, QE model fine-tuning)
- scripts/prepare_data.py: Data preprocessing for EN-KO patent QE data
- scripts/run_training.sh: Training execution wrapper script
- scripts/download_checkpoint.py: Pretrained checkpoint downloader
- scripts/evaluate_model.py: Model evaluation with correlation metrics
- configs/models/en-ko-qe/: Training configs for all 4 approaches + mini test

https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
Major updates based on research of wasanx/ComeTH and MTQE.en-he paper:

- Compare our approach with ComeTH (COMETKiwi EN-TH fine-tune, +4.9%)
- Add MTQE.en-he findings: full fine-tuning degrades COMETKiwi with
  small data, but our 9.6M samples mitigate this risk
- Add approach 5 (FTHead) and approach 6 (LoRA) as safer alternatives
- scripts/finetune_lora.py: LoRA/BitFit/FTHead fine-tuning script
- Updated references with 7 new papers and HuggingFace models
- Revised recommendation order: FTHead first, then Full FT, then LoRA

https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
…e update

Critical finding: training data is heavily skewed (0.7-1.0 = 59.4%,
0.0-0.3 = 3.5%). Research confirms COMET is "highly susceptible to the
distribution of scores in the training data" (Pitfalls paper, WMT 2024).

- scripts/analyze_and_rebalance.py: 3 rebalancing strategies
  (equal, soft/sqrt-inverse-freq, weighted)
- TRAINING_GUIDE_KO.md: New section 4.5 on score distribution impact
  with diagnosis, solutions, and step-by-step rebalancing instructions

https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
UnifiedMetric.prepare_sample returns (tuple_of_dicts, targets) not
(dict, targets). The training loop now correctly iterates over input
sequences for each forward pass, matching UnifiedMetric.training_step.

https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
Includes:
- Training guide (TRAINING_GUIDE_KO.md)
- Data preparation, evaluation, fine-tuning scripts
- 6 training approach configs (scratch, fine-tune, LoRA, FTHead)
- Score distribution analysis and rebalancing tools
- SummaryWriter 초기화 (output_dir/tensorboard/)
- 하이퍼파라미터 텍스트 기록
- Step별 train/step_loss 기록
- 에폭별 train/epoch_loss, val/pearson, val/spearman, val/kendall, val/mse 기록
- 학습 완료 시 writer flush/close

https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
…on-2iFB6

Claude/integrate comet evaluation 2i fb6
- prepare_data.py: 중복 출력(referenceless_*, unified_qe_*) 제거,
  단일 train.csv/val.csv로 통합. 모든 접근법에서 동일 파일 사용.
- --include_pairwise 시 pairwise 변환 데이터가 train.csv에 자동 합쳐짐
- 전체 파일(configs, scripts, guide)의 경로 참조를 train.csv/val.csv로 통일

https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
…on-2iFB6

Simplify data pipeline: unify output to train.csv/val.csv
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- train/grad_norm: gradient norm per step (학습 안정성 모니터링)
- train/lr: learning rate per step
- val/pred_distribution, val/target_distribution: 히스토그램 (score collapse 감지)
- val/pred_std, val/pred_mean: 예측값 통계
- --eval_interval N: 에폭 중간 validation (N step마다, val_mid/* 로깅)
- 콘솔 로그에 grad_norm 추가

https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
…on-2iFB6

Add enhanced TensorBoard logging to finetune_lora.py
Step-level: train/step_loss, train/grad_norm, train/lr
Epoch-level performance: val/pearson, spearman, kendall, mse, mae
Collapse detection: collapse/pred_std, std_ratio, pred_range, pred_iqr
Score bias: bias/pred_mean, target_mean, mean_diff, pred_skewness
Distribution: histogram of predictions, targets, errors
Quantiles: pred_q25, q50, q75
Mid-epoch validation: --eval_interval N (val_mid/* metrics)

https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
…on-2iFB6

Add comprehensive TensorBoard monitoring for QE fine-tuning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants