-
Notifications
You must be signed in to change notification settings - Fork 104
Main enkoqe #265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
taeungshin
wants to merge
14
commits into
Unbabel:master
Choose a base branch
from
taeungshin:main-enkoqe
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Main enkoqe #265
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- TRAINING_GUIDE_KO.md: A-Z guide for training reference-free QE models covering 4 approaches (ReferencelessRegression scratch, UnifiedMetric QE scratch, COMETKiwi fine-tuning, QE model fine-tuning) - scripts/prepare_data.py: Data preprocessing for EN-KO patent QE data - scripts/run_training.sh: Training execution wrapper script - scripts/download_checkpoint.py: Pretrained checkpoint downloader - scripts/evaluate_model.py: Model evaluation with correlation metrics - configs/models/en-ko-qe/: Training configs for all 4 approaches + mini test https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
Major updates based on research of wasanx/ComeTH and MTQE.en-he paper: - Compare our approach with ComeTH (COMETKiwi EN-TH fine-tune, +4.9%) - Add MTQE.en-he findings: full fine-tuning degrades COMETKiwi with small data, but our 9.6M samples mitigate this risk - Add approach 5 (FTHead) and approach 6 (LoRA) as safer alternatives - scripts/finetune_lora.py: LoRA/BitFit/FTHead fine-tuning script - Updated references with 7 new papers and HuggingFace models - Revised recommendation order: FTHead first, then Full FT, then LoRA https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
…e update Critical finding: training data is heavily skewed (0.7-1.0 = 59.4%, 0.0-0.3 = 3.5%). Research confirms COMET is "highly susceptible to the distribution of scores in the training data" (Pitfalls paper, WMT 2024). - scripts/analyze_and_rebalance.py: 3 rebalancing strategies (equal, soft/sqrt-inverse-freq, weighted) - TRAINING_GUIDE_KO.md: New section 4.5 on score distribution impact with diagnosis, solutions, and step-by-step rebalancing instructions https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
UnifiedMetric.prepare_sample returns (tuple_of_dicts, targets) not (dict, targets). The training loop now correctly iterates over input sequences for each forward pass, matching UnifiedMetric.training_step. https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
Includes: - Training guide (TRAINING_GUIDE_KO.md) - Data preparation, evaluation, fine-tuning scripts - 6 training approach configs (scratch, fine-tune, LoRA, FTHead) - Score distribution analysis and rebalancing tools
- SummaryWriter 초기화 (output_dir/tensorboard/) - 하이퍼파라미터 텍스트 기록 - Step별 train/step_loss 기록 - 에폭별 train/epoch_loss, val/pearson, val/spearman, val/kendall, val/mse 기록 - 학습 완료 시 writer flush/close https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
…on-2iFB6 Claude/integrate comet evaluation 2i fb6
- prepare_data.py: 중복 출력(referenceless_*, unified_qe_*) 제거, 단일 train.csv/val.csv로 통합. 모든 접근법에서 동일 파일 사용. - --include_pairwise 시 pairwise 변환 데이터가 train.csv에 자동 합쳐짐 - 전체 파일(configs, scripts, guide)의 경로 참조를 train.csv/val.csv로 통일 https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
…on-2iFB6 Simplify data pipeline: unify output to train.csv/val.csv
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- train/grad_norm: gradient norm per step (학습 안정성 모니터링) - train/lr: learning rate per step - val/pred_distribution, val/target_distribution: 히스토그램 (score collapse 감지) - val/pred_std, val/pred_mean: 예측값 통계 - --eval_interval N: 에폭 중간 validation (N step마다, val_mid/* 로깅) - 콘솔 로그에 grad_norm 추가 https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
…on-2iFB6 Add enhanced TensorBoard logging to finetune_lora.py
Step-level: train/step_loss, train/grad_norm, train/lr Epoch-level performance: val/pearson, spearman, kendall, mse, mae Collapse detection: collapse/pred_std, std_ratio, pred_range, pred_iqr Score bias: bias/pred_mean, target_mean, mean_diff, pred_skewness Distribution: histogram of predictions, targets, errors Quantiles: pred_q25, q50, q75 Mid-epoch validation: --eval_interval N (val_mid/* metrics) https://claude.ai/code/session_012Au123PKF3mnutwZNGo52j
…on-2iFB6 Add comprehensive TensorBoard monitoring for QE fine-tuning
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.