Update README.md
Browse files
README.md
CHANGED
|
@@ -13,7 +13,7 @@ base_model:
|
|
| 13 |
|
| 14 |
Tiny Language Model For Japanese and English Bidirectional Translation
|
| 15 |
|
| 16 |
-
- **Purrs on your lap** 🐱: Small and efficient! 0.8-
|
| 17 |
- **Swift and Feline Sharp** 🐾: Beats TranslateGemma-12B on text-to-text translation quality.
|
| 18 |
- **Adopt and adapt** 🐈: Open source (MIT License) models you can customize and extend.
|
| 19 |
|
|
@@ -28,6 +28,7 @@ All models are available on Hugging Face:
|
|
| 28 |
- [CAT-Translate-0.8B](https://huggingface.co/cyberagent/CAT-Translate-0.8b/)
|
| 29 |
- [CAT-Translate-1.4B](https://huggingface.co/cyberagent/CAT-Translate-1.4b/)
|
| 30 |
- [CAT-Translate-3.3B](https://huggingface.co/cyberagent/CAT-Translate-3.3b/)
|
|
|
|
| 31 |
|
| 32 |
## Evaluation
|
| 33 |
|
|
@@ -44,22 +45,24 @@ We conducted evaluation on the translation subsets of the following benchmarks:
|
|
| 44 |
We chose these tasks as benchmarks because (1) they are derived from real world applications and (2) are less overoptimized compared to popular datasets (e.g., WMT).
|
| 45 |
|
| 46 |
The results are below.
|
| 47 |
-
|
| 48 |
-
The 0.8B, 1.4B, and 3.3B-beta models achieved the best scores among all models (including closed source) within their respective sizes for both En-Ja and Ja-En translation tasks.
|
| 49 |
|
| 50 |
|
| 51 |
| Model | Avg. BLEU | Avg. BLEU Ja->En | Avg. BLEU En->Ja | BSD (Ja-En) | Court (Ja-En) | JMed (Ja-En) | PFMT (Ja-En) | wat-pat-2025 (Ja-En) | BSD (En-Ja) | JMed (En-Ja) | PFMT (En-Ja) | wat-pat-2025 (En-Ja) |
|
| 52 |
|:-------------------------------------------------|----------:|-----------------:|-----------------:|------------:|--------------:|-------------:|-------------:|------------------:|------------:|-------------:|-------------:|------------------:|
|
|
|
|
|
|
|
| 53 |
| CyberAgent/CAT-Translate-1.4B | 33.73 | 33.26 | 34.19 | 31.28 | 43.84 | 24.08 | 36.55 | 30.57 | 15.71 | 26.92 | 51.53 | 42.58 |
|
| 54 |
| Unbabel/Tower-Plus-9B | 32.41 | 36.84 | 27.99 | 15.43 | 40.54 | 29.13 | 58.00 | 41.10 | 10.00 | 18.80 | 53.00 | 30.16 |
|
| 55 |
| google/translategemma-12b-it | 32.24 | 35.81 | 28.68 | 31.58 | 34.30 | 23.46 | 48.75 | 40.97 | 15.92 | 21.79 | 52.53 | 24.47 |
|
| 56 |
-
| CyberAgent/CAT-Translate-3.3B
|
| 57 |
| CyberAgent/CAT-Translate-0.8B | 30.42 | 29.71 | 30.68 | 29.63 | 33.19 | 22.96 | 32.51 | 30.56 | 14.60 | 26.22 | 50.62 | 32.87 |
|
| 58 |
| google/translategemma-4b-it | 28.09 | 29.41 | 26.76 | 28.86 | 25.89 | 21.50 | 42.65 | 28.16 | 14.14 | 20.68 | 51.99 | 20.23 |
|
| 59 |
| LiquidAI/LFM2.5-1.2B-JP | 25.47 | 24.51 | 26.43 | 19.06 | 29.99 | 22.10 | 43.61 | 7.80 | 14.57 | 23.85 | 54.77 | 12.54 |
|
| 60 |
| pfnet/plamo-2-translate | 25.24 | 25.92 | 24.57 | 25.55 | 28.63 | 22.90 | 29.02 | 23.48 | 17.35 | 24.98 | 32.04 | 23.89 |
|
| 61 |
| LiquidAI/LFM2-350M-ENJP-MT | 24.95 | 24.91 | 25.00 | 10.94 | 29.56 | 21.48 | 41.40 | 21.17 | 8.11 | 22.84 | 47.53 | 21.52 |
|
| 62 |
| mistralai/Ministral-8B-Instruct-2410 | 24.12 | 27.52 | 20.71 | 19.23 | 29.21 | 16.25 | 50.23 | 22.69 | 12.91 | 16.49 | 41.66 | 11.80 |
|
|
|
|
| 63 |
| Rakuten/RakutenAI-2.0-mini-instruct | 18.43 | 17.24 | 19.62 | 0.11 | 30.62 | 18.21 | 29.34 | 7.90 | 5.19 | 20.36 | 45.70 | 7.23 |
|
| 64 |
| SakanaAI/TinySwallow-1.5B-Instruct | 15.74 | 14.99 | 16.49 | 4.96 | 18.93 | 15.83 | 26.67 | 8.58 | 6.30 | 17.58 | 34.07 | 8.00 |
|
| 65 |
| llm-jp/llm-jp-3.1-1.8b-instruct4 | 15.18 | 16.26 | 14.11 | 18.82 | 2.44 | 15.67 | 30.65 | 13.72 | 15.38 | 4.91 | 25.47 | 10.65 |
|
|
|
|
| 13 |
|
| 14 |
Tiny Language Model For Japanese and English Bidirectional Translation
|
| 15 |
|
| 16 |
+
- **Purrs on your lap** 🐱: Small and efficient! 0.8-7B models that run on edge devices.
|
| 17 |
- **Swift and Feline Sharp** 🐾: Beats TranslateGemma-12B on text-to-text translation quality.
|
| 18 |
- **Adopt and adapt** 🐈: Open source (MIT License) models you can customize and extend.
|
| 19 |
|
|
|
|
| 28 |
- [CAT-Translate-0.8B](https://huggingface.co/cyberagent/CAT-Translate-0.8b/)
|
| 29 |
- [CAT-Translate-1.4B](https://huggingface.co/cyberagent/CAT-Translate-1.4b/)
|
| 30 |
- [CAT-Translate-3.3B](https://huggingface.co/cyberagent/CAT-Translate-3.3b/)
|
| 31 |
+
- [CAT-Translate-7B](https://huggingface.co/cyberagent/CAT-Translate-7b/)
|
| 32 |
|
| 33 |
## Evaluation
|
| 34 |
|
|
|
|
| 45 |
We chose these tasks as benchmarks because (1) they are derived from real world applications and (2) are less overoptimized compared to popular datasets (e.g., WMT).
|
| 46 |
|
| 47 |
The results are below.
|
| 48 |
+
All the models achieved the best scores among all models (including closed source) within their respective sizes for both En-Ja and Ja-En translation tasks.
|
|
|
|
| 49 |
|
| 50 |
|
| 51 |
| Model | Avg. BLEU | Avg. BLEU Ja->En | Avg. BLEU En->Ja | BSD (Ja-En) | Court (Ja-En) | JMed (Ja-En) | PFMT (Ja-En) | wat-pat-2025 (Ja-En) | BSD (En-Ja) | JMed (En-Ja) | PFMT (En-Ja) | wat-pat-2025 (En-Ja) |
|
| 52 |
|:-------------------------------------------------|----------:|-----------------:|-----------------:|------------:|--------------:|-------------:|-------------:|------------------:|------------:|-------------:|-------------:|------------------:|
|
| 53 |
+
| CyberAgent/CAT-Translate-7B | 37.68 | 41.06 | 34.31 | 33.75 | 45.29 | 30.65 | 49.86 | 45.74 | 16.29 | 29.62 | 52.94 | 38.37 |
|
| 54 |
+
| CyberAgent/CAT-Translate-3.3B | 36.16 | 37.51 | 34.80 | 26.51 | 42.44 | 24.47 | 49.93 | 44.23 | 17.21 | 28.67 | 53.88 | 39.44 |
|
| 55 |
| CyberAgent/CAT-Translate-1.4B | 33.73 | 33.26 | 34.19 | 31.28 | 43.84 | 24.08 | 36.55 | 30.57 | 15.71 | 26.92 | 51.53 | 42.58 |
|
| 56 |
| Unbabel/Tower-Plus-9B | 32.41 | 36.84 | 27.99 | 15.43 | 40.54 | 29.13 | 58.00 | 41.10 | 10.00 | 18.80 | 53.00 | 30.16 |
|
| 57 |
| google/translategemma-12b-it | 32.24 | 35.81 | 28.68 | 31.58 | 34.30 | 23.46 | 48.75 | 40.97 | 15.92 | 21.79 | 52.53 | 24.47 |
|
| 58 |
+
| CyberAgent/CAT-Translate-3.3B-beta | 30.60 | 30.32 | 30.88 | 17.20 | 38.65 | 23.96 | 40.58 | 31.22 | 16.63 | 26.68 | 53.40 | 26.80 |
|
| 59 |
| CyberAgent/CAT-Translate-0.8B | 30.42 | 29.71 | 30.68 | 29.63 | 33.19 | 22.96 | 32.51 | 30.56 | 14.60 | 26.22 | 50.62 | 32.87 |
|
| 60 |
| google/translategemma-4b-it | 28.09 | 29.41 | 26.76 | 28.86 | 25.89 | 21.50 | 42.65 | 28.16 | 14.14 | 20.68 | 51.99 | 20.23 |
|
| 61 |
| LiquidAI/LFM2.5-1.2B-JP | 25.47 | 24.51 | 26.43 | 19.06 | 29.99 | 22.10 | 43.61 | 7.80 | 14.57 | 23.85 | 54.77 | 12.54 |
|
| 62 |
| pfnet/plamo-2-translate | 25.24 | 25.92 | 24.57 | 25.55 | 28.63 | 22.90 | 29.02 | 23.48 | 17.35 | 24.98 | 32.04 | 23.89 |
|
| 63 |
| LiquidAI/LFM2-350M-ENJP-MT | 24.95 | 24.91 | 25.00 | 10.94 | 29.56 | 21.48 | 41.40 | 21.17 | 8.11 | 22.84 | 47.53 | 21.52 |
|
| 64 |
| mistralai/Ministral-8B-Instruct-2410 | 24.12 | 27.52 | 20.71 | 19.23 | 29.21 | 16.25 | 50.23 | 22.69 | 12.91 | 16.49 | 41.66 | 11.80 |
|
| 65 |
+
| nvidia/NVIDIA-Nemotron-Nano-9B-v2-Japanese | 22.97 | 22.77 | 23.18 | 9.62 | 34.98 | 18.01 | 38.44 | 12.81 | 10.62 | 20.41 | 42.55 | 19.13 |
|
| 66 |
| Rakuten/RakutenAI-2.0-mini-instruct | 18.43 | 17.24 | 19.62 | 0.11 | 30.62 | 18.21 | 29.34 | 7.90 | 5.19 | 20.36 | 45.70 | 7.23 |
|
| 67 |
| SakanaAI/TinySwallow-1.5B-Instruct | 15.74 | 14.99 | 16.49 | 4.96 | 18.93 | 15.83 | 26.67 | 8.58 | 6.30 | 17.58 | 34.07 | 8.00 |
|
| 68 |
| llm-jp/llm-jp-3.1-1.8b-instruct4 | 15.18 | 16.26 | 14.11 | 18.82 | 2.44 | 15.67 | 30.65 | 13.72 | 15.38 | 4.91 | 25.47 | 10.65 |
|