PLaMo Translate: 翻訳特化大規模言語モデルの開発

Kentaro Imajo; Masanori Hirano; Kento Nozawa; Kaizaburo Chubachi

doi:10.51094/jxiv.1461

##article.authors##

Kentaro Imajo Preferred Networks, Inc.
Masanori Hirano Preferred Networks, Inc. https://orcid.org/0000-0001-5883-8250 https://mhirano.jp
Kento Nozawa Preferred Networks, Inc.
Kaizaburo Chubachi Preferred Networks, Inc.

DOI:

https://doi.org/10.51094/jxiv.1461

Keywords:

LLM, translation, MT, SFT, Iterative DPO

Abstract

While the development of large language models (LLMs) has dramatically improved performance in natural language processing tasks, optimizing models specifically for translation tasks remains an ongoing challenge.
In this study, we propose "plamo-2-translate," a large language model specialized for Japanese-English translation tasks.
Our proposed model achieves fluent and contextually appropriate translations by combining: specialized input/output control using a dedicated format, fine-tuning with parallel corpora and synthetic data, and optimization through Iterative DPO.
Evaluation experiments demonstrate that our model achieves performance comparable to or better than base models and other LLMs across multiple metrics including BLEU, chrF, BERTScore, COMET, and GEMBA-MQM, with particularly significant improvements observed in GEMBA-MQM, which closely aligns with human evaluation standards.
Furthermore, the model incorporates features such as style specification and context preservation to cater to diverse translation requirements.
The model developed in this study has been made publicly available through Huggingface, with additional releases in various formats currently underway.

Conflicts of Interest Disclosure

There are no conflicts of interest to disclose.

Downloads *Displays the aggregated results up to the previous day.

Download data is not yet available.

References

Brown, T. B.: Language models are few-shot learners, arXiv preprint arXiv:2005.14165 (2020).

DeepSeek-AI: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (2025).

Dong, H., Xiong, W., Pang, B., Wang, H., Zhao, H., Zhou, Y., Jiang, N., Sahoo, D., Xiong, C. and Zhang, T.: RLHF Workflow: From Reward Modeling to Online RLHF, Transactions on Machine Learning Research, (online), available from ⟨https://openreview.net/forum?id=a13aYUU9eU⟩ (2024).

Juraska, J., Deutsch, D., Finkelstein, M. and Freitag, M.: MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task, Proceedings of the Ninth Conference on Machine Translation (Haddow, B., Kocmi, T., Koehn, P. and Monz, C., eds.), Miami, Florida, USA, Association for Computational Linguistics, pp. 492–504 (online), available from ⟨https://aclanthology.org/2024.wmt-1.35⟩ (2024).

Kocmi, T. and Federmann, C.: GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4, Proceedings of the Eighth Conference on Machine Translation, Singapore, Association for Computational Linguistics, pp. 768–775 (online), DOI: 10.18653/v1/2023.wmt-1.64 (2023).

OpenAI: ChatGPT (2023).

OpenAI: GPT-4 Technical Report (2023).

Papineni, K., Roukos, S., Ward, T. and Zhu, W.-J.: Bleu: a method for automatic evaluation of machine translation, Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318 (2002).

Popovi´c, M.: chrF: character n-gram F-score for automatic MT evaluation, Proceedings of the tenth workshop on statistical machine translation, pp. 392–395 (2015).

Rafailov, R., Sharma, A., Mitchell, E., Manning, C. D., Ermon, S. and Finn, C.: Direct preference optimization: Your language model is secretly a reward model, Advances in neural information processing systems, Vol. 36, pp. 53728–53741 (2023).

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W. and Liu, P. J.: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, Journal of Machine Learning Research, Vol. 21, No. 140, pp. 1–67 (online), available from ⟨http://jmlr.org/papers/v21/20-074.html⟩ (2020).

Rei, R., Stewart, C., Farinha, A. C. and Lavie, A.: COMET: A Neural Framework for MT Evaluation, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, pp. 2685–2702 (online), DOI: 10.18653/v1/2020.emnlp-main.213 (2020).

Sutskever, I., Vinyals, O. and Le, Q. V.: Sequence to sequence learning with neural networks, Advances in neural information processing systems, Vol. 27 (2014).

Suzuki, M., Sakaji, H., Hirano, M. and Izumi, K.: Constructing and analyzing domain-specific language model for financial text mining, Information Processing & Management, Vol. 60, No. 2, p. 103194 (online), DOI: 10.1016/j.ipm.2022.103194 (2023).

Suzuki, M., Sakaji, H., Hirano, M. and Izumi, K.: FinDeBERTaV2: Word-Segmentation-Free Pre-trained Language Model for Finance, Transactions of the Japanese Society for Artificial Intelligence, Vol. 39, No. 4, pp. FIN23–G 1–14 (online), DOI: 10.1527/tjsai.39-4 FIN23-G (2024).

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. and Polosukhin, I.: Attention Is All You Need, Advances in Neural Information Processing Systems, Vol. 30, pp. 5999–6009 (2017).

Xu, J., Lee, A., Sukhbaatar, S. and Weston, J.: Some things are more cringe than others: Iterative preference optimization with the pairwise cringe loss, arXiv preprint arXiv:2312.16682 (2023).

Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q. and Artzi, Y.: BERTScore: Evaluating Text Generation with BERT, International Conference on Learning Representations, (online), available from ⟨https://openreview.net/forum?id=SkeHuCVFDr⟩ (2020).

須藤克仁，小町守，梶原智之，三田雅人： NLP2025 ワークショップ：LLM 時代のことばの評価の現在と未来，自然言語処理， Vol. 32, No. 2, pp. 738–745（オンライン），DOI: 10.5715/jnlp.32.738 (2025).

PLaMo Translate: Developing a Large Language Model Specialized for Translation

##article.authors##

DOI:

Keywords:

Abstract

Conflicts of Interest Disclosure

Downloads *Displays the aggregated results up to the previous day.

References

Downloads

Posted

License

Language