LoRA Tuning Conversational Japanese Large Language Models using Japanese Instruction Dataset
DOI:
https://doi.org/10.51094/jxiv.422Keywords:
Large Language Model (LLM), Japanese, Instruction TuningAbstract
In this study, we performed LoRA tuning on large language models (LLM) based on both Japanese and English using Japanese instruction tuning and evaluated these models from both quantitative and qualitative perspectives.
As a result of the evaluation, the effectiveness of tuning with Japanese instruction data was confirmed.
Furthermore, we clarified the challenges in large-scale language models and language resources in Japanese, such as the need for evaluation using a wide range of instruction data and the actual output strings of the models.
Conflicts of Interest Disclosure
The authors declare no conflict of interest.Downloads *Displays the aggregated results up to the previous day.
References
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, and I. Polosukhin, "Attention Is All You Need," Advances in Neural Information Processing Systems, vol.30, pp.5999–6009, 2017.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, pp.4171–4186, Association for Computational Linguistics, 2019.
Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, and P.G. Allen, "RoBERTa: A Robustly Optimized BERT Pretraining Approach," 2019. https://arxiv.org/abs/1907.11692
A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving Language Understanding by Generative Pre-Training," 2018. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language Models are Unsupervised Multitask Learners," 2019. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
T. Brown, B. Mann, N. Ryder, M. Subbiah, J.D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, "Language Models are Few-Shot Learners," Advances in Neural Information Processing Systems, vol.33, pp.1877–1901, 2020.
S. Zhang, S. Roller, N. Goyal, M. Artetxe, M. Chen, S. Chen, C. Dewan, M. Diab, X. Li, X.V. Lin, T. Mihaylov, M. Ott, S. Shleifer, K. Shuster, D. Simig, P.S. Koura, A. Sridhar, T. Wang, and L. Zettlemoyer, "OPT: Open Pre-trained Transformer Language Models," 2022. https://arxiv.org/abs/2205.01068
S. Black, S. Biderman, E. Hallahan, Q. Anthony, L. Gao, L. Golding, H. He, C. Leahy, K. McDonell, J. Phang, M. Pieler, U.S. Prashanth, S. Purohit, L. Reynolds, J. Tow, B. Wang, and S. Weinbach, "GPT-NeoX-20B: An open-source autoregressive language model," Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models, pp.95–136, Association for Computational Linguistics, 2022. https://aclanthology.org/2022.bigscience-1.9
Y. Tay, M. Dehghani, V.Q. Tran, X. Garcia, J. Wei, X. Wang, H.W. Chung, D. Bahri, T. Schuster, S. Zheng, D. Zhou, N. Houlsby, and D. Metzler, "UL2: Unifying Language Learning Paradigms," The Eleventh International Conference on Learning Representations, 2023. https://openreview.net/forum?id=6ruVLB727MC
A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H.W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. Shi, S. Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y. Tay, N. Shazeer, V. Prabhakaran, E. Reif, N. Du, B. Hutchinson, R. Pope, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. Yin, T. Duke, A. Levskaya, S. Ghemawat, S. Dev, H. Michalewski, X. Garcia, V. Misra, K. Robinson, L. Fedus, D. Zhou, D. Ippolito, D. Luan, H. Lim, B. Zoph, A. Spiridonov, R. Sepassi, D. Dohan, S. Agrawal, M. Omernick, A.M. Dai, T.S. Pillai, M. Pellat, A. Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee, Z. Zhou, X. Wang, B. Saeta, M. Diaz, O. Firat, M. Catasta, J. Wei, K. Meier-Hellstern, D. Eck, J. Dean, S. Petrov, and N. Fiedel, "PaLM: Scaling Language Modeling with Pathways," 2022. https://arxiv.org/abs/2204.02311
T.L. Scao, A. Fan, C. Akiki, E. Pavlick, S. Ili ́c, D. Hesslow, R. Castagn ́e, A.S. Luccioni, F. Yvon, M. Gall ́e, et al., "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model," 2022. https://arxiv.org/abs/2211.05100
S. Biderman, H. Schoelkopf, Q. Anthony, H. Bradley, K. O’Brien, E. Hallahan, M.A. Khan, S. Purohit, U.S. Prashanth, E. Raff, A. Skowron, L. Sutawika, and O. van derWal, "Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling," 2023. https://arxiv.org/abs/2304.01373
H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi`ere, N. Goyal, E. Hambro, F. Azhar, et al., "LLaMA: Open and Efficient Foundation Language Models," 2023. https://arxiv.org/abs/2302.13971
J. Wei, M. Bosma, V. Zhao, K. Guu, A.W. Yu, B. Lester, N. Du, A.M. Dai, and Q.V. Le, "Finetuned language models are zero-shot learners," International Conference on Learning Representations, 2022. https://openreview.net/forum?id=gEZrGCozdqR
Databricks, "Dolly," https://github.com/databrickslabs/dolly, 2023.
Vicuna, "Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality," https://vicuna.lmsys.org/, 2023.
R. Taori, I. Gulrajani, T. Zhang, Y. Dubois, X. Li, C. Guestrin, P. Liang, and T.B. Hashimoto, "Stanford Alpaca: An Instruction following LLaMA model," https://github.com/tatsu-labstanford_alpaca, 2023.
平野正徳,鈴木雅弘,坂地泰紀,"llm-japanese-dataset v0: 大規模言語モデルのための日本語チャットデータセット構築," 2023. https://doi.org/10.51094/jxiv.383
S. Black, S. Biderman, E. Hallahan, Q. Anthony, L. Gao, L. Golding, H. He, C. Leahy, K. McDonell, J. Phang, M. Pieler, U.S. Prashanth, S. Purohit, L. Reynolds, J. Tow, B. Wang, and S. Weinbach, "GPT-NeoX-20B: An open-source autoregressive language model," Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models, pp.95–136, Association for Computational Linguistics, 2022. https://aclanthology.org/2022.bigscience-1.9
E.J. Hu, yelongshen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, "LoRA: Low-Rank Adaptation of Large Language Models," International Conference on Learning Representations, pp.1–13, 2022. https://arxiv.org/abs/2106.09685
S. Mangrulkar, S. Gugger, L. Debut, Y. Belkada, and S. Paul, "PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods," https://github.com/huggingface/peft, 2022.
S. Rajbhandari, J. Rasley, O. Ruwase, and Y. He, "ZeRO: Memory Optimizations toward Training Trillion Parameter Models," SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1–16, 2020.
N.S. Keskar, B. McCann, L.R. Varshney, C. Xiong, and R. Socher, "CTRL: A conditional transformer language model for controllable generation," 2019. http://arxiv.org/abs/1909.05858
F. Jelinek,R.L. Mercer,L.R. Bahl, and J.K. Baker, "Perplexity―a measure of the difficulty of speech recognition tasks," The Journal of the Acoustical Society of America,vol.62,no.S1,pp.S63–S63,1977.
N. Shimizu, N. Rong, and T. Miyazaki, "Visual question answering dataset for bilingual image understanding: A study of cross-lingual transfer using attention maps," Proceedings of the 27th International Conference on Computational Linguistics, pp.1918–1928, Association for Computational Linguistics, 2018. http://aclweb.org/anthology/C18-1163
K. Kurihara, D. Kawahara, and T. Shibata, "JGLUE: Japanese General Language Understanding Evaluation," Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp.2957–2966, 2022.
P. Keung, Y. Lu, G. Szarvas, and N.A. Smith, "The multilingual amazon reviews corpus," Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020.
L. Gao, J. Tow, S. Biderman, S. Black, A. DiPofi, C. Foster, L. Golding, J. Hsu, K. McDonell, N. Muennighoff, J. Phang, L. Reynolds, E. Tang, A. Thite, B. Wang, K. Wang, and A. Zou, "A framework for few-shot language model evaluation," 2021. https://doi.org/10.5281/zenodo.5371628
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Gray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, and R. Lowe, "Training language models to follow instructions with human feedback," Advances in Neural Information Processing Systems, eds. by A.H. Oh, A. Agarwal, D. Belgrave, and K. Cho, 2022. https://openreview.net/forum?id=TG8KACxEON
B. Peng, C. Li, P. He, M. Galley, and J. Gao, "Instruction Tuning with GPT-4," 2023. https://arxiv.org/abs/2304.03277
C. Zhou, P. Liu, P. Xu, S. Iyer, J. Sun, Y. Mao, X. Ma, A. Efrat, P. Yu, L. Yu, S. Zhang, G. Ghosh, M. Lewis, L. Zettlemoyer, and O. Levy, "LIMA: Less Is More for Alignment," 2023. https://arxiv.org/abs/2305.11206
Downloads
Posted
Submitted: 2023-06-21 05:30:52 UTC
Published: 2023-06-23 00:11:37 UTC
License
Copyright (c) 2023
Masahiro Suzuki
Masanori Hirano
Hiroki Sakaji
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.