プレプリント / バージョン1

日本語インストラクションデータを用いた対話可能な日本語大規模言語モデルのLoRAチューニング

##article.authors##

DOI:

https://doi.org/10.51094/jxiv.422

キーワード:

大規模言語モデル、 日本語、 インストラクションチューニング

抄録

本研究では,日本語インストラクションデータを用い,日本語と英語のそれぞれをベースにした大規模言語モデル (LLM) に対してLoRAチューニングを行った.チューニングしたモデルに対し定量と定性による両面から評価を行い,日本語インストラクションデータによるチューニングの効果を確認した.また幅広いインストラクションデータや実際のモデルが出力した文字列による評価の必要性など,日本語における大規模言語モデルや言語資源における課題を明らかにした.

利益相反に関する開示

利益相反事項はありません.

ダウンロード *前日までの集計結果を表示します

ダウンロード実績データは、公開の翌日以降に作成されます。

引用文献

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, and I. Polosukhin, "Attention Is All You Need," Advances in Neural Information Processing Systems, vol.30, pp.5999–6009, 2017.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, pp.4171–4186, Association for Computational Linguistics, 2019.

Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, and P.G. Allen, "RoBERTa: A Robustly Optimized BERT Pretraining Approach," 2019. https://arxiv.org/abs/1907.11692

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving Language Understanding by Generative Pre-Training," 2018. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language Models are Unsupervised Multitask Learners," 2019. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

T. Brown, B. Mann, N. Ryder, M. Subbiah, J.D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, "Language Models are Few-Shot Learners," Advances in Neural Information Processing Systems, vol.33, pp.1877–1901, 2020.

S. Zhang, S. Roller, N. Goyal, M. Artetxe, M. Chen, S. Chen, C. Dewan, M. Diab, X. Li, X.V. Lin, T. Mihaylov, M. Ott, S. Shleifer, K. Shuster, D. Simig, P.S. Koura, A. Sridhar, T. Wang, and L. Zettlemoyer, "OPT: Open Pre-trained Transformer Language Models," 2022. https://arxiv.org/abs/2205.01068

S. Black, S. Biderman, E. Hallahan, Q. Anthony, L. Gao, L. Golding, H. He, C. Leahy, K. McDonell, J. Phang, M. Pieler, U.S. Prashanth, S. Purohit, L. Reynolds, J. Tow, B. Wang, and S. Weinbach, "GPT-NeoX-20B: An open-source autoregressive language model," Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models, pp.95–136, Association for Computational Linguistics, 2022. https://aclanthology.org/2022.bigscience-1.9

Y. Tay, M. Dehghani, V.Q. Tran, X. Garcia, J. Wei, X. Wang, H.W. Chung, D. Bahri, T. Schuster, S. Zheng, D. Zhou, N. Houlsby, and D. Metzler, "UL2: Unifying Language Learning Paradigms," The Eleventh International Conference on Learning Representations, 2023. https://openreview.net/forum?id=6ruVLB727MC

A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H.W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. Shi, S. Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y. Tay, N. Shazeer, V. Prabhakaran, E. Reif, N. Du, B. Hutchinson, R. Pope, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. Yin, T. Duke, A. Levskaya, S. Ghemawat, S. Dev, H. Michalewski, X. Garcia, V. Misra, K. Robinson, L. Fedus, D. Zhou, D. Ippolito, D. Luan, H. Lim, B. Zoph, A. Spiridonov, R. Sepassi, D. Dohan, S. Agrawal, M. Omernick, A.M. Dai, T.S. Pillai, M. Pellat, A. Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee, Z. Zhou, X. Wang, B. Saeta, M. Diaz, O. Firat, M. Catasta, J. Wei, K. Meier-Hellstern, D. Eck, J. Dean, S. Petrov, and N. Fiedel, "PaLM: Scaling Language Modeling with Pathways," 2022. https://arxiv.org/abs/2204.02311

T.L. Scao, A. Fan, C. Akiki, E. Pavlick, S. Ili ́c, D. Hesslow, R. Castagn ́e, A.S. Luccioni, F. Yvon, M. Gall ́e, et al., "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model," 2022. https://arxiv.org/abs/2211.05100

S. Biderman, H. Schoelkopf, Q. Anthony, H. Bradley, K. O’Brien, E. Hallahan, M.A. Khan, S. Purohit, U.S. Prashanth, E. Raff, A. Skowron, L. Sutawika, and O. van derWal, "Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling," 2023. https://arxiv.org/abs/2304.01373

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi`ere, N. Goyal, E. Hambro, F. Azhar, et al., "LLaMA: Open and Efficient Foundation Language Models," 2023. https://arxiv.org/abs/2302.13971

J. Wei, M. Bosma, V. Zhao, K. Guu, A.W. Yu, B. Lester, N. Du, A.M. Dai, and Q.V. Le, "Finetuned language models are zero-shot learners," International Conference on Learning Representations, 2022. https://openreview.net/forum?id=gEZrGCozdqR

Databricks, "Dolly," https://github.com/databrickslabs/dolly, 2023.

Vicuna, "Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality," https://vicuna.lmsys.org/, 2023.

R. Taori, I. Gulrajani, T. Zhang, Y. Dubois, X. Li, C. Guestrin, P. Liang, and T.B. Hashimoto, "Stanford Alpaca: An Instruction following LLaMA model," https://github.com/tatsu-labstanford_alpaca, 2023.

平野正徳,鈴木雅弘,坂地泰紀,"llm-japanese-dataset v0: 大規模言語モデルのための日本語チャットデータセット構築," 2023. https://doi.org/10.51094/jxiv.383

S. Black, S. Biderman, E. Hallahan, Q. Anthony, L. Gao, L. Golding, H. He, C. Leahy, K. McDonell, J. Phang, M. Pieler, U.S. Prashanth, S. Purohit, L. Reynolds, J. Tow, B. Wang, and S. Weinbach, "GPT-NeoX-20B: An open-source autoregressive language model," Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models, pp.95–136, Association for Computational Linguistics, 2022. https://aclanthology.org/2022.bigscience-1.9

E.J. Hu, yelongshen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, "LoRA: Low-Rank Adaptation of Large Language Models," International Conference on Learning Representations, pp.1–13, 2022. https://arxiv.org/abs/2106.09685

S. Mangrulkar, S. Gugger, L. Debut, Y. Belkada, and S. Paul, "PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods," https://github.com/huggingface/peft, 2022.

S. Rajbhandari, J. Rasley, O. Ruwase, and Y. He, "ZeRO: Memory Optimizations toward Training Trillion Parameter Models," SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1–16, 2020.

N.S. Keskar, B. McCann, L.R. Varshney, C. Xiong, and R. Socher, "CTRL: A conditional transformer language model for controllable generation," 2019. http://arxiv.org/abs/1909.05858

F. Jelinek,R.L. Mercer,L.R. Bahl, and J.K. Baker, "Perplexity―a measure of the difficulty of speech recognition tasks," The Journal of the Acoustical Society of America,vol.62,no.S1,pp.S63–S63,1977.

N. Shimizu, N. Rong, and T. Miyazaki, "Visual question answering dataset for bilingual image understanding: A study of cross-lingual transfer using attention maps," Proceedings of the 27th International Conference on Computational Linguistics, pp.1918–1928, Association for Computational Linguistics, 2018. http://aclweb.org/anthology/C18-1163

K. Kurihara, D. Kawahara, and T. Shibata, "JGLUE: Japanese General Language Understanding Evaluation," Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp.2957–2966, 2022.

P. Keung, Y. Lu, G. Szarvas, and N.A. Smith, "The multilingual amazon reviews corpus," Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020.

L. Gao, J. Tow, S. Biderman, S. Black, A. DiPofi, C. Foster, L. Golding, J. Hsu, K. McDonell, N. Muennighoff, J. Phang, L. Reynolds, E. Tang, A. Thite, B. Wang, K. Wang, and A. Zou, "A framework for few-shot language model evaluation," 2021. https://doi.org/10.5281/zenodo.5371628

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Gray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, and R. Lowe, "Training language models to follow instructions with human feedback," Advances in Neural Information Processing Systems, eds. by A.H. Oh, A. Agarwal, D. Belgrave, and K. Cho, 2022. https://openreview.net/forum?id=TG8KACxEON

B. Peng, C. Li, P. He, M. Galley, and J. Gao, "Instruction Tuning with GPT-4," 2023. https://arxiv.org/abs/2304.03277

C. Zhou, P. Liu, P. Xu, S. Iyer, J. Sun, Y. Mao, X. Ma, A. Efrat, P. Yu, L. Yu, S. Zhang, G. Ghosh, M. Lewis, L. Zettlemoyer, and O. Levy, "LIMA: Less Is More for Alignment," 2023. https://arxiv.org/abs/2305.11206

ダウンロード

公開済


投稿日時: 2023-06-21 05:30:52 UTC

公開日時: 2023-06-23 00:11:37 UTC
研究分野
一般工学・総合工学