Exploring Open Large Language Models for the Japanese Language: A Practical Guide

Sugimoto, Kaito

doi:10.51094/jxiv.682

##article.authors##

Sugimoto, Kaito Technology Division, Innovation Center, NTT Communications Corporation

DOI:

https://doi.org/10.51094/jxiv.682

キーワード:

Large Language Models、 Japanese Language

抄録

While large language models (LLMs) have demonstrated remarkable capabilities in handling Japanese, they are conventionally trained on English-centric corpora, which may cause a deficiency in understanding and generating Japanese texts. In response, researchers have been actively developing LLMs with a specific focus on Japanese, many of which have been made publicly available. This rapid growth has made it challenging to obtain a comprehensive overview of the developments. To address this issue, this report reviews open LLMs for Japanese, including instruction-tuned models and multimodal models. We also introduce existing LLM evaluation benchmarks for Japanese, aiming to offer a practical guide to choosing the most suitable model. We continually update our work at https://github.com/llm-jp/awesome-japanese-llm.

利益相反に関する開示

The author declares no conflicts of interest associated with this manuscript.

ダウンロード *前日までの集計結果を表示します

ダウンロード実績データは、公開の翌日以降に作成されます。

引用文献

ABEJA. (2022). gpt-neox-japanese-2.7b. https://huggingface.co/abeja/gpt-neox-japanese-2.7b. https://huggingface.co/abeja/gpt-neox-japanese-2.7b

Akiba, T., Shing, M., Tang, Y., Sun, Q., & Ha, D. (2024). Evolutionary Optimization of Model Merging Recipes. CoRR, abs/2403.13187. https://doi.org/10.48550/ARXIV.2403.13187

Bai, J., Bai, S., Chu, Y., Cui, Z., Dang, K., Deng, X., Fan, Y., Ge, W., Han, Y., Huang, F., Hui, B., Ji, L., Li, M., Lin, J., Lin, R., Liu, D., Liu, G., Lu, C., Lu, K., … Zhu, T. (2023). Qwen Technical Report. CoRR, abs/2309.16609. https://doi.org/10.48550/ARXIV.2309.16609

Chang, K., Cramer, M., Soni, S., & Bamman, D. (2023). Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4. In H. Bouamor, J. Pino, & K. Bali (Eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 7312–7327). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.453

Chang, Y., Wang, X., Wang, J., Wu, Y., Zhu, K., Chen, H., Yang, L., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, Y., Yu, P. S., Yang, Q., & Xie, X. (2023). A Survey on Evaluation of Large Language Models. CoRR, abs/2307.03109. https://doi.org/10.48550/ARXIV.2307.03109

Chen, H., Jiao, F., Li, X., Qin, C., Ravaut, M., Zhao, R., Xiong, C., & Joty, S. (2023). ChatGPT’s One-year Anniversary: Are Open-Source Large Language Models Catching up? CoRR, abs/2311.16989. https://doi.org/10.48550/ARXIV.2311.16989

Chen, Z., Handa, H., & Shirahama, K. (2023). JCSE: Contrastive Learning of Japanese Sentence Embeddings and Its Applications. CoRR, abs/2301.08193. https://doi.org/10.48550/ARXIV.2301.08193

Chiang, W.-L., Li, Z., Lin, Z., Sheng, Y., Wu, Z., Zhang, H., Zheng, L., Zhuang, S., Zhuang, Y., Gonzalez, J. E., Stoica, I., & Xing, E. P. (2023). Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. https://lmsys.org/blog/2023-03-30-vicuna/

Chiang, W.-L., Zheng, L., Sheng, Y., Angelopoulos, A. N., Li, T., Li, D., Zhang, H., Zhu, B., Jordan, M., Gonzalez, J. E., & Stoica, I. (2024). Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference.

Clavié, B. (2023). JaColBERT and Hard Negatives, Towards Better Japanese-First Embeddings for Retrieval: Early Technical Report. CoRR, abs/2312.16144. https://doi.org/10.48550/ARXIV.2312.16144

Csaki, Z., Li, B., Li, J., Xu, Q., Pawakapan, P., Zhang, L., Du, Y., Zhao, H., Hu, C., & Thakker, U. (2024). SambaLingo: Teaching Large Language Models New Languages.

Dong, Y., Wang, Z., Sreedhar, M., Wu, X., & Kuchaiev, O. (2023). SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF. In H. Bouamor, J. Pino, & K. Bali (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 11275–11288). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.findings-emnlp.754

Gao, L., Tow, J., Abbasi, B., Biderman, S., Black, S., DiPofi, A., Foster, C., Golding, L., Hsu, J., Le Noac’h, A., Li, H., McDonell, K., Muennighoff, N., Ociepa, C., Phang, J., Reynolds, L., Schoelkopf, H., Skowron, A., Sutawika, L., … Zou, A. (2023). A framework for few-shot language model evaluation (v0.4.0) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.10256836

Gao, T., Yao, X., & Chen, D. (2021). SimCSE: Simple Contrastive Learning of Sentence Embeddings. In M.-F. Moens, X. Huang, L. Specia, & S. W. Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 6894–6910). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.552

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Guo, Q., Wang, M., & Wang, H. (2023). Retrieval-Augmented Generation for Large Language Models: A Survey. CoRR, abs/2312.10997. https://doi.org/10.48550/ARXIV.2312.10997

Han, N., Ueda, N., Otake, M., Katsumata, S., Kamata, K., Kiyomaru, H., Kodama, T., Sugawara, S., Chen, B., Matsuda, H., Miyao, Y., Murawaki, Y., & Ryu, K. (2024). llm-jp-eval. https://github.com/llm-jp/llm-jp-eval. https://github.com/llm-jp/llm-jp-eval

Hayashibe, Y. (2020). Japanese Realistic Textual Entailment Corpus. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 6827–6834). European Language Resources Association. https://aclanthology.org/2020.lrec-1.843

Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., & Steinhardt, J. (2021). Measuring Massive Multitask Language Understanding. 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. https://openreview.net/forum?id=d7KBjmI3GmQ

Hirano, M. (2024). Construction of a Japanese Financial Benchmark for Large Language Models. CoRR, abs/2403.15062. https://doi.org/10.48550/ARXIV.2403.15062

Hirano, M., & Imajo, K. (2024). Construction of Domain-specified Japanese Large Language Model for Finance through Continual Pre-training.

Ho, X., Nguyen, A.-K. D., Dao, T.-A., Jiang, J., Chida, Y., Sugimoto, K., To, H. Q., Boudin, F., & Aizawa, A. (2024). A Survey of Pre-trained Language Models for Processing Scientific Text. CoRR, abs/2401.17824. https://doi.org/10.48550/ARXIV.2401.17824

Hono, Y., Mitsuda, K., Zhao, T., Mitsui, K., Wakatsuki, T., & Sawada, K. (2023). An Integration of Pre-Trained Speech and Language Models for End-to-End Speech Recognition. CoRR, abs/2312.03668. https://doi.org/10.48550/ARXIV.2312.03668

Hornyak, T. (2023). Why Japan is building its own version of ChatGPT. Nature.

Huang, Y., Bai, Y., Zhu, Z., Zhang, J., Zhang, J., Su, T., Liu, J., Lv, C., Zhang, Y., Lei, J., Fu, Y., Sun, M., & He, J. (2023). C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, & S. Levine (Eds.), Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. http://papers.nips.cc/paper%5C_files/paper/2023/hash/c6ec1844bec96d6d32ae95ae694e23d8-Abstract-Datasets%5C_and%5C_Benchmarks.html

Inoue, Y., Sasaki, K., Ochi, Y., Fujii, K., Tanahashi, K., & Yamaguchi, Y. (2024). Heron-Bench: A Benchmark for Evaluating Vision Language Models in Japanese.

Iwakura, T., Komiya, K., & Tachibana, R. (2016). Constructing a Japanese Basic Named Entity Corpus of Various Genres. In X. Duan, R. E. Banchs, M. Zhang, H. Li, & A. Kumaran (Eds.), Proceedings of the Sixth Named Entity Workshop (pp. 41–46). Association for Computational Linguistics. https://doi.org/10.18653/v1/W16-2706

Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., de Las Casas, D., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L. R., Lachaux, M.-A., Stock, P., Scao, T. L., Lavril, T., Wang, T., Lacroix, T., & Sayed, W. E. (2023). Mistral 7B. CoRR, abs/2310.06825. https://doi.org/10.48550/ARXIV.2310.06825

Jinnai, Y. (2024). calm2-7b-chat-dpo-experimental. https://huggingface.co/cyberagent/calm2-7b-chat-dpo-experimental. https://huggingface.co/cyberagent/calm2-7b-chat-dpo-experimental

Kamata, K. (2023). Nejumi LLM Leaderboard. https://api.wandb.ai/links/wandb-japan/xm2pju5m. https://api.wandb.ai/links/wandb-japan/xm2pju5m

Kandpal, N., Deng, H., Roberts, A., Wallace, E., & Raffel, C. (2023). Large Language Models Struggle to Learn Long-Tail Knowledge. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, & J. Scarlett (Eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA (Vol. 202, pp. 15696–15707). PMLR. https://proceedings.mlr.press/v202/kandpal23a.html

KARAKURI. (2024). KARAKURI LM. https://huggingface.co/karakuri-ai/karakuri-lm-70b-chat-v0.1. https://huggingface.co/karakuri-ai/karakuri-lm-70b-chat-v0.1

Kawazoe, A., Tanaka, R., Mineshima, K., & Bekki, D. (2015). An Inference Problem Set for Evaluating Semantic Theories and Semantic Processing Systems for Japanese. In M. Otake, S. Kurahashi, Y. Ota, K. Satoh, & D. Bekki (Eds.), New Frontiers in Artificial Intelligence - JSAI-isAI 2015 Workshops, LENLS, JURISIN, AAA, HAT-MASH, TSDAA, ASD-HR, and SKL, Kanagawa, Japan, November 16-18, 2015, Revised Selected Papers (Vol. 10091, pp. 58–65). https://doi.org/10.1007/978-3-319-50953-2_5

Khattab, O., & Zaharia, M. (2020). ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. In J. X. Huang, Y. Chang, X. Cheng, J. Kamps, V. Murdock, J.-R. Wen, & Y. Liu (Eds.), Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020 (pp. 39–48). ACM. https://doi.org/10.1145/3397271.3401075

Kiyomaru, H., Matsuda, H., Suzuki, J., Han, N., Sugawara, S., Sasaki, S., Kurita, S., Nakamura, T., Kodama, T., & Okamoto, T. (2024). llm-jp-13b-dpo-lora-hh_rlhf_ja-v1.1. https://huggingface.co/llm-jp/llm-jp-13b-dpo-lora-hh_rlhf_ja-v1.1. https://huggingface.co/llm-jp/llm-jp-13b-dpo-lora-hh_rlhf_ja-v1.1

Kudo, T. (2018). Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates. In I. Gurevych & Y. Miyao (Eds.), Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 66–75). Association for Computational Linguistics. https://doi.org/10.18653/v1/P18-1007

Kunitsu, Y. (2023). The potential of GPT-4 as a support tool for pharmacists: analytical study using the Japanese national examination for pharmacists. JMIR Medical Education, 9, e48452.

Kuniyoshi, S. (2023a). kunishou/databricks-dolly-15k-ja. https://huggingface.co/datasets/kunishou/databricks-dolly-15k-ja. https://huggingface.co/datasets/kunishou/databricks-dolly-15k-ja

Kuniyoshi, S. (2023b). kunishou/oasst1-89k-ja. https://huggingface.co/datasets/kunishou/oasst1-89k-ja. https://huggingface.co/datasets/kunishou/oasst1-89k-ja

Kurihara, K., Kawahara, D., & Shibata, T. (2022). JGLUE: Japanese General Language Understanding Evaluation. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 2957–2966). European Language Resources Association. https://aclanthology.org/2022.lrec-1.317

Lee, M., Nakamura, F., Shing, M., McCann, P., Akiba, T., & Orii, N. (2023a). Japanese StableLM Base Beta 70B. https://huggingface.co/stabilityai/japanese-stablelm-base-beta-70b. https://huggingface.co/stabilityai/japanese-stablelm-base-beta-70b

Lee, M., Nakamura, F., Shing, M., McCann, P., Akiba, T., & Orii, N. (2023b). Japanese StableLM Base Gamma 7B. https://huggingface.co/stabilityai/japanese-stablelm-base-gamma-7b. https://huggingface.co/stabilityai/japanese-stablelm-base-gamma-7b

Levine, A., Huang, C., Wang, C., Batista, E., Szymanska, E., Ding, H., Chou, H. W., Pessiot, J.-F., Effendi, J., Chiu, J., Ohlhus, K. T., Chopra, K., Shinzato, K., Murakami, K., Xiong, L., Chen, L., Kubota, M., Tkachenko, M., Lee, M., … Higashiyama, Y. (2024). RakutenAI-7B: Extending Large Language Models for Japanese. CoRR, abs/2403.15484. https://doi.org/10.48550/ARXIV.2403.15484

Li, Y., Wang, S., Ding, H., & Chen, H. (2023). Large Language Models in Finance: A Survey. 4th ACM International Conference on AI in Finance, ICAIF 2023, Brooklyn, NY, USA, November 27-29, 2023, 374–382. https://doi.org/10.1145/3604237.3626869

Lin, L., Durbin, J., Sato, M., & von Bock, F. (2023). Shisa 7B. https://huggingface.co/augmxnt/shisa-7b-v1. https://huggingface.co/augmxnt/shisa-7b-v1

Liu, H., Li, C., Li, Y., & Lee, Y. J. (2023). Improved Baselines with Visual Instruction Tuning. CoRR, abs/2310.03744. https://doi.org/10.48550/ARXIV.2310.03744

Liu, X., Lei, X., Wang, S., Huang, Y., Feng, Z., Wen, B., Cheng, J., Ke, P., Xu, Y., Tam, W. L., Zhang, X., Sun, L., Wang, H., Zhang, J., Huang, M., Dong, Y., & Tang, J. (2023). AlignBench: Benchmarking Chinese Alignment of Large Language Models. CoRR, abs/2311.18743. https://doi.org/10.48550/ARXIV.2311.18743

METI. (2024). GENIAC. https://www.meti.go.jp/english/policy/mono_info_service/geniac/index.html. https://www.meti.go.jp/english/policy/mono_info_service/geniac/index.html

Morishita, T., Yamaguchi, A., Morio, G., Hikaru, T., Imaichi, O., & Sogawa, Y. (2024). JFLD: A Japanese Benchmark for Deductive Reasoning based on Formal Logic. Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation.

Nakao, T., Miki, S., Nakamura, Y., Kikuchi, T., Nomura, Y., Hanaoka, S., Yoshikawa, T., Abe, O., & others. (2024). Capability of GPT-4V (ision) in the Japanese National Medical Licensing Examination: Evaluation Study. JMIR Medical Education, 10(1), e54393.

Okazaki, N., Mizuki, S., Iida, H., Loem, M., Hirai, S., Hattori, K., Ohi, M., Yokota, R., Fujii, K., & Nakamura, T. (2023). Swallow. https://huggingface.co/tokyotech-llm/Swallow-70b-hf. https://huggingface.co/tokyotech-llm/Swallow-70b-hf

Okazaki, N., Mizuki, S., Iida, H., Loem, M., Hirai, S., Hattori, K., Ohi, M., Yokota, R., Fujii, K., & Nakamura, T. (2024). Swallow-MS-7b-v0.1. https://huggingface.co/tokyotech-llm/Swallow-MS-7b-v0.1. https://huggingface.co/tokyotech-llm/Swallow-MS-7b-v0.1

OpenAI. (2022). ChatGPT. https://openai.com/chatgpt. https://openai.com/chatgpt

OpenAI. (2023). GPT-4 Technical Report. CoRR, abs/2303.08774. https://doi.org/10.48550/ARXIV.2303.08774

OpenAI. (2024). Introducing OpenAI Japan. https://openai.com/blog/introducing-openai-japan. https://openai.com/blog/introducing-openai-japan

Park, C., Lee, H., Park, H., Kim, H., Kim, S., Cho, S., Kim, S., & Lee, S. (2023). Open Ko-LLM Leaderboard. Upstage, National Information Society Agency. https://huggingface.co/spaces/upstage/open-ko-llm-leaderboard

Passaglia, S., & Yu, S. (2023). Rakuda Benchmark. https://github.com/yuzu-ai/japanese-llm-ranking. https://github.com/yuzu-ai/japanese-llm-ranking

Rafailov, R., Sharma, A., Mitchell, E., Manning, C. D., Ermon, S., & Finn, C. (2023). Direct Preference Optimization: Your Language Model is Secretly a Reward Model. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, & S. Levine (Eds.), Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. http://papers.nips.cc/paper%5C_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html

RIKEN. (2023). Release of ichikara-instruction (in Japanese). https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF-%E5%85%AC%E9%96%8B/. https://liat-aip.sakura.ne.jp/wp/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF%E4%BD%9C%E6%88%90/llm%E3%81%AE%E3%81%9F%E3%82%81%E3%81%AE%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%A9%E3%82%AF%E3%82%B7%E3%83%A7%E3%83%B3%E3%83%87%E3%83%BC%E3%82%BF-%E5%85%AC%E9%96%8B/

Sasaki, A., Hirakawa, M., Horie, S., & Nakamura, T. (2023). ELYZA-tasks-100. https://huggingface.co/datasets/elyza/ELYZA-tasks-100. https://huggingface.co/datasets/elyza/ELYZA-tasks-100

Sasaki, A., Hirakawa, M., Horie, S., Nakamura, T., Passaglia, S., & Oba, D. (2023). ELYZA-japanese-Llama-2-13b. https://huggingface.co/elyza/ELYZA-japanese-Llama-2-13b. https://huggingface.co/elyza/ELYZA-japanese-Llama-2-13b

Sawada, K., Zhao, T., Shing, M., Mitsui, K., Kaga, A., Hono, Y., Wakatsuki, T., & Mitsuda, K. (2024, May). Release of Pre-Trained Models for the Japanese Language. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). https://arxiv.org/abs/2404.01657

Sennrich, R., Haddow, B., & Birch, A. (2016). Neural Machine Translation of Rare Words with Subword Units. In K. Erk & N. A. Smith (Eds.), Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1715–1725). Association for Computational Linguistics. https://doi.org/10.18653/v1/P16-1162

Shinkawa, T. (2012). Substitutes for Immigrants? Social Policy Responses to Population Decreases in Japan. American Behavioral Scientist, 56(8), 1123–1138.

StabilityAI. (2023a). Japanese MT-Bench. https://github.com/Stability-AI/FastChat. https://github.com/Stability-AI/FastChat

StabilityAI. (2023b). JP Language Model Evaluation Harness. https://github.com/Stability-AI/lm-evaluation-harness. https://github.com/Stability-AI/lm-evaluation-harness

Sukeda, I., Suzuki, M., Sakaji, H., & Kodera, S. (2023). JMedLoRA: Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning. CoRR, abs/2310.10083. https://doi.org/10.48550/ARXIV.2310.10083

Sun, Y., Wan, Z., Ueda, N., Yahata, S., Cheng, F., Chu, C., & Kurohashi, S. (2024). Rapidly Developing High-quality Instruction Data and Evaluation Benchmark for Large Language Models with Minimal Human Effort: A Case Study on Japanese. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024).

Suzuki, M., Hirano, M., & Sakaji, H. (2023). From Base to Conversational: Japanese Instruction Dataset and Tuning Large Language Models. In J. He, T. Palpanas, X. Hu, A. Cuzzocrea, D. Dou, D. Slezak, W. Wang, A. Gruca, J. C.-W. Lin, & R. Agrawal (Eds.), IEEE International Conference on Big Data, BigData 2023, Sorrento, Italy, December 15-18, 2023 (pp. 5684–5693). IEEE. https://doi.org/10.1109/BIGDATA59044.2023.10386605

Takahashi, K., Omi, T., Arima, K., & Ishigaki, T. (2024). Pretraining and Updating Language- and Domain-specific Large Language Model: A Case Study in Japanese Business Domain.

Tanaka, Y., Nakata, T., Aiga, K., Etani, T., Muramatsu, R., Katagiri, S., Kawai, H., Higashino, F., Enomoto, M., Noda, M., & others. (2024). Performance of generative pretrained transformer on the national medical licensing examination in japan. PLOS Digital Health, 3(1), e0000433.

Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D., Blecher, L., Canton-Ferrer, C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., … Scialom, T. (2023). Llama 2: Open Foundation and Fine-Tuned Chat Models. CoRR, abs/2307.09288. https://doi.org/10.48550/ARXIV.2307.09288

Tsukagoshi, H., Sasano, R., & Takeda, K. (2023). Japanese SimCSE Technical Report. CoRR, abs/2310.19349. https://doi.org/10.48550/ARXIV.2310.19349

Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., Zhao, W. X., Wei, Z., & Wen, J.-R. (2023). A Survey on Large Language Model based Autonomous Agents. CoRR, abs/2308.11432. https://doi.org/10.48550/ARXIV.2308.11432

Wang, Y., Zhong, W., Li, L., Mi, F., Zeng, X., Huang, W., Shang, L., Jiang, X., & Liu, Q. (2023). Aligning Large Language Models with Human: A Survey. CoRR, abs/2307.12966. https://doi.org/10.48550/ARXIV.2307.12966

Wei, J., Bosma, M., Zhao, V. Y., Guu, K., Yu, A. W., Lester, B., Du, N., Dai, A. M., & Le, Q. V. (2022). Finetuned Language Models are Zero-Shot Learners. The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. https://openreview.net/forum?id=gEZrGCozdqR

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., … Rush, A. (2020). Transformers: State-of-the-Art Natural Language Processing. In Q. Liu & D. Schlangen (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 38–45). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-demos.6

Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., Zhang, M., Wang, J., Jin, S., Zhou, E., Zheng, R., Fan, X., Wang, X., Xiong, L., Zhou, Y., Wang, W., Jiang, C., Zou, Y., Liu, X., … Gui, T. (2023). The Rise and Potential of Large Language Model Based Agents: A Survey. CoRR, abs/2309.07864. https://doi.org/10.48550/ARXIV.2309.07864

Yada, S., Nakamura, Y., Wakamiya, S., & Aramaki, E. (2022). Real-mednlp: Overview of real document-based medical natural language processing task. Proceedings of the 16th NTCIR Conference on Evaluation of Information Access Technologies, 285–296.

Yanaka, H., & Mineshima, K. (2022). Compositional Evaluation on Japanese Textual Entailment and Similarity. Transactions of the Association for Computational Linguistics, 10, 1266–1284. https://doi.org/10.1162/tacl_a_00518

Yang, A., Xiao, B., Wang, B., Zhang, B., Bian, C., Yin, C., Lv, C., Pan, D., Wang, D., Yan, D., Yang, F., Deng, F., Wang, F., Liu, F., Ai, G., Dong, G., Zhao, H., Xu, H., Sun, H., … Wu, Z. (2023). Baichuan 2: Open Large-scale Language Models. CoRR, abs/2309.10305. https://doi.org/10.48550/ARXIV.2309.10305

Yin, Z., Wang, H., Horio, K., Kawahara, D., & Sekine, S. (2024). Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance. CoRR, abs/2402.14531. https://doi.org/10.48550/ARXIV.2402.14531

Zhao, W. X., Liu, J., Ren, R., & Wen, J.-R. (2022). Dense Text Retrieval based on Pretrained Language Models: A Survey. CoRR, abs/2211.14876. https://doi.org/10.48550/ARXIV.2211.14876

Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E., & Stoica, I. (2023). Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, & S. Levine (Eds.), Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. http://papers.nips.cc/paper%5C_files/paper/2023/hash/91f18a1287b398d378ef22505bf41832-Abstract-Datasets%5C_and%5C_Benchmarks.html

Zhou, H., Gu, B., Zou, X., Li, Y., Chen, S. S., Zhou, P., Liu, J., Hua, Y., Mao, C., Wu, X., Li, Z., & Liu, F. (2023). A Survey of Large Language Models in Medicine: Progress, Application, and Challenge. CoRR, abs/2311.05112. https://doi.org/10.48550/ARXIV.2311.05112

Zhu, D., Chen, J., Shen, X., Li, X., & Elhoseiny, M. (2023). MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. CoRR, abs/2304.10592. https://doi.org/10.48550/ARXIV.2304.10592