大規模言語モデルの翻訳を評価する大規模共同研究: 22心理尺度の人手-機械翻訳間の比較

山田, 祐樹; 小杉, 考司; 国里, 愛彦; 分寺, 杏介; 後藤, 崇志; 橋本, 泰央; 工藤, 大介; 李, 禕飛; 眞嶋, 良全; 向井, 智哉; 野村, 竜也; 小口, 真奈; 七條, 花恋; 下司, 忠大; 高松, 礼奈; 竹橋, 洋毅; 竹下, 昌志; 浅野, 良輔; 福田, 実奈; 古谷, 嘉一郎; 日道, 俊之; 平野, 寛樹; 五十嵐, 祐; 伊藤, 雅隆; 香川, 璃奈; 神野, 雄; 加藤, 弘通; 古村, 健太郎; 宮川, 裕基; 水野, 君平; 村浦, 新之助; 新谷, 優; 西村, 多久磨; 尾崎, 由佳; 佐藤, 秀樹; 佐藤, 奈月; 嶋, 大樹; 瀧川, 諒子; 田中, 勝則; 塚本, 早織; 山崎, 茜; 楊, 帆; 三浦, 麻子

doi:10.51094/jxiv.2056

##article.authors##

山田, 祐樹九州大学基幹教育院 https://orcid.org/0000-0003-1431-568X https://researchmap.jp/YukiYamada
小杉, 考司専修大学人間科学部 https://researchmap.jp/kosugitti
国里, 愛彦専修大学人間科学部 https://researchmap.jp/ykunisato
分寺, 杏介神戸大学大学院経営学研究科 https://researchmap.jp/K-Bunji
後藤, 崇志大阪大学大学院人間科学研究科 https://researchmap.jp/g_ikuyakat
橋本, 泰央東北文教大学人間科学部 https://researchmap.jp/read0156815
工藤, 大介東北学院大学経営学部 https://researchmap.jp/dice_k
李, 禕飛東京都立大学人文科学研究科・・帝京大学医学物理グループ https://tmusocialpsy.sakura.ne.jp/li/home/
眞嶋, 良全北星学園大学社会福祉学部 https://researchmap.jp/yoshi-majima
向井, 智哉福山大学人間文化学部 https://researchmap.jp/mukait
野村, 竜也龍谷大学先端理工学部 https://researchmap.jp/read0062932
小口, 真奈沖縄科学技術大学院大学発達神経生物学ユニット https://researchmap.jp/mana-oguchi
七條, 花恋九州大学大学院芸術工学府
下司, 忠大立正大学心理学部 https://researchmap.jp/shimotsukasa
高松, 礼奈東京科学大学リベラルアーツ研究教育院 https://researchmap.jp/mirthy
竹橋, 洋毅奈良女子大学文学部 https://researchmap.jp/rfm
竹下, 昌志名古屋大学大学院情報学研究科 https://researchmap.jp/m_takeshita
浅野, 良輔久留米大学文学部 https://researchmap.jp/ryosuke_asano
福田, 実奈京都外国語大学国際貢献学部 https://researchmap.jp/minaf
古谷, 嘉一郎関西大学総合情報学部 https://researchmap.jp/kaichiro_furutani
日道, 俊之高知工科大学経済・マネジメント学群 https://researchmap.jp/thimichi
平野, 寛樹高知工科大学総合研究所
五十嵐, 祐名古屋大学大学院教育発達科学研究科 https://researchmap.jp/tasukuigarashi
伊藤, 雅隆福島大学人間発達文化学類
香川, 璃奈筑波大学医学医療系 https://researchmap.jp/rinabou
神野, 雄西武文理大学サービス経営学部 https://researchmap.jp/ykanno-kobe
加藤, 弘通北海道大学大学院教育学研究科 https://researchmap.jp/katozel
古村, 健太郎弘前大学人文学部 https://researchmap.jp/kentaro-comra
宮川, 裕基追手門学院大学心理学部 https://researchmap.jp/ymiyagawa
水野, 君平広島修道大学健康科学部 https://researchmap.jp/kumpei
村浦, 新之助上越教育大学大学院教育学研究科 https://researchmap.jp/shinnosuke_muraura
新谷, 優法政大学グローバル教養学部 https://researchmap.jp/read0147306
西村, 多久磨東京理科大学教育支援機構 https://researchmap.jp/tnishimu
尾崎, 由佳東洋大学社会学部 https://researchmap.jp/g0000213011
佐藤, 秀樹福島県立医科大学医学部 https://researchmap.jp/hidekis
佐藤, 奈月福山市立大学教育学部 https://researchmap.jp/natsuki_sato
嶋, 大樹追手門学院大学心理学部 https://researchmap.jp/taikishima
瀧川, 諒子金沢大学融合研究域融合科学系 https://researchmap.jp/r_takikawa
田中, 勝則北海学園大学経営学部 https://researchmap.jp/tnk
塚本, 早織愛知学院大学心理学部 https://agur1.acoffice.biz/aiguhp/KgApp/k03/resid/S000536
山崎, 茜広島大学大学院人間社会科学研究科 https://researchmap.jp/akaney
楊, 帆早稲田大学文学学術院 https://researchmap.jp/fan.yang
三浦, 麻子大阪大学大学院人間科学研究科 https://orcid.org/0000-0002-7563-7503 https://researchmap.jp/asarin

DOI:

https://doi.org/10.51094/jxiv.2056

キーワード:

大規模言語モデル、機械翻訳、人手翻訳、ビッグチームサイエンス、翻訳精度、心理尺度

抄録

大規模言語モデル (LLM) は心理学研究の多側面にて活用されているが，心理尺度の翻訳において人手翻訳と同等の実用性をもつかは未検証である。このManyScalesプロジェクトは，LLM翻訳の妥当性と実用性を多面的に評価し，人手翻訳との比較を行うことを目的とする。本研究は日本国内での大規模共同研究 (43名，36機関) により実施され，22種類の英語版心理尺度を対象とする。それぞれの尺度は全て同一の手続きに沿って，RパッケージLLMTranslateを用いたLLM翻訳版および翻訳者による人手翻訳版の作成ののち，それぞれの版について同一の方法で逆翻訳を行う。両翻訳版は，(a) 専門家による意味的忠実性・自然さ・文化的妥当性の評価，(b)一般回答者による理解しやすさと自然さの評価，(c) 心理測定学的分析 (因子構造・因子得点・測定不変性・関連係数など) の観点から比較する。さらに探索的に，埋め込み表現に基づくコサイン類似度を算出し，原版項目との意味的距離を検討する。本研究により，LLM翻訳が心理尺度翻訳にどの程度活用可能であるか，また人手翻訳との差異がどの観点に表れるかを明らかにし，尺度翻訳プロセスの可視化・標準化への貢献を目指す。

利益相反に関する開示

全ての著者について開示すべき利益相反はありません。

ダウンロード *前日までの集計結果を表示します

ダウンロード実績データは、公開の翌日以降に作成されます。

引用文献

アスタリスク(*)が付された文献は，翻訳対象とする尺度が報告されているものである。直後の数字がTable 1のIDと対応している。

Abrams, E., Leone, P. V., Cambrosio, A., & Faraj, S. (2025). The governance of open science: A comparative analysis of two open science consortia. Research Policy, 54(3), 105195. https://doi.org/10.1016/j.respol.2025.105195

Adetula, A., Forscher, P. S., Basnight-Brown, D., Azouaghe, S., & IJzerman, H. (2022). Psychology should generalize from—not just to—Africa. Nature Reviews Psychology, 1(7), 370–371. https://doi.org/10.1038/s44159-022-00070-y

Affonso, F. M. (2025). Detecting vision-enabled AI respondents in behavioral research through cognitive traps. PsyArXiv. https://doi.org/10.31234/osf.io/enuqj_v1

Anastasi, A. (1996). Psychological testing (7th ed.). Prentice Hall.

Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J. R., Rytting, C., & Wingate, D. (2023). Out of one, many: Using language models to simulate human samples. Political Analysis, 31(3), 337–351. https://doi.org/10.1017/pan.2023.2

Ayers, J. W., Poliak, A., Dredze, M., Leas, E. C., Zhu, Z., Kelley, J. B., Faix, D. J., Goodman, A. M., Longhurst, C. A., Hogarth, M., & Smith, D. M. (2023). Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Internal Medicine, 183(6), 589–596. https://doi.org/10.1001/jamainternmed.2023.1838

*3Back, M. D., Küfner, A. C., Dufner, M., Gerlach, T. M., Rauthmann, J. F., & Denissen, J. J. (2013). Narcissistic admiration and rivalry: Disentangling the bright and dark sides of narcissism. Journal of Personality and Social Psychology, 105(6), 1013–1037. https://doi.org/10.1037/a0034431

Belton, I., MacDonald, A., Wright, G., & Hamlin, I. (2019). Improving the practical application of the Delphi method in group-based judgment: A six-step prescription for a well-founded and defensible process. Technological Forecasting and Social Change, 147, 72-82. https://doi.org/10.1016/j.techfore.2019.07.002

Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., Bollen, K. A., Brembs, B., Brown, L., Camerer, C., Cesarini, D., Chambers, C. D., Clyde, M., Cook, T. D., De Boeck, P., Dienes, Z., Dreber, A., Easwaran, K., Efferson, C., … Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6–10. https://doi.org/10.1038/s41562-017-0189-z

Binz, M., Akata, E., Bethge, M., Brändle, F., Callaway, F., Coda-Forno, J., Dayan, P., Demircan, C., Eckstein, M. K., Éltető, N., Griffiths, T. L., Haridi, S., Jagadish, A. K., Ji-An, L., Kipnis, A., Kumar, S., Ludwig, T., Mathony, M., Mattar, M., … Schulz, E. (2025). A foundation model to predict and capture human cognition. Nature, 644(8078), 1002–1009. https://doi.org/10.1038/s41586-025-09215-4

Binz, M., Alaniz, S., Roskies, A., Aczel, B., Bergstrom, C. T., Allen, C., Schad, D., Wulff, D., West, J. D., Zhang, Q., Shiffrin, R. M., Gershman, S. J., Popov, V., Bender, E. M., Marelli, M., Botvinick, M. M., Akata, Z., & Schulz, E. (2025). How should the advancement of large language models affect the practice of science? Proceedings of the National Academy of Sciences of the United States of America, 122(5), e2401227121. https://doi.org/10.1073/pnas.2401227121

Binz, M., & Schulz, E. (2023). Using cognitive psychology to understand GPT-3. Proceedings of the National Academy of Sciences of the United States of America, 120(6), e2218523120. https://doi.org/10.1073/pnas.2218523120

Bowers, J. S., Puebla, G., Thorat, S., Tsetsos, K., & Ludwig, C. J. H. (2025). Centaur: A model without a theory. PsyArXiv. https://doi.org/10.31234/osf.io/v9w37_v3

Breznau, N., Rinke, E. M., Wuttke, A., Nguyen, H. H. V., Adem, M., Adriaans, J., Alvarez-Benjumea, A., Andersen, H. K., Auer, D., Azevedo, F., Bahnsen, O., Balzer, D., Bauer, G., Bauer, P. C., Baumann, M., Baute, S., Benoit, V., Bernauer, J., Berning, C., … Żółtak, T. (2022). Observing many researchers using the same data and hypothesis reveals a hidden universe of uncertainty. Proceedings of the National Academy of Sciences of the United States of America, 119(44), e2203150119. https://doi.org/10.1073/pnas.2203150119

*21Briesch, A. M., Chafouleas, S. M., Neugebauer, S. R., & Riley-Tillman, T. C. (2013). Assessing influences on intervention implementation: Revision of the Usage Rating Profile-Intervention. Journal of School Psychology, 51(1), 81–96. https://doi.org/10.1016/j.jsp.2012.08.006

Brislin R. W. (1986). The wording and translation of research instruments. In Lonner W., Berry J. (Eds.), Field Methods in Cross-Cultural Research (pp. 137–164). Sage.

*20Buckels, E. E., Jones, D. N., & Paulhus, D. L. (2013). Behavioral confirmation of everyday sadism. Psychological Science, 24(11), 2201–2209. https://doi.org/10.1177/0956797613490749

*11Cantarella, I. A., Spielmann, S. S., Partridge, T., MacDonald, G., Joel, S., & Impett, E. A. (2023). Validating the fear of being single scale for individuals in relationships. Journal of Social and Personal Relationships, 40(9), 2969–2979. https://doi.org/10.1177/02654075231164588

Cao, Y., Sickles, R. C., Triebs, T. P., & Tumlinson, J. (2024). Linguistic distance to English impedes research performance. Research Policy, 53(4), 104971. https://doi.org/10.1016/j.respol.2024.104971

*8Cartwright, F., & Stritzke, W. G. K. (2008). A multidimensional ambivalence model of chocolate craving: Construct validity and associations with chocolate consumption and disordered eating. Eating Behaviors, 9(1), 1–12. https://doi.org/10.1016/j.eatbeh.2007.01.006

Coles, N. A., Hamlin, J. K., Sullivan, L. L., Parker, T. H., & Altschul, D. (2022). Build up big-team science. Nature, 601(7894), 505–507. https://doi.org/10.1038/d41586-022-00150-2

Coles, N. A., March, D. S., Marmolejo-Ramos, F., Larsen, J. T., Arinze, N. C., Ndukaihe, I. L. G., Willis, M. L., Foroni, F., Reggev, N., Mokady, A., Forscher, P. S., Hunter, J. F., Kaminski, G., Yüvrük, E., Kapucu, A., Nagy, T., Hajdu, N., Tejada, J., Freitag, R. M. K., … Liuzza, M. T. (2022). A multi-lab test of the facial feedback hypothesis by the Many Smiles Collaboration. Nature Human Behaviour, 6(12), 1731–1742. https://doi.org/10.1038/s41562-022-01458-9

*12Crimston, C. R., Hornsey, M. J., Bain, P. G., & Bastian, B. (2018). Moral expansiveness short form: Validity and reliability of the MESx. PLoS One, 13(10), e0205373. https://doi.org/10.1371/journal.pone.0205373

Cronin, B. (2001). Hyperauthorship: A postmodern perversion or evidence of a structural shift in scholarly communication practices? Journal of the American Society for Information Science and Technology, 52(7), 558–569. https://doi.org/10.1002/asi.1097

Cross, J., Kayalackakom, T., Robinson, R. E., Vaughans, A., Sebastian, R., Hood, R., Lewis, C., Devaraju, S., Honnavar, P., Naik, S., Joseph, J., Anand, N., Mohammed, A., Johnson, A., Cohen, E., Adeniji, T., Nnenna Nnaji, A., & George, J. E. (2025). Assessing ChatGPT’s capability as a new age standardized patient: Qualitative study. JMIR Medical Education, 11(1), e63353. https://doi.org/10.2196/63353

Cruchinho, P., López-Franco, M. D., Capelas, M. L., Almeida, S., Bennett, P. M., Miranda da Silva, M., Teixeira, G., Nunes, E., Lucas, P., Gaspar, F., & Handovers4SafeCare. (2024). Translation, cross-cultural adaptation, and validation of measurement instruments: A practical guideline for novice researchers. Journal of Multidisciplinary Healthcare, 17, 2701–2728. https://doi.org/10.2147/JMDH.S419714

DELVE (DECam Local Volume Exploration) Survey. (2025). DELVE Policy Guidelines Version 2.3. https://delve-survey.github.io/docs/DELVE_PolicyGuidelines.pdf

Dillion, D., Tandon, N., Gu, Y., & Gray, K. (2023). Can AI language models replace human participants? Trends in Cognitive Sciences, 27(7), 597–600. https://doi.org/10.1016/j.tics.2023.04.008

Elyoseph, Z., Hadar-Shoval, D., Asraf, K., & Lvovsky, M. (2023). ChatGPT outperforms humans in emotional awareness evaluations. Frontiers in Psychology, 14, 1199058. https://doi.org/10.3389/fpsyg.2023.1199058

Epstein, J., Santo, R. M., & Guillemin, F. (2015). A review of guidelines for cross-cultural adaptation of questionnaires could not bring out a consensus. Journal of Clinical Epidemiology, 68, 435-441. http://dx.doi.org/10.1016/j.jclinepi.2014.11.021

Forscher, P. S., Wagenmakers, E.-J., Coles, N. A., Silan, M. A., Dutra, N., Basnight-Brown, D., & IJzerman, H. (2023). The benefits, barriers, and risks of big-team science. Perspectives on Psychological Science, 18(3), 607–623. https://doi.org/10.1177/17456916221082970

Friese, S. P. (2025). Conversational analysis with AI - CA to the power of AI: Rethinking coding in qualitative analysis. OSF Preprints. https://doi.org/10.31219/osf.io/6b52m_v1

*10Gabay, R., Hameiri, B., Rubel-Lifschitz, T., & Nadler, A. (2020). The tendency for interpersonal victimhood: The personality construct and its consequences. Personality and Individual Differences, 165, 110134. https://doi.org/10.1016/j.paid.2020.110134

*17Giacalone, R. A., Valentine, S. R., Yin, B., & Promislo, M. D. (2025). Rage against the dying of the light: Employees’ moral outrage, anger expression, and generalized well-being. Journal of Business Ethics. https://doi.org/10.1007/s10551-024-05919-1

Google. (2025). Gmail's protections are strong and effective, and claims of a major Gmail security warning are false. https://blog.google/products/workspace/gmail-security-protections/

Götz, F. M., Maertens, R., Loomba, S., & van der Linden, S. (2024). Let the algorithm speak: How to use neural networks for automatic item generation in psychological scale development. Psychological Methods, 29(3), 494–518. https://doi.org/10.1037/met0000540

Granas, A. G., Nørgaard, L. S., & Sporrong, S. K. (2014). Lost in translation? Comparing three Scandinavian translations of the Beliefs about Medicines Questionnaire. Patient Education and Counseling, 96(2), 216–221. https://doi.org/10.1016/j.pec.2014.05.010

Grobelny, J., Szymański, K., & Strozyk, Z. (2025). Act as an expert in psychometry. The evaluation of large language models utility in psychological tests cross-cultural adaptations. Acta Psychologica, 261(105813), 105813. https://doi.org/10.1016/j.actpsy.2025.105813

Harkness J. (2003). Questionnaire translation. In Harkness J. A., van de Vijver F. J. R., Mohler P. P. (Eds.), Cross-Cultural Survey Methods (pp. 35–56). Wiley.

Harkness J. A., Villar A., & Edwards B. (2010). Translation, adaptation, and design. In Harkness J. A.et al. (Eds.), Survey Methods in Multinational, Multicultural and Multiregional Contexts (pp. 117-140). Hoboken, NJ: John Wiley.

Heinz, M. V., Mackin, D. M., Trudeau, B. M., Bhattacharya, S., Wang, Y., Banta, H. A., Jewett, A. D., Salzhauer, A. J., Griffin, T. Z., & Jacobson, N. C. (2025). Randomized trial of a generative AI chatbot for mental health treatment. NEJM AI, 2(4). https://doi.org/10.1056/aioa2400802

*16Hewitt, P. L., & Flett, G. L. (1991a). Perfectionism in the self and social contexts: Conceptualization, assessment, and association with psychopathology. Journal of Personality and Social Psychology, 60(3), 456–470. https://doi.org/10.1037/0022-3514.60.3.456

*16Hewitt, P. L., Flett, G. L., Turnbull-Donovan, W., & Mikail, S. F. (1991b). The Multidimensional Perfectionism Scale: Reliability, validity, and psychometric properties in psychiatric samples. Psychological Assessment, 3(3), 464–468. http://doi.org/10.1037/1040-3590.3.3.464

Hoekman, J., & Rake, B. (2024). Geography of authorship: How geography shapes authorship attribution in big team science. Research Policy, 53(2), 104927. https://doi.org/10.1016/j.respol.2023.104927

Holcombe, A. (2019). Farewell authors, hello contributors. Nature, 571(7764), 147. https://doi.org/10.1038/d41586-019-02084-8

*18Hopwood, C. J., Piazza, J., Chen, S., & Bleidorn, W. (2021). Development and validation of the motivations to Eat Meat Inventory. Appetite, 163, 105210. https://doi.org/10.1016/j.appet.2021.105210

Horn J.L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185. https://doi.org/10.1007/BF02289447

Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1-55. https://doi.org/10.1080/10705519909540118

Huang, M., Zhang, X., Soto, C., & Evans, J. (2024). Designing LLM-agents with personalities: A psychometric approach. arXiv. https://doi.org/10.48550/arXiv.2410.19238

*7Katzir, M., Baldwin, M., Werner, K. M., & Hofmann, W. (2021). Moving beyond inhibition: Capturing a broader scope of the self-control construct with the Self-Control Strategy Scale (SCSS). Journal of Personality Assessment, 103(6), 762–776. https://doi.org/10.1080/00223891.2021.1883627

Kjell, O. N. E., Kjell, K., Garcia, D., & Sikström, S. (2019). Semantic measures: Using natural language processing to measure, differentiate, and describe psychological constructs. Psychological Methods, 24(1), 92–115. https://doi.org/10.1037/met0000191

Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Bahník, Š., Bernstein, M. J., Bocian, K., Brandt, M. J., Brooks, B., Brumbaugh, C. C., Cemalcilar, Z., Chandler, J., Cheong, W., Davis, W. E., Devos, T., Eisner, M., Frankowska, N., Furrow, D., Galliani, E. M., … Nosek, B. A. (2014). Investigating variation in replicability: A “many labs” replication project. Social Psychology, 45(3), 142–152. https://doi.org/10.1027/1864-9335/a000178

*13Konrath, S., James, C., Weinstein, E., & Tench, B. (2025). Development and validation of the Tech With Care Index for teens. Psychology of Popular Media, 14(4), 507–520. https://doi.org/10.1037/ppm0000593

Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155–163. https://doi.org/10.1016/j.jcm.2016.02.012

*14Kruglanski, A. W., Thompson, E. P., Higgins, E. T., Atash, M. N., Pierro, A., Shah, J. Y., & Spiegel, S. (2000). To "do the right thing" or to "just do it": Locomotion and assessment as distinct self-regulatory imperatives. Journal of Personality and Social Psychology, 79(5), 793–815. https://doi.org/10.1037/0022-3514.79.5.793

Kunst, J. R. (2026). LLMTranslate: 'shiny' app for TRAPD/ISPOR survey translation with LLMs (R package version 0.3.0). Comprehensive R Archive Network (CRAN). https://doi.org/10.32614/CRAN.package.LLMTranslate

Kunst, J. R., & Bierwiaczonek, K. (2023). Utilizing AI questionnaire translations in cross-cultural and intercultural research: Insights and recommendations. International Journal of Intercultural Relations, 97(101888), 101888. https://doi.org/10.1016/j.ijintrel.2023.101888

Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A. J., Argamon, S. E., Baguley, T., Becker, R. B., Benning, S. D., Bradford, D. E., Buchanan, E. M., Caldwell, A. R., Van Calster, B., Carlsson, R., Chen, S.-C., Chung, B., Colling, L. J., Collins, G. S., Crook, Z., … Zwaan, R. A. (2018). Justify your alpha. Nature Human Behaviour, 2(3), 168–171. https://doi.org/10.1038/s41562-018-0311-x

*4LeFebvre, A., & Huta, V. (2021). Age and gender differences in eudaimonic, hedonic, and extrinsic motivations. Journal of Happiness Studies, 22(5), 2299–2321. https://doi.org/10.1007/s10902-020-00319-4

Maertens, R., Götz, F. M., Golino, H. F., Roozenbeek, J., Schneider, C. R., Kyrychenko, Y., Kerr, J. R., Stieger, S., McClanahan, W. P., Drabot, K., He, J., & van der Linden, S. (2024). The Misinformation Susceptibility Test (MIST): A psychometrically validated measure of news veracity discernment. Behavior Research Methods, 56(3), 1863–1899. https://doi.org/10.3758/s13428-023-02124-2

Massé, C. C., Krieger, V., Peró-Cebollero, M., Amador-Campos, J. A., & Guàrdia-Olmos, J. (2025). Measurement invariance and cross-linguistic validation of the PSS-4 in university context: Multidimensional analysis and associations with psychological and behavioral outcomes. Frontiers in Psychology, 16:1648070. https://doi.org/10.3389/fpsyg.2025.1648070

Maslej, N., Fattorini, L., Perrault, R., Gil, Y., Parli, V., Kariuki, N., Capstick, E., Reuel, A., Brynjolfsson, E., Etchemendy, J., Ligett, K., Lyons, T., Manyika, J., Niebles, J. C., Shoham, Y., Wald, R., Walsh, T., Hamrah, A., Santarlasci, L., … Oak, S. (2025). Artificial Intelligence Index Report 2025. arXiv. https://doi.org/10.48550/arXiv.2504.07139

*19McWilliam, A. M., Beattie, S., & Callow, N. (2025). The development and validation of the public speaking threats inventory (PSTI). Personality and Individual Differences, 246, 113322. https://doi.org/10.1016/j.paid.2025.113322

Mei, Q., Xie, Y., Yuan, W., & Jackson, M. O. (2024). A Turing test of whether AI chatbots are behaviorally similar to humans. Proceedings of the National Academy of Sciences of the United States of America, 121(9), e2313925121. https://doi.org/10.1073/pnas.2313925121

三浦麻子・小林哲郎 (2015)．オンライン調査モニタのSatisficeに関する実験的研究. 社会心理学研究，31(1), 1–12. https://doi.org/10.14966/jssp.31.1_1

Mokkink, L. B., Elsman, E. B. M., & Terwee, C. B. (2024). COSMIN guideline for systematic reviews of patient-reported outcome measures version 2.0. Quality of Life Research, 33(11), 2929–2939. https://doi.org/10.1007/s11136-024-03761-6

Moshontz, H., Campbell, L., Ebersole, C. R., IJzerman, H., Urry, H. L., Forscher, P. S., Grahe, J. E., McCarthy, R. J., Musser, E. D., Antfolk, J., Castille, C. M., Evans, T. R., Fiedler, S., Flake, J. K., Forero, D. A., Janssen, S. M. J., Keene, J. R., Protzko, J., Aczel, B., … Chartier, C. R. (2018). The Psychological Science Accelerator: Advancing psychology through a distributed collaborative network. Advances in Methods and Practices in Psychological Science, 1(4), 501–515. https://doi.org/10.1177/2515245918797607

Mundfrom, D. J., Shaw, D. G., & Ke, T. L. (2005). Minimum sample size recommendations for conducting factor analyses. International Journal of Testing, 5(2), 159–168. https://doi.org/10.1207/s15327574ijt0502_4

Orr, M., Cranford, D., Ford, K., Gluck, K., Hancock, W., Lebiere, C., Pirolli, P., Ritter, F., & Stocco, A. (2025). Not even wrong: On the limits of prediction as explanation in cognitive science. arXiv. http://arxiv.org/abs/2510.03311

Ozolins, U., Hale, S., Cheng, X., Hyatt, A., & Schofield, P. (2020). Translation and back-translation methodology in health research – a critique. Expert Review of Pharmacoeconomics & Outcomes Research, 1–9. https://doi.org/10.1080/14737167.2020.1734453

Park, J. J., & Oh, J. (2025). Enhancing international research through alternative back translation methods leveraging artificial intelligence. Human Resource Development International, 1–22. https://doi.org/10.1080/13678868.2025.2558571

Parsons, S., Azevedo, F., Elsherif, M. M., Guay, S., Shahim, O. N., Govaart, G. H., Norris, E., O’Mahony, A., Parker, A. J., Todorovic, A., Pennington, C. R., Garcia-Pelegrin, E., Lazić, A., Robertson, O., Middleton, S. L., Valentini, B., McCuaig, J., Baker, B. J., Collins, E., … Aczel, B. (2022). A community-sourced glossary of open scholarship terms. Nature Human Behaviour, 6(3), 312–318. https://doi.org/10.1038/s41562-021-01269-4

*20Paulhus, D. L., & Jones, D. N. (2015). Measures of dark personalities. In G. J. Boyle, D. H. Saklofske, & G. Matthews (Eds.), Measures of Personality and Social Psychological Constructs (pp. 562–594). Elsevier Academic Press. https://doi.org/10.1016/B978-0-12-386915-9.00020-6

*6Pennycook, G., Cheyne, J. A., Koehler, D. J., & Fugelsang, J. A. (2020). On the belief that beliefs should change according to evidence: Implications for conspiratorial, moral, paranormal, political, religious, and science beliefs. Judgment and Decision Making, 15(4), 476–498. https://doi.org/10.1017/S1930297500007439

*15Przybylski, A. K., Murayama, K., DeHaan, C. R., & Gladwell, V. (2013). Motivational, emotional, and behavioral correlates of fear of missing out. Computers in Human Behavior, 29, 1814–1848. http://dx.doi.org/10.1016/j.chb.2013.02.014

Revelle, W. (2025). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University, Evanston, Illinois. R package version 2.5.6, https://CRAN.R-project.org/package=psych.

Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36. https://doi.org/10.18637/jss.v048.i02

Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16(2), 225–237. https://doi.org/10.3758/PBR.16.2.225

Ruggeri, K., Panin, A., Vdovic, M., Većkalov, B., Abdul-Salaam, N., Achterberg, J., Akil, C., Amatya, J., Amatya, K., Andersen, T. L., Aquino, S. D., Arunasalam, A., Ashcroft-Jones, S., Askelund, A. D., Ayacaxli, N., Sheshdeh, A. B., Bailey, A., Barea Arroyo, P., Mejía, G. B., … García-Garzon, E. (2022). The globalizability of temporal discounting. Nature Human Behaviour, 6(10), 1386–1397. https://doi.org/10.1038/s41562-022-01392-w

Salama-Younes, M., Montazeri, A., Ismaïl, A., & Roncin, C. (2009). Factor structure and internal consistency of the 12-item General Health Questionnaire (GHQ-12) and the Subjective Vitality Scale (VS), and the relationship between them: A study from France. Health and Quality of Life Outcomes, 7(1), 22. https://doi.org/10.1186/1477-7525-7-22

Sanz, A., Tapia, J. L., García-Carpintero, E., Rocabado, J. F., & Pedrajas, L. M. (2025). ChatGPT simulated patient: Use in clinical training in psychology. Psicothema, 37(3), 23–32. https://doi.org/10.70478/psicothema.2025.37.21

佐々木研一・豊田秀樹 (2024). ChatGPTにより生成された心理尺度項目の信頼性・妥当性の評価日本テスト学会誌, 20(1), 111–133. https://doi.org/10.24690/jart.20.1_111

*22Schepman, A., & Rodway, P. (2026). Validation of the short general attitudes towards artificial intelligence scale: The short GAAIS-10. International Journal of Human–Computer Interaction, 1–17. https://doi.org/10.1080/10447318.2025.2610446

Schönbrodt, F. D., & Wagenmakers, E.-J. (2018). Bayes factor design analysis: Planning for compelling evidence. Psychonomic Bulletin & Review, 25(1), 128–142. https://doi.org/10.3758/s13423-017-1230-y

Schröder, S., Morgenroth, T., Kuhl, U., Vaquet, V., & Paaßen, B. (2025). Large Language Models do not simulate human psychology. arXiv. https://doi.org/10.48550/arXiv.2508.06950

Seminara, D., Khoury, M. J., O’Brien, T. R., Manolio, T., Gwinn, M. L., Little, J., Higgins, J. P. T., Bernstein, J. L., Boffetta, P., Bondy, M., Bray, M. S., Brenchley, P. E., Buffler, P. A., Casas, J. P., Chokkalingam, A. P., Danesh, J., Smith, G. D., Dolan, S., Duncan, R., … Ioannidis, J. P. A. (2007). The emergence of networks in human genome epidemiology: Challenges and opportunities. Epidemiology, 18(1), 1–8. https://doi.org/10.1097/01.ede.0000249540.17855.b7

*5Smith, M. M., Saklofske, D. H., Stoeber, J., & Sherry, S. B. (2016). The Big Three Perfectionism Scale: A new measure of perfectionism. Journal of Psychoeducational Assessment, 34(7), 670–687. https://doi.org/10.1177/0734282916651539

Sørensen, C. B., Gram-Hanssen, A., Rosenberg, J., & Baker, J. J. (2025). Comparing ChatGPT-4 and human translation of an outcome questionnaire: A randomized, double-blinded non-inferiority study. Cureus, 17(4), e82525. https://doi.org/10.7759/cureus.82525

*11Spielmann, S. S., MacDonald, G., Maxwell, J. A., Joel, S., Peragine, D., Muise, A., & Impett, E. A. (2013). Settling for less out of fear of being single. Journal of Personality and Social Psychology, 105(6), 1049–1073. https://doi.org/10.1037/a0034628

Stefan, A. M., Evans, N. J., & Wagenmakers, E.-J. (2022). Practical challenges and methodological flexibility in prior elicitation. Psychological Methods, 27(2), 177–197. https://doi.org/10.1037/met0000354

Stefana, A., Damiani, S., Granziol, U., Provenzani, U., Solmi, M., Youngstrom, E. A., & Fusar-Poli, P. (2024). Psychological, psychiatric, and behavioral sciences measurement scales: Best practice guidelines for their development and validation. Frontiers in Psychology, 15:1494261. https://doi.org/10.3389/fpsyg.2024.1494261

Symeonaki, M., Stamou, G., Kazani, A., Tsouparopoulou, E., & Stamatopoulou, G. (2024). Examining the development of attitude scales using Large Language Models (LLMs). arXiv. https://doi.org/10.48550/arXiv.2405.19011

Teixeira da Silva, J. A., & Yamada, Y. (2024). Could generative artificial intelligence serve as a psychological counselor? Prospects and limitations. Central Asian Journal of Medical Hypotheses and Ethics, 5(4), 297–303. https://doi.org/10.47316/cajmhe.2024.5.4.06

Trafimow, D., Amrhein, V., Areshenkoff, C. N., Barrera-Causil, C. J., Beh, E. J., Bilgiç, Y. K., Bono, R., Bradley, M. T., Briggs, W. M., Cepeda-Freyre, H. A., Chaigneau, S. E., Ciocca, D. R., Correa, J. C., Cousineau, D., de Boer, M. R., Dhar, S. S., Dolgov, I., Gómez-Benito, J., Grendar, M., … Marmolejo-Ramos, F. (2018). Manipulating the alpha level cannot cure significance testing. Frontiers in Psychology, 9:699. https://doi.org/10.3389/fpsyg.2018.00699

Vaidis, D. C., Sleegers, W. W. A., van Leeuwen, F., DeMarree, K. G., Sætrevik, B., Ross, R. M., Schmidt, K., Protzko, J., Morvinski, C., Ghasemi, O., Roberts, A. J., Stone, J., Bran, A., Gourdon-Kanhukamwe, A., Gunsoy, C., Moussaoui, L. S., Smith, A. R., Nugier, A., Fayant, M.-P., … Priolo, D. (2024). A multilab replication of the induced-compliance paradigm of cognitive dissonance. Advances in Methods and Practices in Psychological Science, 7(1), 25152459231213375. https://doi.org/10.1177/25152459231213375

Van Bavel, J. J., Cichocka, A., Capraro, V., Sjåstad, H., Nezlek, J. B., Pavlović, T., Alfano, M., Gelfand, M. J., Azevedo, F., Birtel, M. D., Cislak, A., Lockwood, P. L., Ross, R. M., Abts, K., Agadullina, E., Aruta, J. J. B., Besharati, S. N., Bor, A., Choma, B. L., … Boggio, P. S. (2022). National identity predicts public health support during a global pandemic. Nature Communications, 13, 517. https://doi.org/10.1038/s41467-021-27668-9

Visser, I., Bergmann, C., Byers-Heinlein, K., Dal Ben, R., Duch, W., Forbes, S., Franchin, L., Frank, M. C., Geraci, A., Hamlin, J. K., Kaldy, Z., Kulke, L., Laverty, C., Lew-Williams, C., Mateu, V., Mayor, J., Moreau, D., Nomikou, I., Schuwerk, T., … Zettersten, M. (2022). Improving the generalizability of infant psychological research: The ManyBabies model. Behavioral and Brain Sciences, 45:e35. https://doi.org/10.1017/S0140525X21000455

*1Watson, D., O'Hara, M. W., Naragon-Gainey, K., Koffel, E., Chmielewski, M., Kotov, R., Stasik, S. M., & Ruggero, C. J. (2012). Development and validation of new anxiety and bipolar symptom scales for an expanded version of the IDAS (the IDAS-II). Assessment, 19(4), 399–420. https://doi.org/10.1177/1073191112449857

*9Weinfurt, K. P., Lin, L., Bruner, D. W., Cyranowski, J. M., Dombeck, C. B., Hahn, E. A., ... & Flynn, K. E. (2015). Development and initial validation of the PROMIS® sexual function and satisfaction measures version 2.0. Journal of Sexual Medicine, 12(9), 1961–1974. https://doi.org/10.1111/jsm.12966

Werner, P., Eliyahu, E., & Krupat, E. (2025). Mapping the translation and psychometric characteristics of the Patient-Practitioner Oriented Scale: A scoping review. Patient Education and Counseling, 137:108787. https://doi.org/10.1016/j.pec.2025.108787

Wild, D., Grove, A., Martin, M., Eremenco, S., McElroy, S., Verjee-Lorenz, A., Erikson, P., & ISPOR Task Force for Translation and Cultural Adaptation (2005). Principles of good practice for the translation and cultural adaptation process for patient-reported outcomes (PRO) measures: Report of the ISPOR task force for translation and cultural adaptation. Value in Health, 8(2), 94–104. https://doi.org/10.1111/j.1524-4733.2005.04054.x

山田祐樹 (2024). 心理学を遊撃する──再現性問題は恥だが役に立つ──. ちとせプレス.

*2Yang, F., & Oshio, A. (2025). Using attachment theory to conceptualize and measure the experiences in human-AI relationships. Current Psychology, 44(11), 10658–10669. https://doi.org/10.1007/s12144-025-07917-6

Zhang, W., Balloo, K., Hosein, A., & Medland, E. (2024). A scoping review of well-being measures: Conceptualisation and scales for overall well-being. BMC Psychology, 12(1), 585. https://doi.org/10.1186/s40359-024-02074-0

大規模言語モデルの翻訳を評価する大規模共同研究

22心理尺度の人手-機械翻訳間の比較

##article.authors##

DOI:

キーワード:

抄録

利益相反に関する開示

ダウンロード *前日までの集計結果を表示します

引用文献

ダウンロード

公開済

バージョン

改版理由

ライセンス

言語

臨時メンテナンス