大規模言語モデルの翻訳を評価する大規模共同研究
22心理尺度の人手-機械翻訳間の比較
DOI:
https://doi.org/10.51094/jxiv.2056キーワード:
大規模言語モデル、 機械翻訳、 人手翻訳、 ビッグチームサイエンス、 翻訳精度、 心理尺度抄録
大規模言語モデル (LLM) は心理学研究の多側面にて活用されているが,心理尺度の翻訳において人手翻訳と同等の実用性をもつかは未検証である。このManyScalesプロジェクトは,LLM翻訳の妥当性と実用性を多面的に評価し,人手翻訳との比較を行うことを目的とする。本研究は日本国内での大規模共同研究 (43名,36機関) により実施され,22種類の英語版心理尺度を対象とする。それぞれの尺度は全て同一の手続きに沿って,RパッケージLLMTranslateを用いたLLM翻訳版および翻訳者による人手翻訳版の作成ののち,それぞれの版について同一の方法で逆翻訳を行う。両翻訳版は,(a) 専門家による意味的忠実性・自然さ・文化的妥当性の評価,(b)一般回答者による理解しやすさと自然さの評価,(c) 心理測定学的分析 (因子構造・因子得点・測定不変性・関連係数など) の観点から比較する。さらに探索的に,埋め込み表現に基づくコサイン類似度を算出し,原版項目との意味的距離を検討する。本研究により,LLM翻訳が心理尺度翻訳にどの程度活用可能であるか,また人手翻訳との差異がどの観点に表れるかを明らかにし,尺度翻訳プロセスの可視化・標準化への貢献を目指す。
利益相反に関する開示
全ての著者について開示すべき利益相反はありません。ダウンロード *前日までの集計結果を表示します
引用文献
アスタリスク(*)が付された文献は,翻訳対象とする尺度が報告されているものである。
Abrams, E., Leone, P. V., Cambrosio, A., & Faraj, S. (2025). The governance of open science: A comparative analysis of two open science consortia. Research Policy, 54(3), 105195. https://doi.org/10.1016/j.respol.2025.105195
Adetula, A., Forscher, P. S., Basnight-Brown, D., Azouaghe, S., & IJzerman, H. (2022). Psychology should generalize from—not just to—Africa. Nature Reviews Psychology, 1(7), 370–371. https://doi.org/10.1038/s44159-022-00070-y
Affonso, F. M. (2025). Detecting vision-enabled AI respondents in behavioral research through cognitive traps. PsyArXiv. https://doi.org/10.31234/osf.io/enuqj_v1
Anastasi, A. (1996). Psychological testing (7th ed.). Prentice Hall.
Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J. R., Rytting, C., & Wingate, D. (2023). Out of one, many: Using language models to simulate human samples. Political Analysis, 31(3), 337–351. https://doi.org/10.1017/pan.2023.2
Ayers, J. W., Poliak, A., Dredze, M., Leas, E. C., Zhu, Z., Kelley, J. B., Faix, D. J., Goodman, A. M., Longhurst, C. A., Hogarth, M., & Smith, D. M. (2023). Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Internal Medicine, 183(6), 589–596. https://doi.org/10.1001/jamainternmed.2023.1838
*Back, M. D., Küfner, A. C., Dufner, M., Gerlach, T. M., Rauthmann, J. F., & Denissen, J. J. (2013). Narcissistic admiration and rivalry: Disentangling the bright and dark sides of narcissism. Journal of Personality and Social Psychology, 105(6), 1013–1037. https://doi.org/10.1037/a0034431
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., Bollen, K. A., Brembs, B., Brown, L., Camerer, C., Cesarini, D., Chambers, C. D., Clyde, M., Cook, T. D., De Boeck, P., Dienes, Z., Dreber, A., Easwaran, K., Efferson, C., … Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6–10. https://doi.org/10.1038/s41562-017-0189-z
Binz, M., Akata, E., Bethge, M., Brändle, F., Callaway, F., Coda-Forno, J., Dayan, P., Demircan, C., Eckstein, M. K., Éltető, N., Griffiths, T. L., Haridi, S., Jagadish, A. K., Ji-An, L., Kipnis, A., Kumar, S., Ludwig, T., Mathony, M., Mattar, M., … Schulz, E. (2025). A foundation model to predict and capture human cognition. Nature, 644(8078), 1002–1009. https://doi.org/10.1038/s41586-025-09215-4
Binz, M., Alaniz, S., Roskies, A., Aczel, B., Bergstrom, C. T., Allen, C., Schad, D., Wulff, D., West, J. D., Zhang, Q., Shiffrin, R. M., Gershman, S. J., Popov, V., Bender, E. M., Marelli, M., Botvinick, M. M., Akata, Z., & Schulz, E. (2025). How should the advancement of large language models affect the practice of science? Proceedings of the National Academy of Sciences of the United States of America, 122(5), e2401227121. https://doi.org/10.1073/pnas.2401227121
Binz, M., & Schulz, E. (2023). Using cognitive psychology to understand GPT-3. Proceedings of the National Academy of Sciences of the United States of America, 120(6), e2218523120. https://doi.org/10.1073/pnas.2218523120
Bowers, J. S., Puebla, G., Thorat, S., Tsetsos, K., & Ludwig, C. J. H. (2025). Centaur: A model without a theory. PsyArXiv. https://doi.org/10.31234/osf.io/v9w37_v3
Breznau, N., Rinke, E. M., Wuttke, A., Nguyen, H. H. V., Adem, M., Adriaans, J., Alvarez-Benjumea, A., Andersen, H. K., Auer, D., Azevedo, F., Bahnsen, O., Balzer, D., Bauer, G., Bauer, P. C., Baumann, M., Baute, S., Benoit, V., Bernauer, J., Berning, C., … Żółtak, T. (2022). Observing many researchers using the same data and hypothesis reveals a hidden universe of uncertainty. Proceedings of the National Academy of Sciences of the United States of America, 119(44), e2203150119. https://doi.org/10.1073/pnas.2203150119
*Briesch, A. M., Chafouleas, S. M., Neugebauer, S. R., & Riley-Tillman, T. C. (2013). Assessing influences on intervention implementation: Revision of the Usage Rating Profile-Intervention. Journal of School Psychology, 51(1), 81–96. https://doi.org/10.1016/j.jsp.2012.08.006
Brislin R. W. (1986). The wording and translation of research instruments. In Lonner W., Berry J. (Eds.), Field Methods in Cross-Cultural Research (pp. 137–164). Sage.
*Buckels, E. E., Jones, D. N., & Paulhus, D. L. (2013). Behavioral confirmation of everyday sadism. Psychological Science, 24(11), 2201–2209. https://doi.org/10.1177/0956797613490749
*Cantarella, I. A., Spielmann, S. S., Partridge, T., MacDonald, G., Joel, S., & Impett, E. A. (2023). Validating the fear of being single scale for individuals in relationships. Journal of Social and Personal Relationships, 40(9), 2969–2979. https://doi.org/10.1177/02654075231164588
Cao, Y., Sickles, R. C., Triebs, T. P., & Tumlinson, J. (2024). Linguistic distance to English impedes research performance. Research Policy, 53(4), 104971. https://doi.org/10.1016/j.respol.2024.104971
*Cartwright, F., & Stritzke, W. G. K. (2008). A multidimensional ambivalence model of chocolate craving: Construct validity and associations with chocolate consumption and disordered eating. Eating Behaviors, 9(1), 1–12. https://doi.org/10.1016/j.eatbeh.2007.01.006
Coles, N. A., Hamlin, J. K., Sullivan, L. L., Parker, T. H., & Altschul, D. (2022). Build up big-team science. Nature, 601(7894), 505–507. https://doi.org/10.1038/d41586-022-00150-2
Coles, N. A., March, D. S., Marmolejo-Ramos, F., Larsen, J. T., Arinze, N. C., Ndukaihe, I. L. G., Willis, M. L., Foroni, F., Reggev, N., Mokady, A., Forscher, P. S., Hunter, J. F., Kaminski, G., Yüvrük, E., Kapucu, A., Nagy, T., Hajdu, N., Tejada, J., Freitag, R. M. K., … Liuzza, M. T. (2022). A multi-lab test of the facial feedback hypothesis by the Many Smiles Collaboration. Nature Human Behaviour, 6(12), 1731–1742. https://doi.org/10.1038/s41562-022-01458-9
*Crimston, C. R., Hornsey, M. J., Bain, P. G., & Bastian, B. (2018). Moral expansiveness short form: Validity and reliability of the MESx. PLoS One, 13(10), e0205373. https://doi.org/10.1371/journal.pone.0205373
Cronin, B. (2001). Hyperauthorship: A postmodern perversion or evidence of a structural shift in scholarly communication practices? Journal of the American Society for Information Science and Technology, 52(7), 558–569. https://doi.org/10.1002/asi.1097
Cross, J., Kayalackakom, T., Robinson, R. E., Vaughans, A., Sebastian, R., Hood, R., Lewis, C., Devaraju, S., Honnavar, P., Naik, S., Joseph, J., Anand, N., Mohammed, A., Johnson, A., Cohen, E., Adeniji, T., Nnenna Nnaji, A., & George, J. E. (2025). Assessing ChatGPT’s capability as a new age standardized patient: Qualitative study. JMIR Medical Education, 11(1), e63353. https://doi.org/10.2196/63353
Cruchinho, P., López-Franco, M. D., Capelas, M. L., Almeida, S., Bennett, P. M., Miranda da Silva, M., Teixeira, G., Nunes, E., Lucas, P., Gaspar, F., & Handovers4SafeCare. (2024). Translation, cross-cultural adaptation, and validation of measurement instruments: A practical guideline for novice researchers. Journal of Multidisciplinary Healthcare, 17, 2701–2728. https://doi.org/10.2147/JMDH.S419714
DELVE (DECam Local Volume Exploration) Survey. (2025). DELVE Policy Guidelines Version 2.3. https://delve-survey.github.io/docs/DELVE_PolicyGuidelines.pdf
Dillion, D., Tandon, N., Gu, Y., & Gray, K. (2023). Can AI language models replace human participants? Trends in Cognitive Sciences, 27(7), 597–600. https://doi.org/10.1016/j.tics.2023.04.008
Elyoseph, Z., Hadar-Shoval, D., Asraf, K., & Lvovsky, M. (2023). ChatGPT outperforms humans in emotional awareness evaluations. Frontiers in Psychology, 14, 1199058. https://doi.org/10.3389/fpsyg.2023.1199058
Epstein, J., Santo, R. M., & Guillemin, F. (2015). A review of guidelines for cross-cultural adaptation of questionnaires could not bring out a consensus. Journal of Clinical Epidemiology, 68, 435-441. http://dx.doi.org/10.1016/j.jclinepi.2014.11.021
Forscher, P. S., Wagenmakers, E.-J., Coles, N. A., Silan, M. A., Dutra, N., Basnight-Brown, D., & IJzerman, H. (2023). The benefits, barriers, and risks of big-team science. Perspectives on Psychological Science, 18(3), 607–623. https://doi.org/10.1177/17456916221082970
Friese, S. P. (2025). Conversational analysis with AI - CA to the power of AI: Rethinking coding in qualitative analysis. OSF Preprints. https://doi.org/10.31219/osf.io/6b52m_v1
*Gabay, R., Hameiri, B., Rubel-Lifschitz, T., & Nadler, A. (2020). The tendency for interpersonal victimhood: The personality construct and its consequences. Personality and Individual Differences, 165, 110134. https://doi.org/10.1016/j.paid.2020.110134
*Giacalone, R. A., Valentine, S. R., Yin, B., & Promislo, M. D. (2025). Rage against the dying of the light: Employees’ moral outrage, anger expression, and generalized well-being. Journal of Business Ethics. https://doi.org/10.1007/s10551-024-05919-1
Google. (2025). Gmail's protections are strong and effective, and claims of a major Gmail security warning are false. https://blog.google/products/workspace/gmail-security-protections/
Götz, F. M., Maertens, R., Loomba, S., & van der Linden, S. (2024). Let the algorithm speak: How to use neural networks for automatic item generation in psychological scale development. Psychological Methods, 29(3), 494–518. https://doi.org/10.1037/met0000540
Granas, A. G., Nørgaard, L. S., & Sporrong, S. K. (2014). Lost in translation? Comparing three Scandinavian translations of the Beliefs about Medicines Questionnaire. Patient Education and Counseling, 96(2), 216–221. https://doi.org/10.1016/j.pec.2014.05.010
Grobelny, J., Szymański, K., & Strozyk, Z. (2025). Act as an expert in psychometry. The evaluation of large language models utility in psychological tests cross-cultural adaptations. Acta Psychologica, 261(105813), 105813. https://doi.org/10.1016/j.actpsy.2025.105813
Harkness J. (2003). Questionnaire translation. In Harkness J. A., van de Vijver F. J. R., Mohler P. P. (Eds.), Cross-Cultural Survey Methods (pp. 35–56). Wiley.
Harkness J. A., Villar A., & Edwards B. (2010). Translation, adaptation, and design. In Harkness J. A.et al. (Eds.), Survey Methods in Multinational, Multicultural and Multiregional Contexts (pp. 117-140). Hoboken, NJ: John Wiley.
Heinz, M. V., Mackin, D. M., Trudeau, B. M., Bhattacharya, S., Wang, Y., Banta, H. A., Jewett, A. D., Salzhauer, A. J., Griffin, T. Z., & Jacobson, N. C. (2025). Randomized trial of a generative AI chatbot for mental health treatment. NEJM AI, 2(4). https://doi.org/10.1056/aioa2400802
*Hewitt, P. L., & Flett, G. L. (1991a). Perfectionism in the self and social contexts: Conceptualization, assessment, and association with psychopathology. Journal of Personality and Social Psychology, 60(3), 456–470. https://doi.org/10.1037/0022-3514.60.3.456
*Hewitt, P. L., Flett, G. L., Turnbull-Donovan, W., & Mikail, S. F. (1991b). The Multidimensional Perfectionism Scale: Reliability, validity, and psychometric properties in psychiatric samples. Psychological Assessment, 3(3), 464–468. http://doi.org/10.1037/1040-3590.3.3.464
Hoekman, J., & Rake, B. (2024). Geography of authorship: How geography shapes authorship attribution in big team science. Research Policy, 53(2), 104927. https://doi.org/10.1016/j.respol.2023.104927
Holcombe, A. (2019). Farewell authors, hello contributors. Nature, 571(7764), 147. https://doi.org/10.1038/d41586-019-02084-8
*Hopwood, C. J., Piazza, J., Chen, S., & Bleidorn, W. (2021). Development and validation of the motivations to Eat Meat Inventory. Appetite, 163, 105210. https://doi.org/10.1016/j.appet.2021.105210
Horn J.L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185. https://doi.org/10.1007/BF02289447
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1-55. https://doi.org/10.1080/10705519909540118
Huang, M., Zhang, X., Soto, C., & Evans, J. (2024). Designing LLM-agents with personalities: A psychometric approach. arXiv. https://doi.org/10.48550/arXiv.2410.19238
*Katzir, M., Baldwin, M., Werner, K. M., & Hofmann, W. (2021). Moving beyond inhibition: Capturing a broader scope of the self-control construct with the Self-Control Strategy Scale (SCSS). Journal of Personality Assessment, 103(6), 762–776. https://doi.org/10.1080/00223891.2021.1883627
Kjell, O. N. E., Kjell, K., Garcia, D., & Sikström, S. (2019). Semantic measures: Using natural language processing to measure, differentiate, and describe psychological constructs. Psychological Methods, 24(1), 92–115. https://doi.org/10.1037/met0000191
Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Bahník, Š., Bernstein, M. J., Bocian, K., Brandt, M. J., Brooks, B., Brumbaugh, C. C., Cemalcilar, Z., Chandler, J., Cheong, W., Davis, W. E., Devos, T., Eisner, M., Frankowska, N., Furrow, D., Galliani, E. M., … Nosek, B. A. (2014). Investigating variation in replicability: A “many labs” replication project. Social Psychology, 45(3), 142–152. https://doi.org/10.1027/1864-9335/a000178
*Konrath, S., James, C., Weinstein, E., & Tench, B. (2025). Development and validation of the Tech With Care Index for teens. Psychology of Popular Media, 14(4), 507–520. https://doi.org/10.1037/ppm0000593
Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155–163. https://doi.org/10.1016/j.jcm.2016.02.012
*Kruglanski, A. W., Thompson, E. P., Higgins, E. T., Atash, M. N., Pierro, A., Shah, J. Y., & Spiegel, S. (2000). To "do the right thing" or to "just do it": Locomotion and assessment as distinct self-regulatory imperatives. Journal of Personality and Social Psychology, 79(5), 793–815. https://doi.org/10.1037/0022-3514.79.5.793
Kunst, J. R. (2026). LLMTranslate: 'shiny' app for TRAPD/ISPOR survey translation with LLMs (R package version 0.3.0). Comprehensive R Archive Network (CRAN). https://doi.org/10.32614/CRAN.package.LLMTranslate
Kunst, J. R., & Bierwiaczonek, K. (2023). Utilizing AI questionnaire translations in cross-cultural and intercultural research: Insights and recommendations. International Journal of Intercultural Relations, 97(101888), 101888. https://doi.org/10.1016/j.ijintrel.2023.101888
Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A. J., Argamon, S. E., Baguley, T., Becker, R. B., Benning, S. D., Bradford, D. E., Buchanan, E. M., Caldwell, A. R., Van Calster, B., Carlsson, R., Chen, S.-C., Chung, B., Colling, L. J., Collins, G. S., Crook, Z., … Zwaan, R. A. (2018). Justify your alpha. Nature Human Behaviour, 2(3), 168–171. https://doi.org/10.1038/s41562-018-0311-x
*LeFebvre, A., & Huta, V. (2021). Age and gender differences in eudaimonic, hedonic, and extrinsic motivations. Journal of Happiness Studies, 22(5), 2299–2321. https://doi.org/10.1007/s10902-020-00319-4
Maertens, R., Götz, F. M., Golino, H. F., Roozenbeek, J., Schneider, C. R., Kyrychenko, Y., Kerr, J. R., Stieger, S., McClanahan, W. P., Drabot, K., He, J., & van der Linden, S. (2024). The Misinformation Susceptibility Test (MIST): A psychometrically validated measure of news veracity discernment. Behavior Research Methods, 56(3), 1863–1899. https://doi.org/10.3758/s13428-023-02124-2
Massé, C. C., Krieger, V., Peró-Cebollero, M., Amador-Campos, J. A., & Guàrdia-Olmos, J. (2025). Measurement invariance and cross-linguistic validation of the PSS-4 in university context: Multidimensional analysis and associations with psychological and behavioral outcomes. Frontiers in Psychology, 16:1648070. https://doi.org/10.3389/fpsyg.2025.1648070
Maslej, N., Fattorini, L., Perrault, R., Gil, Y., Parli, V., Kariuki, N., Capstick, E., Reuel, A., Brynjolfsson, E., Etchemendy, J., Ligett, K., Lyons, T., Manyika, J., Niebles, J. C., Shoham, Y., Wald, R., Walsh, T., Hamrah, A., Santarlasci, L., … Oak, S. (2025). Artificial Intelligence Index Report 2025. arXiv. https://doi.org/10.48550/arXiv.2504.07139
*McWilliam, A. M., Beattie, S., & Callow, N. (2025). The development and validation of the public speaking threats inventory (PSTI). Personality and Individual Differences, 246, 113322. https://doi.org/10.1016/j.paid.2025.113322
Mei, Q., Xie, Y., Yuan, W., & Jackson, M. O. (2024). A Turing test of whether AI chatbots are behaviorally similar to humans. Proceedings of the National Academy of Sciences of the United States of America, 121(9), e2313925121. https://doi.org/10.1073/pnas.2313925121
三浦 麻子・小林 哲郎 (2015).オンライン調査モニタのSatisficeに関する実験的研究. 社会心理学研究,31(1), 1–12. https://doi.org/10.14966/jssp.31.1_1
Mokkink, L. B., Elsman, E. B. M., & Terwee, C. B. (2024). COSMIN guideline for systematic reviews of patient-reported outcome measures version 2.0. Quality of Life Research, 33(11), 2929–2939. https://doi.org/10.1007/s11136-024-03761-6
Moshontz, H., Campbell, L., Ebersole, C. R., IJzerman, H., Urry, H. L., Forscher, P. S., Grahe, J. E., McCarthy, R. J., Musser, E. D., Antfolk, J., Castille, C. M., Evans, T. R., Fiedler, S., Flake, J. K., Forero, D. A., Janssen, S. M. J., Keene, J. R., Protzko, J., Aczel, B., … Chartier, C. R. (2018). The Psychological Science Accelerator: Advancing psychology through a distributed collaborative network. Advances in Methods and Practices in Psychological Science, 1(4), 501–515. https://doi.org/10.1177/2515245918797607
Mundfrom, D. J., Shaw, D. G., & Ke, T. L. (2005). Minimum sample size recommendations for conducting factor analyses. International Journal of Testing, 5(2), 159–168. https://doi.org/10.1207/s15327574ijt0502_4
Orr, M., Cranford, D., Ford, K., Gluck, K., Hancock, W., Lebiere, C., Pirolli, P., Ritter, F., & Stocco, A. (2025). Not even wrong: On the limits of prediction as explanation in cognitive science. arXiv. http://arxiv.org/abs/2510.03311
Ozolins, U., Hale, S., Cheng, X., Hyatt, A., & Schofield, P. (2020). Translation and back-translation methodology in health research – a critique. Expert Review of Pharmacoeconomics & Outcomes Research, 1–9. https://doi.org/10.1080/14737167.2020.1734453
Park, J. J., & Oh, J. (2025). Enhancing international research through alternative back translation methods leveraging artificial intelligence. Human Resource Development International, 1–22. https://doi.org/10.1080/13678868.2025.2558571
Parsons, S., Azevedo, F., Elsherif, M. M., Guay, S., Shahim, O. N., Govaart, G. H., Norris, E., O’Mahony, A., Parker, A. J., Todorovic, A., Pennington, C. R., Garcia-Pelegrin, E., Lazić, A., Robertson, O., Middleton, S. L., Valentini, B., McCuaig, J., Baker, B. J., Collins, E., … Aczel, B. (2022). A community-sourced glossary of open scholarship terms. Nature Human Behaviour, 6(3), 312–318. https://doi.org/10.1038/s41562-021-01269-4
*Paulhus, D. L., & Jones, D. N. (2015). Measures of dark personalities. In G. J. Boyle, D. H. Saklofske, & G. Matthews (Eds.), Measures of Personality and Social Psychological Constructs (pp. 562–594). Elsevier Academic Press. https://doi.org/10.1016/B978-0-12-386915-9.00020-6
*Pennycook, G., Cheyne, J. A., Koehler, D. J., & Fugelsang, J. A. (2020). On the belief that beliefs should change according to evidence: Implications for conspiratorial, moral, paranormal, political, religious, and science beliefs. Judgment and Decision Making, 15(4), 476–498. https://doi.org/10.1017/S1930297500007439
*Przybylski, A. K., Murayama, K., DeHaan, C. R., & Gladwell, V. (2013). Motivational, emotional, and behavioral correlates of fear of missing out. Computers in Human Behavior, 29, 1814–1848. http://dx.doi.org/10.1016/j.chb.2013.02.014
Revelle, W. (2025). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University, Evanston, Illinois. R package version 2.5.6, https://CRAN.R-project.org/package=psych.
Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36. https://doi.org/10.18637/jss.v048.i02
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16(2), 225–237. https://doi.org/10.3758/PBR.16.2.225
Ruggeri, K., Panin, A., Vdovic, M., Većkalov, B., Abdul-Salaam, N., Achterberg, J., Akil, C., Amatya, J., Amatya, K., Andersen, T. L., Aquino, S. D., Arunasalam, A., Ashcroft-Jones, S., Askelund, A. D., Ayacaxli, N., Sheshdeh, A. B., Bailey, A., Barea Arroyo, P., Mejía, G. B., … García-Garzon, E. (2022). The globalizability of temporal discounting. Nature Human Behaviour, 6(10), 1386–1397. https://doi.org/10.1038/s41562-022-01392-w
Salama-Younes, M., Montazeri, A., Ismaïl, A., & Roncin, C. (2009). Factor structure and internal consistency of the 12-item General Health Questionnaire (GHQ-12) and the Subjective Vitality Scale (VS), and the relationship between them: A study from France. Health and Quality of Life Outcomes, 7(1), 22. https://doi.org/10.1186/1477-7525-7-22
Sanz, A., Tapia, J. L., García-Carpintero, E., Rocabado, J. F., & Pedrajas, L. M. (2025). ChatGPT simulated patient: Use in clinical training in psychology. Psicothema, 37(3), 23–32. https://doi.org/10.70478/psicothema.2025.37.21
佐々木 研一・豊田 秀樹 (2024). ChatGPTにより生成された心理尺度項目の信頼性・妥当性の評価 日本テスト学会誌, 20(1), 111–133. https://doi.org/10.24690/jart.20.1_111
*Schepman, A., & Rodway, P. (2026). Validation of the short general attitudes towards artificial intelligence scale: The short GAAIS-10. International Journal of Human–Computer Interaction, 1–17. https://doi.org/10.1080/10447318.2025.2610446
Schönbrodt, F. D., & Wagenmakers, E.-J. (2018). Bayes factor design analysis: Planning for compelling evidence. Psychonomic Bulletin & Review, 25(1), 128–142. https://doi.org/10.3758/s13423-017-1230-y
Schröder, S., Morgenroth, T., Kuhl, U., Vaquet, V., & Paaßen, B. (2025). Large Language Models do not simulate human psychology. arXiv. https://doi.org/10.48550/arXiv.2508.06950
Seminara, D., Khoury, M. J., O’Brien, T. R., Manolio, T., Gwinn, M. L., Little, J., Higgins, J. P. T., Bernstein, J. L., Boffetta, P., Bondy, M., Bray, M. S., Brenchley, P. E., Buffler, P. A., Casas, J. P., Chokkalingam, A. P., Danesh, J., Smith, G. D., Dolan, S., Duncan, R., … Ioannidis, J. P. A. (2007). The emergence of networks in human genome epidemiology: Challenges and opportunities. Epidemiology, 18(1), 1–8. https://doi.org/10.1097/01.ede.0000249540.17855.b7
*Smith, M. M., Saklofske, D. H., Stoeber, J., & Sherry, S. B. (2016). The Big Three Perfectionism Scale: A new measure of perfectionism. Journal of Psychoeducational Assessment, 34(7), 670–687. https://doi.org/10.1177/0734282916651539
Sørensen, C. B., Gram-Hanssen, A., Rosenberg, J., & Baker, J. J. (2025). Comparing ChatGPT-4 and human translation of an outcome questionnaire: A randomized, double-blinded non-inferiority study. Cureus, 17(4), e82525. https://doi.org/10.7759/cureus.82525
*Spielmann, S. S., MacDonald, G., Maxwell, J. A., Joel, S., Peragine, D., Muise, A., & Impett, E. A. (2013). Settling for less out of fear of being single. Journal of Personality and Social Psychology, 105(6), 1049–1073. https://doi.org/10.1037/a0034628
Stefan, A. M., Evans, N. J., & Wagenmakers, E.-J. (2022). Practical challenges and methodological flexibility in prior elicitation. Psychological Methods, 27(2), 177–197. https://doi.org/10.1037/met0000354
Stefana, A., Damiani, S., Granziol, U., Provenzani, U., Solmi, M., Youngstrom, E. A., & Fusar-Poli, P. (2024). Psychological, psychiatric, and behavioral sciences measurement scales: Best practice guidelines for their development and validation. Frontiers in Psychology, 15:1494261. https://doi.org/10.3389/fpsyg.2024.1494261
Symeonaki, M., Stamou, G., Kazani, A., Tsouparopoulou, E., & Stamatopoulou, G. (2024). Examining the development of attitude scales using Large Language Models (LLMs). arXiv. https://doi.org/10.48550/arXiv.2405.19011
Teixeira da Silva, J. A., & Yamada, Y. (2024). Could generative artificial intelligence serve as a psychological counselor? Prospects and limitations. Central Asian Journal of Medical Hypotheses and Ethics, 5(4), 297–303. https://doi.org/10.47316/cajmhe.2024.5.4.06
Trafimow, D., Amrhein, V., Areshenkoff, C. N., Barrera-Causil, C. J., Beh, E. J., Bilgiç, Y. K., Bono, R., Bradley, M. T., Briggs, W. M., Cepeda-Freyre, H. A., Chaigneau, S. E., Ciocca, D. R., Correa, J. C., Cousineau, D., de Boer, M. R., Dhar, S. S., Dolgov, I., Gómez-Benito, J., Grendar, M., … Marmolejo-Ramos, F. (2018). Manipulating the alpha level cannot cure significance testing. Frontiers in Psychology, 9:699. https://doi.org/10.3389/fpsyg.2018.00699
Vaidis, D. C., Sleegers, W. W. A., van Leeuwen, F., DeMarree, K. G., Sætrevik, B., Ross, R. M., Schmidt, K., Protzko, J., Morvinski, C., Ghasemi, O., Roberts, A. J., Stone, J., Bran, A., Gourdon-Kanhukamwe, A., Gunsoy, C., Moussaoui, L. S., Smith, A. R., Nugier, A., Fayant, M.-P., … Priolo, D. (2024). A multilab replication of the induced-compliance paradigm of cognitive dissonance. Advances in Methods and Practices in Psychological Science, 7(1), 25152459231213375. https://doi.org/10.1177/25152459231213375
Van Bavel, J. J., Cichocka, A., Capraro, V., Sjåstad, H., Nezlek, J. B., Pavlović, T., Alfano, M., Gelfand, M. J., Azevedo, F., Birtel, M. D., Cislak, A., Lockwood, P. L., Ross, R. M., Abts, K., Agadullina, E., Aruta, J. J. B., Besharati, S. N., Bor, A., Choma, B. L., … Boggio, P. S. (2022). National identity predicts public health support during a global pandemic. Nature Communications, 13, 517. https://doi.org/10.1038/s41467-021-27668-9
Visser, I., Bergmann, C., Byers-Heinlein, K., Dal Ben, R., Duch, W., Forbes, S., Franchin, L., Frank, M. C., Geraci, A., Hamlin, J. K., Kaldy, Z., Kulke, L., Laverty, C., Lew-Williams, C., Mateu, V., Mayor, J., Moreau, D., Nomikou, I., Schuwerk, T., … Zettersten, M. (2022). Improving the generalizability of infant psychological research: The ManyBabies model. Behavioral and Brain Sciences, 45:e35. https://doi.org/10.1017/S0140525X21000455
*Watson, D., O'Hara, M. W., Naragon-Gainey, K., Koffel, E., Chmielewski, M., Kotov, R., Stasik, S. M., & Ruggero, C. J. (2012). Development and validation of new anxiety and bipolar symptom scales for an expanded version of the IDAS (the IDAS-II). Assessment, 19(4), 399–420. https://doi.org/10.1177/1073191112449857
*Weinfurt, K. P., Lin, L., Bruner, D. W., Cyranowski, J. M., Dombeck, C. B., Hahn, E. A., ... & Flynn, K. E. (2015). Development and initial validation of the PROMIS® sexual function and satisfaction measures version 2.0. Journal of Sexual Medicine, 12(9), 1961–1974. https://doi.org/10.1111/jsm.12966
Werner, P., Eliyahu, E., & Krupat, E. (2025). Mapping the translation and psychometric characteristics of the Patient-Practitioner Oriented Scale: A scoping review. Patient Education and Counseling, 137:108787. https://doi.org/10.1016/j.pec.2025.108787
Wild, D., Grove, A., Martin, M., Eremenco, S., McElroy, S., Verjee-Lorenz, A., Erikson, P., & ISPOR Task Force for Translation and Cultural Adaptation (2005). Principles of good practice for the translation and cultural adaptation process for patient-reported outcomes (PRO) measures: Report of the ISPOR task force for translation and cultural adaptation. Value in Health, 8(2), 94–104. https://doi.org/10.1111/j.1524-4733.2005.04054.x
山田 祐樹 (2024). 心理学を遊撃する──再現性問題は恥だが役に立つ──. ちとせプレス.
*Yang, F., & Oshio, A. (2025). Using attachment theory to conceptualize and measure the experiences in human-AI relationships. Current Psychology, 44(11), 10658–10669. https://doi.org/10.1007/s12144-025-07917-6
Zhang, W., Balloo, K., Hosein, A., & Medland, E. (2024). A scoping review of well-being measures: Conceptualisation and scales for overall well-being. BMC Psychology, 12(1), 585. https://doi.org/10.1186/s40359-024-02074-0
ダウンロード
公開済
投稿日時: 2025-12-01 14:12:21 UTC
公開日時: 2025-12-09 09:34:46 UTC — 2026-03-24 06:54:21 UTCに更新
バージョン
- 2026-03-24 06:54:21 UTC(2)
- 2025-12-09 09:34:46 UTC(1)
改版理由
査読に基づき改稿を行ったため。ライセンス
Copyright(c)2025
山田, 祐樹
小杉, 考司
国里, 愛彦
分寺, 杏介
後藤, 崇志
橋本, 泰央
工藤, 大介
李, 禕飛
眞嶋, 良全
向井, 智哉
野村, 竜也
小口, 真奈
七條, 花恋
下司, 忠大
高松, 礼奈
竹橋, 洋毅
竹下, 昌志
浅野, 良輔
福田, 実奈
古谷, 嘉一郎
日道, 俊之
平野, 寛樹
五十嵐, 祐
伊藤, 雅隆
香川, 璃奈
神野, 雄
加藤, 弘通
古村, 健太郎
宮川, 裕基
水野, 君平
村浦, 新之助
新谷, 優
西村, 多久磨
尾崎, 由佳
佐藤, 秀樹
佐藤, 奈月
嶋, 大樹
瀧川, 諒子
田中, 勝則
塚本, 早織
山崎, 茜
楊, 帆
三浦, 麻子
この作品は、Creative Commons Attribution 4.0 International Licenseの下でライセンスされています。
