プレプリント / バージョン2

大規模言語モデルの翻訳を評価する大規模共同研究

22心理尺度の人手-機械翻訳間の比較

##article.authors##

DOI:

https://doi.org/10.51094/jxiv.2056

キーワード:

大規模言語モデル、 機械翻訳、 人手翻訳、 ビッグチームサイエンス、 翻訳精度、 心理尺度

抄録

大規模言語モデル (LLM) は心理学研究の多側面にて活用されているが,心理尺度の翻訳において人手翻訳と同等の実用性をもつかは未検証である。このManyScalesプロジェクトは,LLM翻訳の妥当性と実用性を多面的に評価し,人手翻訳との比較を行うことを目的とする。本研究は日本国内での大規模共同研究 (43名,36機関) により実施され,22種類の英語版心理尺度を対象とする。それぞれの尺度は全て同一の手続きに沿って,RパッケージLLMTranslateを用いたLLM翻訳版および翻訳者による人手翻訳版の作成ののち,それぞれの版について同一の方法で逆翻訳を行う。両翻訳版は,(a) 専門家による意味的忠実性・自然さ・文化的妥当性の評価,(b)一般回答者による理解しやすさと自然さの評価,(c) 心理測定学的分析 (因子構造・因子得点・測定不変性・関連係数など) の観点から比較する。さらに探索的に,埋め込み表現に基づくコサイン類似度を算出し,原版項目との意味的距離を検討する。本研究により,LLM翻訳が心理尺度翻訳にどの程度活用可能であるか,また人手翻訳との差異がどの観点に表れるかを明らかにし,尺度翻訳プロセスの可視化・標準化への貢献を目指す。

利益相反に関する開示

全ての著者について開示すべき利益相反はありません。

ダウンロード *前日までの集計結果を表示します

ダウンロード実績データは、公開の翌日以降に作成されます。

引用文献

アスタリスク(*)が付された文献は,翻訳対象とする尺度が報告されているものである。

Abrams, E., Leone, P. V., Cambrosio, A., & Faraj, S. (2025). The governance of open science: A comparative analysis of two open science consortia. Research Policy, 54(3), 105195. https://doi.org/10.1016/j.respol.2025.105195

Adetula, A., Forscher, P. S., Basnight-Brown, D., Azouaghe, S., & IJzerman, H. (2022). Psychology should generalize from—not just to—Africa. Nature Reviews Psychology, 1(7), 370–371. https://doi.org/10.1038/s44159-022-00070-y

Affonso, F. M. (2025). Detecting vision-enabled AI respondents in behavioral research through cognitive traps. PsyArXiv. https://doi.org/10.31234/osf.io/enuqj_v1

Anastasi, A. (1996). Psychological testing (7th ed.). Prentice Hall.

Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J. R., Rytting, C., & Wingate, D. (2023). Out of one, many: Using language models to simulate human samples. Political Analysis, 31(3), 337–351. https://doi.org/10.1017/pan.2023.2

Ayers, J. W., Poliak, A., Dredze, M., Leas, E. C., Zhu, Z., Kelley, J. B., Faix, D. J., Goodman, A. M., Longhurst, C. A., Hogarth, M., & Smith, D. M. (2023). Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Internal Medicine, 183(6), 589–596. https://doi.org/10.1001/jamainternmed.2023.1838

*Back, M. D., Küfner, A. C., Dufner, M., Gerlach, T. M., Rauthmann, J. F., & Denissen, J. J. (2013). Narcissistic admiration and rivalry: Disentangling the bright and dark sides of narcissism. Journal of Personality and Social Psychology, 105(6), 1013–1037. https://doi.org/10.1037/a0034431

Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., Bollen, K. A., Brembs, B., Brown, L., Camerer, C., Cesarini, D., Chambers, C. D., Clyde, M., Cook, T. D., De Boeck, P., Dienes, Z., Dreber, A., Easwaran, K., Efferson, C., … Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6–10. https://doi.org/10.1038/s41562-017-0189-z

Binz, M., Akata, E., Bethge, M., Brändle, F., Callaway, F., Coda-Forno, J., Dayan, P., Demircan, C., Eckstein, M. K., Éltető, N., Griffiths, T. L., Haridi, S., Jagadish, A. K., Ji-An, L., Kipnis, A., Kumar, S., Ludwig, T., Mathony, M., Mattar, M., … Schulz, E. (2025). A foundation model to predict and capture human cognition. Nature, 644(8078), 1002–1009. https://doi.org/10.1038/s41586-025-09215-4

Binz, M., Alaniz, S., Roskies, A., Aczel, B., Bergstrom, C. T., Allen, C., Schad, D., Wulff, D., West, J. D., Zhang, Q., Shiffrin, R. M., Gershman, S. J., Popov, V., Bender, E. M., Marelli, M., Botvinick, M. M., Akata, Z., & Schulz, E. (2025). How should the advancement of large language models affect the practice of science? Proceedings of the National Academy of Sciences of the United States of America, 122(5), e2401227121. https://doi.org/10.1073/pnas.2401227121

Binz, M., & Schulz, E. (2023). Using cognitive psychology to understand GPT-3. Proceedings of the National Academy of Sciences of the United States of America, 120(6), e2218523120. https://doi.org/10.1073/pnas.2218523120

Bowers, J. S., Puebla, G., Thorat, S., Tsetsos, K., & Ludwig, C. J. H. (2025). Centaur: A model without a theory. PsyArXiv. https://doi.org/10.31234/osf.io/v9w37_v3

Breznau, N., Rinke, E. M., Wuttke, A., Nguyen, H. H. V., Adem, M., Adriaans, J., Alvarez-Benjumea, A., Andersen, H. K., Auer, D., Azevedo, F., Bahnsen, O., Balzer, D., Bauer, G., Bauer, P. C., Baumann, M., Baute, S., Benoit, V., Bernauer, J., Berning, C., … Żółtak, T. (2022). Observing many researchers using the same data and hypothesis reveals a hidden universe of uncertainty. Proceedings of the National Academy of Sciences of the United States of America, 119(44), e2203150119. https://doi.org/10.1073/pnas.2203150119

*Briesch, A. M., Chafouleas, S. M., Neugebauer, S. R., & Riley-Tillman, T. C. (2013). Assessing influences on intervention implementation: Revision of the Usage Rating Profile-Intervention. Journal of School Psychology, 51(1), 81–96. https://doi.org/10.1016/j.jsp.2012.08.006

Brislin R. W. (1986). The wording and translation of research instruments. In Lonner W., Berry J. (Eds.), Field Methods in Cross-Cultural Research (pp. 137–164). Sage.

*Buckels, E. E., Jones, D. N., & Paulhus, D. L. (2013). Behavioral confirmation of everyday sadism. Psychological Science, 24(11), 2201–2209. https://doi.org/10.1177/0956797613490749

*Cantarella, I. A., Spielmann, S. S., Partridge, T., MacDonald, G., Joel, S., & Impett, E. A. (2023). Validating the fear of being single scale for individuals in relationships. Journal of Social and Personal Relationships, 40(9), 2969–2979. https://doi.org/10.1177/02654075231164588

Cao, Y., Sickles, R. C., Triebs, T. P., & Tumlinson, J. (2024). Linguistic distance to English impedes research performance. Research Policy, 53(4), 104971. https://doi.org/10.1016/j.respol.2024.104971

*Cartwright, F., & Stritzke, W. G. K. (2008). A multidimensional ambivalence model of chocolate craving: Construct validity and associations with chocolate consumption and disordered eating. Eating Behaviors, 9(1), 1–12. https://doi.org/10.1016/j.eatbeh.2007.01.006

Coles, N. A., Hamlin, J. K., Sullivan, L. L., Parker, T. H., & Altschul, D. (2022). Build up big-team science. Nature, 601(7894), 505–507. https://doi.org/10.1038/d41586-022-00150-2

Coles, N. A., March, D. S., Marmolejo-Ramos, F., Larsen, J. T., Arinze, N. C., Ndukaihe, I. L. G., Willis, M. L., Foroni, F., Reggev, N., Mokady, A., Forscher, P. S., Hunter, J. F., Kaminski, G., Yüvrük, E., Kapucu, A., Nagy, T., Hajdu, N., Tejada, J., Freitag, R. M. K., … Liuzza, M. T. (2022). A multi-lab test of the facial feedback hypothesis by the Many Smiles Collaboration. Nature Human Behaviour, 6(12), 1731–1742. https://doi.org/10.1038/s41562-022-01458-9

*Crimston, C. R., Hornsey, M. J., Bain, P. G., & Bastian, B. (2018). Moral expansiveness short form: Validity and reliability of the MESx. PLoS One, 13(10), e0205373. https://doi.org/10.1371/journal.pone.0205373

Cronin, B. (2001). Hyperauthorship: A postmodern perversion or evidence of a structural shift in scholarly communication practices? Journal of the American Society for Information Science and Technology, 52(7), 558–569. https://doi.org/10.1002/asi.1097

Cross, J., Kayalackakom, T., Robinson, R. E., Vaughans, A., Sebastian, R., Hood, R., Lewis, C., Devaraju, S., Honnavar, P., Naik, S., Joseph, J., Anand, N., Mohammed, A., Johnson, A., Cohen, E., Adeniji, T., Nnenna Nnaji, A., & George, J. E. (2025). Assessing ChatGPT’s capability as a new age standardized patient: Qualitative study. JMIR Medical Education, 11(1), e63353. https://doi.org/10.2196/63353

Cruchinho, P., López-Franco, M. D., Capelas, M. L., Almeida, S., Bennett, P. M., Miranda da Silva, M., Teixeira, G., Nunes, E., Lucas, P., Gaspar, F., & Handovers4SafeCare. (2024). Translation, cross-cultural adaptation, and validation of measurement instruments: A practical guideline for novice researchers. Journal of Multidisciplinary Healthcare, 17, 2701–2728. https://doi.org/10.2147/JMDH.S419714

DELVE (DECam Local Volume Exploration) Survey. (2025). DELVE Policy Guidelines Version 2.3. https://delve-survey.github.io/docs/DELVE_PolicyGuidelines.pdf

Dillion, D., Tandon, N., Gu, Y., & Gray, K. (2023). Can AI language models replace human participants? Trends in Cognitive Sciences, 27(7), 597–600. https://doi.org/10.1016/j.tics.2023.04.008

Elyoseph, Z., Hadar-Shoval, D., Asraf, K., & Lvovsky, M. (2023). ChatGPT outperforms humans in emotional awareness evaluations. Frontiers in Psychology, 14, 1199058. https://doi.org/10.3389/fpsyg.2023.1199058

Epstein, J., Santo, R. M., & Guillemin, F. (2015). A review of guidelines for cross-cultural adaptation of questionnaires could not bring out a consensus. Journal of Clinical Epidemiology, 68, 435-441. http://dx.doi.org/10.1016/j.jclinepi.2014.11.021

Forscher, P. S., Wagenmakers, E.-J., Coles, N. A., Silan, M. A., Dutra, N., Basnight-Brown, D., & IJzerman, H. (2023). The benefits, barriers, and risks of big-team science. Perspectives on Psychological Science, 18(3), 607–623. https://doi.org/10.1177/17456916221082970

Friese, S. P. (2025). Conversational analysis with AI - CA to the power of AI: Rethinking coding in qualitative analysis. OSF Preprints. https://doi.org/10.31219/osf.io/6b52m_v1

*Gabay, R., Hameiri, B., Rubel-Lifschitz, T., & Nadler, A. (2020). The tendency for interpersonal victimhood: The personality construct and its consequences. Personality and Individual Differences, 165, 110134. https://doi.org/10.1016/j.paid.2020.110134

*Giacalone, R. A., Valentine, S. R., Yin, B., & Promislo, M. D. (2025). Rage against the dying of the light: Employees’ moral outrage, anger expression, and generalized well-being. Journal of Business Ethics. https://doi.org/10.1007/s10551-024-05919-1

Google. (2025). Gmail's protections are strong and effective, and claims of a major Gmail security warning are false. https://blog.google/products/workspace/gmail-security-protections/

Götz, F. M., Maertens, R., Loomba, S., & van der Linden, S. (2024). Let the algorithm speak: How to use neural networks for automatic item generation in psychological scale development. Psychological Methods, 29(3), 494–518. https://doi.org/10.1037/met0000540

Granas, A. G., Nørgaard, L. S., & Sporrong, S. K. (2014). Lost in translation? Comparing three Scandinavian translations of the Beliefs about Medicines Questionnaire. Patient Education and Counseling, 96(2), 216–221. https://doi.org/10.1016/j.pec.2014.05.010

Grobelny, J., Szymański, K., & Strozyk, Z. (2025). Act as an expert in psychometry. The evaluation of large language models utility in psychological tests cross-cultural adaptations. Acta Psychologica, 261(105813), 105813. https://doi.org/10.1016/j.actpsy.2025.105813

Harkness J. (2003). Questionnaire translation. In Harkness J. A., van de Vijver F. J. R., Mohler P. P. (Eds.), Cross-Cultural Survey Methods (pp. 35–56). Wiley.

Harkness J. A., Villar A., & Edwards B. (2010). Translation, adaptation, and design. In Harkness J. A.et al. (Eds.), Survey Methods in Multinational, Multicultural and Multiregional Contexts (pp. 117-140). Hoboken, NJ: John Wiley.

Heinz, M. V., Mackin, D. M., Trudeau, B. M., Bhattacharya, S., Wang, Y., Banta, H. A., Jewett, A. D., Salzhauer, A. J., Griffin, T. Z., & Jacobson, N. C. (2025). Randomized trial of a generative AI chatbot for mental health treatment. NEJM AI, 2(4). https://doi.org/10.1056/aioa2400802

*Hewitt, P. L., & Flett, G. L. (1991a). Perfectionism in the self and social contexts: Conceptualization, assessment, and association with psychopathology. Journal of Personality and Social Psychology, 60(3), 456–470. https://doi.org/10.1037/0022-3514.60.3.456

*Hewitt, P. L., Flett, G. L., Turnbull-Donovan, W., & Mikail, S. F. (1991b). The Multidimensional Perfectionism Scale: Reliability, validity, and psychometric properties in psychiatric samples. Psychological Assessment, 3(3), 464–468. http://doi.org/10.1037/1040-3590.3.3.464

Hoekman, J., & Rake, B. (2024). Geography of authorship: How geography shapes authorship attribution in big team science. Research Policy, 53(2), 104927. https://doi.org/10.1016/j.respol.2023.104927

Holcombe, A. (2019). Farewell authors, hello contributors. Nature, 571(7764), 147. https://doi.org/10.1038/d41586-019-02084-8

*Hopwood, C. J., Piazza, J., Chen, S., & Bleidorn, W. (2021). Development and validation of the motivations to Eat Meat Inventory. Appetite, 163, 105210. https://doi.org/10.1016/j.appet.2021.105210

Horn J.L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185. https://doi.org/10.1007/BF02289447

Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1-55. https://doi.org/10.1080/10705519909540118

Huang, M., Zhang, X., Soto, C., & Evans, J. (2024). Designing LLM-agents with personalities: A psychometric approach. arXiv. https://doi.org/10.48550/arXiv.2410.19238

*Katzir, M., Baldwin, M., Werner, K. M., & Hofmann, W. (2021). Moving beyond inhibition: Capturing a broader scope of the self-control construct with the Self-Control Strategy Scale (SCSS). Journal of Personality Assessment, 103(6), 762–776. https://doi.org/10.1080/00223891.2021.1883627

Kjell, O. N. E., Kjell, K., Garcia, D., & Sikström, S. (2019). Semantic measures: Using natural language processing to measure, differentiate, and describe psychological constructs. Psychological Methods, 24(1), 92–115. https://doi.org/10.1037/met0000191

Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Bahník, Š., Bernstein, M. J., Bocian, K., Brandt, M. J., Brooks, B., Brumbaugh, C. C., Cemalcilar, Z., Chandler, J., Cheong, W., Davis, W. E., Devos, T., Eisner, M., Frankowska, N., Furrow, D., Galliani, E. M., … Nosek, B. A. (2014). Investigating variation in replicability: A “many labs” replication project. Social Psychology, 45(3), 142–152. https://doi.org/10.1027/1864-9335/a000178

*Konrath, S., James, C., Weinstein, E., & Tench, B. (2025). Development and validation of the Tech With Care Index for teens. Psychology of Popular Media, 14(4), 507–520. https://doi.org/10.1037/ppm0000593

Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155–163. https://doi.org/10.1016/j.jcm.2016.02.012

*Kruglanski, A. W., Thompson, E. P., Higgins, E. T., Atash, M. N., Pierro, A., Shah, J. Y., & Spiegel, S. (2000). To "do the right thing" or to "just do it": Locomotion and assessment as distinct self-regulatory imperatives. Journal of Personality and Social Psychology, 79(5), 793–815. https://doi.org/10.1037/0022-3514.79.5.793

Kunst, J. R. (2026). LLMTranslate: 'shiny' app for TRAPD/ISPOR survey translation with LLMs (R package version 0.3.0). Comprehensive R Archive Network (CRAN). https://doi.org/10.32614/CRAN.package.LLMTranslate

Kunst, J. R., & Bierwiaczonek, K. (2023). Utilizing AI questionnaire translations in cross-cultural and intercultural research: Insights and recommendations. International Journal of Intercultural Relations, 97(101888), 101888. https://doi.org/10.1016/j.ijintrel.2023.101888

Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A. J., Argamon, S. E., Baguley, T., Becker, R. B., Benning, S. D., Bradford, D. E., Buchanan, E. M., Caldwell, A. R., Van Calster, B., Carlsson, R., Chen, S.-C., Chung, B., Colling, L. J., Collins, G. S., Crook, Z., … Zwaan, R. A. (2018). Justify your alpha. Nature Human Behaviour, 2(3), 168–171. https://doi.org/10.1038/s41562-018-0311-x

*LeFebvre, A., & Huta, V. (2021). Age and gender differences in eudaimonic, hedonic, and extrinsic motivations. Journal of Happiness Studies, 22(5), 2299–2321. https://doi.org/10.1007/s10902-020-00319-4

Maertens, R., Götz, F. M., Golino, H. F., Roozenbeek, J., Schneider, C. R., Kyrychenko, Y., Kerr, J. R., Stieger, S., McClanahan, W. P., Drabot, K., He, J., & van der Linden, S. (2024). The Misinformation Susceptibility Test (MIST): A psychometrically validated measure of news veracity discernment. Behavior Research Methods, 56(3), 1863–1899. https://doi.org/10.3758/s13428-023-02124-2

Massé, C. C., Krieger, V., Peró-Cebollero, M., Amador-Campos, J. A., & Guàrdia-Olmos, J. (2025). Measurement invariance and cross-linguistic validation of the PSS-4 in university context: Multidimensional analysis and associations with psychological and behavioral outcomes. Frontiers in Psychology, 16:1648070. https://doi.org/10.3389/fpsyg.2025.1648070

Maslej, N., Fattorini, L., Perrault, R., Gil, Y., Parli, V., Kariuki, N., Capstick, E., Reuel, A., Brynjolfsson, E., Etchemendy, J., Ligett, K., Lyons, T., Manyika, J., Niebles, J. C., Shoham, Y., Wald, R., Walsh, T., Hamrah, A., Santarlasci, L., … Oak, S. (2025). Artificial Intelligence Index Report 2025. arXiv. https://doi.org/10.48550/arXiv.2504.07139

*McWilliam, A. M., Beattie, S., & Callow, N. (2025). The development and validation of the public speaking threats inventory (PSTI). Personality and Individual Differences, 246, 113322. https://doi.org/10.1016/j.paid.2025.113322

Mei, Q., Xie, Y., Yuan, W., & Jackson, M. O. (2024). A Turing test of whether AI chatbots are behaviorally similar to humans. Proceedings of the National Academy of Sciences of the United States of America, 121(9), e2313925121. https://doi.org/10.1073/pnas.2313925121

三浦 麻子・小林 哲郎 (2015).オンライン調査モニタのSatisficeに関する実験的研究. 社会心理学研究,31(1), 1–12. https://doi.org/10.14966/jssp.31.1_1

Mokkink, L. B., Elsman, E. B. M., & Terwee, C. B. (2024). COSMIN guideline for systematic reviews of patient-reported outcome measures version 2.0. Quality of Life Research, 33(11), 2929–2939. https://doi.org/10.1007/s11136-024-03761-6

Moshontz, H., Campbell, L., Ebersole, C. R., IJzerman, H., Urry, H. L., Forscher, P. S., Grahe, J. E., McCarthy, R. J., Musser, E. D., Antfolk, J., Castille, C. M., Evans, T. R., Fiedler, S., Flake, J. K., Forero, D. A., Janssen, S. M. J., Keene, J. R., Protzko, J., Aczel, B., … Chartier, C. R. (2018). The Psychological Science Accelerator: Advancing psychology through a distributed collaborative network. Advances in Methods and Practices in Psychological Science, 1(4), 501–515. https://doi.org/10.1177/2515245918797607

Mundfrom, D. J., Shaw, D. G., & Ke, T. L. (2005). Minimum sample size recommendations for conducting factor analyses. International Journal of Testing, 5(2), 159–168. https://doi.org/10.1207/s15327574ijt0502_4

Orr, M., Cranford, D., Ford, K., Gluck, K., Hancock, W., Lebiere, C., Pirolli, P., Ritter, F., & Stocco, A. (2025). Not even wrong: On the limits of prediction as explanation in cognitive science. arXiv. http://arxiv.org/abs/2510.03311

Ozolins, U., Hale, S., Cheng, X., Hyatt, A., & Schofield, P. (2020). Translation and back-translation methodology in health research – a critique. Expert Review of Pharmacoeconomics & Outcomes Research, 1–9. https://doi.org/10.1080/14737167.2020.1734453

Park, J. J., & Oh, J. (2025). Enhancing international research through alternative back translation methods leveraging artificial intelligence. Human Resource Development International, 1–22. https://doi.org/10.1080/13678868.2025.2558571

Parsons, S., Azevedo, F., Elsherif, M. M., Guay, S., Shahim, O. N., Govaart, G. H., Norris, E., O’Mahony, A., Parker, A. J., Todorovic, A., Pennington, C. R., Garcia-Pelegrin, E., Lazić, A., Robertson, O., Middleton, S. L., Valentini, B., McCuaig, J., Baker, B. J., Collins, E., … Aczel, B. (2022). A community-sourced glossary of open scholarship terms. Nature Human Behaviour, 6(3), 312–318. https://doi.org/10.1038/s41562-021-01269-4

*Paulhus, D. L., & Jones, D. N. (2015). Measures of dark personalities. In G. J. Boyle, D. H. Saklofske, & G. Matthews (Eds.), Measures of Personality and Social Psychological Constructs (pp. 562–594). Elsevier Academic Press. https://doi.org/10.1016/B978-0-12-386915-9.00020-6

*Pennycook, G., Cheyne, J. A., Koehler, D. J., & Fugelsang, J. A. (2020). On the belief that beliefs should change according to evidence: Implications for conspiratorial, moral, paranormal, political, religious, and science beliefs. Judgment and Decision Making, 15(4), 476–498. https://doi.org/10.1017/S1930297500007439

*Przybylski, A. K., Murayama, K., DeHaan, C. R., & Gladwell, V. (2013). Motivational, emotional, and behavioral correlates of fear of missing out. Computers in Human Behavior, 29, 1814–1848. http://dx.doi.org/10.1016/j.chb.2013.02.014

Revelle, W. (2025). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University, Evanston, Illinois. R package version 2.5.6, https://CRAN.R-project.org/package=psych.

Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36. https://doi.org/10.18637/jss.v048.i02

Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16(2), 225–237. https://doi.org/10.3758/PBR.16.2.225

Ruggeri, K., Panin, A., Vdovic, M., Većkalov, B., Abdul-Salaam, N., Achterberg, J., Akil, C., Amatya, J., Amatya, K., Andersen, T. L., Aquino, S. D., Arunasalam, A., Ashcroft-Jones, S., Askelund, A. D., Ayacaxli, N., Sheshdeh, A. B., Bailey, A., Barea Arroyo, P., Mejía, G. B., … García-Garzon, E. (2022). The globalizability of temporal discounting. Nature Human Behaviour, 6(10), 1386–1397. https://doi.org/10.1038/s41562-022-01392-w

Salama-Younes, M., Montazeri, A., Ismaïl, A., & Roncin, C. (2009). Factor structure and internal consistency of the 12-item General Health Questionnaire (GHQ-12) and the Subjective Vitality Scale (VS), and the relationship between them: A study from France. Health and Quality of Life Outcomes, 7(1), 22. https://doi.org/10.1186/1477-7525-7-22

Sanz, A., Tapia, J. L., García-Carpintero, E., Rocabado, J. F., & Pedrajas, L. M. (2025). ChatGPT simulated patient: Use in clinical training in psychology. Psicothema, 37(3), 23–32. https://doi.org/10.70478/psicothema.2025.37.21

佐々木 研一・豊田 秀樹 (2024). ChatGPTにより生成された心理尺度項目の信頼性・妥当性の評価 日本テスト学会誌, 20(1), 111–133. https://doi.org/10.24690/jart.20.1_111

*Schepman, A., & Rodway, P. (2026). Validation of the short general attitudes towards artificial intelligence scale: The short GAAIS-10. International Journal of Human–Computer Interaction, 1–17. https://doi.org/10.1080/10447318.2025.2610446

Schönbrodt, F. D., & Wagenmakers, E.-J. (2018). Bayes factor design analysis: Planning for compelling evidence. Psychonomic Bulletin & Review, 25(1), 128–142. https://doi.org/10.3758/s13423-017-1230-y

Schröder, S., Morgenroth, T., Kuhl, U., Vaquet, V., & Paaßen, B. (2025). Large Language Models do not simulate human psychology. arXiv. https://doi.org/10.48550/arXiv.2508.06950

Seminara, D., Khoury, M. J., O’Brien, T. R., Manolio, T., Gwinn, M. L., Little, J., Higgins, J. P. T., Bernstein, J. L., Boffetta, P., Bondy, M., Bray, M. S., Brenchley, P. E., Buffler, P. A., Casas, J. P., Chokkalingam, A. P., Danesh, J., Smith, G. D., Dolan, S., Duncan, R., … Ioannidis, J. P. A. (2007). The emergence of networks in human genome epidemiology: Challenges and opportunities. Epidemiology, 18(1), 1–8. https://doi.org/10.1097/01.ede.0000249540.17855.b7

*Smith, M. M., Saklofske, D. H., Stoeber, J., & Sherry, S. B. (2016). The Big Three Perfectionism Scale: A new measure of perfectionism. Journal of Psychoeducational Assessment, 34(7), 670–687. https://doi.org/10.1177/0734282916651539

Sørensen, C. B., Gram-Hanssen, A., Rosenberg, J., & Baker, J. J. (2025). Comparing ChatGPT-4 and human translation of an outcome questionnaire: A randomized, double-blinded non-inferiority study. Cureus, 17(4), e82525. https://doi.org/10.7759/cureus.82525

*Spielmann, S. S., MacDonald, G., Maxwell, J. A., Joel, S., Peragine, D., Muise, A., & Impett, E. A. (2013). Settling for less out of fear of being single. Journal of Personality and Social Psychology, 105(6), 1049–1073. https://doi.org/10.1037/a0034628

Stefan, A. M., Evans, N. J., & Wagenmakers, E.-J. (2022). Practical challenges and methodological flexibility in prior elicitation. Psychological Methods, 27(2), 177–197. https://doi.org/10.1037/met0000354

Stefana, A., Damiani, S., Granziol, U., Provenzani, U., Solmi, M., Youngstrom, E. A., & Fusar-Poli, P. (2024). Psychological, psychiatric, and behavioral sciences measurement scales: Best practice guidelines for their development and validation. Frontiers in Psychology, 15:1494261. https://doi.org/10.3389/fpsyg.2024.1494261

Symeonaki, M., Stamou, G., Kazani, A., Tsouparopoulou, E., & Stamatopoulou, G. (2024). Examining the development of attitude scales using Large Language Models (LLMs). arXiv. https://doi.org/10.48550/arXiv.2405.19011

Teixeira da Silva, J. A., & Yamada, Y. (2024). Could generative artificial intelligence serve as a psychological counselor? Prospects and limitations. Central Asian Journal of Medical Hypotheses and Ethics, 5(4), 297–303. https://doi.org/10.47316/cajmhe.2024.5.4.06

Trafimow, D., Amrhein, V., Areshenkoff, C. N., Barrera-Causil, C. J., Beh, E. J., Bilgiç, Y. K., Bono, R., Bradley, M. T., Briggs, W. M., Cepeda-Freyre, H. A., Chaigneau, S. E., Ciocca, D. R., Correa, J. C., Cousineau, D., de Boer, M. R., Dhar, S. S., Dolgov, I., Gómez-Benito, J., Grendar, M., … Marmolejo-Ramos, F. (2018). Manipulating the alpha level cannot cure significance testing. Frontiers in Psychology, 9:699. https://doi.org/10.3389/fpsyg.2018.00699

Vaidis, D. C., Sleegers, W. W. A., van Leeuwen, F., DeMarree, K. G., Sætrevik, B., Ross, R. M., Schmidt, K., Protzko, J., Morvinski, C., Ghasemi, O., Roberts, A. J., Stone, J., Bran, A., Gourdon-Kanhukamwe, A., Gunsoy, C., Moussaoui, L. S., Smith, A. R., Nugier, A., Fayant, M.-P., … Priolo, D. (2024). A multilab replication of the induced-compliance paradigm of cognitive dissonance. Advances in Methods and Practices in Psychological Science, 7(1), 25152459231213375. https://doi.org/10.1177/25152459231213375

Van Bavel, J. J., Cichocka, A., Capraro, V., Sjåstad, H., Nezlek, J. B., Pavlović, T., Alfano, M., Gelfand, M. J., Azevedo, F., Birtel, M. D., Cislak, A., Lockwood, P. L., Ross, R. M., Abts, K., Agadullina, E., Aruta, J. J. B., Besharati, S. N., Bor, A., Choma, B. L., … Boggio, P. S. (2022). National identity predicts public health support during a global pandemic. Nature Communications, 13, 517. https://doi.org/10.1038/s41467-021-27668-9

Visser, I., Bergmann, C., Byers-Heinlein, K., Dal Ben, R., Duch, W., Forbes, S., Franchin, L., Frank, M. C., Geraci, A., Hamlin, J. K., Kaldy, Z., Kulke, L., Laverty, C., Lew-Williams, C., Mateu, V., Mayor, J., Moreau, D., Nomikou, I., Schuwerk, T., … Zettersten, M. (2022). Improving the generalizability of infant psychological research: The ManyBabies model. Behavioral and Brain Sciences, 45:e35. https://doi.org/10.1017/S0140525X21000455

*Watson, D., O'Hara, M. W., Naragon-Gainey, K., Koffel, E., Chmielewski, M., Kotov, R., Stasik, S. M., & Ruggero, C. J. (2012). Development and validation of new anxiety and bipolar symptom scales for an expanded version of the IDAS (the IDAS-II). Assessment, 19(4), 399–420. https://doi.org/10.1177/1073191112449857

*Weinfurt, K. P., Lin, L., Bruner, D. W., Cyranowski, J. M., Dombeck, C. B., Hahn, E. A., ... & Flynn, K. E. (2015). Development and initial validation of the PROMIS® sexual function and satisfaction measures version 2.0. Journal of Sexual Medicine, 12(9), 1961–1974. https://doi.org/10.1111/jsm.12966

Werner, P., Eliyahu, E., & Krupat, E. (2025). Mapping the translation and psychometric characteristics of the Patient-Practitioner Oriented Scale: A scoping review. Patient Education and Counseling, 137:108787. https://doi.org/10.1016/j.pec.2025.108787

Wild, D., Grove, A., Martin, M., Eremenco, S., McElroy, S., Verjee-Lorenz, A., Erikson, P., & ISPOR Task Force for Translation and Cultural Adaptation (2005). Principles of good practice for the translation and cultural adaptation process for patient-reported outcomes (PRO) measures: Report of the ISPOR task force for translation and cultural adaptation. Value in Health, 8(2), 94–104. https://doi.org/10.1111/j.1524-4733.2005.04054.x

山田 祐樹 (2024). 心理学を遊撃する──再現性問題は恥だが役に立つ──. ちとせプレス.

*Yang, F., & Oshio, A. (2025). Using attachment theory to conceptualize and measure the experiences in human-AI relationships. Current Psychology, 44(11), 10658–10669. https://doi.org/10.1007/s12144-025-07917-6

Zhang, W., Balloo, K., Hosein, A., & Medland, E. (2024). A scoping review of well-being measures: Conceptualisation and scales for overall well-being. BMC Psychology, 12(1), 585. https://doi.org/10.1186/s40359-024-02074-0

ダウンロード

公開済


投稿日時: 2025-12-01 14:12:21 UTC

公開日時: 2025-12-09 09:34:46 UTC — 2026-03-24 06:54:21 UTCに更新

バージョン

改版理由

査読に基づき改稿を行ったため。
研究分野
心理学・教育学