Preprint / Version 1

Bigram Knowledge Extraction from GPT-2

##article.authors##

  • Minoru Yoshida Faculty of Science and Technology, Tokushima University
  • Kazuyuki Matsumoto Faculty of Science and Technology, Tokushima University

DOI:

https://doi.org/10.51094/jxiv.752

Keywords:

GPT

Abstract

GPT-2からbigram知識を取り出す手法を提案する.提案手法は,第1層のヘッドに着目し,出力単語を予測する.

さらに,誤差逆伝搬により,与えられたbigramから,それと関連する文脈語を取り出す手法も提案する.

実験では,提案手法がベースライン手法と比べ精度の高い抽出を行えることを確認する.

Conflicts of Interest Disclosure

Conflicts of Interest: None

Downloads *Displays the aggregated results up to the previous day.

Download data is not yet available.

References

Samira Abnar, Willem H. Zuidema: Quantifying Attention Flow in Transformers. ACL 2020: 4190-4197

Tanja Baeumel, Soniya Vijayakumar, Josef van Genabith, Guenter Neumann, Simon Ostermann: Investigating the Encoding of Words in BERT's Neurons using Feature Textualization. Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pages 261?270, 2023.

Nora Belrose, Zach Furman, Logan Smith, Danny Halawi, Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt: Eliciting Latent Predictions from Transformers with the Tuned Lens. CoRR abs/2303.08112, 2023.

Steven Bills, Nick Cammarata, Dan Mossing, Henk Tillman, Leo Gao, Gabriel Goh, Ilya Sutskever, Jan Leike, Jeff Wu, William Saunders? Language models can explain neurons in language models, URL: https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html, 2023.

Trenton Bricken, Adly Templeton, Joshua Batson, Brian Chen, Adam Jermyn, Tom Conerly, Nicholas L Turner, Cem Anil, Carson Denison, Amanda Askell, Robert Lasenby, Yifan Wu, Shauna Kravec, Nicholas Schiefer, Tim Maxwell, Nicholas Joseph, Alex Tamkin, Karina Nguyen, Brayden McLean, Josiah E Burke, Tristan Hume, Shan Carter, Tom Henighan, Chris Olah, Towards Monosemanticity: Decomposing Language Models With Dictionary Learning URL: https://transformer-circuits.pub/2023/monosemantic-features/index.html

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei: Language Models are Few-Shot Learners. NeurIPS 2020.

Soumen Chakrabarti, Mining the Web : Discovering Knowledge from Hypertext Data, Morgan-Kaufmann Publishers, 2002.

Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D. Manning: What Does BERT Look At? An Analysis of BERT's Attention. BlackboxNLP@ACL 2019: 276-286.

Guy Dar, Mor Geva, Ankit Gupta, Jonathan Berant: Analyzing Transformers in Embedding Space, ACL (1) 2023: 16124-16170

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT (1) 2019: 4171-4186

Measuring the Mixing of Contextual Information in the Transformer Javier Ferrando, Gerard I. Gallego, Marta R. Costa-jussa, EMNLP 2022, 8698 - 8714.

Mor Geva, Roei Schuster, Jonathan Berant, Omer Levy: Transformer Feed-Forward Layers Are Key-Value Memories. EMNLP (1) 2021: 5484-5495

Mor Geva, Avi Caciularu, Kevin Ro Wang, Yoav Goldberg: Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space. CoRR abs/2203.14680 (2022)

Ankit Gupta, Jonathan Berant: Value-aware Approximate Attention. EMNLP (1) 2021: 9567-9574

Wes Gurnee, Neel Nanda, Matthew Pauly, Katherine Harvey, Dmitrii Troitskii, Dimitris Bertsimas: Finding Neurons in a Haystack: Case Studies with Sparse Probing. CoRR abs/2305.01610 (2023)

William Held, Diyi Yang: Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers. EACL 2023: 2408-2419

Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, Kentaro Inui: Attention is Not Only a Weight: Analyzing Transformers with Vector Norms. EMNLP (1) 2020: 7057-7075

Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi, Kentaro Inui: Incorporating Residual and Normalization Layers into Analysis of Masked Language Models. EMNLP (1) 2021: 4547-4568

Fajri Koto, Jey Han Lau, Timothy Baldwin: Discourse Probing of Pretrained Language Models. NAACL-HLT 2021: 3849-3864

Anna Langedijk, Hosein Mohebbi, Gabriele Sarti, Willem H. Zuidema, Jaap Jumelet: DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers. CoRR abs/2310.03686 (2023)

Yongjie Lin, Yi Chern Tan, Robert Frank: Open Sesame: Getting inside BERT's Linguistic Knowledge. BlackboxNLP@ACL 2019: 241-253

David Marecek, Rudolf Rosa: From Balustrades to Pierre Vinken: Looking for Syntax in Transformer Self-Attentions. BlackboxNLP@ACL 2019: 263-275

Paul Michel, Omer Levy, Graham Neubig: Are Sixteen Heads Really Better than One? NeurIPS 2019: 14014-14024

nostalgebraist. URL: https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens, 2020.

Daisuke Oba, Naoki Yoshinaga, Masashi Toyoda: Exploratory Model Analysis Using Data-Driven Neuron Representations. BlackboxNLP@EMNLP 2021: 518-528

Emily Reif, Ann Yuan, Martin Wattenberg, Fernanda B. Viegas, Andy Coenen, Adam Pearce, Been Kim: Visualizing and Measuring the Geometry of BERT. NeurIPS 2019: 8592-8600

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin: Attention is All you Need. NIPS 2017: 5998-6008

Jesse Vig, Yonatan Belinkov: Analyzing the Structure of Attention in a Transformer Language Model. BlackboxNLP@ACL 2019: 63-76

Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, Ivan Titov: Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned. ACL (1) 2019: 5797-5808

Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, Ivan Titov: Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned. ACL (1) 2019: 5797-5808

Mansi Sakarvadia, Arham Khan, Aswathy Ajith, Daniel Grzenda, Nathaniel Hudson, Andre Bauer, Kyle Chard, Ian T. Foster: Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism. arXiv:2310.16270v1, 2023.

Alec Radford, Jeff Wu, Rewon Child, D. Luan, Dario Amodei, Ilya Sutskever, Language Models are Unsupervised Multitask Learners, URL: https://www.semanticscholar.org/paper/Language-Models-are-Unsupervised-Multitask-Learners-Radford-Wu/9405cc0d6169988371b2755e573cc28650d14dfe, 2019.

Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, Improving Language Understanding by Generative Pre-Training, URL: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf, 2018.

吉田 稔, 松本 和幸, 北 研二, Value行列を手掛かりとした Transformerの分析, 人工知能学会論文誌, Vol.38, No.2, 2023年.

Posted


Submitted: 2024-06-15 04:06:39 UTC

Published: 2024-06-19 04:39:09 UTC
Section
Information Sciences