Preprint / Version 1

Wikipedia Collaborative Filtering

##article.authors##

  • Koki Takeuchi Gunma University, Faculty of Social and Information Studies
  • Katsuhiko Hayashi Hokkaido University, Faculty of Information and Science Technologies https://researchmap.jp/katsuhiko-h/

DOI:

https://doi.org/10.51094/jxiv.423

Keywords:

Recommendation, Collaborative Filtering, Web Intelligence

Abstract

Wikipediaは様々な物事(ここでは「エンティティ」と呼ぶ)について質の高い記事が存在し,多様な研究領域において利用されてきた.従来の研究では,Wikipediaの概要文やハイパーリンクなどのコンテンツ情報を利用することが一般的であったが,Wikipediaのコンテンツ情報はユーザの主観を排して編集されるため,評論やレビュー文とは異なり,エンティティに関する表層的な属性情報しか考慮することができない.この課題を解決するため,本稿ではWikipediaの編集者情報を利用した協調フィルタリング法を提案する.提案手法をエンティティ間の類似度推定に利用し,推薦タスクで評価を行った結果,その有効性を確認した.

Conflicts of Interest Disclosure

The authors have no financial conflicts of interest disclose concerning the study.

Downloads *Displays the aggregated results up to the previous day.

Download data is not yet available.

References

Ikuya Yamada, Akari Asai, Jin Sakuma, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, and Yuji Matsumoto. Wikipedia2vec: An efficient toolkit for learning and visualizing the embeddings of words and entities from wikipedia. arXiv preprint arXiv:1812.06280, 2018.

Evgeniy Gabrilovich and Shaul Markovitch. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proc. of IJCAI, pp. 1606–1611, 2007.

Cataldo Musto, Giovanni Semeraro, Marco de Gemmis, and Pasquale Lops. Learning word embeddings from wikipedia for content-based recommender systems. In Proc. of ECIR, pp. 729–734, 2016.

David Goldberg, David Nichols, Brian M Oki, and Douglas Terry. Using collaborative filtering to weave an information tapestry. Communications of the ACM, Vol. 35, No. 12, pp. 61–70, 1992.

竹内皓紀, 林克彦. Wikipedia 編集者情報を用いた協調フィルタリングによるエンティティ類似度推定. Web インテリジェンスとインタラクション研究会予稿集 第 18 回研究会, pp. 43–50. Web インテリジェンスとインタラクション研究会, 2022.

Katsuhiko Hayashi. Using wikipedia editor information to build high-performance recommender systems. arXiv preprint arXiv:2306.08636, 2023.

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Proc. of NIPS Conference, 2013.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.

Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. Indexing by latent semantic analysis. Journal of the American society for information science, Vol. 41, No. 6, pp.391–407, 1990.

Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In Proc. of ICML, pp. 1188–1196, 2014.

Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Application of dimensionality reduction in recommender system-a case study. Technical report, 2000.

Dawen Liang, Rahul G Krishnan, Matthew D Hoffman, and Tony Jebara. Variational autoencoders for collaborative filtering. In Proc. of WWW Conference, pp. 689–698, 2018.

Harald Steck. Embarrassingly shallow autoencoders for sparse data. In Proc. of WWW Conference, pp. 3251–3257, 2019.

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. Neural collaborative filtering. In Proc. of WWW Conference, pp. 173–182, 2017.

Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie. Autorec: Autoencoders meet collaborative filtering. In Proc. of WWW Conference, pp. 111–112, 2015.

Yin Zheng, Bangsheng Tang, Wenkui Ding, and Hanning Zhou. A neural autoregressive approach to collaborative filtering. In Proc. of ICML, pp. 764–773, 2016.

Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Item-based collaborative filtering recommendation algorithms. In Proc. of WWW Conference, pp. 285–295, 2001.

Katsuhiko Hayashi. Rethinking correlation-based item-item similarities for recommender systems. In Proc. of SIGIR Conference, pp. 2287–2291, 2022.

Xia Ning and George Karypis. Slim: Sparse linear methods for top-n recommender systems. In Proc. of ICDM, pp. 497–506, 2011.

Iv ́an Cantador, Peter Brusilovsky, and Tsvi Kuflik. Second workshop on information heterogeneity and fusion in recommender systems (hetrec2011). In Proc. of RecSys Conference, pp. 387–388, 2011.

Ignacio Fern ́andez-Tob ́ıas, Paolo Tomeo, Iv ́an Cantador, Tommaso Di Noia, and Eugenio Di Sciascio. Accuracy and diversity in cross-domain recommendations for cold-start users with positive-only feedback. In Proc. of RecSys Conference, pp. 119–122, 2016.

Tommaso Di Noia, Vito Claudio Ostuni, Paolo Tomeo, and Eugenio Di Sciascio. Sprank: Semantic path-based ranking for top-n recommendations using linked open data. ACM Transactions on Intelligent Systems and Technology (TIST), Vol. 8, No. 1, pp. 1–34, 2016.

Posted


Submitted: 2023-06-21 08:59:36 UTC

Published: 2023-06-28 09:58:15 UTC
Section
Information Sciences