Case Study on well-known Topic Modeling Methods for Document Classification


Ozdemirci S. M., TURAN M.

6th International Conference on Inventive Computation Technologies, ICICT 2021, Coimbatore, Hindistan, 20 - 22 Ocak 2021, ss.1304-1309 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/icict50816.2021.9358473
  • Basıldığı Şehir: Coimbatore
  • Basıldığı Ülke: Hindistan
  • Sayfa Sayıları: ss.1304-1309
  • Anahtar Kelimeler: Classification, topic modeling, LDA, BERT, TF-IDF
  • İstanbul Ticaret Üniversitesi Adresli: Evet

Özet

Topic modeling has numerous applications like text categorization, topic clustering, document tagging, feature extraction on wide document collections. In this study, practical exploration method of topic modeling of Latent Dirichlet Allocation, transformers based machine learning method Bidirectional Encoder Representations from Transformers and Term Frequency-Inverse Document Frequency method were applied to the document set separately. It includes sport and education articles collected from internet by graduate students, 801 number totally. The purpose of this study is to observe which method best suits to the topic modeling and if possible in order to increase the accuracy rate via ensemble of these methods. As a result of this study, it was observed that even it has some disadvantages, BERT classified the documents with the correct topic with an average of %92.6 success ratio, overwhelming the others.