Selection Informative Units for Extractive Summarization


TURAN M.

WSEAS Transactions on Systems, cilt.22, ss.287-294, 2023 (Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 22
  • Basım Tarihi: 2023
  • Doi Numarası: 10.37394/23202.2023.22.31
  • Dergi Adı: WSEAS Transactions on Systems
  • Derginin Tarandığı İndeksler: Scopus, INSPEC, zbMATH
  • Sayfa Sayıları: ss.287-294
  • Anahtar Kelimeler: AI, Document Summarization, Informative Units, NLP, Paragraph Extraction, TF-IDF
  • İstanbul Ticaret Üniversitesi Adresli: Evet

Özet

An Extractive Multi-Document Summarizer must select the most informative units and prevents duplication in extraction. In order to achieve this goal, a new technique, called “comprising at least one Representative Term at the Highest Frequency”, called RTHF, is proposed in this work. The units which include representative terms, but with low frequencies are not considered for extraction (selection of the most informative units). On the other hand, these units which provide RTHF feature, precede other similar units in ranking (prevents duplication). The heuristic behind the RTHF is explained by probability. RTHF was experimented on a previously developed and tested paragraph-based Extractive Multi-Document Summarizer. The results show that it enhances the original system by 0.8% ~ 3.2% (Average-F values of ROUGE metrics).