Transformer-Based Approach for Automatic Semantic Financial Document Verification


Toprak A., TURAN M.

IEEE Access, 2024 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1109/access.2024.3477270
  • Dergi Adı: IEEE Access
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Anahtar Kelimeler: abstract summarization, Document verification, Reuters financial datasets, semantic analysis, transformer
  • İstanbul Ticaret Üniversitesi Adresli: Evet

Özet

Document verification is the process of verifying an original summary document on the original full-text document. Semantic control is very critical in these verification processes. In this study, an automatic document verification system based on Natural Language Processing techniques was designed to semantically check the consistency of the abstract summary produced especially for the original document or documents of the financial type. Verification of abstract summaries on original full-text documents was done through the Transformer-based model. Since the reference documents to be verified in the study belong to the financial type, the Transformer model was created by training with Reuters financial dataset. The proposed Transformer-based semantic document verification approach was tested on the original full-text and summary documents. The full text and summary documents were subjected to data pre-processing and Spell Checker processes. Then, since the summary document will be verified on the full-text document, the sentences most similar to the summary document sentences from the full-text document sentences were determined by using Simhash and Cross Encoder text similarity algorithms. It is a heuristic approach and completes the proposed verification system. Two (experimentally) original full-text document sentences most similar to the summary document sentence were selected. Then, these original full-text document sentences were inputted as training data to the Transformer model. Finally, the transformer model produced an abstract summary of original full-text sentences. In the last stage, the original summary and the summary produced by the Transformer model were compared with both Simhash and Cross Encoder text similarity algorithms in terms of their similarities, and the average document verification accuracy was calculated. The proposed Transformer-based semantic document verification approach achieved an average of 84.1% semantic financial document verification accuracy on the financial documents in the Reuters financial dataset. In this study, we present several key contributions to the field of semantic document verification: Firstly, we introduce a Transformer-based model tailored for financial texts, trained on the Reuters financial dataset, which offers enhanced precision in understanding financial language. Secondly, our approach employs advanced Natural Language Processing techniques for deep semantic analysis to verify the consistency of document summaries. Thirdly, we propose a novel hybrid methodology that integrates Transformer models with sentence grouping techniques for generating accurate and informative abstract summaries. These innovations collectively mark a substantial advancement in the automation and precision of document verification processes.