Journal of Supercomputing, cilt.81, sa.4, 2025 (SCI-Expanded)
Automatic document summarization is a widely studied field that aims to generate brief and informative summaries of long documents. In this paper, we proposed a hybrid approach for automatic document summarization using Transformer and sentence grouping. The Transformer model was created by training with BBC News dataset. We first preprocessed this dataset by correcting logical and spelling errors in the original full-text and summary document pairs. The preprocessed dataset was used to train a Transformer model with hyper-parameters that were determined through experimentation. In the testing stage, the documents were decomposed into sentences, and the similarities of each sentence with other sentences were calculated using the Simhash text similarity algorithm. The most similar sentences were grouped together, with the number of groups set to be 25% of the total number of sentences in the document. These groups of sentences were then input into the Transformer model, which produced abstractive new sentences for each group. To determine the order of the groups, the average position of their sentences in the original document was calculated. Finally, the Transformer model generated abstractive sentences, which were combined into a summary. Experimental results showed that the proposed approach achieved an average of 93.2% Simhash text similarity to the original full-text documents and an average of 5% more similarity to the original summary documents. These results demonstrated the effectiveness of the proposed approach in the automatic document summarization.