Deep Learning-Based Classification of Software Bugs Using Code Context and AST Features

Gokcen A., ARŞIK A., AKBULUT A., Catal C.

13th International Symposium on Digital Forensics and Security, ISDFS 2025, Massachusetts, Amerika Birleşik Devletleri, 24 - 25 Nisan 2025, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/isdfs65363.2025.11012056
Basıldığı Şehir: Massachusetts
Basıldığı Ülke: Amerika Birleşik Devletleri
Anahtar Kelimeler: abstract syntax tree, bug classification, Convolutional Neural Networks, defect prediction, fault prediction
İstanbul Ticaret Üniversitesi Adresli: Evet

Özet

A software component plagued with bugs is likely to experience both functional and non-functional difficulties, such as usability, performance, and security issues. These bugs can range from improper layouts to system crashes and security vulnerabilities. In this research, we developed a deep learning-based model to classify software bugs based on preceding and corrected code statements. Preprocessing includes extracting features from the Abstract Syntax Tree (AST) by traversing the tree to capture node types and relationships, then encoding the AST into a numerical vector representation for classification. Using AST and padding methods, features are extracted from code statements for use in a Convolutional Neural Network (CNN) model. We proposed a CNN-based model with AST-based feature extraction for large-scale software bug classification and evaluated on 153,652 bug samples from 1,000 GitHub projects. Compared to traditional datasets, this large dataset presents challenges in building accurate prediction models. This model is particularly useful in continuous integration (CI) pipelines, where it can automatically detect problematic code during the build process, helping to identify bugs faster and reduce manual review.