14th International Conference on Image Processing, Theory, Tools and Applications, IPTA 2025, İstanbul, Türkiye, 13 - 16 Ekim 2025, (Tam Metin Bildiri)
Monitoring student attention is crucial for academic success, and yet it is a challenge for teachers to achieve it while teaching at the same. Machine learning tools, which are very successful at recognizing certain behavior, can be employed to support teachers to adapt their in either online or traditional learning environments. This study proposes a transformer-based deep learning model, RT-DETR, to assess students' attention in classroom settings. First, we collected attention/no-attention dataset and labeled it manually. Then, the RT-DETR and YOLO (v5, v8, and v 11) models are trained. The test results show that RT-DETR outperforms all three YOLO models with an accuracy of approximately 80 % and an f1-score of 0.806 despite the small size of the dataset. Additionally, higher accuracy levels are obtained with RT-DETR after a short training compared YOLO models. Nevertheless, it requires more space to keep the parameters and more time to complete inference compared to YOLO models. Additionally, ensemble models of RT-DETR and YOLO do not improve the accuracy. RT-DETR is a promising model to assess students' attention so that instructors can adapt the delivery of the course contents based on the attention behavior predicted in real-time by aiming to improve the academic success of students. Code is available at http://bit.ly/45E7NKO.