Scientific Reports, cilt.16, sa.1, 2026 (SCI-Expanded, Scopus)
Human parsing, a vital task in human-centric analysis, involves segmenting clothing and body parts for individual association. Existing methods often rely on auxiliary inputs like detection and edge prediction, limiting their suitability for resource-constrained devices. To address this, we propose an end-to-end framework that integrates a transformer based self-attention module to enhance contextual understanding while being optimized for low-resource environments. We also introduce bounding-polygon annotations to facilitate simultaneous detection and parsing. Our method achieves fine-grained results in a single pass, significantly improving inference speed without sacrificing accuracy. Real-world validation on Raspberry Pi demonstrates its effectiveness and efficiency in resource-constrained scenarios.