WNet: A dual-encoded multi-human parsing network


HOSEN M. I., Aydin T., Islam M. B.

IET Image Processing, cilt.18, sa.12, ss.3316-3328, 2024 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 18 Sayı: 12
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1049/ipr2.13176
  • Dergi Adı: IET Image Processing
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, Directory of Open Access Journals, Civil Engineering Abstracts
  • Sayfa Sayıları: ss.3316-3328
  • Anahtar Kelimeler: computer vision, image processing, image segmentation
  • İstanbul Ticaret Üniversitesi Adresli: Evet

Özet

In recent years, multi-human parsing has become a focal point in research, yet prevailing methods often rely on intermediate stages and lacking pixel-level analysis. Moreover, their high computational demands limit real-world efficiency. To address these challenges and enable real-time performance, low-latency end-to-end network is proposed. This approach leverages vision transformer and convolutional neural network in a dual-encoded network, featuring a lightweight Transformer-based vision encoder) and a convolution encoder based on Darknet. This combination adeptly captures long-range dependencies and spatial relationships. Incorporating a fuse block enables the seamless merging of features from the encoders. Residual connections in the decoder design amplify information flow. Experimental validation on crowd instance-level human parsing and look into person datasets showcases the WNet's effectiveness, achieving high-speed multi-human parsing at 26.7 frames per second. Ablation studies further underscore WNet's capabilities, emphasizing its efficiency and accuracy in complex multi-human parsing tasks.