Bag encoding strategies in multiple instance learning problems

Şeyma Küçükaşcı, EMEL; Gökçe Baydoğan, Mustafa

doi:10.1016/j.ins.2018.08.020

Bag encoding strategies in multiple instance learning problems

Atıf İçin Kopyala

Şeyma Küçükaşcı E. Ş., Gökçe Baydoğan M.

Information Sciences, cilt.467, ss.559-578, 2018 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 467
Basım Tarihi: 2018
Doi Numarası: 10.1016/j.ins.2018.08.020
Dergi Adı: Information Sciences
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.559-578
Anahtar Kelimeler: Multiple instance learning, Classification, Bag encoding, Decision trees
İstanbul Ticaret Üniversitesi Adresli: Evet

Özet

Multiple instance learning (MIL) deals with supervised learning tasks, where the aim is to learn from a set of labeled bags containing certain number of instances. In MIL setting, instance label information is unavailable, which makes it difficult to apply regular supervised learning. To resolve this problem, researchers devise methods focusing on certain assumptions regarding the instance labels. However, it is not a trivial task to determine which assumption holds for a new type of MIL problem. A bag-level representation based on instance characteristics does not require assumptions about the instance labels and is shown to be successful in MIL tasks. These approaches mainly encode bag vectors using bag-of-features representations. In this paper, we propose tree-based encoding strategies that partition the instance feature space and represent the bags using the frequency of instances residing at each partition. Our encoding implicitly learns generalized Gaussian Mixture Model (GMM) on the instance feature space and transforms this information into a bag-level summary. We show that bag representation using tree ensembles provides fast, accurate and robust representations. Our experiments on a large database of MIL problems show that tree-based encoding is highly scalable, and its performance is competitive with the state-of-the-art algorithms.