Abstract:
In a multiple instance learning (MIL) scenario, the outcome annotation is usually only reported at the bag level.
Considering simplicity and convergence criteria, the lazy learning approach, i.e., k-nearest neighbors (kNN), plays a crucial role in predicting bag labels in the MIL domain.
Notably, two variations of the kNN algorithm tailored to the MIL framework have been introduced, namely Bayesian-kNN (BkNN) and Citation-kNN (CkNN).
These adaptations leverage the Hausdorff metric along with Bayesian or citation approaches.
However, neither BkNN nor CkNN explicitly integrates feature selection methodologies, and when irrelevant and redundant features are present, the model’s generalization decreases.
In the single-instance learning scenario, to overcome this limitation of kNN, a feature weighting algorithm named Neighborhood Component Feature Selection (NCFS) is often applied to find the optimal degree of influence of each feature.
To address the significant gap existing in the literature, we introduce the NCFS method for the MIL framework.
The proposed methodologies, i.e. NCFS-BkNN, NCFS-CkNN, and NCFS-Bayesian Citation-kNN (NCFS-BCkNN), learn the optimal features weighting vector by minimizing the regularized leave-one-out error of the training bags.
Hence, the prediction of unseen bags is computed by combining the Bayesian and citation approaches based on the minimum optimally weighted Hausdorff distance.
Through experiments with various benchmark MIL datasets in the biomedical informatics and affective computing fields, we provide statistical evidence that the proposed methods outperform state-of-the-art MIL algorithms that do not employ any a priori feature weighting strategy.
Published:
22 August 2024
RAISE Affiliate:
Spoke 1
Conference name:
Joint European Conference on Machine Learning and Knowledge Discovery in Databases
Publication type:
Contribution in conference proceedings
DOI:
10.1007