cwangrun/MGProto

Repository files navigation

This is Pytorch implementation of the paper "Mixture of Gaussian-distributed s with Generative Modelling for Interpretable and Trustworthy Image Recognition", published at IEEE TPAMI 2025.

This code repository was based on ProtoPNet (https://.com/cfchen-duke/ProtoPNet).

Introduction: Prototypical-part methods (a), such as ProtoPNet, enhance interpretability in image recognition by linking predictions to training s. They rely on a point-based learning of s, which limits representation power, struggles with Out-of-Distribution (OoD) detection, and causes unstable performance in projection. We propose a generative alternative—Mixture of Gaussian-distributed s (MGProto)—to model distributions (b) and address these limitations.

Methodology: In MGProto, we leverage the Gaussian-distributed s to explicitly characterise the underlying data density, thereby allowing both interpretable image classification and trustworthy recognition of OoD inputs. Interestingly, the learning of our Gaussian-distributed s has a natural projection step, effectively addressing the performance degradation issue.

Additionally, inspired by the ancient legend of Tian Ji’s horse-racing, we also present a new and generic mining strategy to enhance learning from abundant less-salient object regions.

Requirements: Pytorch, numpy, scipy, cv2, matplotlib, ...

  1. Download CUB-200-2011, Stanford Cars, Stanford Dogs, and Oxford-IIIT Pets.

  2. We primarily employ full images in these datasets and online augmentations for training images.

  1. Provide data in data_path, train_dir, test_dir, train_push_dir in settings.py
  2. Run python main.py, our pre-trained CUB models are given here.

The repository supports online or offline evaluation for interpretable image classification and trustworthy recognition of OoD input. This is achieved by computing the overall data probability p(x), where in-distribution data (a) yields high p(x) while out-of-distribution input (b) has low p(x).

s with large prior, which dominate the decision making, are always from high-density distribution regions (in T-SNE) and can localise well the object (bird) parts. Background s tend to obtain a low prior and come from the low-density distribution regions. This observation is used for model compression by pruning the s that hold low prior/importance.

@article{wang2025mixture,
  title={Mixture of gaussian-distributed s with generative modelling for interpretable and trustworthy image recognition},
  author={Wang, Chong and Chen, Yuanhong and Liu, Fengbei and Liu, Yuyuan and McCarthy, Davis James and Frazer, Helen and Carneiro, Gustavo},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2025},
  publisher={IEEE}
}