GROOD - GRadient-aware Out-Of-Distribution detection

Download the paper Check the code

Out-of-Distribution (OOD) detection, the task of identifying inputs that a model has not been trained on, is fundamental for the safe and reliable deployment of deep learning models in real-world applications like autonomous driving and healthcare. Models that perform well on familiar, in-distribution (ID) data often produce dangerously overconfident predictions when faced with novel or unexpected inputs. This uncertainty awareness is critical for preventing silent failures in safety-critical systems.

In this research, we address a critical but often overlooked aspect of OOD detection: the subtle signals hidden in the model’s gradient space. We propose a novel method, GROOD (GRadient-aware Out-Of-Distribution detection), to explicitly leverage these signals and improve the distinction between ID and OOD samples.

Our Core Intuition

Our approach is built on two key observations about how well-trained neural networks organize information.

First, due to a phenomenon called Neural Collapse (Papyan et al., 2020), the feature representations of in-distribution (ID) data for a given class tend to cluster tightly around a single point, their class prototype (the average of all features for that class). Second, out-of-distribution (OOD) samples naturally fall outside these tight clusters, in more dispersed and undefined regions of the feature space.

Instead of just measuring distances in this feature space, we asked a different question: How sensitive is a sample’s classification to a hypothetical “OOD” reference point? We reasoned that ID samples, being firmly anchored to their class prototype, would show very little sensitivity. In contrast, OOD samples, floating in ambiguous space, would be highly sensitive to the location of this OOD reference point. GROOD measures this sensitivity directly by calculating the gradient with respect to an artificial OOD prototype, turning this sensitivity into a powerful and robust OOD score.


What is Out-of-Distribution (OOD) Detection?

Imagine a self-driving car’s vision system trained to recognize ‘cats’ and ‘dogs’. It performs this task with high accuracy. Now, what happens when it encounters a ‘bird’ on the road for the first time?. A standard classifier, forced to choose between its known categories, might predict ‘dog’ with 60% confidence or even ‘cat’ with 90% confidence. It lacks the ability to say, “I don’t know what this is, but it’s neither a cat nor a dog.” This is the goal of OOD detection: to equip models with the self-awareness to flag inputs that fall outside their training experience.

A major hurdle for OOD detection methods is achieving a clear separation between ID and OOD data, especially when OOD samples are semantically similar to the training data.

OOD Scenarios

The challenge of OOD detection can be categorized based on the similarity between the OOD and ID data:

  1. Far-OOD: In this scenario, the OOD samples are semantically and visually very different from the ID data. For a model trained on CIFAR-10 (airplanes, cars, birds, etc.), an example of Far-OOD data would be images of handwritten digits from the MNIST dataset. These are typically easier to detect.
  2. Near-OOD: This scenario is much more challenging. The OOD samples are semantically close to the ID classes. For a model trained on CIFAR-10, the CIFAR-100 dataset (which contains different types of animals and vehicles) would be considered Near-OOD. The model can easily confuse a new type of truck (from CIFAR-100) with a known type of truck (from CIFAR-10).

The Near-OOD scenario is a major focus for modern OOD research because it represents the most likely and dangerous failure mode for real-world systems. This is the core issue GROOD aims to solve.

The “Gradient Sensitivity” Challenge in OOD Detection

In OOD detection, many existing methods analyze the model’s feature representations or its final output probabilities. To save on training costs, most practical methods are post-hoc, meaning they work on an already-trained model without requiring retraining. However, these approaches face two key problems:

  1. Feature Space Ambiguity: Near-OOD samples can produce feature representations that are very close to the clusters of ID data, making them difficult to distinguish using distance-based metrics alone.
  2. Overconfidence: Deep neural networks are notoriously overconfident in their predictions, even for unrecognizable or novel inputs. This means that simple confidence scores (like the maximum softmax probability) are often unreliable for OOD detection.

This ambiguity in both feature and output spaces confuses the detection process, leading standard post-hoc techniques to fail or perform poorly, especially on Near-OOD data. Existing methods often struggle because they don’t capture the model’s internal sensitivity to perturbations.

Proposed Framework: GROOD

GROOD is designed to specifically leverage a model’s gradient sensitivity to distinguish ID and OOD samples. Instead of just looking at where a sample lands in feature space, we ask: “How much would we need to change our understanding of ‘OOD’ to explain this sample?”.

Diagram of the GROOD framework

The core components of GROOD are:

  1. Synthetic OOD Prototype: We create an artificial reference point in the feature space, the “OOD prototype,” that represents a generic “outlier” region, far from the well-defined ID class clusters. This is done without requiring any real OOD data.
  2. Gradient-Space Projection: For any given input, we calculate the gradient of a classification loss with respect to this artificial OOD prototype. This gradient tells us the direction and magnitude of the change needed to move the OOD prototype to better accommodate the input sample.
  3. Nearest-Neighbor Scoring in Gradient Space: We observe that ID samples produce small, consistent gradients, while OOD samples produce gradients that are large and point in diverse directions. The final OOD score is the distance from a new sample’s gradient to the nearest gradient from the training set. A large distance means the sample is likely OOD.

GROOD Components in Detail

OOD Prototype Construction

At the start, we compute prototypes for each in-distribution class by averaging the feature representations of all its training samples. The key challenge is creating the OOD prototype without access to real OOD data. We solve this by synthesizing “hard” samples that lie on the decision boundaries between known classes using a technique called manifold mixup. Specifically, for a sample x, we interpolate its feature representation at an early layer, \(f^{\text{early}}(x)\) , with the prototype of the second-closest class, \(p_{c_2}^{\text{early}}\)

\[\hat{h}(x) = f^{\text{mid}}(\lambda f^{\text{early}}(x) + (1-\lambda)p_{c_2}^{\text{early}})\]

These synthetic features, \(\hat{h}(x)\), are then averaged in the final feature space to form the single OOD prototype, \(p_{\text{ood}}^{\text{pen}}\).

Gradient Computation

We define our model’s logits not from a final linear layer, but as the negative squared Euclidean distances from a sample’s feature vector \(h\) to each class prototype and the OOD prototype:

\[L(h) = [-\|h-p_1\|^2, \dots, -\|h-p_C\|^2, -\|h-p_{\text{ood}}\|^2]\]

The key insight of GROOD is to analyze the gradient of the cross-entropy loss with respect to the OOD prototype \(p_{\text{ood}}^{\text{pen}}\). This gradient has a simple and intuitive closed-form expression:

\[\nabla H(h) = p_{\text{ood}}(h) \frac{h-p_{\text{ood}}^{\text{pen}}}{\|h-p_{\text{ood}}^{\text{pen}}\|_2}\]

The magnitude of this gradient is simply the softmax probability that the sample is OOD, \(p_{\text{ood}}(h)\), while its direction points from the OOD prototype toward the sample’s feature vector. This vector captures both the model’s confidence and the geometric relationship of the sample to the outlier region.

Final OOD Score

The standard OOD detection approach would be to just use the gradient’s norm (or magnitude) as the score. However, GROOD leverages the full gradient vector, including its direction. Our final OOD score for a new sample \(x_{\text{new}}\) is its minimum distance to any gradient vector computed from the ID training set \(\mathcal{D}_{\text{in}}\).

\[S(x_{\text{new}}) = \min_{x \in \mathcal{D}_{\text{in}}} \|\nabla H(h(x_{\text{new}})) - \nabla H(h(x))\|_2\]

This score intuitively measures how “abnormal” a sample’s gradient sensitivity is compared to what was observed during training. A high score indicates the sample is likely OOD.

Experiments

We evaluated GROOD on standard OOD detection benchmarks, following the comprehensive OpenOOD v1.5 protocol.

Results

GROOD consistently achieved state-of-the-art or highly competitive performance across all benchmarks, demonstrating its robustness and effectiveness. The table below summarizes the key results (AUROC %) from the OpenOOD benchmark:

Method CIFAR-10 Near CIFAR-10 Far CIFAR-100 Near CIFAR-100 Far IN-200 Near IN-200 Far IN-1K Near IN-1K Far
MSP 88.0 90.7 80.3 77.8 83.3 90.1 76.0 85.2
ReAct 87.1 90.4 80.7 80.4 81.9 92.3 77.4 93.7
KNN 90.6 93.0 80.2 82.4 81.6 93.2 71.1 90.2
VIM 88.7 93.5 75.0 81.7 78.7 91.3 72.1 92.7
NCI 88.8 91.3 81.0 81.3 83.5 93.7 78.6 95.5
GROOD (Ours) 91.2 93.8 78.9 84.4 83.4 92.2 78.9 94.8

(Table 1: Summary of main results (AUROC %) from the OpenOOD v1.5 benchmark. GROOD shows significant improvements, especially in Near-OOD scenarios.)

Qualitative results also demonstrate that GROOD produces a much clearer separation between the score distributions for ID, Near-OOD, and Far-OOD data compared to other methods.

Qualitative comparison of OOD score distributions

Discussion

The experiments validate that projecting samples into a gradient space provides a richer signal for OOD detection than feature or output spaces alone. By measuring sensitivity to a synthetic OOD prototype, GROOD effectively identifies samples that behave differently from the ID data, even if they appear similar in feature space. The approach is fully post-hoc, requires no retraining, and shows remarkable stability across different model checkpoints and architectures, including Transformers.

Our approach sets a new state-of-the-art on several challenging benchmarks, demonstrating its potential for building more reliable and safe machine learning systems for real-world deployment.

For a nice introduction of out of distribution detection problem check the following post. Also feel free to check the paper for more technical details, ablation studies, and analysis (ElAraby et al., 2023).

Collaborators

References

  1. Papyan, V., Han, X. Y., & Donoho, D. L. (2020). Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Sciences, 117(40), 24652–24663.
  2. ElAraby, M., Sahoo, S., Pequignot, Y., Novello, P., & Paull, L. (2023). GROOD: GRadient-aware Out-Of-Distribution detection in interpolated manifolds. ArXiv Preprint ArXiv:2312.14427.