Metrics Framework

In the pursuit of next-generation autonomous systems, standardizing data is only the first step. To fully unlock the potential of FOOD’s fusion-oriented architecture, we must also revolutionize how we evaluate performance.

Traditional metrics often treat perception output as deterministic points, discarding the rich uncertainty information (e.g., covariance, existence probability) inherent in multi-sensor fusion algorithms. This leads to an incomplete assessment of an agent’s true capability.

Therefore, the FOOD metrics framework is designed to be Uncertainty-Aware. We prioritize metrics that can evaluate the full probabilistic output of fusion algorithms, ensuring that the confidence of the system is measured as rigorously as its accuracy.

P-GOSPA [1]

For multi-object tracking and fusion tasks within FOOD, we adopt Probabilistic GOSPA (P-GOSPA) as the primary performance metric. Unlike traditional metrics (such as OSPA or Hausdorff) or the standard GOSPA, which operate on deterministic sets, P-GOSPA extends evaluation into the space of Multi-Bernoulli (MB) densities.

This metric is particularly suitable for the FOOD platform because it accounts for the inherent uncertainty in fusion algorithms—capturing not just where an object is, but how confident the system is about its existence and state.

Fig. 1: An exemplary scenario with two objects and two MB set densities. Each Bernoulli density has Gaussian single-object density, and its existence probability is shown next to its Gaussian mean. A desirable metric should be able to answer: 1) what is the distance between each MB density and ground truth object states? and 2) what is the distance between the two MB densities?

The Definition

We utilize the standard configuration of P-GOSPA (parameter \(\alpha = 2\)). According to Equation (8) in the original paper, this configuration offers a mathematically rigorous definition that can be exactly decomposed into four interpretable components.

Let the ground truth set be \(f_X\) and the estimated Multi-Bernoulli density be \(f_Y\). The metric is defined as:

\[d_p^{(c, 2)}\left(f_X, f_Y\right) = \left[\min_{\gamma \in \Gamma} \left( \underbrace{\sum_{(i, j) \in \gamma}\text{LocErr}(i,j)}_{\text{Localization}} + \underbrace{\sum_{(i, j) \in \gamma}\text{UncErr}(i,j)}_{\text{Uncertainty}} + \underbrace{\frac{c^p}{2}\sum_{i \notin \gamma} r_x^i}_{\text{Missed}} + \underbrace{\frac{c^p}{2}\sum_{j \notin \gamma} r_y^j}_{\text{False}}\right)\right]^{\frac{1}{p}}\]

This decomposition provides a granular performance analysis for algorithms, highlighting the Soft Penalty mechanism:

Localization Error

\[\sum_{(i, j) \in \gamma} \min \left(r_x^i, r_y^j\right) d\left(p_x^i, p_y^j\right)^p\]

Measures spatial accuracy. Soft Penalty Feature: The error is scaled by \(\min(r_x, r_y)\). A track with low existence probability contributes significantly less to the error than a high-confidence track, preventing weak detections from dominating the localization score.

Existence Probability Mismatch

\[\sum_{(i, j) \in \gamma} \left|r_x^i - r_y^j\right| \frac{c^p}{2}\]

[Unique to P-GOSPA] Penalizes the “probability discrepancy.” If an object exists (\(r_x=1\)) but the tracker is unsure (e.g., \(r_y=0.6\)), a penalty proportional to the difference (\(0.4\)) is applied.

Missed Detection Error

\[\frac{c^p}{2} \sum_{i: \forall j,(i, j) \notin \gamma} r_x^i\]

Cost for unassigned ground truth objects. The penalty is proportional to the target’s existence probability.

False Detection Error

\[\frac{c^p}{2} \sum_{j: \forall i,(i, j) \notin \gamma} r_y^j\]

[Key Soft Penalty] Represents the cost for false positives (estimated tracks that do not correspond to any ground truth).

Standard GOSPA (Hard Penalty): A false positive is binary—either ignored (cost=0) or fully penalized (cost= \(c^p/2\)) based on a cut-off threshold.
P-GOSPA (Soft Penalty): The penalty scales linearly with \(r_y\). A false positive with low existence probability (\(r=0.1\)) incurs only 10% of the max penalty, while a high-confidence false positive (\(r=0.9\)) incurs 90%. This encourages the filter to report potential objects without excessive punishment.

Tip

For the full mathematical proofs and the derivation of Equation, please refer to the original paper.

Why P-GOSPA?

Mathematically Well-Defined: P-GOSPA satisfies all properties of a metric (identity, symmetry, and triangle inequality). This ensures rigorous and consistent comparisons across different datasets and algorithms.
Soft Penalties vs. Hard Thresholding: Standard metrics require “hard thresholding” (e.g., discarding tracks with existence probability \(r < 0.5\)) before evaluation. This leads to information loss. P-GOSPA evaluates the raw probabilistic output using Soft Penalties, where the cost is proportional to the confidence. This rewards algorithms that accurately model their uncertainty rather than making binary guesses.
Smoothness: As a result of soft penalties, the metric is continuous. As shown in the heatmap below, the error transitions smoothly with changes in existence probability (\(r\)) and variance (\(\sigma^2\)), avoiding the abrupt jumps that are typical for deterministic metrics.

Fig. 2: P-GOSPA versus \(r\) and \(\sigma^2\)

Probabilistic Trajectory GOSPA [2]

The probabilistic trajectory GOSPA (PT-GOSPA) metric extends the P-GOSPA framework to evaluate entire object trajectories over time, rather than just individual time steps. This is crucial for assessing multi-object tracking algorithms that must maintain consistent object identities.

The PT-GOSPA metric incorporates temporal consistency by evaluating the alignment of estimated trajectories with ground truth trajectories, considering both spatial accuracy and uncertainty over time. This allows for a more comprehensive assessment of tracking performance, capturing not only instantaneous errors but also the ability to maintain accurate tracks across multiple frames.

Fig. 3: An exemplary scenario with a single ground truth trajectory and two sets of trajectory estimates, where each trajectory estimate is a sequence of Bernoulli densities. Each Bernoulli density has Gaussian single object density, and its existence probability is shown next to its Gaussian mean. The true trajectory and the trajectory estimate in blue exist at time step 1, 2 and 3. The set of trajectory estimates in orange consists of two single trajectory estimates, one exists only at time step 1, and the other exists at time step 2 and 3. A desirable metric should be able to answer: 1) what is the distance between each set of sequences of Bernoulli densities and the set of true trajectories? and 2) what is the distance between the two sets of sequences of Bernoulli densities?

Similar to P-GOSPA, PT-GOSPA decomposes the overall error into five interpretable components, including localization error, uncertainty error, missed detection error, false detection error, and track switch error, all evaluated over entire trajectories. This decomposition allows for detailed performance analysis of tracking algorithms, highlighting their strengths and weaknesses in maintaining accurate and confident tracks over time.

Tip

For more details on the mathematical formulation and properties of the PT-GOSPA metric, please refer to the original paper.