Hyperbolic Fusion Boosts Video Anomaly Detection

The new system, called PoinCLIP‑VAD, tackles the challenge of spotting unusual events in videos without detailed frame‑by‑frame labels. Traditional methods struggle because they treat visual and textual clues in a flat, Euclidean space, which makes it hard to tease apart subtle differences between normal and abnormal scenes. PoinCLIP‑VAD moves both video frames and their text descriptions into a curved, hyperbolic space known as the Poincaré ball. This geometry naturally expands distances for items that are far apart while compressing close ones, giving the model a richer way to encode hidden relationships. The approach does not depend on pre‑defined hierarchies, so it can learn structure directly from the data.

The architecture splits into two parts. First, a classification module gives an overall score that flags potentially anomalous clips. Second, a fine‑grained alignment block compares video content with its textual description using negative Poincaré distance, tightening the link between what is seen and what is described. This two‑stage process helps the system learn from weak supervision more effectively than earlier methods. Testing on popular benchmarks shows clear gains: the model reaches a 90. 62 % AUC on UCF‑Crime and an 86. 93 % AP on XD‑Violence. These results indicate that the hyperbolic representation improves both detection accuracy and consistency when aligning visual and language signals under limited labeling. Overall, PoinCLIP‑VAD demonstrates that rethinking the underlying geometry can unlock better performance in video anomaly detection, especially when precise annotations are scarce.

actions