Combating Textual Noise and Redundancy: Entropy-Aware Dense Visual Token Pruning

Thu, 18 Jun 2026 00:00:00 +0000

Motivation

(a) illustrates a limitation of global guidance: it tends to attend to background regions. (b) highlights the dispersion phenomenon caused by textual noise. (c) reveals the issues of feature fragmentation and selection redundancy.

Method Overview

EADP acts as a lightweight, plug-and-play module that compresses the original set of visual tokens into a smaller, more informative subset before the downstream LLM consumes them.

Stage 1: Entropy-Aware Dense Scoring

EADP computes dense cross-modal similarities between non-EOS text tokens and visual tokens, then estimates the spatial entropy of each text token’s similarity distribution. High-entropy tokens are treated as dispersed textual noise and filtered or down-weighted. The remaining low-entropy dense guidance is fused with the global EOS score to produce an instruction relevance map with both local precision and global semantic stability.

Stage 2: Structured Token Selection

After scoring, EADP refines the relevance map with spatial smoothing and score polarization. Gaussian smoothing propagates local structure, while polarization sharpens core visual entities against the background. Instead of selecting tokens with naive Top-K, EADP formulates token selection as a facility-location submodular maximization problem, encouraging non-redundant coverage of the original visual content.

Results

Results on LLaVA-1.5

Results on LLaVA-1.6

Results on Qwen2.5-VL

Results on LLaVA-Video

Efficiency Analysis

More results are provided in our paper.

EADP

Thu, 18 Jun 2026 00:00:00 +0000

Combating Textual Noise and Redundancy: Entropy-Aware Dense Visual Token Pruning

ECCV 2026

Authors: Author A, Xuankun Yang, Author C, Author D, Author E

Affiliations: Institution A, Institution B, Institution C

Paper | Code | BibTeX

TL;DR: EADP is a plug-and-play visual token pruning framework for VLMs/MLLMs. It combines entropy-aware dense scoring with submodular token selection to preserve fine-grained visual cues under strict token budgets.

Abstract

Visual token pruning is a crucial strategy for accelerating Vision-Language Models by compressing redundant image patches, yet existing methods often fail to preserve critical cues under dense instructions and fine-grained queries. In this paper, we investigate this failure and identify two underlying bottlenecks: the widespread dispersion of textual noise that corrupts dense cross-modal scoring, and the feature fragmentation inherent to standard token selection. To address these issues, we propose Entropy-Aware Dense Pruning (EADP), a framework that reformulates pruning as a structured compression problem. EADP first leverages statistical entropy to quantify and filter out textual noise, yielding a robust, fine-grained instruction relevance score. Subsequently, instead of naive Top-$K$ selection, EADP casts token selection as a submodular maximization problem with a spatial prior, explicitly guaranteeing a holistic and non-redundant visual representation. Extensive experiments demonstrate that EADP significantly improves the accuracy-efficiency trade-off of VLMs, robustly preserving fine-grained visual cues under strict token budgets while achieving state-of-the-art performance on challenging multimodal benchmarks.

Motivation

Method Overview

EADP acts as a lightweight, plug-and-play module that compresses the original set of visual tokens into a smaller, more informative subset before the downstream LLM consumes them.

Stage 1: Entropy-Aware Dense Scoring

Stage 2: Structured Token Selection

Results

Results on LLaVA-1.5

Results on LLaVA-1.6

Results on Qwen2.5-VL

Results on LLaVA-Video

Efficiency Analysis

More results are provided in our paper.

Citation

@inproceedings{eadp2026,
 title = {Combating Textual Noise and Redundancy: Entropy-Aware Dense Visual Token Pruning},
 author = {Author A and Xuankun Yang and Author C and Author D and Author E},
 booktitle = {European Conference on Computer Vision (ECCV)},
 year = {2026},
 note = {Placeholder citation. Replace with the official camera-ready metadata.}
}

Vision-Language Models | Learn more about Xuankun Yang

Combating Textual Noise and Redundancy: Entropy-Aware Dense Visual Token Pruning

Motivation

Method Overview

Stage 1: Entropy-Aware Dense Scoring

Stage 2: Structured Token Selection

Results

Results on LLaVA-1.5

Results on LLaVA-1.6

Results on Qwen2.5-VL

Results on LLaVA-Video

Efficiency Analysis

EADP

Combating Textual Noise and Redundancy: Entropy-Aware Dense Visual Token Pruning

Abstract

Motivation

Method Overview

Stage 1: Entropy-Aware Dense Scoring

Stage 2: Structured Token Selection

Results

Results on LLaVA-1.5

Results on LLaVA-1.6

Results on Qwen2.5-VL

Results on LLaVA-Video

Efficiency Analysis

Citation