Abstract: The rich motion and appearance cues between consecutive frames are crucial for robust visual tracking. However, most existing tracking methods are still limited in designing different ...
Abstract: Vision-Language Models (VLMs) have recently advanced the Visual Object Tracking (VOT) performance. In VLMs, a vision encoder is employed to obtain visual representation, and a text encoder ...