Abstract: Visual Grounding (VG) aims to locate the most relevant object or region in an image according to a natural language query. Existing methods in VG utilize fixed image and text representations ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果一些您可能无法访问的结果已被隐去。
显示无法访问的结果