Language-guided robotic manipulation is advancing rapidly with Vision-Language-Action (VLA) models, yet faces fundamental challenges in 3D perception. This paper addresses two critical challenges: the ...