Abstract: Audio-driven simultaneous gesture generation is vital for human-computer communication, AI games, and film pro-duction. While previous research has shown promise, there are still limitations ...