Research
Bridging gesture and sound
How natural human movement can direct AI music generation—no keyboard, no learning curve.
Current work
Discrete Diffusion for Symbolic Music
Unlike image diffusion models, ours operates directly on musical tokens. Each gesture influences the denoising process through cross-attention, giving you real-time melodic control.
Trained on 1 million+ melodies. 8 gesture types map to melodic direction and intensity.
Try the demo →Architecture
Cross-attention conditioning
Gesture embeddings are attended to at each diffusion step.
# Gesture conditioning cond = self.gesture_embed(gesture) attn, _ = self.cross_attn(x, cond, cond) x = x + attn