Research

Bridging gesture and sound

How natural human movement can direct AI music generation—no keyboard, no learning curve.

Current work

Discrete Diffusion for Symbolic Music

Unlike image diffusion models, ours operates directly on musical tokens. Each gesture influences the denoising process through cross-attention, giving you real-time melodic control.

Trained on 1 million+ melodies. 8 gesture types map to melodic direction and intensity.

Try the demo →

Architecture

Cross-attention conditioning

Gesture embeddings are attended to at each diffusion step.

# Gesture conditioning
cond = self.gesture_embed(gesture)
attn, _ = self.cross_attn(x, cond, cond)
x = x + attn