Human Motion and Animation Open access

Plan, Don't Pose: Long Composite Motion Generation with Text-Aligned BFM

Nikolay Shvetsov, Maksim Bobrin, Nazar Buzun, Dmitry V. Dylov

arXiv (Cornell University) | May 28, 2026

Abstract

Text-to-motion (T2M) generation has broad applications in character animation, virtual avatars, and human-robot interaction. Existing methods typically generate pose trajectories or motion tokens directly from language, forcing a single model to handle semantic interpretation, long-horizon structure, and low-level physical realization. This coupling makes them costly and often unreliable for long, compositional, or semantically dense prompts. We propose Text2BFM, the first framework that aligns natural language with pretrained Behavioral Foundation Models (BFMs) for T2M generation without relying on heavy end-to-end motion generators. Text2BFM operates in the latent policy space of a frozen BFM, using it as an executable motion prior. A text-aligned variational behavioral bottleneck compresses BFM policy-latent sequences into compact motion representations that are compatible with language and preserve long-horizon behavioral structure. Generation is performed in this compact behavioral manifold with a lightweight conditional generator, and the resulting latent encoded behaviors are decoded into policy latents that drive the pretrained frozen BFM. By decoupling semantic planning from motion execution, Text2BFM achieves efficient, robust T2M generation and strong performance on long, compositional textual descriptions.

Direct answer

What can I do from this paper page?

Use this page to scan "Plan, Don't Pose: Long Composite Motion Generation with Text-Aligned BFM" quickly: start with the summary and abstract, then check the authors, source, topics, and related papers. From here, open Scollr to follow Human Motion and Animation research, save the paper, or map adjacent work.

Authors

Researchers on this paper

Nikolay Shvetsov

first | ORCID 0000-0001-9526-7637

Maksim Bobrin

middle

Nazar Buzun

middle | ORCID 0000-0002-4649-2827

Dmitry V. Dylov

last | ORCID 0000-0003-2251-3221

Research areas

Follow related topics

Latest Human Motion and Animation research Multimodal Machine Learning Applications Generative Adversarial Networks and Image Synthesis

Citation

BibTeX

@article{Shvetsov2026Plan,
  title = {Plan, Don't Pose: Long Composite Motion Generation with Text-Aligned BFM},
  author = {Nikolay Shvetsov and Maksim Bobrin and Nazar Buzun and Dmitry V. Dylov},
  journal = {arXiv (Cornell University)},
  year = {2026},
  doi = {10.48550/arxiv.2605.29906},
  url = {https://doi.org/10.48550/arxiv.2605.29906}
}

FAQ

Using this paper in a discovery workflow

How do I find related work for this paper?

Use the related papers and topic links on this page as starting points. In Scollr, you can also open the paper and build a literature map around its references, citing papers, and related work.

How can I keep up with new Human Motion and Animation research papers?

Follow Human Motion and Animation research in Scollr. New papers from the topic flow into a personalized feed, and you can save useful studies to revisit later.

Can I cite this paper from this page?

This page includes a static BibTeX block for Plan, Don't Pose: Long Composite Motion Generation with Text-Aligned BFM. Always verify the DOI, source, and publication details against the publisher record before submitting a manuscript.

Follow this research in Scollr

Follow the topics and authors behind this paper, save useful studies, and build a literature map when you are ready to go deeper.

Get the app