Context-Aware Activity Recognition Systems Open access Peer reviewed

COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition

Wilson Wongso, Zechen Li, Yonchanok Khaokaew, Hao Xue and 1 more

Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies | Jun 15, 2026

Scollr summary

What this paper is about

ComODO is a cross-modal self-supervised distillation framework that transfers semantic knowledge from video to IMU without requiring labels, and is compatible with diverse pretrained video and time-series models, offering the potential to leverage more powerful teacher and student foundation models in future ubiquitous computing research.

Full abstract

Read the full abstract

The goal of creating intelligent, human-centered wearable systems for continuous activity understanding faces a fundamental trade-off: Egocentric video-based models capture rich semantic information and have demonstrated strong performance in human activity recognition (HAR), but their high power consumption, privacy concerns, and dependence on lighting limit their feasibility for continuous on-device recognition. In contrast, inertial measurement unit (IMU) sensors offer an energy-efficient, privacy-preserving alternative, yet lack large-scale annotated datasets, leading to weaker generalization. To bridge this gap, we propose COMODO, a cross-modal self-supervised distillation framework that transfers semantic knowledge from video to IMU without requiring labels. COMODO leverages a pretrained and frozen video encoder to construct a dynamic instance queue to align the feature distributions of video and IMU embeddings. This enables the IMU encoder to inherit rich semantic structure from video while maintaining its efficiency for real-world applications. Experiments on multiple egocentric HAR datasets show that COMODO consistently improves downstream performance, matching or surpassing fully supervised models, and demonstrating strong cross-dataset generalization. Benefiting from its simplicity and flexibility, COMODO is compatible with diverse pretrained video and time-series models, offering the potential to leverage more powerful teacher and student foundation models in future ubiquitous computing research. The code is available at this repository: https://github.com/cruiseresearchgroup/COMODO.

Direct answer

What can I do from this paper page?

Use this page to scan "COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition" quickly: start with the summary and abstract, then check the authors, source, topics, and related papers. From here, open Scollr to follow Context-Aware Activity Recognition Systems research, save the paper, or map adjacent work.

Authors

Researchers on this paper

Wilson Wongso

middle | UNSW Sydney | ORCID 0000-0003-0896-1941

Zechen Li

middle | UNSW Sydney | ORCID 0000-0002-1810-1446

Yonchanok Khaokaew

middle | King Mongkut's University of Technology North Bangkok | ORCID 0000-0003-4297-6274

Hao Xue

middle | University of Hong Kong | ORCID 0009-0000-3369-8511

Flora D. Salim

last | UNSW Sydney | ORCID 0000-0002-1237-1664

Research areas

Follow related topics

Citation

BibTeX

@article{Wongso2026COMODO,
  title = {COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition},
  author = {Wilson Wongso and Zechen Li and Yonchanok Khaokaew and Hao Xue and Flora D. Salim},
  journal = {Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies},
  year = {2026},
  doi = {10.1145/3810218},
  url = {https://doi.org/10.1145/3810218}
}

FAQ

Using this paper in a discovery workflow

How do I find related work for this paper?

Use the related papers and topic links on this page as starting points. In Scollr, you can also open the paper and build a literature map around its references, citing papers, and related work.

How can I keep up with new Context-Aware Activity Recognition Systems research papers?

Follow Context-Aware Activity Recognition Systems research in Scollr. New papers from the topic flow into a personalized feed, and you can save useful studies to revisit later.

Can I cite this paper from this page?

This page includes a static BibTeX block for COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition. Always verify the DOI, source, and publication details against the publisher record before submitting a manuscript.

Follow this research in Scollr

Follow the topics and authors behind this paper, save useful studies, and build a literature map when you are ready to go deeper.

Get the app