Scollr summary
What this paper is about
ComODO is a cross-modal self-supervised distillation framework that transfers semantic knowledge from video to IMU without requiring labels, and is compatible with diverse pretrained video and time-series models, offering the potential to leverage more powerful teacher and student foundation models in future ubiquitous computing research.
Full abstract
Read the full abstract
The goal of creating intelligent, human-centered wearable systems for continuous activity understanding faces a fundamental trade-off: Egocentric video-based models capture rich semantic information and have demonstrated strong performance in human activity recognition (HAR), but their high power consumption, privacy concerns, and dependence on lighting limit their feasibility for continuous on-device recognition. In contrast, inertial measurement unit (IMU) sensors offer an energy-efficient, privacy-preserving alternative, yet lack large-scale annotated datasets, leading to weaker generalization. To bridge this gap, we propose COMODO, a cross-modal self-supervised distillation framework that transfers semantic knowledge from video to IMU without requiring labels. COMODO leverages a pretrained and frozen video encoder to construct a dynamic instance queue to align the feature distributions of video and IMU embeddings. This enables the IMU encoder to inherit rich semantic structure from video while maintaining its efficiency for real-world applications. Experiments on multiple egocentric HAR datasets show that COMODO consistently improves downstream performance, matching or surpassing fully supervised models, and demonstrating strong cross-dataset generalization. Benefiting from its simplicity and flexibility, COMODO is compatible with diverse pretrained video and time-series models, offering the potential to leverage more powerful teacher and student foundation models in future ubiquitous computing research. The code is available at this repository: https://github.com/cruiseresearchgroup/COMODO.
Direct answer
What can I do from this paper page?
Use this page to scan "COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition" quickly: start with the summary and abstract, then check the authors, source, topics, and related papers. From here, open Scollr to follow Context-Aware Activity Recognition Systems research, save the paper, or map adjacent work.
Research areas
Follow related topics
Citation
BibTeX
@article{Wongso2026COMODO,
title = {COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition},
author = {Wilson Wongso and Zechen Li and Yonchanok Khaokaew and Hao Xue and Flora D. Salim},
journal = {Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies},
year = {2026},
doi = {10.1145/3810218},
url = {https://doi.org/10.1145/3810218}
}
FAQ
Using this paper in a discovery workflow
How do I find related work for this paper?
Use the related papers and topic links on this page as starting points. In Scollr, you can also open the paper and build a literature map around its references, citing papers, and related work.
How can I keep up with new Context-Aware Activity Recognition Systems research papers?
Follow Context-Aware Activity Recognition Systems research in Scollr. New papers from the topic flow into a personalized feed, and you can save useful studies to revisit later.
Can I cite this paper from this page?
This page includes a static BibTeX block for COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition. Always verify the DOI, source, and publication details against the publisher record before submitting a manuscript.
Follow this research in Scollr
Follow the topics and authors behind this paper, save useful studies, and build a literature map when you are ready to go deeper.
Get the app