Using LLMS for multimodal sensor fusion for task recognition

nimda November 20, 2025

0 6 1 minute read

Using LLMS for multimodal sensor fusion for task recognition

This paper is accepted for reading from the time series of the health meeting on neurips 2025.

Sensor Muscles provide valuable information around the activities and context of downstream applications, although combining relevant information can be challenging. We show that large-scale linguistic models (LLMS) can be used for late-stage inference of task segmentation from Audio and Motion Time Series data. We extract a subset of the recognition data for various tasks in all contexts (e.g. household tasks, sports) from the ego4d database. The tested LLMS achieved zero-zero- and one-division F1-scores significantly above chance, without task-specific training. Zero-shot segmentation with LLM-based Fusion from certain modality-specific models can enable multimodal temporal applications where there is a corresponding training data set for learning the shared motion space. In addition, LLM-based Fusion can allow model routing without requiring additional memory and integration for specific multimodal target models.

Source link

nimda November 20, 2025

0 6 1 minute read