MM-EGO: Going to the Elocentric Multimodal LLMS

This study aims to fully explore fully create a model of the Multimodal Foundation of the ELOCECTIC. To achieve this goal, we work on three premiums. First, as the lack of eqA Data for egoocentn video, we automatically generate the high QA quality samples by 30 videos from 30 seconds from EGO4D. This is one of the largest egoocentric quiz. Second, we give egocentric QA Benchmark with 629 videos and 7,026 questions to test the power of models to see and memorize visual information to all different videos. We introduce a new way to test the discrimination of criminal discrimination to help reduce the unavoidable language available in testing models. Third, we propose a special Multimodal construction that includes the novel “ memory pointer. This includes visual information using the effectiveness of the video.
40 Hong Kong University of Science Netechnology (HKUST)



