WebApr 3, 2024 · First, we introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom. The dataset combines visual, textual and temporal coherence reasoning together with knowledge-based questions, which need of the experience obtained from the viewing of the series to be answered. WebRecently, KnowIT VQA [5] introduced a combination of detailed questions about scenes and knowledge-based questions about the story. The proposed model re-lied on human-generated annotations to understand the insights of the plot. On the contrary, our model exploits both speci c and general story information
Knowledge-Based Video Question Answering with …
WebOct 23, 2024 · KnowIT VQA: Answering Knowledge-Based Questions about Videos. We propose a novel video understanding task by fusing knowledge-based and video question … WebApr 17, 2024 · First, we introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom. The dataset combines visual, … folding chairs for patio
Papers with Code - LiVLR: A Lightweight Visual-Linguistic …
WebOct 23, 2024 · First, we introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom. The dataset combines visual, … WebOct 23, 2024 · First, we introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom. The dataset combines visual, textual and temporal coherence reasoning together with knowledge-based questions, which need of the experience obtained from the viewing of the series to be answered. Second, … WebMar 26, 2024 · Our model outperforms the state of the art on the KnowIT VQA dataset by a large margin, without using question-specific human annotation or human-made plot summaries. It even outperforms human... folding chairs for outdoors