본문 바로가기

Paper Review/Multimodal

(6)

[Paper Review] Learning to Prompt Your Domain for Vision-Language Models [Paper Review] Learning to Prompt Your Domain for Vision-Language Models

[Paper Review] How Culturally Aware are Vision-Language Models? [Paper Review] How Culturally Aware are Vision-Language Models?

[Paper Review] InstructBLIP: Toward General-purpose Vision-Language Models with Instruction Tuning [Paper Review] InstructBLIP: Toward General-purpose Vision-Language Models with Instruction Tuning

[Paper Review] EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory [Paper Review] EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory

[Paper Review] BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language

[Architecture] EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory EVCap: Retrieval-Augmented Image Captioning with External Visual-Name MemoryKeywords: Lightweight Image-captioning, Multimodality, Retrieval Augmentation학회: CVPR, Filckr30k, NoCaps, WHOOPSDataset: COCO, LVIS관련 연구: smallCAP진행 일시: 2024년 7월 1일논문 주소: https://arxiv.org/pdf/2311.15879year: 2024논문 요약기존 Image captioning task의 모델은 LLM을 활용함에 따라 파라미터가 많아지면서 open world knowledge를 유지하도록 하는데에 어려움이 있었으며, Retri..

이전 1 다음

티스토리툴바