V
主页
20240605【Prompt Learning in Vision】陈广义:Prompt Learning Meets Dense Context for …
发布人
报告嘉宾:陈广义 (Carnegie Mellon University / Mohamed bin Zayed University of Artificial Intelligence) 报告时间:2024年6月5日 (星期三)晚上20:30 (北京时间) 报告题目:Prompt Learning Meets Dense Context for Vision-Language Models 报告人简介: Guangyi Chen is currently a Postdoctoral Research Fellow at Carnegie Mellon University, Pittsburgh, USA, and Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE. He received the B.S. and Ph.D. degrees from the Department of Automation, Tsinghua University, China, in 2016 and 2021, respectively. His research interests include computer vision and machine learning, with particular expertise in causal representation learning, attention learning, and video understanding. He has published around 20 papers as the first or co-first author in top-tier journals and conferences, such as CVPR、ICCV、ICLR、ICML、ECCV and IEEE TIP. 个人主页: https://chengy12.github.io/ 报告摘要: Recent advancements indicate that large-scale pre-trained vision-language models (VLMs), such as CLIP, offer a promising alternative for high-quality visual representation learning using natural language supervision. To elicit the pre-trained knowledge of VLMs for downstream tasks, prompt learning, a key parameter-efficient fine-tuning method, has proven significantly successful. However, a gap exists where language with prompts typically conveys coarse, high-level overviews, whereas vision offers detailed, fine-grained context. This talk introduces how to bridge this gap and leverage dense visual context to enhance prompt learning. First, we demonstrate that multiple comprehensive prompts can be developed to describe diverse category characteristics, guided by dense visual context. Second, by transforming a pre-trained image-text matching task into a pixel-text matching task, we can learn prompts that facilitate dense prediction tasks, such as segmentation and detection.
打开封面
下载高清视频
观看高清视频
视频下载器
20240522【计算机辅助诊疗:过去,现在和未来】骆路阳:Understanding and Learning from Imperfect Medical
【VALSE2024】0505 施柏鑫《APR:神经形态相机视觉计算》
【VALSE2024】0505 李鸿升《特邀报告:图像生成和视频生成若干前沿技术探索》
20230607【开放世界的感知:探索可迁移与可持续学习之路】巩东:Continual Learning and Memory Augmentation……
2024年超好用的九大AI工具!免费且强大!(下集)
20240814【多模态医学图像处理及医学大模型的发展近况】王连生:病理数据的多模态分析
20240828【医学多模态分析与研究:从传统模型到大模型的演变】刘明霞:多中心多模态脑影像智能分析及应用研究
【VALSE2024】0505 马月昕《特邀报告:三维场景理解的前世、今生与未来》
【VALSE2024】0505 王兴刚《APR:面向大模型的新型高效率网络架构》
20240731【多模态研究进展】徐偲:面向低质多模态数据的深度学习
20210421【无师自通:自监督学习】田渊栋:Understanding representation learning without negative……
20220706【联邦学习在医学图像处理的应用】李霄霄:Federated learning for healthcare: from theory to……
【VALSE2024】0505《2023-2024年度CV与ML领域重要学术进展》
20230628【可信机器学习及应用】张长青:Trustworthy Multimodal Learning
【VALSE2024】0506《Workshop :三维重建与生成》
【VALSE2024】0505 卢志武《APR:视频生成》
20240814【多模态医学图像处理及医学大模型的发展近况】Panel
20210324【图像视频分割】王井东 Learning high-resolution and object-contextual...
反派机械少女军团战败
20240717【面向事件相机的物体检测与跟踪】王逍:Visual Object Tracking using an Event Camera
20230906【多模态行人重识别的研究进展与未来】叶茫:多模态行人重识别进展与挑战
20220529 VALSE Student Webinar【When CV meets NLP】都一凡:视觉-语言预训练模型综述
【VALSE论文速览-69期】Learning with Twin Noisy Labels for Visible-Infrared Person……
20210324【图像视频分割】沈春华 Instance Segmentation Made Simple
20240612【可信基础模型】韩波:Exploring Trustworthy Foundation Models under Imperfect Data
【VALSE论文速览-82期】Causality Inspired Representation Learning for Domain Generalizat
【VALSE论文速览-101期】Balanced Multimodal Learning via On-the-fly Gradient Modulation
【VALSE2023】0610 朱军《扩散概率模型的前沿进展》
20200805-VALSE 2020 APR-武智融《自监督学习年度进展概述》
【VALSE论文速览-74期】Learning What Not to Segment: A New Perspective on Few-Shot……
20230329【多模态预训练的研究进展与未来】宋睿华:多模态预训练模型及在智能创作领域的应用
【VALSE2023】0610 侯淇彬《开放域目标检测与识别年度进展》
20210317【深度学习中的拓扑美学:图神经网络】Panel
20220309【让机器看懂视频:视频分割与目标追踪】杨宗鑫:视频理解中的多目标联合分割
20211124【标记高效的视觉学习】董力:BEiT: BERT Pre-Training of Image Transformers
20220105【标签噪声学习专题论坛】冯磊:噪声标记学习的鲁棒损失与算法框架
20220316【基于领域知识的机器学习在医学影像分析中的应用】秦璟:Rethink Deep Learning Models for Medical……
20210714【弱监督视觉学习:定位、分割及其他】万方:Weakly Supervised Object Localization:From CNN to…
20220529 VALSE Student Webinar【When CV meets NLP】Panel
20220323【我要找到你:2D/3D物体检测和定位】陈挺:Pix2seq: A Language Modeling Framework for……