V
主页
京东 11.11 红包
20240529【可控的视觉内容生成】陈铠:Geometric-Controllable Visual Generation: A Systematic ……
发布人
报告嘉宾:陈铠 (香港科技大学) 报告时间:2024年5月29日 (星期三)晚上20:30 (北京时间) 报告题目:Geometric-Controllable Visual Generation: A Systematic Solution 报告人简介: Kai Chen is a PhD candidate in HKUST, supervised by Prof. Dit-Yan Yeung. His research aims at building generalizable AI systems via a data-centric perspective, especially in controllable generation for visual world modeling, Mixture-of-Experts (MoE) and (M)LLM self-alignment. He has published more than 10 papers in top conferences including CVPR, ICCV and ECCV, and actively served as academic reviewers and workshop organizers to promote development of the research community. 个人主页: https://kaichen1998.github.io/ 报告摘要: Controllability is an essential property to use generative models in real-life applications. Text prompts are currently considered as the primary conditions due to the superior interactivity with humans. However, different from language modeling, our visual world is 3D environment with precise geometric constraints. A typical case is that a robot can “turn left” in various ways, but the moving trajectory cannot be determined without the specific geometric information (e.g., angles and distances). In this talk, I will systematically discuss how to introduce geometric controls into foundational text-to-image generative models, which are then generalized to controllable video and 3D scene generation separately. Finally, I will discuss several remaining problems proposed in our ECCV 2024 W-CODA Workshop, which might finally lead us toward unified visual world modeling.
打开封面
下载高清视频
观看高清视频
视频下载器
20240529【可控的视觉内容生成】刘希慧:Towards Controllable and Compositional Visual Content ……
【VALSE2024】0505 谢凌曦《APR:视觉通用人工智能》
【VALSE2024】0505 李鸿升《特邀报告:图像生成和视频生成若干前沿技术探索》
20240918【医学视觉语言大模型:进展与展望】郑冶枫:Medical Imaging Meets Vision-Language Model
20240731【多模态研究进展】徐偲:面向低质多模态数据的深度学习
【VALSE2024】0506《Tutorial:开放词汇视觉感知论坛》
【VALSE2024】0505 赵恒爽《APR:视觉基础大模型》
20240814【多模态医学图像处理及医学大模型的发展近况】陈浩:多模态计算病理基准模型:挑战和未来
20240925【大模型机理分析】张辉帅:大模型表征空间的理解与安全可控生成
20230607【开放世界的感知:探索可迁移与可持续学习之路】巩东:Continual Learning and Memory Augmentation……
20240717【面向事件相机的物体检测与跟踪】王逍:Visual Object Tracking using an Event Camera
【VALSE2024】0506《Workshop :具身智能的视觉与学习》
【VALSE2024】0505 王兴刚《APR:面向大模型的新型高效率网络架构》
20240605【Prompt Learning in Vision】刘东方:Prompt Tuning as Sustainable Fast Learner
20210414【元学习研究的进展与未来】孟德宇:应对高光谱复杂噪声的加权模型:一种数据驱动的显式加权机制
20240320【鲁棒开放世界感知】李祥泰:Beyond SAM: Towards More Efficient, Unified, General View…
【VALSE2024】0505 施柏鑫《APR:神经形态相机视觉计算》
20230712【类脑视觉算法研究与应用】顾实:Approximation and Adaption for Obtaining High-performing
20210811【知其所以然:因果推理与学习】张含望:真正的无偏模型
【VALSE2024】0505 杨耀东《APR:从偏好对齐到价值对齐与超对齐》
20220413【脑启发视觉】余肇飞:面向类脑视觉的生物视觉编解码机制和模型
20240313【Sora与视频生成新时代】刘子纬:Vchitect: Building Open-Source Foundation System for …
20240918【医学视觉语言大模型:进展与展望】周洪宇:Learning to diagnose whispers of the human body
20220831【就正有道:物理机理驱动的图像恢复与增强】任文琦:融合先验知识的图像视频复原方法研究
20210816【VALSE短教程】《视觉语言导航》特邀讲师:吴琦副教授(澳大利亚阿德莱德大学)
2025最好出论文的方向:结合图神经网络GNN构建局部特征!50集理论基础+创新点讲解,学会轻松发SCI!(AI人工智能丨机器学习丨深度学习丨计算机视觉丨CV)
【VALSE2024】0505 俞扬《APR:世界模型与具身决策》
20211121特邀报告【人工智能在赋能设备、赋能临床、赋能科研中的应用实例】沈定刚(上海科技大学、上海联影智能医疗科技有限公司)
20220615【AI for Science之物理信息驱动的深度学习】王建勋:Leveraging physics-induced bias in……
20210707【预训练大模型 :大势所趋or昙花一现?】Panel
20240814【多模态医学图像处理及医学大模型的发展近况】王连生:病理数据的多模态分析
太厉害了!终于有人能把OpenCV图像处理+YOLO目标检测讲的这么通俗易懂了!无偿分享学不会你来找我!_计算机视觉/深度学习/OpenCV/YOLO
20231025【面向视觉的零样本学习】李晶晶:基于生成模型的零样本视觉识别
20230531【大模型时代下的三维视觉:路在何方?】杨波:3D Semantic and Instance Segmentation without 3D……
20211027【慧眼独具:基于视觉的遥测式生理指标测量】杨学志:面由心生:基于面部视频的心血管生理信号检测
20230628【可信机器学习及应用】张长青:Trustworthy Multimodal Learning
20241009【视觉计算中的跨域和跨任务学习问题】李伟宏:异构多任务学习:挑战与进展
目前B站最完整的【图神经网络从入门到精通】讲解,我居然20小时就学懂了GNN原理模型与应用,纯干货!超详细!看完血赚!神经网络深度学习/AI 人工智能
20230531【大模型时代下的三维视觉:路在何方?】阳行意:Anything-3D-基于模型重用的任意物体的3D重建
20220413【脑启发视觉】张铁林:基于类脑脉冲神经网络的视听觉信息处理