V
主页
TiC-CLIP: Continual Training of CLIP Models
发布人
论文简述:在这篇名为TiC-CLIP: Continual Training of CLIP Models的论文中,作者们提出了一种新的方法来持续训练视觉-语言模型。为了解决大型基础模型不断更新以适应最新数据所带来的高昂成本问题,他们提出了一个持续的培训策略。然而,由于缺乏大规模的不断学习基准或基线,这个问题变得更加严重。为了克服这个挑战,作者们引入了一套大规模的定时(TiC)基准,用于训练视觉-语言模型:TiC-DataCompt、TiC-YFCC和TiC-RedCaps,这些数据集包含了超过1270亿个带有时间戳的图像-文本对,涵盖了9年的时间(2014年至2022年)。首先,作者们使用他们的基准来评估各种动态性能指标,以衡量现有模型在时间上的鲁棒性。他们发现,与最近训练的OpenCLIP仓库中的模型相比,开放AI的CLIP(训练数据截至2020年)在其编选的2021年至2022年的检索任务中失去了约8%的零击球准确率。接下来,作者们研究如何高效地训练时间连续的数据。他们展示了一种简单的回顾式方法,即从最后一个检查点继续培训和重现旧数据,这相较于从头开始训练减少了2.5倍的计算量。 论文链接: https://arxiv.org/pdf/2310.16226
打开封面
下载高清视频
观看高清视频
视频下载器
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
本地安装Qwen2-VL 2B-Instruct 效果最好的视觉语言模型
CLEX: Continuous Length Extrapolation for Large Language Models
Moral Foundations of Large Language Models
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
CLIP作为RNN:无需训练即可分割无数视觉概念
Compressing Context to Enhance Inference Efficiency of Large Language Models
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
VILA:探索视觉语言预训练的有效设计选择
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
Offline Actor-Critic Reinforcement Learning Scales to Large Models
FLAP: Fast Language-Audio Pre-training
TrustLLM: Trustworthiness in Large Language Models
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
BitNet: Scaling 1-bit Transformers for Large Language Models
Can Large Language Models be Good Path Planners? A Benchmark and Investigation o
ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network L
AutoMix: Automatically Mixing Language Models
多模态协同学习模型
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
ADaPT: As-Needed Decomposition and Planning with Language Models
Interactive Task Planning with Language Models
Making Large Language Models Perform Better in Knowledge Graph Completion
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Orca 2: Teaching Small Language Models How to Reason
System 2 Attention (is something you might need too)
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Large Language Models Cannot Self-Correct Reasoning Yet
稳定分数蒸馏:高质量三维生成新方法
基于统一视觉语言模型的图像和视频混合学习
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-S
Memory Augmented Language Models through Mixture of Word Experts
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
Localizing and Editing Knowledge in Text-to-Image Generative Models
Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock
Densely Captioned Images: 评估视觉语言模型的新基准