V
主页
CV论文阅读OPENAI CLIP(1/3):Learning Transferable Visual Models From Natural Language
发布人
OpenAI's CLIP(1/3): Learning Transferable Visual Models From Natural Language Supervision Abstract State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at https://github.com/OpenAI/CLIP. #openai #clip #pretrain #vitransformer #computervision #coursera #ml #course #contrastivelearning #sota #imagenet
打开封面
下载高清视频
观看高清视频
视频下载器
十分钟看懂脸书太极拳法Wav2Vec2.0 -- 语音预训练模型就像绝命毒师老白教杰西
详解OpenAI GPT-3: Language Models are Few-Shot Learners(1/3)
B站史上最全的【NLP自然语言处理】保姆级入门教程,整整300集从零基础到项目实战,草履虫都能听懂学完即可就业!
NLP模型与知识图谱有效结合!华东理工博士带你快速入门NLP与知识图谱,从原理解析到案例解读,学完可提高模型的语义理解和推理能力!
【清华NLP】刘知远团队大模型公开课,从入门到实战完整版!|带你从入门到实战!
十分钟看懂谷歌W2v-BERT: Combining Contrastive Learning and Masked Language Modeling
[Long Review] GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
知识图谱实战系列:华东理工博士精讲知识图谱核心知识点,带你实战练手Neo4j图数据、医疗智能问答助手、NLP关系抽取核心等!
[Long Review] Deduplicating Training Data Makes Language Models Better
三分钟搞定ChatGPT
CV论文阅读OpenAI CLIP(2/3):Learning Transferable Visual Models From Natural Language
十分钟看懂脸书虎爪绝户手 - 虎BERT - HuBERT: Self-Supervised Speech Representation Learning
语音文本技术论文阅读 Scaling Laws for Neural Language Models
强推!清华大学2024大模型神级教程,从入门到实战,连草履虫都能听懂!人工智能/大模型/AIGC/自然语言处理/机器学习/深度学习/NLP算法
[Long Review] Transfer Learning from Speaker Verification to Multispeaker TTS
语音文本技术论文阅读 Improving Speech Recognition Accuracy of Local POI Using Geographical
语音文本技术论文阅读 SNRi Target Training for Joint Speech Enhancement and Recognition
语音文本技术论文阅读 Joint Unsupervised and Supervised Training for Multilingual ASR
[Long Review] Axial Attention in Multidimensional Transformers
语音文本技术论文阅读 RNN-T: Sequence Transduction with Recurrent Neural Networks
语音文本技术论文阅读 OpenAI最新的Whisper ASR也会像GPT-3一样火起来吗?
NLP学起来太难了吧!迪哥带你高效入门NLP自然语言处理,从原理到分类实战,3小时完全吃透!
强推!这可能是唯一能将AI Agent讲清楚的课程了,公认最适合新手入门Agent大模型实战系列,3小时全流程解读分析,简直比刷剧还爽!人工智能|大模型
详解微软零样本语音合成VALL-E
2024强推!终于有教程把【深度学习时间序列预测】讲透彻了!LSTM、Informer、ARIMA模型、Pandas从零详解,迪哥半天带你搞定时间序列任务实战!
详解OpenAI GPT-3: Language Models are Few-Shot Learners(2/3)
语音文本技术论文阅读 One-Edit-Distance Network (OEDN) in Mispronunciation Detection & ASR
语音文本技术论文阅读 Branchformer: Parallel MLP-Attention Architectures and E-Branchformer
三分钟搞定微软零样本语音合成VALL-E
十分钟告诉你为什么OpenAI的Whisper语音识别没ChatGPT那么好用 [语音语言论文阅读]
[Short Review] Deduplicating Training Data Makes Language Models Better
比刷剧还爽!终于有人把李宏毅教授2024年AI教程整合发布出来:从机器学习、深度学习到自然语言处理再到生成式AI一口气学到爽!
十分钟看懂微软大力金刚掌WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack
基于Bert模型的自然语言处理实战与论文精读,计算机博士50集带你吃透NLP入门到实战!
这绝对是全B站最系统(没有之一)的人工智能基础教学!内含机器学习、深度学习、强化学习、NLP、等多个方向解析,零基础必看!
[Long Review] Towards Zero-Label Language Learning
【包教会的】从入门到提示词工程师:全网最通俗易懂Prompt-Learning提示词学习教程!草履虫都学的会!
[Short Review] Transfer Learning from Speaker Verification to Multispeaker TTS
强推!三位大牛合作发表在Nature上深度学习论文,建议所有深度学习初学者把它作为第一篇论文来阅读!
十分钟看懂谷歌铁布衫BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised ...