V
主页
[论文速览]BLIP-2 ...with Frozen Image Encoders and Large Language Models[2301.12597]
发布人
论文题目:BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models 论文地址:http://arxiv.org/abs/2301.12597 代码:https://github.com/salesforce/LAVIS/tree/main/projects/blip2 ALBEF: BV1H1421Q7Ft BLIP: BV1fx4y1U7ui LLaVA: BV1n14y1k7B9 Flamingo: BV1pu411G7ce * 本视频旨在传递一篇论文的存在推荐感兴趣的您阅读,并不是详细介绍,受up能力限制经常出现中英混杂,散装英语等现象,请见谅。如论文报道出了偏差,欢迎各位怒斥。 ** 新论文推荐,过往论文查找,欢迎编辑这个文档: https://docs.qq.com/sheet/DSUdOTG9xWUdydVB6 *** Slides每1-2月会上传到置顶动态地址
打开封面
下载高清视频
观看高清视频
视频下载器
[论文速览]Bootstrapping Language-Image Pre-training...[2201.12086]
【LLM前沿】6小时精讲四大多模态大模型CLIP BLIP VIT MLLM及对话机器人办公助手!绝对的通俗易懂的大模型应用教程!
[论文速览]RetNet: A Successor to Transformer for Large Language Models[2307.08621]
[论文速览]OpenVLA: An Open-Source Vision-Language-Action Model[2406.09246]
[论文速览]LoRA: Low-Rank Adaptation of Large Language Models[2106.09685]
[论文简析]Large Language Models as General Pattern Machines[2307.04721]
[论文速览]Denoising Diffusion Probabilistic Models / DDPM[2006.11239]
[论文速览]Flamingo: a Visual Language Model for Few-Shot Learning[2204.14198]
[论文速览]Ferret-v2: An Improved...for Referring and Grounding with LLMs[2404.07973]
[论文速览]Structured Denoising Diffusion Models in Discrete State-Spaces[2107.03006]
[论文速览]Drag Your GAN: Interactive Point-based Manipulation...[2305.10973]
[论文速览]Open-vocabulary Object Segmentation with Diffusion Models[2301.05221]
[论文简析]TokenLearner: What Can 8 Learned Tokens Do for Images and vids[2106.11297]
[论文简析]DAT: Vision Transformer with Deformable Attention[2201.00520]
[论文简析]DeiT: Data-efficient Image Transformers[2012.12877]
[论文速览]Taming Transformers for High-Resolution Image Synthesis[2012.09841]
[论文简析]Deep Unsupervised Learning using Nonequilibrium Thermodynamics[1503.03585]
[论文速览]Visual Prompt Tuning / VPT[2203.12119]
[论文速览]Denoising Diffusion Implicit Models / DDIM[2010.02502]
[论文速览]Open Vocab. Semantic Seg. with Patch Aligned Contrastive...[2212.04994]
[论文简析]Toolformer: Language Models Can Teach Themselves to Use Tools[2302.04761]
[论文速览]DDPG&TD3[1509.02971][1802.09477]
[论文简析]PolyFormer: Referring Image Seg. as Sequential Polygon Gen [2302.07387]
[论文速览]Implicit Behavioral Cloning / IBC[2109.00137]
[论文速览]OWL-ViT: Simple Open-Vocabulary Object Detection with ViT[2205.06230]
[论文简析]Transformers are Sample Efficient World Models[2209.00588]
[论文简析]MobileNets: Efficient CNN for Mobile Vision Applications[1704.04861]
[论文简析]NeRF: Representing Scenes as Neural Radiance Fields...[2003.08934]
[论文速览]NeRF-RL: Reinforcement Learning with Neural Radiance Fields[2206.01634]
[论文简析]Tokens-to-Token ViT: Training ViT from Scratch on ImageNet[2101.11986]
[论文速览]Diffusion Policy: Visuomotor Policy Learning via Action Diff.[2303.04137]
[论文速览]Invariant Information Clustering for Unsupervised Image...[1807.06653]
[论文速览]LaFTer: Label-Free Tuning of Zero-shot Classifier...[2305.18287]
[论文速览]LLaMA-Adapter: Efficient Fine-tuning..Zero-init Attention[2303.16199]
[论文简析]GroupViT: Semantic Segmentation Emerges from Text Supervision[2202.11094]
[论文简析]TNT: Transformer in Transformer[2103.00112]
[论文速览]Masked-attention Mask Tr. for Universal Image Segmentation[2112.01527]
[论文简析]VAE: Auto-encoding Variational Bayes[1312.6114]
[论文简析]NAT: Neighborhood Attention Transformer[2204.07143]
[论文速览]Autoregressive Image Generation using Residual Quantization[2203.01941]