V
主页
[论文简析]Visual Autoregressive Modeling: ...via Next-Scale Prediction[2404.02905]
发布人
论文题目:Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction 论文地址:https://arxiv.org/abs/2404.02905 代码:https://github.com/FoundationVision/VAR VQ-VAE:BV1bb4y1i7j6 VQGAN:BV1ym4y1d7iP MaskGIT:BV1qS4y1r7p1 RQ-Transformer:BV1gY411E7ge * 视频受up能力限制经常出现中英混杂,散装英语等现象,请见谅。如论文理解报道出了偏差,欢迎各位怒斥。 ** 新论文推荐,过往论文查找,欢迎编辑这个文档: https://docs.qq.com/sheet/DSUdOTG9xWUdydVB6 *** Slides每1-2月会上传到置顶动态地址
打开封面
下载高清视频
观看高清视频
视频下载器
何凯明:Autoregressive Image Generation without Vector Quantizarion.
71、VQGAN模型+VQ离散化模块的代码讲解
[论文速览]LLaVA: Visual Instruction Tuning[2304.08485]
57、Autoregressive Diffusion Model自回归扩散模型用于序列预测论文讲解
69、VQGAN+Transformer自回归建模图像生成的论文原理细致讲解
【北大,字节】自回归图像生成模型 Visual Autoregressive Model(VAR), 通过Next-Scale预测方式实现图像生成
【小红书 InstantX】InstantStyle 论文讲解,效果炸裂
[论文简析]VQ-VAE:Neural discrete representation learning[1711.00937]
【Daily Paper】 #2 Visual Autoregressive Modeling 自回归图像生成模型
[论文速览]Ferret-v2: An Improved...for Referring and Grounding with LLMs[2404.07973]
论文速读12:iKUN
[论文速览]Taming Transformers for High-Resolution Image Synthesis[2012.09841]
[论文速览]Bootstrapping Language-Image Pre-training...[2201.12086]
论文速读17:Mamba
为什么还是有很多傻der源源不断地涌入计算机视觉?
[论文简析]Deconstructing Denoising Diffusion Models for SSL[2401.14404]
[论文简析]Contrastive Learning for Unpaired Image-to-Image Translation[2007.15651]
硕士生去搞计算机视觉,是纯纯的脑瘫行为!
[双字] {Python}中5个好用的<字符串格式化>技巧
[论文速览]Autoregressive Image Generation using Residual Quantization[2203.01941]
[论文速览]Visual Prompt Tuning / VPT[2203.12119]
生成模型的新SOTA?Visual Autoregressive Modeling. 及VQ-VAE, VQ-GAN, VQ-DDPM介绍
[论文速览]OpenVLA: An Open-Source Vision-Language-Action Model[2406.09246]
Diffusion Illusions: 图里藏了个图(扩散模型使用
[论文速览]Align before Fuse / ALBEF: ...[2107.07651]
[论文速览]Finite Scalar Quantization: VQ-VAE Made Simple[2309.15505]
【我 们 上 央 视 了!】Hugging Face 抱抱脸呼吁人工智能开源
[论文简析]MaskGIT: Masked Generative Image Transformer[2202.04200]
[论文简析]Improving fine-grained understanding in image-text pre-training[2401.0986]
[论文简析]Large Language Models as General Pattern Machines[2307.04721]
[论文速览]BLIP-2 ...with Frozen Image Encoders and Large Language Models[2301.12597]
[论文简析]Vision Transformers Need Registers[2309.16588]
[论文速览]A Simple LLM Framework for Long-Range Video Question-Answering[2312.17235]
[论文速览]A Self-Improving Generalist Agent for Robotic Manipulation[2306.11706]
[论文速览]MixUp: Beyond Empirical Risk Minimization[1710.09412]
[论文速览]LLaRA: Supercharging Robot Learning Data for VLM Policy[2406.20095]
[论文速览]CRG: Improving Grounding in VLM w/o training[2403.02325]
[论文速览]FreeU: Free Lunch in Diffusion U-Net[2309.11497]
[KKND2024]一位up拜访了自己的15320名粉丝,这是他们眼中的世界的变化