V
主页
京东 11.11 红包
[论文速览]Taming Transformers for High-Resolution Image Synthesis[2012.09841]
发布人
论文题目: Taming Transformers for High-Resolution Image Synthesis (VQGAN) 论文地址: http://arxiv.org/abs/2012.09841 论文代码: https://git.io/JnyvK VQVAE: BV1bb4y1i7j6 第二阶段的Transformer其实在DALL-E里面就有了 https://arxiv.org/abs/2102.12092 * 本视频旨在传递一篇论文的存在推荐感兴趣的您阅读,并不是详细介绍,受up能力限制经常出现中英混杂,散装英语等现象,请见谅。如论文报道出了偏差,欢迎各位怒斥。 ** 新论文推荐,过往论文查找,欢迎编辑这个文档: https://docs.qq.com/sheet/DSUdOTG9xWUdydVB6 *** Slides每1-2月会上传到置顶动态地址
打开封面
下载高清视频
观看高清视频
视频下载器
[论文速览]SODA: Bottleneck Diffusion Models for Representation Learning[2311.17901]
[论文速览]Visual Prompt Tuning / VPT[2203.12119]
[论文速览]DDPG&TD3[1509.02971][1802.09477]
[论文速览]Diffusion Policy: Visuomotor Policy Learning via Action Diff.[2303.04137]
[论文简析]Location-Aware Self-Supervised Transformers for Semantic Seg.[2212.02400]
[论文速览]Ferret-v2: An Improved...for Referring and Grounding with LLMs[2404.07973]
[论文速览]LongLoRA: Efficient Fine-tuning of Long-Context LLMs[2309.12307]
[论文简析]Deep Unsupervised Learning using Nonequilibrium Thermodynamics[1503.03585]
[论文速览]OWL-ViT: Simple Open-Vocabulary Object Detection with ViT[2205.06230]
[论文速览]Open Vocab. Semantic Seg. with Patch Aligned Contrastive...[2212.04994]
[论文速览]OpenVLA: An Open-Source Vision-Language-Action Model[2406.09246]
[论文速览]Decision Transformer: RL via Sequence Modeling[2106.01345]
[论文速览]Open-vocabulary Object Segmentation with Diffusion Models[2301.05221]
[论文简析]Unsupervised Image-to-Image Translation Networks[1703.00848]
[论文速览]Drag Your GAN: Interactive Point-based Manipulation...[2305.10973]
[论文简析]PolyFormer: Referring Image Seg. as Sequential Polygon Gen [2302.07387]
[论文简析]Visual Autoregressive Modeling: ...via Next-Scale Prediction[2404.02905]
[论文速览]Personalizing Text2Img Generation using Textual Inversion[2208.01618]
[论文速览]Implicit Behavioral Cloning / IBC[2109.00137]
[论文速览]Theia: Distilling Diverse Vision Foundation Models for Robot..[2407.20179]
[论文速览]EViT: Expediting Vision Transformers via Token Reorganizations[2202.07800]
[论文速览]Scalable Video Object Segmentation with Simplified Framework[2308.09903]
[论文速览]Rethinking the Truly Unsupervised Image-to-Image Translation[2006.06500]
[论文速览]Denoising Diffusion Probabilistic Models / DDPM[2006.11239]
[论文简析]Regularized Vector Quantization for Tokenized Image Synthesis[2303.06424]
[论文简析]NeRF in the Wild: NeRF for Unconstrained Photo Collections[2008.02268]
[论文速览]NeRF-RL: Reinforcement Learning with Neural Radiance Fields[2206.01634]
[论文简析]XCiT: Cross-Covariance Image Transformers[2106.09681]
[论文速览]Flamingo: a Visual Language Model for Few-Shot Learning[2204.14198]
[论文简析]SAC: Soft Actor-Critic Part 2[1812.05905]
[论文速览]BLIP-2 ...with Frozen Image Encoders and Large Language Models[2301.12597]
[论文速览]CRG: Improving Grounding in VLM w/o training[2403.02325]
[论文简析]Swin Transformer: Hierarchical ViT using Shifted Windows[2103.14030]
[论文简析]NeRF: Representing Scenes as Neural Radiance Fields...[2003.08934]
[论文简析]Tokens-to-Token ViT: Training ViT from Scratch on ImageNet[2101.11986]
[论文简析]Multimodal Unsupervised Image-to-Image Translation[1804.04732]
[论文简析]Unified Transformer for Efficient Spatiotemporal...[2201.04676]
[论文简析]BEVT: BERT Pretraining of Video Transformers[2112.01529]
[论文简析]Finding an Unsupervised Image Segmenter in .. Generative Model[2105.08127]
[论文简析]Vision Transformers Need Registers[2309.16588]