V
主页
京东 11.11 红包
[Long Review] Cascaded Diffusion Models for High Fidelity Image Generation
发布人
Join 'Speech and Language Technologies' Meetup group https://www.meetup.com/speech-and-language-technology-meetup-group/ to see weekly paper reading schedules and discussions. Cascaded Diffusion Models for High Fidelity Image Generation Jonathan Ho, Chitwan Saharia, William Chan, David J. Fleet, Mohammad Norouzi, Tim Salimans https://arxiv.org/abs/2106.15282 We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation benchmark, without any assistance from auxiliary image classifiers to boost sample quality. A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution, beginning with a standard diffusion model at the lowest resolution, followed by one or more super-resolution diffusion models that successively upsample the image and add higher resolution details. We find that the sample quality of a cascading pipeline relies crucially on conditioning augmentation, our proposed method of data augmentation of the lower resolution conditioning inputs to the super-resolution models. Our experiments show that conditioning augmentation prevents compounding error during sampling in a cascaded model, helping us to train cascading pipelines achieving FID scores of 1.48 at 64x64, 3.52 at 128x128 and 4.88 at 256x256 resolutions, outperforming BigGAN-deep, and classification accuracy scores of 63.02% (top-1) and 84.06% (top-5) at 256x256, outperforming VQ-VAE-2.
打开封面
下载高清视频
观看高清视频
视频下载器
语音NLP论文阅读 Token-level Sequence Labeling for SLU using Compositional E2E Models
十分钟看懂微软大力金刚掌WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack
十分钟看懂脸书太极拳法Wav2Vec2.0 -- 语音预训练模型就像绝命毒师老白教杰西
[Long Review] Axial Attention in Multidimensional Transformers
[Long Review] Conformer: Convolution-augmented Transformer for Speech Recogniti
详解OpenAI GPT-3: Language Models are Few-Shot Learners(2/3)
[Long Review]Switch Transformers: Scaling to Trillion Parameter Models with
[Long Review] Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using
详解微软零样本语音合成VALL-E
[Long Review] Transfer Learning from Speaker Verification to Multispeaker TTS
[Olewave's Long Review] Efficient Training of Neural Transducer for Speech Recog
十分钟看懂谷歌铁布衫BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised ...
[Long Review] CLAS: Deep context: end-to-end contextual speech recognition
详解AudioLM: a Language Modeling Approach to Audio Generation
语音文本技术论文阅读 RNN-T: Sequence Transduction with Recurrent Neural Networks
CV论文阅读OpenAI CLIP(2/3):Learning Transferable Visual Models From Natural Language
[Long Review]Kullback-Leibler Divergence: Listen, Attend, Spell and Adapt ASR
超全超简单!一口气学完CNN、RNN、GAN、GNN、DQN、Transformer、LSTM、DBN等八大深度学习神经网络算法!存下吧,真的比啃书快多了!!
语音文本技术论文阅读 XLS-R: Self-supervised Cross-lingual Speech Representation Learning a
[Long Review] Towards Zero-Label Language Learning
语音文本技术论文阅读 Scaling Laws for Neural Language Models
[Short Review] Deduplicating Training Data Makes Language Models Better
语音文本技术论文阅读 SNRi Target Training for Joint Speech Enhancement and Recognition
十分钟看懂脸书虎爪绝户手 - 虎BERT - HuBERT: Self-Supervised Speech Representation Learning
三分钟搞定ChatGPT
语音文本技术论文阅读 Joint Unsupervised and Supervised Training for Multilingual ASR
[Short Review]Conformer Convolution-augmented Transformer for Speech Recognition
十分钟告诉你为什么OpenAI的Whisper语音识别没ChatGPT那么好用 [语音语言论文阅读]
[Short Review] Transfer Learning from Speaker Verification to Multispeaker TTS
语音文本技术论文阅读 OpenAI最新的Whisper ASR也会像GPT-3一样火起来吗?
语音文本技术论文阅读 Branchformer: Parallel MLP-Attention Architectures and E-Branchformer
语音文本技术论文阅读 Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recogni
只需半天就能搞定的【时间序列预测任务】项目实战,华理博士精讲LSTM、Informer、ARIMA模型、Pandas、股票预测,学不会UP主下跪!附课件+源码
[Short Review] Fully Sharded Data Parallel: faster AI training with fewer GPUs
三分钟搞定微软零样本语音合成VALL-E
B站强推!2024公认最通俗易懂的【NLP】教程,55集自然语言处理付费课程(附代码)人工智能_机器学习_深度学习_计算机视觉_pytorch_神经网络
从零入门!浙大教授全面精讲知识基础原理及项目实战,手把手教你构建自己的知识图谱!
【强烈推荐】大模型任务使用Huggingface预训练模型解决90%的NLP问题,迪哥带你从零解读Huggingface核心模块!
CV论文阅读OPENAI CLIP(1/3):Learning Transferable Visual Models From Natural Language
十分钟看懂谷歌易筋经BERT