V
主页
京东 11.11 红包
[Long Review] Axial Attention in Multidimensional Transformers
发布人
Join 'Speech and Language Technologies' Meetup group https://www.meetup.com/speech-and-language-technology-meetup-group/ to see weekly paper reading schedules and discussions. Axial Attention in Multidimensional Transformers Jonathan Ho, Nal Kalchbrenner, Dirk Weissenborn, Tim Salimans We propose Axial Transformers, a self-attention-based autoregressive model for images and other data organized as high dimensional tensors. Existing autoregressive models either suffer from excessively large computational resource requirements for high dimensional data, or make compromises in terms of distribution expressiveness or ease of implementation in order to decrease resource requirements. Our architecture, by contrast, maintains both full expressiveness over joint distributions over data and ease of implementation with standard deep learning frameworks, while requiring reasonable memory and computation and achieving state-of-the-art results on standard generative modeling benchmarks. Our models are based on axial attention, a simple generalization of self-attention that naturally aligns with the multiple dimensions of the tensors in both the encoding and the decoding settings. Notably the proposed structure of the layers allows for the vast majority of the context to be computed in parallel during decoding without introducing any independence assumptions. This semi-parallel structure goes a long way to making decoding from even a very large Axial Transformer broadly applicable. We demonstrate state-of-the-art results for the Axial Transformer on the ImageNet-32 and ImageNet-64 image benchmarks as well as on the BAIR Robotic Pushing video benchmark. We open source the implementation of Axial Transformers.
打开封面
下载高清视频
观看高清视频
视频下载器
十分钟看懂微软大力金刚掌WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack
十分钟看懂脸书太极拳法Wav2Vec2.0 -- 语音预训练模型就像绝命毒师老白教杰西
[Long Review] Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using
十分钟看懂谷歌易筋经BERT
[Long Review] Transfer Learning from Speaker Verification to Multispeaker TTS
BERT模型有哪些核心知识点?transformer、self-attention,计算机博士从0讲解BERT模型搭建!
[Long Review] GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
我愿称之为【NLP自然语言处理】天花板教程,NLTK/Spacy/可视化/文本分析/HMM隐马尔科夫模型/LSTM情感分析一次学透!!!
语音文本技术论文阅读 RNN-T: Sequence Transduction with Recurrent Neural Networks
[Long Review]Kullback-Leibler Divergence: Listen, Attend, Spell and Adapt ASR
[Long Review] Fully Sharded Data Parallel: faster AI training with fewer GPUs
[Long Review] Deduplicating Training Data Makes Language Models Better
耗时三周,终于整理出来这份CVPR2024可复现论文合集了!有数据集、有代码、有原文,带你啃透今年的CVPR论文!-人工智能、计算机视觉、NLP
这才是科研人该学的Matlab教程!一口气学会透遗传算法、蚁群算法、模拟退火算法、粒子群优化算法!共100集通俗易懂,比啃书强太多了!机器学习|深度学习|NLP
语音文本技术论文阅读 Branchformer: Parallel MLP-Attention Architectures and E-Branchformer
三分钟搞定ChatGPT
Whisper终结者:Reverb ASR 语音识别和说话人分离方面新标杆 在前所未有的20万小时人工转录数据上进行训练 支持可定制的逐字转录
[Long Review] Conformer: Convolution-augmented Transformer for Speech Recogniti
十分钟看懂谷歌金钟罩Transformer以及语音的LAS模型
语音文本技术论文阅读 Scaling Laws for Neural Language Models
新手狂喜!这绝对是全网最适合初学者入门的NLP自然语言处理教程!清华大佬20小时带你从入门到实战!!!
三分钟搞定微软零样本语音合成VALL-E
【论文精读】ACL 2024最佳论文解读:自然语言处理(NLP)的尽头是文学?-sci论文/论文
只需半天就能搞定的【时间序列预测任务】项目实战,华理博士精讲LSTM、Informer、ARIMA模型、Pandas、股票预测,学不会UP主下跪!附课件+源码
[Long Review] Towards Zero-Label Language Learning
语音文本技术论文阅读 RefineGAN - Universally Generating Waveform Better than Ground ...
[Short Review] Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using
[Long Review]Switch Transformers: Scaling to Trillion Parameter Models with
语音文本技术论文阅读 Joint Unsupervised and Supervised Training for Multilingual ASR
语音文本技术论文阅读 SNRi Target Training for Joint Speech Enhancement and Recognition
语音文本技术论文阅读 UniSpeech-SAT - Universal Speech Representation Learning with Speaker
[Long Review] Cascaded Diffusion Models for High Fidelity Image Generation
[Long Review] Xception: Deep Learning with Depthwise Separable Convolution
十分钟看懂谷歌铁布衫BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised ...
[Short Review] Transfer Learning from Speaker Verification to Multispeaker TTS
语音文本技术论文阅读 One-Edit-Distance Network (OEDN) in Mispronunciation Detection & ASR
基于词向量和神经网络,训练文本分类模型
【强烈推荐】大模型任务使用Huggingface预训练模型解决90%的NLP问题,迪哥带你从零解读Huggingface核心模块!
语音文本技术论文阅读 Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recogni
十分钟告诉你为什么OpenAI的Whisper语音识别没ChatGPT那么好用 [语音语言论文阅读]