V
主页
京东 11.11 红包
[Long Review] Conformer: Convolution-augmented Transformer for Speech Recogniti
发布人
Join 'Speech and Language Technologies' Meetup group https://www.meetup.com/speech-and-language-technology-meetup-group/ to see weekly paper reading schedules and discussions. Conformer: Convolution-augmented Transformer for Speech Recognition Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang Download PDF Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing content-based global interactions, while CNNs exploit local features effectively. In this work, we achieve the best of both worlds by studying how to combine convolution neural networks and transformers to model both local and global dependencies of an audio sequence in a parameter-efficient way. To this regard, we propose the convolution-augmented transformer for speech recognition, named Conformer. Conformer significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies. On the widely used LibriSpeech benchmark, our model achieves WER of 2.1%/4.3% without using a language model and 1.9%/3.9% with an external language model on test/testother. We also observe competitive performance of 2.7%/6.3% with a small model of only 10M parameters.
打开封面
下载高清视频
观看高清视频
视频下载器
三分钟搞定微软零样本语音合成VALL-E
[Long Review] Cascaded Diffusion Models for High Fidelity Image Generation
[Long Review] Xception: Deep Learning with Depthwise Separable Convolution
[Long Review] Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using
[Long Review] Axial Attention in Multidimensional Transformers
[Short Review]Conformer Convolution-augmented Transformer for Speech Recognition
[Long Review] Transfer Learning from Speaker Verification to Multispeaker TTS
Whisper终结者:Reverb ASR 语音识别和说话人分离方面新标杆 在前所未有的20万小时人工转录数据上进行训练 支持可定制的逐字转录
语音文本技术论文阅读 Improving Speech Recognition Accuracy of Local POI Using Geographical
语音文本技术论文阅读 UniSpeech-SAT - Universal Speech Representation Learning with Speaker
语音文本技术论文阅读 RefineGAN - Universally Generating Waveform Better than Ground ...
十分钟看懂脸书虎爪绝户手 - 虎BERT - HuBERT: Self-Supervised Speech Representation Learning
福奇博士小声嘟囔议员蠢货,结果忘记关麦克风 -- analysis from a research perspective
[Long Review]Switch Transformers: Scaling to Trillion Parameter Models with
十分钟看懂谷歌易筋经BERT
福建舰上面的雷达是如何工作的?和语音波束处理什么关系?
语音文本技术论文阅读 One-Edit-Distance Network (OEDN) in Mispronunciation Detection & ASR
BERT模型有哪些核心知识点?transformer、self-attention,计算机博士从0讲解BERT模型搭建!
[Short Review] Deduplicating Training Data Makes Language Models Better
CV论文阅读OpenAI CLIP(2/3):Learning Transferable Visual Models From Natural Language
只需半天就能搞定的【时间序列预测任务】项目实战,华理博士精讲LSTM、Informer、ARIMA模型、Pandas、股票预测,学不会UP主下跪!附课件+源码
[Short Review] Cascaded Diffusion Models for High Fidelity Image Generation
十分钟看懂谷歌金钟罩Transformer以及语音的LAS模型
语音文本技术论文阅读 Scaling Laws for Neural Language Models
[Long Review]Kullback-Leibler Divergence: Listen, Attend, Spell and Adapt ASR
语音NLP论文阅读 Token-level Sequence Labeling for SLU using Compositional E2E Models
[Short Review] Axial Attention in Multidimensional Transformers
语音文本技术论文阅读 XLS-R: Self-supervised Cross-lingual Speech Representation Learning a
读研期间,如何快速学会语音识别技术?多亏了这套NLP语音识别项目全套教程!从零基础到实战简单明了讲明白了!语音分离、语音合成、变声器
花了我6800,大模型算法工程师稳了!构建专属大模型的大模型入门到就业教程,人工智能、神经网络、transformer、视觉模型、NLP、提示工程
三分钟搞定ChatGPT
十分钟告诉你为什么OpenAI的Whisper语音识别没ChatGPT那么好用 [语音语言论文阅读]
[Olewave's Long Review] Efficient Training of Neural Transducer for Speech Recog
语音文本技术论文阅读 Branchformer: Parallel MLP-Attention Architectures and E-Branchformer
直接带你把Transformer手搓一遍,这次总能学会Transformer了吧!
[Long Review] GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
详解AudioLM: a Language Modeling Approach to Audio Generation
[Long Review] Towards Zero-Label Language Learning
详解OpenAI GPT-3: Language Models are Few-Shot Learners(1/3)
CV论文阅读OPENAI CLIP(1/3):Learning Transferable Visual Models From Natural Language