V
主页
京东 11.11 红包
[Short Review]Conformer Convolution-augmented Transformer for Speech Recognition
发布人
Conformer: Convolution-augmented Transformer for Speech Recognition Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing content-based global interactions, while CNNs exploit local features effectively. In this work, we achieve the best of both worlds by studying how to combine convolution neural networks and transformers to model both local and global dependencies of an audio sequence in a parameter-efficient way. To this regard, we propose the convolution-augmented transformer for speech recognition, named Conformer. Conformer significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies. On the widely used LibriSpeech benchmark, our model achieves WER of 2.1%/4.3% without using a language model and 1.9%/3.9% with an external language model on test/testother. We also observe competitive performance of 2.7%/6.3% with a small model of only 10M parameters.
打开封面
下载高清视频
观看高清视频
视频下载器
详解微软零样本语音合成VALL-E
三分钟搞定微软零样本语音合成VALL-E
[Long Review] Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using
[Long Review] Xception: Deep Learning with Depthwise Separable Convolution
十分钟看懂谷歌金钟罩Transformer以及语音的LAS模型
CV论文阅读OPENAI CLIP(1/3):Learning Transferable Visual Models From Natural Language
[Long Review] Transfer Learning from Speaker Verification to Multispeaker TTS
[Olewave's Long Review] Efficient Training of Neural Transducer for Speech Recog
语音NLP论文阅读 Token-level Sequence Labeling for SLU using Compositional E2E Models
十分钟看懂谷歌W2v-BERT: Combining Contrastive Learning and Masked Language Modeling
[Long Review] Deduplicating Training Data Makes Language Models Better
十分钟看懂脸书虎爪绝户手 - 虎BERT - HuBERT: Self-Supervised Speech Representation Learning
[Short Review] Xception: Deep Learning with Depthwise Separable Convolution
语音文本技术论文阅读 Improving Speech Recognition Accuracy of Local POI Using Geographical
十分钟看懂谷歌易筋经BERT
Boris Johnson约翰逊辞职演讲 - 附双麦克风使用分析
语音文本技术论文阅读 UniSpeech-SAT - Universal Speech Representation Learning with Speaker
[Short Review] Deduplicating Training Data Makes Language Models Better
[Long Review] Axial Attention in Multidimensional Transformers
语音文本技术论文阅读 OpenAI最新的Whisper ASR也会像GPT-3一样火起来吗?
CV论文阅读OpenAI CLIP(2/3):Learning Transferable Visual Models From Natural Language
语音文本技术论文阅读 Scaling Laws for Neural Language Models
[Long Review] GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
福建舰上面的雷达是如何工作的?和语音波束处理什么关系?
读李·斯莫林《现有的物理理论都是近似,不会通往宇宙终极理论》
[Short Review] Towards Zero-Label Language Learning
详解OpenAI GPT-3: Language Models are Few-Shot Learners(1/3)
【全243集】2024全B站最详细Transformer教程!入门到进阶,全程干货讲解!拿走不谢!(神经网络/NLP/深度学习/BERT/大模型/GPT)
十分钟看懂微软大力金刚掌WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack
【强推!】 这绝对是AI+医疗最好的【医疗机器学习】全套教程,不愧是MIT教授31小时全学会通关了!!!-人工智能|AI医疗|人工智能医疗
只需半天就能搞定的【时间序列预测任务】项目实战,华理博士精讲LSTM、Informer、ARIMA模型、Pandas、股票预测,学不会UP主下跪!附课件+源码
[Long Review] Conformer: Convolution-augmented Transformer for Speech Recogniti
十分钟告诉你为什么OpenAI的Whisper语音识别没ChatGPT那么好用 [语音语言论文阅读]
2024最爽的吴恩达深度学习,附神经网络与Transformer论文合集
从零入门!浙大教授全面精讲知识基础原理及项目实战,手把手教你构建自己的知识图谱!
[Long Review] Fully Sharded Data Parallel: faster AI training with fewer GPUs
深度学习 | ICML2024顶会 | NSA对MHSA多头自注意力全新升级,时间序列所有任务通用,适用于计算机视觉CV方向和NLP方向通用的即插即用注意力模块
[Short Review] Fully Sharded Data Parallel: faster AI training with fewer GPUs
从OpenAI's Whisper模型到你自主研发的语音识别服务: 长音频与流式识别 (第三部分)
生成式AI神级论文:谷歌DeepMind的Variational Autoencoder (VAE) and Reparameterization