V
主页
京东 11.11 红包
语音文本技术论文阅读 Branchformer: Parallel MLP-Attention Architectures and E-Branchformer
发布人
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding https://arxiv.org/abs/2207.02971 Conformer has proven to be effective in many speech processing tasks. It combines the benefits of extracting local dependencies using convolutions and global dependencies using self-attention. Inspired by this, we propose a more flexible, interpretable and customizable encoder alternative, Branchformer, with parallel branches for modeling various ranged dependencies in end-to-end speech processing. In each encoder layer, one branch employs self-attention or its variant to capture long-range dependencies, while the other branch utilizes an MLP module with convolutional gating (cgMLP) to extract local relationships. We conduct experiments on several speech recognition and spoken language understanding benchmarks. Results show that our model outperforms both Transformer and cgMLP. It also matches with or outperforms state-of-the-art results achieved by Conformer. Furthermore, we show various strategies to reduce computation thanks to the two-branch architecture, including the ability to have variable inference complexity in a single trained model. The weights learned for merging branches indicate how local and global dependencies are utilized in different layers, which benefits model designing. E-Branchformer: Branchformer with Enhanced merging for speech recognition https://arxiv.org/abs/2210.00077 In this paper, we propose E-Branchformer, which enhances Branchformer by applying an effective merging method and stacking additional point-wise modules. E-Branchformer sets new state-of-the-art word error rates (WERs) 1.81% and 3.65% on LibriSpeech test-clean and test-other sets without using any external training data. #branchformer #e-branchformer #cnn #attention #globalattention #localattention #asr #nlp #wav2vec #hubert #transformer #google #meta #microsoft #icml #nips
打开封面
下载高清视频
观看高清视频
视频下载器
[Long Review] Axial Attention in Multidimensional Transformers
详解微软零样本语音合成VALL-E
十分钟看懂谷歌铁布衫BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised ...
语音NLP论文阅读 Token-level Sequence Labeling for SLU using Compositional E2E Models
语音文本技术论文阅读 RNN-T: Sequence Transduction with Recurrent Neural Networks
十分钟看懂谷歌易筋经BERT
语音文本技术论文阅读 UniSpeech-SAT - Universal Speech Representation Learning with Speaker
[Long Review] Conformer: Convolution-augmented Transformer for Speech Recogniti
语音文本技术论文阅读 Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recogni
详解OpenAI GPT-3: Language Models are Few-Shot Learners(2/3)
[Long Review] Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using
语音文本技术论文阅读 One-Edit-Distance Network (OEDN) in Mispronunciation Detection & ASR
十分钟看懂微软大力金刚掌WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack
CV论文阅读OpenAI CLIP(2/3):Learning Transferable Visual Models From Natural Language
[Long Review]Kullback-Leibler Divergence: Listen, Attend, Spell and Adapt ASR
Transformer本质上在解决什么事?迪哥手把手带你从零基础开始搭建Transformer!论文解读+源码复现,草履虫都能学会!-人工智能/深度学习/大模型
语音文本技术论文阅读 OpenAI最新的Whisper ASR也会像GPT-3一样火起来吗?
[Long Review] Deduplicating Training Data Makes Language Models Better
只需半天就能搞定的【时间序列预测任务】项目实战,华理博士精讲LSTM、Informer、ARIMA模型、Pandas、股票预测,学不会UP主下跪!附课件+源码
三分钟搞定微软零样本语音合成VALL-E
福建舰上面的雷达是如何工作的?和语音波束处理什么关系?
[Long Review] Towards Zero-Label Language Learning
解锁天顶星科技ChatGPT
从零入门!浙大教授全面精讲知识基础原理及项目实战,手把手教你构建自己的知识图谱!
超全超简单!一口气学完CNN、RNN、GAN、GNN、DQN、Transformer、LSTM、DBN等八大深度学习神经网络算法!存下吧,真的比啃书快多了!!
[Long Review] Fully Sharded Data Parallel: faster AI training with fewer GPUs
三分钟搞定ChatGPT
详解OpenAI GPT-3: Language Models are Few-Shot Learners(1/3)
十分钟看懂谷歌金钟罩Transformer以及语音的LAS模型
【B站第一】清华大佬1000分钟讲完的AI大模型(LLM)入门到实战全套学习教程!整整135集,全干货无废话!还学不会,我退出AI圈!!
[Short Review] Axial Attention in Multidimensional Transformers
新手狂喜!一小时带你搞懂【LSTM情感分析】,架构解读+案例实战+数据集处理,深度学习NLP核心知识点竟然被他讲得如此透彻!!!
语音文本技术论文阅读 Scaling Laws for Neural Language Models
[Long Review]Switch Transformers: Scaling to Trillion Parameter Models with
太完整了!我居然3天时间就掌握了【机器学习+深度学习+强化学习+PyTorch】理论到实战,多亏了这个课程,绝对通俗易懂纯干货分享!
[Short Review] Transfer Learning from Speaker Verification to Multispeaker TTS
[Long Review] Transfer Learning from Speaker Verification to Multispeaker TTS
Boris Johnson约翰逊辞职演讲 - 附双麦克风使用分析
语音文本技术论文阅读 Improving Speech Recognition Accuracy of Local POI Using Geographical
[Long Review] CLAS: Deep context: end-to-end contextual speech recognition