[论文阅读] Zipformer: A faster and better encoder for automatic sp

发布人

The Conformer has become the most popular encoder model for automatic speech recognition (ASR). It adds convolution modules to a transformer to learn both local and global dependencies. In this work we describe a faster, more memory-efficient, and better-performing transformer, called Zipformer. Modeling changes include: 1) a U-Net-like encoder structure where middle stacks operate at lower frame rates; 2) reorganized block structure with more modules, within which we re-use attention weights for efficiency; 3) a modified form of LayerNorm called BiasNorm allows us to retain some length information; 4) new activation functions SwooshR and SwooshL work better than Swish. We also propose a new optimizer, called ScaledAdam, which scales the update by each tensor&#39;s current scale to keep the relative change about the same, and also explictly learns the parameter scale. It achieves faster convergence and better performance than Adam. Extensive experiments on LibriSpeech, Aishell-1, and WenetSpeech datasets demonstrate the effectiveness of our proposed Zipformer over other state-of-the-art ASR models. 

https://arxiv.org/abs/2310.11230

打开封面下载高清视频观看高清视频视频下载器

[论文阅读] Zipformer: A faster and better encoder for automatic sp

Daniel Povey|Zipformer:一种改进的语音识别编码器

从OpenAI's Whisper模型到你自主研发的语音识别服务: 长音频与流式识别 (第三部分)

从OpenAI's Whisper模型到你自主研发的语音识别服务: 后处理与语言模型 (第四部分)

新一代 Kaldi: 最新版 zipformer 在 iOS 上的中文语音识别演示

微调Whisper，让它学会潮州话

最新OpenAI+Microsoft, Google, Meta, and Nvidia开源语音大模型评价：语音识别部分

AI研究生不要走入魔改网络的误区

[Long Review] Conformer: Convolution-augmented Transformer for Speech Recogniti

一键克隆一个ai财经博主

详解语音合成中的Hifi-GAN

【动态图Hook机制解释】PyTorch Hooks Explained - In-depth Tutorial

新一代 Kaldi: C++ 流式 VAD + 非流式 语音识别 （使用 zipformer, paraformer, whisper为例)

生成式AI神级论文：谷歌DeepMind的Variational Autoencoder (VAE) and Reparameterization

斯坦福大学 Transformers 最新课程！

甲骨文 2500 万行的屎山代码有多“恐怖”？

[独家解密] 大神杨立昆新出的'语音魔盒'会让语音算法工程师失业吗（Meta AI's VoiceBox）

丁院士：失策，当初应该派个夹子音去见耿同学的。（ps:看嘛，夹子音会让耿同学脑袋懵懵）

leetcode刷题看到的逆天老哥

建议所有图神经网络初学者把它作为第一篇深度学习论文来阅读！——机器学习/人工智能/AI/大模型/神经网络

研究生话题：实验室研二师妹说她在yolo-v5里面加了transformer网络是什么水平？

Tycho工具包：助力您自主研发遥遥领先的语音识别服务: 总论

【HomeLab】大模型分布式训练，基于家用炼丹炉集群

何恺明：科研总是让人感到沮丧的

等时圆

恺明大佬问答环节：AI自动驾驶可靠吗？

【UIUC CS598】【计算拓扑学】001 Introduction, Jordan polygon theorem

理解大模型的5个关键公式

力作！切入点太好啦，何恺明谢赛宁解剖扩散模型，新作刚刚出炉！！！-深度学习/机器学习/计算机视觉

春晚为什么越来越难看？因为本来就不是给你看的

详解I-JEPA: 杨立昆大神用第一个'世界模型'降维打击计算机视觉圈

2024拒绝信息差！AI领域最值得关注的博主，优质信息良心推荐

详解LoRA: 高中生用游戏显卡也可以训GPT-3大语言模型

UE5 AI高斯喷射技术制作短片测试demo

高速上遇车辆“自动驾驶” 司机在驾驶位上睡觉

终于不是水论文了！何恺明团队五刀解剖扩散模型！新作刚刚出炉！

没有大学生能拒绝vision pro里这样的自习空间

重磅！我的偶像发了一篇Cell !！我带你们解读！

培养的学生写出这种论文，你睡的着觉？有点出息没有？

深度学习纯靠造假能发论文嘛？

基于深度学习的整数奇偶性判别算法哈哈哈哈

新一代 Kaldi: C++ 流式 VAD + 非流式语音识别（使用 zipformer, paraformer, whisper为例)