谷歌研究员NanDing博士开讲：CausalLM is not optimal for in-context learning

发布人

NanDing
Bio:
	Dr. Nan Ding is a research scientist at Google Research. He obtained his Bachelor&#39;s degree from EE at Tsinghua University in 2008 and a Ph.D. degree from CS at Purdue University in 2013. As a researcher, he has published over 40 papers in the field of machine learning and quantum computation. His papers have been published in top conferences and journals, including NeurIPS, ICML, CVPR, ICCV, ECCV, ACL, Nature Physics, etc.
Abstract: 
Recent empirical evidence indicates that transformer based in-context learning performs better when using a prefix language model (prefixLM), in which in-context samples can all attend to each other, compared to causal language models (causalLM), which use auto-regressive attention that prohibits in-context samples to attend to future samples. While this result is intuitive, it is not understood from a theoretical perspective. In this paper we take a theoretical approach and analyze the convergence behavior of prefixLM and causalLM under a certain parameter construction. Our analysis shows that both LM types converge to their stationary points at a linear rate, but that while prefixLM converges to the optimal solution of linear regression, causalLM convergence dynamics follows that of an online gradient descent algorithm, which is not guaranteed to be optimal even as the number of samples grows infinitely. We supplement our theoretical claims with empirical experiments over synthetic and real tasks and using various types of transformers. Our experiments verify that causalLM consistently underperforms prefixLM in all settings.

打开封面下载高清视频观看高清视频视频下载器

谷歌研究员NanDing博士开讲：CausalLM is not optimal for in-context learning

可信机器学习专题（二）伊利诺伊大学安全学习实验室专场

基于图数据的鲁棒机器学习

从指令微调到数学推理能力，探索大模型潜力-来自清华、上交大、USC、UCB、达摩院的五位一作

NeurIPS 2023预讲会(三)-图机器学习、时间序列分析等

真的爽！ai一键生成几十万字的小说，直接拿到小说平台达成变现，又带大家搞米啦！

KDD 2024预讲会 | 浙江大学计算机科学与技术学院 杨洋老师课题组 AINet

向大家求一个谷歌账号

谷歌商店开源版 不需要谷歌服务 Google Play 服务或MicroG都能完美运行

超越5G走向超联接的世界-IEEE Fellow 李永会教授 |AI 2000学者专场

谷歌联合创始人拉里佩奇2000年访谈：AI将是搜索的终极形态

NeurlPS 2023预讲会（八）-特征蒸馏、人类偏好学习等

Neurl 2023预讲会（四）-Semantic Segmentation、NeuOpt求解器、MetaBox等

ACL 2024 0725 Part1

华为再整狠活！曝海外p70搭载emui，且支持谷歌服务，，，，（为何不是鸿蒙）

当你在国内打开谷歌商店

【AI TIME PhD】融合知识的预训练语言模型 | 清华大学张正彦博士（2020年4月3日）

WWW 2023-推荐系统专题

码农的零门槛AI课：基于fastai与PyTorch的深度学习-Lesson 8

谷歌为 Chrome 添加内置 AI 能力，并提供通用、总结、翻译检测、写作、重写等 API 能力～

图预训练专题 浙江大学-信也科技人工智能联合实验室专场

【GPU算力进化史】从CUDA Core到Tensor Core，FP32到TF32的双重变革——AI性能大爆发！

诺贝尔化学奖也颁给了AI

2分钟前：谷歌刚刚向所有人推出了其最新的最先进的图像生成器

逆天！免费用GPT-4o的网站，白嫖就是赚到！

谷歌商店绑定银联卡(更多问题看评论区)

全球博士Talk机器学习顶会-NeurIPS 2022预讲会专场四

Google邮箱谷歌账号最新注册教程

CVPR 2024最佳学生论文|BioCLIP：用于生命树的视觉基础模型

Debate：大模型时代的数据挖掘与应用

码农的零门槛AI课：基于fastai与PyTorch的深度学习-Lesson 3

英伟达的AI“泡沫”，到底还能撑多久？

3D生成一切！谷歌新作CAT3D：多视图扩散生成3D一切内容！收录顶会NeurIPS 2024！

ChatGLM大模型指令工程-技术大拿结合一手行业经验全方面讲解Prompt

AI TIME AAAI 2023预讲会（上午场）

AI时代我们该如何学习

生成式仿真赋能自动驾驶与具身智能

【AI TIME PhD Debate-2】如何迈向知识驱动的人工智能-瞿锰、于济凡、韩旭、晋嘉睿

KDD 2023预讲会-来自亚马逊、贝勒大学、NUS、中科院的四位一作学者

西湖大学张岳教授：自然语言处理逻辑推理与泛化问题研究|AI 2000学者专场

ACL 2024|清华大学自然语言处理与社会人文计算实验室 Part2

KDD 2024预讲会 | 浙江大学计算机科学与技术学院杨洋老师课题组 AINet

谷歌商店开源版不需要谷歌服务 Google Play 服务或MicroG都能完美运行

图预训练专题浙江大学-信也科技人工智能联合实验室专场