V
主页
The unreasonable effectiveness of mathematics in large scale deep learning
发布人
嘉宾简介:Greg Yang is a researcher at Microsoft Research in Redmond, Washington. He joined MSR after he obtained Bachelor's in Mathematics and Master's degrees in Computer Science from Harvard University, respectively advised by ST Yau and Alexander Rush. He won the Hoopes prize at Harvard for best undergraduate thesis as well as Honorable Mention for the AMS-MAA-SIAM Morgan Prize, the highest honor in the world for an undergraduate in mathematics. He gave an invited talk at the International Congress of Chinese Mathematicians 2019. 报告简介:Recently, the theory of infinite-width neural networks led to the first technology, muTransfer, for tuning enormous neural networks that are too expensive to train more than once. For example, this allowed us to tune the 6.7 billion parameter version of GPT-3 using only 7% of its pretraining compute budget, and with some asterisks, we get a performance comparable to the original GPT-3 model with twice the parameter count. In this talk, I will explain the core insight behind this theory. In fact, this is an instance of what I call the *Optimal Scaling Thesis*, which connects infinite-size limits for general notions of “size” to the optimal design of large models in practice, illustrating a way for theory to reliably guide the future of AI. I'll end with several concrete key mathematical research questions whose resolutions will have incredible impact on how practitioners scale up their NNs.
打开封面
下载高清视频
观看高清视频
视频下载器
什么是大模型幻觉?为什么会产生幻觉?
【全新升级】开源esp32-AI语音对话|在线语音唤醒|连续对话|音乐播放|音量调节|屏幕中文显示|嵌入式课设
通俗易懂理解自注意力机制(Self-Attention)
Mamba:颠覆 Transformer 的全新架构?
多模态大模型—袁粒 张敏 洪文逸 陈俊松 李博 杨靖康 孙鑫宇【大模型嘉年华0104下午】
ACL 2023 自然语言处理顶会-大模型专题:MVP、超关系知识图谱、CFSum、大模型文本生成等
大模型成功背后的RLHF到底是什么?
阿里通义千问【Qwen-7B】教程!AI大佬带你从模型原理-微调-代码实例一条龙解读!草履虫都能学会!!!
ChatGLM大模型应用构建——解决方案技术总监手把手教你如何结合大模型设计产品
最详细GraphRAG教程-环境配置、本地部署流程、本地数据库导入、neo4j构建知识图谱
LLama3.1:Meta给了李彦宏一记耳光
关于Llama3.1模型简要总结&启示
GraphRAG太烧钱?Qwen2-7b本地部署GraphRAG,无需Ollama,从环境搭建到报错解决全流程
被第一次科技革命淘汰的人,他们都经历了什么? | 通过探究第一次工业革命的全周期来窥探目前无人汽车和机器人革命给普通人带来的影响
什么是端到端(End-to-End)模型?
大模型项目选择RAG还是微调:八个判断依据
【PhD Debate-15】因果推理(Causal Inference)与时间序列(Time Series)
ICLR-Learning Compositional Koopman Operators for Model-Based Control-Yunzhu Li
大模型对齐、评测和微调—王晋东 王子奇 黄杰 郑锐 刘明道 陈修元 杨恺【大模型嘉年华0104上午】
传统AI与基于大模型AI之间的区别
ACL 2023预讲会 大模型主题Talk-OSU NLP Group 专场
当人工智能遇到时空数据:概念、方法和应用
人工智能在药物研发中的应用
如何基于GLM打造行业模型?【ChatGLM干货分享】
一键提取PDF内容和一键生成知识图谱
【AI TIME Prof Talk-1】Open BMB大模型系统-刘知远
大模型专场三-系列综述带你了解ChatGPT,AIGC 和扩散模型
AI TIME AAAI 2023预讲会(下午场)
教你仅用两台Mac,在家跑Llama3.1 405,老黄看了落泪!
Ilya 透露最新研究方向,让人工智能更稳定
3.GraphRAG + Neo4j:构建知识图谱的教程GraphRAG构建知识图谱及本地展示本地部署graphrag使用neo4j本地化知识图谱
【AI TIME PhD 】多领域端到端任务型对话系统研究分享-覃立波
ChatGLM大模型应用构建和指令工程
NAACL2022——6位论文一作在线分享NLP前沿研究
【AI TIME 青年科学家-25】随机矩阵理论及其在大规模机器学习中的应用-廖振宇
国资委打响了国内AI第一枪!你真的还不学大模型吗?
AAAI 2023预讲会:Global Mixup: Eliminating Ambiguity with Clustering
基于图数据的鲁棒机器学习
大模型的对齐性、工具学习与智能互动
CodeGeeX: AI帮你写代码