V
主页
NEWTON: Are Large Language Models Capable of Physical Reasoning?
发布人
论文简述:标题:NEWTON:大型语言模型能否进行物理推理?摘要:本文提出了一种名为NEWTON的评估大型语言模型(LLM)物理学推理技能的数据库和基准。为了实现这一目标,作者们设计了一个管道,使研究人员能够为其应用生成定制化的对象和属性变量版本。 NEWTON数据库包含2800个对象-属性对,为生成无限规模的评估模板提供了基础。NEWTON基准包括160K个问题,这些问题是通过NEWTON数据库编辑的,以研究几个主流语言模型在基本、明确和隐含推理任务上的物理推理能力。通过广泛的实证分析,作者们强调了LLM在进行物理推理方面的能力。他们发现,像GPT-4这样的LLM在基于场景的任务中表现出强大的推理能力,但在对象属性推理方面与人类相比表现较差(84% vs. 50%)。NEWTON平台展示了评估和增强语言模型的潜力,为将它们集成到物理基础环境中(如机器人操作)奠定了基础。 论文链接: https://arxiv.org/pdf/2310.07018
打开封面
下载高清视频
观看高清视频
视频下载器
TrustLLM: Trustworthiness in Large Language Models
AutoMix: Automatically Mixing Language Models
Interactive Task Planning with Language Models
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language M
How FaR Are Large Language Models From Agents with Theory-of-Mind?
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Making Large Language Models Perform Better in Knowledge Graph Completion
BitNet: Scaling 1-bit Transformers for Large Language Models
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-
Moral Foundations of Large Language Models
Retrieval meets Long Context Large Language Models
ADaPT: As-Needed Decomposition and Planning with Language Models
Memory Augmented Language Models through Mixture of Word Experts
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Controlled Decoding from Language Models
Compressing Context to Enhance Inference Efficiency of Large Language Models
What is large langue models?
Orca 2: Teaching Small Language Models How to Reason
How Do Large Language Models Capture the Ever-changing World Knowledge? A Review
LayoutPrompter: Awaken the Design Ability of Large Language Models
Creative Robot Tool Use with Large Language Models
Secrets of RLHF in Large Language Models Part II: Reward Modeling
【AI大模型】使用Ollama+Dify搭建属于自己的全能知识库!支持多种文件类型,轻松实现本地部署,草履虫都能看懂!
Improving Large Language Model Fine-tuning for Solving Math Problems
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language M
MusicAgent: An AI Agent for Music Understanding and Generation with Large Langua
Video Language Planning
这可能是B站最完整的Transformer讲解了!一口气学完DETR⽬标检测、DETR项⽬源码解读、项⽬源码debug逐⾏解读、注意⼒机制的作⽤分析-人工智能
OpenAI CTO 离职马斯克疯狂嘲讽奥特曼
Exponentially Faster Language Modelling
ReFT: Reasoning with Reinforced Fine-Tuning
GraphLLM: Boosting Graph Reasoning Ability of Large Language Model
我从来不用自己剪视频,因为我会用AI
我的通义账号终于解封了!但是,我要碎了……
Can a student Large Language Model perform as well as it's teacher?
Language Models can be Logical Solvers