NEWTON: Are Large Language Models Capable of Physical Reasoning?

发布人

论文简述：标题：NEWTON：大型语言模型能否进行物理推理？摘要：本文提出了一种名为NEWTON的评估大型语言模型（LLM）物理学推理技能的数据库和基准。为了实现这一目标，作者们设计了一个管道，使研究人员能够为其应用生成定制化的对象和属性变量版本。 NEWTON数据库包含2800个对象-属性对，为生成无限规模的评估模板提供了基础。NEWTON基准包括160K个问题，这些问题是通过NEWTON数据库编辑的，以研究几个主流语言模型在基本、明确和隐含推理任务上的物理推理能力。通过广泛的实证分析，作者们强调了LLM在进行物理推理方面的能力。他们发现，像GPT-4这样的LLM在基于场景的任务中表现出强大的推理能力，但在对象属性推理方面与人类相比表现较差（84% vs. 50%）。NEWTON平台展示了评估和增强语言模型的潜力，为将它们集成到物理基础环境中（如机器人操作）奠定了基础。
论文链接： https://arxiv.org/pdf/2310.07018

打开封面下载高清视频观看高清视频视频下载器

NEWTON: Are Large Language Models Capable of Physical Reasoning?

TrustLLM: Trustworthiness in Large Language Models

AutoMix: Automatically Mixing Language Models

Interactive Task Planning with Language Models

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language M

How FaR Are Large Language Models From Agents with Theory-of-Mind?

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Making Large Language Models Perform Better in Knowledge Graph Completion

BitNet: Scaling 1-bit Transformers for Large Language Models

FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Large Language Model Cascades with Mixture of Thoughts Representations for Cost-

Moral Foundations of Large Language Models

Retrieval meets Long Context Large Language Models

ADaPT: As-Needed Decomposition and Planning with Language Models

Memory Augmented Language Models through Mixture of Word Experts

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Controlled Decoding from Language Models

Compressing Context to Enhance Inference Efficiency of Large Language Models

What is large langue models?

Orca 2: Teaching Small Language Models How to Reason

How Do Large Language Models Capture the Ever-changing World Knowledge? A Review

LayoutPrompter: Awaken the Design Ability of Large Language Models

Creative Robot Tool Use with Large Language Models

Secrets of RLHF in Large Language Models Part II: Reward Modeling

【AI大模型】使用Ollama+Dify搭建属于自己的全能知识库！支持多种文件类型，轻松实现本地部署，草履虫都能看懂！

Improving Large Language Model Fine-tuning for Solving Math Problems

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language M

MusicAgent: An AI Agent for Music Understanding and Generation with Large Langua

Video Language Planning

这可能是B站最完整的Transformer讲解了！一口气学完DETR⽬标检测、DETR项⽬源码解读、项⽬源码debug逐⾏解读、注意⼒机制的作⽤分析-人工智能

OpenAI CTO 离职马斯克疯狂嘲讽奥特曼

Exponentially Faster Language Modelling

ReFT: Reasoning with Reinforced Fine-Tuning

GraphLLM: Boosting Graph Reasoning Ability of Large Language Model

我从来不用自己剪视频，因为我会用AI

我的通义账号终于解封了！但是，我要碎了……

Can a student Large Language Model perform as well as it's teacher?

Language Models can be Logical Solvers