V
主页
Video Language Planning
发布人
论文简述:在这篇名为Video Language Planning的论文中,作者提出了一种名为VLP的方法,该方法利用最近在大型生成模型上的进展,为复杂的长视野任务在生成的视频和语言空间中进行视觉规划。为了实现这一目标,VLP包括一个树搜索过程,其中训练(i)视觉-语言模型作为政策和价值函数,以及(ii)文本到视频模型作为动力学模型。VLP的输入是一个长视野任务指令和当前的图像观察,并输出一个提供详细多模态(视频和语言)规格的长视频计划,描述如何完成最终任务。VLP随着计算预算的增加而扩展,更多的计算时间导致更好的视频计划,并且能够合成不同机器人领域之间的长视野视频计划:从多对象重新排列到多摄像头双臂灵巧操作。生成的视频计划可以通过基于目标的政策转换为实际机器人动作,这些政策基于生成视频的每个中间帧条件。实验表明,与之前的方法相比,VLP在模拟和现实中的长视野任务成功率有显著改善(跨越3个硬件平台)。总之,这篇论文提出了一种名为Video Language Planning的方法,该方法利用大型生成模型的进展为复杂的长视角任务进行视觉规划。通过训练视觉-语言模型作为政策和价值函数以及文本到视频模型作为动力学模型,VLP能够生成详细的多模态规格,描述如何完成最终任务。在模拟和现实中的实验表明,与之前的方法相比,VLP显著提高了长视野任务的成功率。 论文链接: https://arxiv.org/pdf/2310.10625
打开封面
下载高清视频
观看高清视频
视频下载器
Interactive Task Planning with Language Models
VideoCon: Robust Video-Language Alignment via Contrast Captions
SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents
How FaR Are Large Language Models From Agents with Theory-of-Mind?
Moral Foundations of Large Language Models
TrustLLM: Trustworthiness in Large Language Models
FlashDecoding++: Faster Large Language Model Inference on GPUs
LayoutPrompter: Awaken the Design Ability of Large Language Models
Making Large Language Models Perform Better in Knowledge Graph Completion
Language Models can be Logical Solvers
OceanGPT: A Large Language Model for Ocean Science Tasks
LEGO:Language Enhanced Multi-modal Grounding Model
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-
CLEX: Continuous Length Extrapolation for Large Language Models
Toward Joint Language Modeling for Speech Units and Text
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Controlled Decoding from Language Models
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
FLAP: Fast Language-Audio Pre-training
Llemma: An Open Language Model For Mathematics
The Consensus Game: Language Model Generation via Equilibrium Search
BitNet: Scaling 1-bit Transformers for Large Language Models
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
Memory Augmented Language Models through Mixture of Word Experts
Tuna: Instruction Tuning using Feedback from Large Language Models
稳定分数蒸馏:高质量三维生成新方法
Learning to Learn Faster from Human Feedback with Language Model Predictive Cont
Amortizing intractable inference in large language models
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Compressing Context to Enhance Inference Efficiency of Large Language Models
ADaPT: As-Needed Decomposition and Planning with Language Models
Can a student Large Language Model perform as well as it's teacher?
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Orca 2: Teaching Small Language Models How to Reason
大型语言模型在上下文学习中的可靠性提升:结合监督知识的方法
GraphLLM: Boosting Graph Reasoning Ability of Large Language Model
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Large Language Models Cannot Self-Correct Reasoning Yet