RLVF: Learning from Verbal Feedback without Overgeneralization

发布人

【加群】 一起来刷arxiv，请加vx: pwbot02(请备注：b站arxiv) 
【论文标题】 RLVF: Learning from Verbal Feedback without Overgeneralization
【论文简述】 这篇论文介绍了一种名为Contextualized Critiques with Constrained Preference Optimization (C3PO)的方法，用于应对大型语言模型在不同情境中的使用需要。现有的解决方法是使用高级口头反馈对模型进行调整，但简单地提供这样的反馈往往会导致反馈被过度概括应用到不相关的情境中。为了解决这个问题，C3PO通过使用高级反馈生成一个小规模的合成偏好数据集，明确指定如何（以及如何不）应用反馈。然后，C3PO在微调模型的同时尽量减小与原始模型在不适用反馈的情境下的差异。实验结果表明，C3PO能够在相关的场景中有效应用口头反馈，同时保留其他情境下的模型行为。与在上下文中进行对比的基准方法相比，无论是通过人工生成的还是通过GPT-4生成的高级反馈，C3PO都能够相对有效地遵循给定的反馈，并减少过度概括的情况，降低了30%的过度概括。
【论文链接】 https://arxiv.org/abs/2402.10893

打开封面下载高清视频观看高清视频视频下载器

RLVF: Learning from Verbal Feedback without Overgeneralization

Domain Generalization Guided by Gradient Signal to Noise Ratio of Parameters

AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort

NLP学起来太难了吧！迪哥带你高效入门NLP自然语言处理，从原理到分类实战，3小时完全吃透！

In-Context Learning Creates Task Vectors

Language Models can be Logical Solvers

Understanding prompt engineering may not require rethinking generalization

ReFT: Reasoning with Reinforced Fine-Tuning

Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixtu

Contrastive Chain-of-Thought Prompting

TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarizat

In-Context Principle Learning from Mistakes

Learning to Learn Faster from Human Feedback with Language Model Predictive Cont

Interactive Task Planning with Language Models

BitNet: Scaling 1-bit Transformers for Large Language Models

UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations

TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models

Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Langua

基于语言引导的3D场景理解与推理

An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Con

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

看完就能写进简历！NLP企业级实战项目：新闻分类任务、LSTM情感分析、word2vec分类任务、语言模型、机器人写唐诗、对话机器人全详解！

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

GraphLLM: Boosting Graph Reasoning Ability of Large Language Model

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to

VeRA: Vector-based Random Matrix Adaptation

Contrastive Prefence Learning: Learning from Human Feedback without RL

Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirect

Tuna: Instruction Tuning using Feedback from Large Language Models

肖仰华“机器能否认知世界”| 智能与知识之间的紧密联系

Memory Augmented Language Models through Mixture of Word Experts

肖仰华：机器如何表达和获取人类知识？

MindAgent: LLM Multi-Agents Collaboration Benchmark

解释性语言模型特征发现

How Do Large Language Models Capture the Ever-changing World Knowledge? A Review

基于语言模型的知识探测和推理方法

Farzi Data: Autoregressive Data Distillation

深度对于transformer模型合成泛化的促进作用

A New Benchmark and Reverse Validation Method for Passage-level Hallucination De

PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics

CRUXEval：代码推理、理解和执行评估的新基准