TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models

发布人

论文简述：在这篇名为TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models的论文中，作者们提出了一种针对大型语言模型（LLM）持续学习的评估基准TRACE。这个基准包含8个独特的数据集，涵盖了包括领域特定任务、多语言能力、代码生成和数学推理在内的挑战性任务。所有数据集都被标准化为一个统一格式，以便轻松地自动评估LLM。实验结果显示，在对TRACE进行训练之后，对齐的LLM在一般能力和遵循指令方面都表现出显著的下降。例如，llama2-chat 13B在gsm8k数据集上的准确率从28.8%降至2%。这表明了在实现特定任务性能的同时保持LLM原始能力的挑战性。为了应对这一挑战，作者们提出了一种名为Reasoning-augmented Continual Learning（RCL）的方法。RCL将任务特定的提示与元理性相结合，有效地减少了LLM中的灾难性遗忘，并加速了在新任务上的收敛速度。
论文链接： https://arxiv.org/pdf/2310.06762

打开封面下载高清视频观看高清视频视频下载器

TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models

开始本地大型语言模型的 Llamafile|Beginning Llamafile for Local Large Language Models (LLMs)

Amortizing intractable inference in large language models

Making Large Language Models Perform Better in Knowledge Graph Completion

Can Large Language Models be Good Path Planners? A Benchmark and Investigation o

杜克大学《本地大语言模型的基础|Foundations of Local Large Language models》中英字幕

BitNet: Scaling 1-bit Transformers for Large Language Models

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Retrieval meets Long Context Large Language Models

CLEX: Continuous Length Extrapolation for Large Language Models

Offline Actor-Critic Reinforcement Learning Scales to Large Models

FlashDecoding++: Faster Large Language Model Inference on GPUs

FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

LayoutPrompter: Awaken the Design Ability of Large Language Models

Creative Robot Tool Use with Large Language Models

Language Models can be Logical Solvers

The FinBen: An Holistic Financial Benchmark for Large Language Models

Compressing Context to Enhance Inference Efficiency of Large Language Models

Memory Augmented Language Models through Mixture of Word Experts

Large Language Model Cascades with Mixture of Thoughts Representations for Cost-

AutoMix: Automatically Mixing Language Models

The Consensus Game: Language Model Generation via Equilibrium Search

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning

FLAP: Fast Language-Audio Pre-training

Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-S

CogVLM: Visual Expert for Pretrained Language Models

Ollama+CiteSpace生成聚类标签

How Do Large Language Models Capture the Ever-changing World Knowledge? A Review

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Branch-Solve-Merge Improves Large Language Model Evaluation and Generation

Orca 2: Teaching Small Language Models How to Reason

Can a student Large Language Model perform as well as it's teacher?

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Con

A Zero-Shot Language Agent for Computer Control with Structured Reflection

TrustLLM: Trustworthiness in Large Language Models

Exponentially Faster Language Modelling

SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents

Ranking LLM-Generated Loop Invariants for Program Verification

LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language M