V
主页
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models
发布人
论文简述:在这篇名为TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models的论文中,作者们提出了一种针对大型语言模型(LLM)持续学习的评估基准TRACE。这个基准包含8个独特的数据集,涵盖了包括领域特定任务、多语言能力、代码生成和数学推理在内的挑战性任务。所有数据集都被标准化为一个统一格式,以便轻松地自动评估LLM。实验结果显示,在对TRACE进行训练之后,对齐的LLM在一般能力和遵循指令方面都表现出显著的下降。例如,llama2-chat 13B在gsm8k数据集上的准确率从28.8%降至2%。这表明了在实现特定任务性能的同时保持LLM原始能力的挑战性。为了应对这一挑战,作者们提出了一种名为Reasoning-augmented Continual Learning(RCL)的方法。RCL将任务特定的提示与元理性相结合,有效地减少了LLM中的灾难性遗忘,并加速了在新任务上的收敛速度。 论文链接: https://arxiv.org/pdf/2310.06762
打开封面
下载高清视频
观看高清视频
视频下载器
开始本地大型语言模型的 Llamafile|Beginning Llamafile for Local Large Language Models (LLMs)
Amortizing intractable inference in large language models
Making Large Language Models Perform Better in Knowledge Graph Completion
Can Large Language Models be Good Path Planners? A Benchmark and Investigation o
杜克大学《本地大语言模型的基础|Foundations of Local Large Language models》中英字幕
BitNet: Scaling 1-bit Transformers for Large Language Models
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Retrieval meets Long Context Large Language Models
CLEX: Continuous Length Extrapolation for Large Language Models
Offline Actor-Critic Reinforcement Learning Scales to Large Models
FlashDecoding++: Faster Large Language Model Inference on GPUs
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
LayoutPrompter: Awaken the Design Ability of Large Language Models
Creative Robot Tool Use with Large Language Models
Language Models can be Logical Solvers
The FinBen: An Holistic Financial Benchmark for Large Language Models
Compressing Context to Enhance Inference Efficiency of Large Language Models
Memory Augmented Language Models through Mixture of Word Experts
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-
AutoMix: Automatically Mixing Language Models
The Consensus Game: Language Model Generation via Equilibrium Search
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
FLAP: Fast Language-Audio Pre-training
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-S
CogVLM: Visual Expert for Pretrained Language Models
Ollama+CiteSpace生成聚类标签
How Do Large Language Models Capture the Ever-changing World Knowledge? A Review
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Orca 2: Teaching Small Language Models How to Reason
Can a student Large Language Model perform as well as it's teacher?
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Con
A Zero-Shot Language Agent for Computer Control with Structured Reflection
TrustLLM: Trustworthiness in Large Language Models
Exponentially Faster Language Modelling
SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents
Ranking LLM-Generated Loop Invariants for Program Verification
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language M