V
主页
TrustLLM: Trustworthiness in Large Language Models
发布人
【加群】 一起来刷arxiv,请加vx: pwbot02(请备注:b站arxiv) 【论文标题】 TrustLLM: Trustworthiness in Large Language Models 【论文简述】 这篇论文介绍了TrustLLM,对LLMs的可信度进行了全面的研究,包括不同维度的可信度原则、建立基准、评估和分析主流LLMs的可信度,以及对开放性挑战和未来发展方向的讨论。具体而言,我们首先提出了涵盖八个不同维度的可信度原则。基于这些原则,我们进一步建立了一个包括真实性、安全性、公平性、健壮性、隐私性和机器伦理六个维度的基准。然后,我们在TrustLLM中对16个主流LLMs进行了研究,使用了30多个数据集。我们的研究结果首先表明,总体上可信性和实用性(即功能有效性)是正相关的。其次,我们的观察结果显示,专有LLMs在可信度方面通常优于大多数开源对手,这引发了对广泛可访问的开源LLMs潜在风险的担忧。然而,有少数开源LLMs与专有LLMs非常接近。第三,重要的是要注意,一些LLMs可能过于调整以展示可信度,以至于通过错误地将良性提示视为有害而不予回应来损害其实用性。最后,我们强调确保透明度的重要性,不仅适用于模型本身,而且适用于支撑可信度的技术。了解具体使用了哪些可信度技术对于分析其有效性至关重要。 【论文链接】 https://arxiv.org/abs/2401.05561
打开封面
下载高清视频
观看高清视频
视频下载器
LayoutPrompter: Awaken the Design Ability of Large Language Models
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language M
Language Models can be Logical Solvers
Simple and Scalable Strategies to Continually Pre-train Large Language Models
RLHF-V: 通过细粒度纠正性人工反馈实现值得信赖的多模态大型语言模型
Tuna: Instruction Tuning using Feedback from Large Language Models
FLAP: Fast Language-Audio Pre-training
Orca 2: Teaching Small Language Models How to Reason
Secrets of RLHF in Large Language Models Part II: Reward Modeling
SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents
Asynchronous Local-SGD Training for Language Modeling
综述:大模型的可解释性研究
大型语言模型潜在知识发现的挑战
FlashDecoding++: Faster Large Language Model Inference on GPUs
语言模型对齐新方法:基于对比不似然训练的判断反馈
GraphLLM: Boosting Graph Reasoning Ability of Large Language Model
BitNet: Scaling 1-bit Transformers for Large Language Models
大型语言模型 (LLMs) 在应对需要任务规划和使用外部工具的任务时表现出了熟练的能力
Compressing Context to Enhance Inference Efficiency of Large Language Models
PALP: Prompt Aligned Personalization of Text-to-Image Models
Creative Robot Tool Use with Large Language Models
大型语言模型:从训练到推理的全面综述
EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
闪电注意力-2:大型语言模型处理无限序列长度的高效方法
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
TextGenSHAP:面向长文本的可扩展生成解释方法
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Spla
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
多模态基础模型研究综述
Retrieval meets Long Context Large Language Models
大型语言模型效率研究综述
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
思想设计:破解复杂问题的关键
重新定义LLM量化:一种面向生成任务的全新FP6中心策略
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
基于上下文调整的检索增强生成方法
Learning to Learn Faster from Human Feedback with Language Model Predictive Cont