V
主页
A New Benchmark and Reverse Validation Method for Passage-level Hallucination De
发布人
论文简述:在这篇名为A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection的论文中,作者提出了一种基于反向验证的自检方法,以自动检测零资源环境下模型生成的幻觉错误。为了促进未来的研究并评估不同的方法,作者构建了一个由ChatGPT生成、人工标注的幻觉检测基准。与之前的零资源幻觉检测研究相比,本文的方法和基准更注重段落级别的检测而非句子级别。在基准的不同领域上,作者实证评估了现有零资源检测方法和自己的方法,以探索幻觉与训练数据之间的潜在关系。此外,通过对一些LM未能捕捉到的幻觉案例进行人工分析,揭示了零资源方法的共同局限性。这篇论文的主要贡献在于提出了一种有效的零资源幻觉检测方法,并通过构建基准来促进未来的研究。通过对比不同领域的实验结果,作者展示了这种方法在幻觉检测方面的有效性。同时,通过对失败案例的深入分析,作者揭示了零资源方法的共同局限性和潜在改进方向。总的来说,这篇论文为未来研究和实际应用提供了有价值的参考和启示。 论文链接: https://arxiv.org/pdf/2310.06498
打开封面
下载高清视频
观看高清视频
视频下载器
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort
System 2 Attention (is something you might need too)
Manipulate-Anything: 实现机器人操作自动化的视觉-语言模型
Interactive Task Planning with Language Models
ReFT: Reasoning with Reinforced Fine-Tuning
Can Large Language Models be Good Path Planners? A Benchmark and Investigation o
An Image is Worth Multiple Words: Learning Object Level Concepts using Multi-Con
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completi
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixtu
BitNet: Scaling 1-bit Transformers for Large Language Models
Learning to Learn Faster from Human Feedback with Language Model Predictive Cont
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
Memory Augmented Language Models through Mixture of Word Experts
自然语言处理很难?【强推!】迪哥带你高效入门NLP自然语言处理,从原理到分类实战,3小时完全吃透!
多模态基础模型研究综述
KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval
仿生机器人女友它水灵灵的来了
我从来不用自己剪视频,因为我会用AI
Amortizing intractable inference in large language models
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
由 AI 拍摄的《红楼梦》视频,1分钟让你看完四大名著之一 | 零度解说
Contrastive Chain-of-Thought Prompting
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
人工智能有可能破解动物语言吗?
Video as the New Language for Real-World Decision Making
AX400"卡拉"随时为您服务 - Nagoonimation
How Do Large Language Models Capture the Ever-changing World Knowledge? A Review
Label Supervised LLaMA Finetuning
EvoPrompt: AI Prompt Optimizer
深度对于transformer模型合成泛化的促进作用
FastDiT-3D:高效生成高质量三维点云的扩散变换器
Visual In-Context Prompting
GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
RAG 提出者 Patrick Lewis:详解检索增强生成(RAG)
开源指令生成:用开源代码提升代码生成模型性能
单张图像到3D的高效生成:基于分摊生成的3D高斯模型
Retrieval meets Long Context Large Language Models
大型语言模型在链式思维推理中的应用
VeRA: Vector-based Random Matrix Adaptation