V
主页
The FinBen: An Holistic Financial Benchmark for Large Language Models
发布人
【加群】 一起来刷arxiv,请加vx: pwbot02(请备注:b站arxiv) 【论文标题】 The FinBen: An Holistic Financial Benchmark for Large Language Models 【论文简述】 这篇论文介绍了一种金融领域中基于语言模型的综合评估基准——FinBen,该基准由35个数据集组成,涵盖了23个金融任务,并按照Cattell-Horn-Carroll理论的启发,将这些任务分为三个难度谱系,旨在全面评估语言模型在金融领域中的能力。通过对包括GPT-4、ChatGPT和最新的Gemini在内的15个代表性语言模型的评估,研究人员发现GPT-4在定量化、提取、数值推理和股票交易方面表现出色,而Gemini在生成和预测方面表现出色。然而,两者在复杂提取和预测方面表现不佳,需要有针对性的改进。研究还发现,指令调优可以提升简单任务的性能,但在改善复杂推理和预测能力方面效果有限。FinBen将不断评估金融领域中的语言模型,在任务和模型的定期更新中促进人工智能的发展。 【论文链接】 https://arxiv.org/abs/2402.12659
打开封面
下载高清视频
观看高清视频
视频下载器
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Retrieval meets Long Context Large Language Models
How Do Large Language Models Capture the Ever-changing World Knowledge? A Review
Interactive Task Planning with Language Models
Offline Actor-Critic Reinforcement Learning Scales to Large Models
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language M
MusicAgent: An AI Agent for Music Understanding and Generation with Large Langua
Are Large Language Models Post Hoc Explainers?
Learning to Learn Faster from Human Feedback with Language Model Predictive Cont
GraphLLM: Boosting Graph Reasoning Ability of Large Language Model
Secrets of RLHF in Large Language Models Part II: Reward Modeling
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Llemma: An Open Language Model For Mathematics
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
Anthropic:人工智能的性格应该是什么样的?
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
How FaR Are Large Language Models From Agents with Theory-of-Mind?
Can Large Language Models be Good Path Planners? A Benchmark and Investigation o
[2024年] [中英字幕] 7 Building AI Models in the Wild | 麻省理工学院深度学习导论 6.S191
Controlled Decoding from Language Models
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Moral Foundations of Large Language Models
AutoMix: Automatically Mixing Language Models
LayoutPrompter: Awaken the Design Ability of Large Language Models
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-S
Language Models can be Logical Solvers
Orca 2: Teaching Small Language Models How to Reason
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
基于语言模型的知识探测和推理方法
FlashDecoding++: Faster Large Language Model Inference on GPUs
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-
Making Large Language Models Perform Better in Knowledge Graph Completion
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark
The Consensus Game: Language Model Generation via Equilibrium Search
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling