HellaSwag

Introduced by Zellers et al. in HellaSwag: Can a Machine Really Finish Your Sentence?

HellaSwag is a challenge dataset for evaluating commonsense NLI that is specially hard for state-of-the-art models, though its questions are trivial for humans (>95% accuracy).

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Sentence Completion	HellaSwag	CompassMTL 567M with Tailor
Text Generation	HellaSwag (10-Shot)	LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
parameter-efficient fine-tuning	HellaSwag	LLaMA2-7b
Text Generation	HellaSwag TR	MARS