Google
Reasoning about actions and change (RAC) is essential to understand and interact with the ever-changing environment. Previous AI research has shown the importance of fundamental and indispensable knowledge of actions, i.e., preconditions and effects.
Nov 25, 2022
This work proposes four essential RAC tasks as a comprehensive textual benchmark and generates problems in a way that minimizes the influence of other�...
Nov 25, 2022Reasoning about actions and change (RAC) is essential to understand and interact with the ever-changing environment.
Jul 9, 2023We propose Textual Reasoning about Actions and Change (TRAC), a comprehensive suite of four fundamental and granular RAC reasoning tasks�...
TRAC: A Textual Benchmark for Reasoning about Actions and Change. Preprint. Full-text available. Nov 2022. Weinan He � Canming Huang�...
Oct 17, 2024This benchmark rigorously evaluates LLMs across six key RAC dimensions: Fluent Tracking, State Tracking, Action Executability, Effects of�...
ARB is a novel benchmark dataset composed of advanced reasoning problems designed to evaluate LLMs on text comprehension and expert domain reasoning.
The paper proposes a benchmark for evaluating the ability of LLMs to plan and reason on classical AI problems (BlocksWorld and Logistics). Overall, all�...
Missing: TRAC: | Show results with:TRAC:
A benchmark dataset generator to test relational reasoning on text. Code for generating data for our paper "CLUTRR: A Diagnostic Benchmark for Inductive�...
Oct 9, 2024ACPBench is a research paper that introduces a new benchmark for evaluating the reasoning abilities of artificial intelligence (AI) systems�...