View on GitHub

TextWorlds QA

TextWorlds QA is a new Question Answering dataset introduced in the following paper. The inspiration behind the creation of this dataset is the vision that our future personal assistants will be able to learn new knowledge from explicit verbal instruction, and then answer questions over that knowledge.

A quick summary of what this dataset is about (from the paper):

We synthesize narratives in five diverse worlds, each containing a thousand narratives and where each narrative describes the evolution of a simulated user’s world from a first-person perspective. In each narrative, the simulated user may introduce new knowledge, update existing knowledge or express a state change (e.g., “Homework 3 is now due on Friday” or “Samantha passed her thesis defense”). Each narrative is interleaved with questions about the current state of the world, and questions range in complexity depending on the amount of knowledge that needs to be integrated to answer them. This allows us to benchmark a range of QA models at their ability to answer questions that require different extents of relational reasoning to be answered.


You can download the dataset here.


If you use the dataset please cite:

Labutov, Igor, Bishan Yang, Anusha Prakash, Amos Azaria. “Multi-Relational Question Answering from Narratives: Machine Reading and Reasoning in Simulated Worlds” Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.


Please email us directly to report new results on this dataset. We will maintain the list of publications that report performance on this data, and report these benchmarks in the table below:

F1 F1
MemN2N 57.0 9.2
BIDAF 72.8 16.6
DrQA 79.4 23.9


If you find any issues with the dataset, or if you would like to report new results and be added to the benchmark list, please contact us.