SCDE: Sentence Cloze Dataset with High Quality Distractors from Examinations

Description

SCDE is a human-created sentence cloze dataset, collected from public school English examinations in China. Our task requires a model to fill up multiple blanks in a passage from a shared candidate set with distractors designed by English teachers.

Disclaimer

  1. The SCDE dataset is available strictly for non-commercial research purposes only.

  2. All questions, passages and answers are obtained from the Internet, which is not the property of Carnegie Mellon University, the authors, or any associated employers, entities or institutions. We do not bear responsibility for either the content or meaning of these questions,passages and answers.

  3. By requesting and/or using this dataset, you agree not to reproduce, duplicate, copy, sell, trade, resell, rent or exploit for any commercial purpose, any portion of the contexts and any portion of derived data.

  4. We reserve the right to terminate your access to the SCDE dataset at any time.

Download

Please use this Google form to submit your information and request access to SCDE

Paper

Sentence Cloze Dataset with High Quality Distractors From Examinations
Xiang Kong*, Varun Gangal* and Eduard Hovy.


@inproceedings{xiang2020sentence,
  title={SCDE: Sentence Cloze Dataset with High Quality Distractors from Examinations},
  author={Kong, Xiang and Gangal, Varun and Hovy, Eduard},
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  year={2020}
}

Contact

Please reach out to Xiang Kong and Varun Gangal for any questions about the dataset.
Do let us know if you find any issues in the data!
Also, let us know if you get an improved result over the current best one!

Leaderboard

The leaderboard for this task can be tracked here

Data Format and Evaluation Metrics

Readers can directly refer to the Repository for more extensive instructions.
We provide a brief overview here.
The dataset has three splits - train, dev and test. Each of these is a JSON file.
The json file is a list of training examples, with each example containing the passage, the candidates and the answers.