Introduction

This web page supports a dataset of natural language instructions for object specification in manipulation scenarios. It is comprised of 1582 individual written instructions which were collected via online crowdsourcing. Each of these instructions was elicited from one of 28 included scenario images. This dataset is particularly useful for researchers who work in natural language processing, human-robot interaction, and robotic tabletop manipulation. In addition to serving as a rich corpus of domain specific language, it provides a benchmark of image/instruction pairs to be used in system evaluations as well as uncovers inherent challenges in tabletop object specification.

Referred Journal Publication

R. Scalise*, S. Li*, H. Admoni, S. Rosenthal, and S. Srinivasa "Natural Language Instructions for Human-Robot Collaborative Manipulation", International Journal of Robotics Research, in press.

Data and accessing code

  1. Primary Dataset: Natural Language Instructions Corpus

    • Data in CSV

    • Downsampled Data in CSV

      (We downsampled the data from 1582 to 1400 and use the 1400 instructions in the evaluation study)

    • Accessing code in Python

    • Example of one row from the table containing the primary dataset:


    • Instruction Index Scenario AgentType Difficulty TimeToComplete Strategy Challenging GeneralComments Age Gender Occupation ComputerUsage DominantHand EnglishFirst ExpWithRobots ExpWithRCCars ExpWithFPS ExpWithRTS ExpWithRobotComments
      Pick up the yellow cube. 1341 Configuration_1_v1.png human 1 00:00:16 Tried to find something that would differ the specific cube from others Moderately challenging at the beginning but it get's easier with practice. 28 female Engineer 15-20 Right 1 3 1 5 3 Yes, I had to build one in one of my classes

      • Note that 'Scenario' denotes the corresponding stimulus image used to elicit the instruction. The first (of 28 total images) is Configuration_01_v1.png.

      • Please refer to Table 2 in the IJRR Data Paper for further details on each header field.

  2. Supplementary Dataset: Instruction Evaluation

    • Full Data in JSON

    • Full Data in CSV

    • Averaged Data in JSON

    • Averaged Data in CSV

    • Python code to access JSON data

    • Python code to access CSV data

      • Note: in accessing code of study 2, r_target_block_index is referring to the index of the target block. The index of all the blocks and the indices of target blocks in both versions of each scenario on the tabletop are annotated in images_code.pdf

    • Example of one row from the table containing the evaluation dataset (non-averaged):


    • Instruction Index Scenario NumOfWords TargetBlockId ClickedBlockId Correctness TimeToComplete DifficultyComm ObsHardComm ObsEasyComm AddiComm Age Gender Occupation ComputerUsage DominantHand EnglishFirst ExpWithRobots ExpWithRCCars ExpWithFPS ExpWithRTS ExpWithRobotComments InternalUserID
      Pick up the yellow cube. 1341 Configuration_1_v1.png 5 1 1 1 3.593606 Nice Game and Fun Nothing Nothing 37 female SEO >20 Right 1 6 6 6 6 No Idea 165

      • Note that 'Scenario' denotes the corresponding stimulus image paired with the shown instruction when it was elicited during the first study. In the evaluation study stimulus images, there is no distinction between 'v0' and 'v1' for each image as there is no red arrow specifiying a particular block. The first (of 14 total images) is Configuration_01.png.

      • Please refer to Table 3 in the IJRR Data Paper for further details on each header field.

Stimulus Images

  1. The stimulus images used in Study 1

    • "Configuration_example_page.png" was only used in the example page of the initial study.

    • Images from "Configuration_01_**.png" to "Configuration_14_**.png" are the images used as stimuli. From each of the 14 configurations, there are 2 possible target blocks selected which are indicated by a red arrow ("Configuration_**_v1.png" and "Configuration_**_v2.png"). In total, there are 28 unique scenarios in the set of stimulus.

    • An example

      stimulus_image_example_1

  2. The stimulus images used in Study 2

    • Contains the image from the example page as well as 14 stimulus images (the original 28 sans red arrow).

    • An example

      stimulus_image_example_2

Publications based on this dataset

  1. Conference Papers

    Shen Li*, Rosario Scalise*, Henny Admoni, Stephanie Rosenthal, and Siddhartha S Srinivasa. Spatial references and perspective in natural language instructions for collaborative manipulation. In Proceedings of the International Symposium on Robot and Human Interactive Communication Conference. IEEE, 2016.

  2. Workshop Papers

    Shen Li*, Rosario Scalise*, Henny Admoni, Stephanie Rosenthal, and Siddhartha S Srinivasa. Perspective in Natural Language Instructions for Collaborative Manipulation. In Proceedings of the Robotics: Science and Systems Workshop on Model Learning for Human-Robot Communication. 2016.

  3. Posters

    Workshop at Robotics: Science and Systems 2016 - Model Learning for Human-Robot Communication

Contact

If you have any questions about the dataset, or intend to collaborate with us on human-robot communication, please contact us! We are excited to hear from you!

You can reach either of us via email:

Rosario Scalise rscalise@andrew.cmu.edu

Shen Li shenli@cmu.edu

License

Source code is provided under the MIT License and the .csv data is available under a CC BY-SA 4.0 license.

corpus words