HTP-Soy: An Aerial Image Set of Multi-category Soybean for High-Throughput Phenotyping (HTP)

Listed in Datasets

By Beichen Lyu1, Stuart D Smith1, Keith Cherkauer1, Katy Rainey1

Purdue University

The publication “HTP-Soy” contains 1,728 aerial images of soybean plots in the field and each image is categorized based on spatial, temporal, and genetic variations. Data is made available to encourage development of HTP applications.

Version 1.0 - published on 06 Jan 2020 doi:10.4231/ZAD3-MG98 - cite this Content may change until committed to the archive on 06 Feb 2020

Licensed under CC0 1.0 Universal

180702_map.png 180706_map.png 180712_map.png


Using high-throughput data collection and analysis, High-Throughput Phenotyping (HTP) studies plant phenotypic traits and their interactions with Genotype, Environment, and Management (GxExM). Recently, HTP has empowered cross-cutting applications in agronomy such as plant breeding (Xavier et al., 2017), yield prediction (Hassan et al., 2019), and stress evaluation (Lyu et al., 2019). Furthermore, with food production under increased stress from growing populations and climate uncertainty, HTP can provide a cheaper and faster solution to secure global food production. 

Modern HTP has been evolving toward image-based and learning-based approaches, but publicly available image sets are scarce, in particular aerial image sets of plants in the field. Previously released image sets are restricted to either species level such as trees (Kumar et al., 2012) and vegetables (Zheng et al., 2019), or experiment environment such as greenhouse and stationary camera (e.g., Cruz et al., 2016; Das Choudhury et al., 2018, Taghavi Namin et al., 2018). However, emerging HTP approaches have shown increasing interest in finer grained plant images at sub-species or genetic level, as well as more versatile experiment environments that can simulate plant growth in practice. For example, one of the emerging approaches is to plant crop plots with different genetics in the field and capture their images by using Unmanned Aerial Systems (UAS)  (Lyu et al., 2019). This approach allows us to scale up HTP for massive crop plots at very high spatial and temporal resolutions, while raising new challenges such as accurate crop plot localization, efficient Vegetation Index computation, and unbiased modelling under spatial/temporal/genetic variations.

To tackle the aforementioned challenges, we have assembled an image set called “HTP-Soy”, which contains 1,728 aerial images of soybean plots in the field. Specifically, these images represent 96 soybean plots at 3 growth stages, and for each soybean plot at each growth stage there are 6 image replicates from different camera view angles. Each neighbor of 6 soybean plots share the same genetics. Therefore, each image is categorized based on spatial, temporal, and genetic variations. We collected these images by using an eBee UAS from senseFly which flew over a soybean field in Romney, Indiana on July 2, July 6, and July 12, 2018, respectively. We flew the UAS at a height of 120 meters and the UAS had a high-resolution senseFly S.O.D.A. RGB camera onboard, which produced images at a Ground Sampling Distance of 2.5 cm. Besides, we flew the UAS with a forward and side overlap ratio of over 85% and 75%, which provided multiple image replicates at different camera view angles. Having the raw UAS images, we extracted all soybean plot images automatically using the approach of Hearst et al. (in review) and Lyu et al. (2019). Before publication, manual verification and adjustments were also deployed. 

The HTP-Soy image set is publicly available from the Purdue University Research Repository (PURR). It can be downloaded as a compressed file “” Inside “”, the “original” folder contains the original 1,728 aerial images of soybean plots in the field, the “segmented” folder contains another version of these 1,728 images with canopy segmented, and the other three image files that describe the experiment site at three growth stages. Inside each folder, images are grouped into three sub-folders based on their growth stages. For each image, its naming convention follows the format of “ - row - - range - - u - - rep - ”: Each token of “” denotes a growth stage;  each token combo of “row - - range - ” denotes a unique genetics; each token combo of “row - - range - - u - ” denotes a unique soybean plot in the field; and each token of “rep - ” denotes a camera view angle. 

We hope HTP-Soy can benefit research in HTP and other related communities such as agronomy and computer science. For future research using HTP, we envision several applications including: 1) a benchmark dataset to train and test classification algorithms on crop plot images, 2) a baseline dataset to explore the effect of spatial, temporal, and genetic variations on HTP models, and 3) an introduction dataset to help visualize crop plots in the field from the perspective of UAS. 

Cite this work

Researchers should cite this work as follows:



This research is supported by USDA Agriculture and Food Research Initiative Award 2016-07982.

The Purdue University Research Repository (PURR) is a university core research facility provided by the Purdue University Libraries, the Office of the Executive Vice President for Research and Partnerships, and Information Technology at Purdue (ITaP).