Addressing food security at scale: standardized tools for open ground truth data
More than 700 million people around the world faced hunger in 2020, and this number has been rising since 2014. Strengthening agricultural production, especially for rural, smallholder farmers, is an important tool for addressing food security. Innovative technological solutions, such as advanced analytics and forecasting models, can help strengthen agricultural production in a way that is scalable. But these methods require significant ground truth data to train and validate machine learning algorithms. Ground truth data represents farmers’ realities on the ground, and can include things like crop type, farm size and location, and crop yield. Predictive analytics can be incredibly powerful, but it can be very difficult, and less accurate, without quality ground truth data. For this data to be useful for analytic purposes, it must be standardized—so that it is comparable across farms, regions, and countries, and it must be made publicly available—something data owners aren’t always willing or able to do.
As the applications of artificial intelligence (AI) and machine learning (ML) in crop analytics become more readily accessible to public and private sector actors, the need for a core set of standardized ground truth data is growing. There are many effective data collection tools for in-field ground truth data—these tools are commonly used by agricultural development projects supporting millions of smallholder farmers around the world. However, widely available standards for these common mobile data collection platforms do not exist. Instead, agriculture field data are not standardized, they are collected in different formats, lack quality assurance, and are not interoperable, making these data difficult to share, compare, and apply to new models. A lack of crop analytics in accessible, decision-ready formats leads to inefficiency, missed opportunities, and less productivity.
Enabling Crop Analytics as Scale is a multi-phase initiative to standardize field data collection and create pipelines to make that field data publicly available. Kicking off in January 2020, the project was funded by the Bill & Melinda Gates Foundation and implemented by Tetra Tech. By leveraging the existing, common tools and platforms, we can rapidly expand access to standardized, ground truth data and accelerate innovation and scaling of crop analytics applications and services for smallholder farmers across a range of use cases.
To achieve the goal of accelerating innovation in ground-truthing and scaling dataset availability, the project partners [Radiant Earth Foundation and ODK]:
Created a set of tools to collect standardized ground truth training data for advanced crop analytics deployable through common mobile data collection devices, while exploring innovative strategies and methods for dramatically reducing the costs of collecting ground-truth data.
Created effective automated systems to process, quality control, and share ground truth data.
Improved access to repositories of regularly updated ground data and core insights layers (i.e., crop type, yield, and field boundaries) to enable scalable satellite-based analytics in emerging agricultural economies.
For major funders, increasing the availability of standardized ML-ready agricultural data will allow for improved comparability and scaling of crop analytic product use across projects and programs. For researchers, developers, and entrepreneurs, interoperable “pipelines” that allow these data to flow across platforms will provide geographically disbursed and analysis-ready information to decrease model and application development and testing costs allowing for innovations to scale. For governments, the private sector, and civil society organizations working on the ground with smallholder farmers, new tools and applications will help decrease the cost of collecting and processing data and generate analytics and insights that can be used to better identify and deliver extension or financial services to citizens or clients.
D4DInsights has extensive experience developing open data ecosystems that cut across sectors and regions. From strategy development to stakeholder engagement, we have been a trusted partner to multilateral organizations, governments, companies, and non-governmental organizations around the world, supporting the use of data, innovation, and technology for sustainable development.
Building on our expertise at the intersection of data and agricultural production, we supported:
Coordination and strategy development
Coordinated efforts between project partners Tetra Tech, ODK, Radiant Earth Foundation, and a range of stakeholders.
Provided guidance for building an open data ecosystem approach, ensuring the process was inclusive of all relevant stakeholders that needed to be at the table.
Supported the development of the value proposition articulating why this would be beneficial to stakeholders.
Network-building and engagement
Developed a network of public and private sector actors, aligning incentives to contribute ground-truth data and invest in the development and use of a common platform.
Conducted a stakeholder mapping to identify key users.
Conducted interviews with a range of potential users across the African continent, to understand user profiles, including how they collect and use data.
Building the data collection tool
Analyzed information from interviews with stakeholder organizations and companies working across Africa and India.
Created use cases and user personas to ensure the tool was a fit-for-purpose solution, built to work with the users’ processes and meet their needs.
Advised on data privacy standards and anonymization to protect stakeholders, increase trust across all participating organizations, and facilitate additional data sharing.
Outcomes and Impact
The Standardized Data Collection Tool Kit is now available and being tested in the field through a number of partners. Additionally, a number of partners will be using the tool for field data collection at scale, and then providing feedback. This will allow the program to better understand what works and what doesn’t, and then adjust and optimize the tool where necessary. We are anticipating that organizations will likely have additional data requirements, resulting in the need for additional fields and integration of the data standard per existing processes. Therefore, the toolkit will likely need to be customized in many instances. Currently, data collected via the tool will be ingested by Radiant Earth Foundation’s MLHub, an open library for geospatial training data. Despite the potential for customized data collection toolkits, MLHub will only ingest data associated with the standard.
Throughout the process of planning, engaging with partners, and developing the tool, we captured valuable lessons that will inform later stages of this project. A selection of some key lessons include:
There are many forms of field data collection including mobile scratch cards, phone calls, WhatsApp, surveys, and mobile data collection. Of these, conducting physical, in-field data collection through mobile devices was less common compared to other methods.
In many cases, location is captured with very little precision, making it difficult to model other, associated data (like crop type, yield, etc) from satellite information.
Even if personal information is removed from data, in many cases, permission is still required from the farmer to share the data. Generally, capturing consent information is complicated given the variety of data collection techniques, individual and organizations’ protocols, security needs, and relationships. The current version of the tool accounts for this complexity, however, it is likely that the toolkit may need further adjustments to meet ongoing data requirements.
Stakeholders in the field used a variety of mobile data collection tools and platforms. Therefore, interoperability is necessary to ensure these tools can operate across platforms.
There is not an ideal method for capturing crop yield, for many reasons, including intercropping, growth stage, and user error. These inconsistencies can have a direct impact on the reliability of these data for ML purposes.
The project partners will continue to refine the standard based on further feedback and continue developing the ecosystem enabling ground truth data to be made open and accessible to users across sectors. By October 2022, the tool should be fully developed and the network and data sharing facility activated. We look forward to this progress and future collaboration.
Tetra Tech is a leading, global provider of consulting and engineering services. We are differentiated by Leading with Science® to provide innovative technical solutions to our clients. We support global commercial and government clients focused on water, environment, sustainable infrastructure, renewable energy, and international development. With 21,000 associates worldwide, Tetra Tech provides clear solutions to complex problems.
Radiant Earth Foundation is a non-profit organization actively working to develop Earth observation machine learning libraries and models through an open source hub that support global missions like agriculture, conservation, and climate change. Radiant Earth also fosters a community of practice to develop standards around machine learning for Earth observation and provide information on the progress and innovation in the Earth observation marketplace.
ODK began with a vision to make open-source mobile data collection software for resource-limited settings. Today, ODK helps millions of people collect data quickly, accurately, offline, and at scale. It’s the standard for offline data collection and trusted by organizations like World Health Organization, Red Cross, Carter Center, CGIAR, Google, and many more.