AI for Agriculture and Global Food Security with Nemo Semret

EPISODE 347

Join our list for notifications and early access to events

About this Episode

Today we begin our annual Black in AI series joined by Nemo Semret, CTO at Gro Intelligence.

While agriculture isn't normally considered a very sexy industry, it is certainly one of the most important in the world to anyone that eats, and is a huge employer as well, with about 2 billion people involved, from production through distribution. Because of the industry's importance, a great deal of data is available about food production, from modern satellite imagery to historical –in some cases ancient–crop yield reports. Taken together, these factors create a tremendous opportunity to apply AI and generate insights and forecasts that help those in the agricultural industry make more informed decisions.

AI in agriculture traditionally operates on one of two different scales: micro and macro. The micro scale, also called precision agriculture, is concerned with applying tech to increase the productivity of individual parcels of land. Macro-scale questions, on the other hand, are looking at entire markets or ecosystems and the impacts of changes to individual players in the food production supply chain.

Nemo Semret is the CTO of Gro Intelligence, a company providing an agricultural data platform dedicated to improving global food security, focused on applying AI at a macro scale. Nemo was previously a tech lead at Google until the founder of Gro, Sara Menker, brought him on board in 2015.

The company is focused on helping its customers answer macro-scale questions such as: What types of crops are more suitable to southern Brazil? Or what are the environmental conditions that make more sense to grow coffee beans?

ML Applications & Modeling Tasks

There are four main ways that Gro applies machine learning to agriculture:

  1. Agricultural Yield Models. This class of problems attempts to predict crop yields, answering questions like how many tons of wheat will be produced in India a year from now? Traditionally, the reports available to farmers and decision-makers in government and industry relied heavily on subjective estimates that often proved inaccurate and were manually produced on an infrequent basis, e.g. quarterly or semi-annually. Using machine learning regression models, Gro is able to the wide variety of data sources it has collected to update key yield predictions on a daily basis. You can check out their published papers on yield models here.
  2. Crop Masking. Name that tree! This is essentially a classification task in which Gro seeks to identify what type of crop is growing in each pixel of a satellite image.The challenge is that conditions change often and distinguishing between an orange tree and a tangerine crop might be easier said than done.
  3. Droughts. Droughts are a major threat to farming and food production. To date, there is no standard international drought index that the world can agree on, and Gro wants to change that by analyzing environmental conditions to create an objective benchmark for severe droughts.
  4. Knowledge Graph Automation. Gro ingests data from dozens of sources and that information needs to be organized into a common, structured, ontology or knowledge graph. Gro uses machine learning models to automate this task. Gro's knowledge graph automation models help extract data and update how it flows into the Gro knowledge graph.

The Data is So Good

Gro's models ingest "wildly different data types" to support the company's models and allow them to get a sense of a dynamic agriculture market. The majority, at least in volume, comes from satellite data, spanning the entire frequency range of the electromagnetic spectrum, including visible, ultraviolet, and infrared. This helps Gro deduce a wealth of information about crop growth and growing conditions around the globe.

In addition to satellite imagery, the company also collects a huge amount of time series data, many originating in PDFs or worse, scanned paper reports issued by local governments.

The company's database currently has over 55 million data series and the amount is doubling every 6-9 months. Reproducibility and attribution are extremely important and ensure that each data point can be traced back to where it came from.

Despite the overwhelming amount of data sources, the amount is not always sufficient. That's where Gro's own derived data series come into play. This method applies the company's machine learning models to data from multiple sources to create new, insightful data series. This helps users overcome data inconsistencies that might be found in any individual source.

For the most part, the data Gro collects is surprisingly clean. As Nemo notes, it's "hard to lie to a satellite." Try me.

Modeling Lessons Learned
To deal with their scale, Gro has had to learn many lessons about developing effective machine learning models in agriculture. The keys to their success, according to Nemo, lie in:

  1. Choosing what to model. Gro has to carefully determine criteria to answer whether it is an important and economically interesting problem for their user base.
  2. Don't come at a problem with a solution. This involves remaining "agnostic to technology" and being prepared to try different approaches to each issue.
  3. Build for the masses. The company actively builds general frameworks that can be applied to different situations and geographic regions.
  4. Pause, then go. Before launching a set of models, they evaluate the performance in unique ways such as looking at how the error is distributed spatially or its temporal distribution performance. They bring in domain expertise to figure out feature engineering and tweaks to have a good model.

Nemo points out that while they are still continuing to develop these methods, the past few years are already showing improvements in accuracy with more rigorous data acquisition.

Connect with Nemo

More from TWIML

Leave a Reply

Your email address will not be published. Required fields are marked *