Today we begin our annual Black in AI series joined by Nemo Semret, CTO at Gro Intelligence.
While agriculture isn't normally considered a very sexy industry, it is certainly one of the most important in the world to anyone that eats, and is a huge employer as well, with about 2 billion people involved, from production through distribution. Because of the industry's importance, a great deal of data is available about food production, from modern satellite imagery to historical –in some cases ancient–crop yield reports. Taken together, these factors create a tremendous opportunity to apply AI and generate insights and forecasts that help those in the agricultural industry make more informed decisions.
AI in agriculture traditionally operates on one of two different scales: micro and macro. The micro scale, also called precision agriculture, is concerned with applying tech to increase the productivity of individual parcels of land. Macro-scale questions, on the other hand, are looking at entire markets or ecosystems and the impacts of changes to individual players in the food production supply chain.
Nemo Semret is the CTO of Gro Intelligence, a company providing an agricultural data platform dedicated to improving global food security, focused on applying AI at a macro scale. Nemo was previously a tech lead at Google until the founder of Gro, Sara Menker, brought him on board in 2015.
The company is focused on helping its customers answer macro-scale questions such as: What types of crops are more suitable to southern Brazil? Or what are the environmental conditions that make more sense to grow coffee beans?
ML Applications & Modeling Tasks
There are four main ways that Gro applies machine learning to agriculture:
The Data is So Good
Gro's models ingest "wildly different data types" to support the company's models and allow them to get a sense of a dynamic agriculture market. The majority, at least in volume, comes from satellite data, spanning the entire frequency range of the electromagnetic spectrum, including visible, ultraviolet, and infrared. This helps Gro deduce a wealth of information about crop growth and growing conditions around the globe.
In addition to satellite imagery, the company also collects a huge amount of time series data, many originating in PDFs or worse, scanned paper reports issued by local governments.
The company's database currently has over 55 million data series and the amount is doubling every 6-9 months. Reproducibility and attribution are extremely important and ensure that each data point can be traced back to where it came from.
Despite the overwhelming amount of data sources, the amount is not always sufficient. That's where Gro's own derived data series come into play. This method applies the company's machine learning models to data from multiple sources to create new, insightful data series. This helps users overcome data inconsistencies that might be found in any individual source.
For the most part, the data Gro collects is surprisingly clean. As Nemo notes, it's "hard to lie to a satellite." Try me.
Modeling Lessons Learned
To deal with their scale, Gro has had to learn many lessons about developing effective machine learning models in agriculture. The keys to their success, according to Nemo, lie in:
Nemo points out that while they are still continuing to develop these methods, the past few years are already showing improvements in accuracy with more rigorous data acquisition.