The accuracy of an ML model is based on identifying the right set of features, but this process can be very time-consuming. Once those features have been created and it has been determined that they are the best for a given model, they should ideally be made available to other team members for training other models. In a perfect world, whatever features were used for training should also be made available for production inference. For many organizations, it is common that features are not stored, named, shared, or accessible between team members. Even if they are stored, they are often stored in two different places, one for training and one for production. In many cases, they have even been written in different languages!
Because training is often done in batch and inference is often done in real-time, the storage requirements in terms of latency are quite different, and this leads many organizations to end up building two different feature stores – one for training and one for production. At that point, it’s not hard to see how those systems can go out of sync over time.
Building and managing a feature store from scratch involves a lot of engineering, infrastructure, and operational effort that could take valuable time away from the actual machine learning work that needs to be done to deliver value to the organization. For that reason, there are many new startups in the space that are delivering feature store functionality to customers, either stand-alone or as part of a larger End-to-End ML platform.