Model Portability and Compression

Organizations sometimes find that they want to deploy a model somewhere and they find that they need to reduce the model size in order for it to successfully run in the target production environment. There are many approaches to compression or quantization of models that range from removing data from the training sets to removing layers of the neural network to changing things like the floating-point precision used for the calculations from 15 decimal places to 3. Regardless of the technique used, the goal is to reduce the size and therefore storage, compute, and power costs to run it while reducing the impact on accuracy.

