The training process is central to machine learning. In order to produce a model, we train the system by applying a learning algorithm to a training dataset. If this process is successful, the model will be able to identify the relationship between features and labels in the training data. Features refer to specific attributes or characteristics of the training data likely to help the model make accurate predictions. Labels refer to the answers that we want our model to predict given a set of input features. Once a model is trained, it can then be used for inference or scoring; that is, querying the model with new, unlabeled data to predict how it should be labeled.
For a simple example, consider the problem of predicting home values from available real estate data. We know intuitively that a relationship exists between the characteristics of a home such as its size, number of bedrooms and baths, and location (i.e., our features), and the prices the properties ultimately sell for (i.e., our labels). Without using machine learning, it would be very difficult to accurately capture this relationship as a rule or set of rules, especially as we increase the number of features (independent variables.)