Data Preparation and Transformation

Data Preparation and Transformation

Once data has been acquired and some initial visualization and exploration have been completed, the training data must be prepared for model development. This is known as Data Preparation and Transformation, or sometimes just “Data Wrangling.” This step is all about restructuring, cleaning up, enriching, validating, and potentially publishing the cleaned-up data. Transformations may also be required to extract labels. For example, developing a model that predicts the likelihood of churn among customers will require a label indicating which of the customers in our transactional database are examples of churn. This can, in turn, require a complex query against the data warehouse that considers factors such as the products or services that we are basing the prediction on, the number of days without a transaction, the window in which we want to make predictions, and more.

It can also include standardization of formats (e.g. “California”, “CA”), deduplication, conversions (e.g. metric to imperial or currency to currency), breaking data up into bins or buckets (e.g. ages <19, 20-29, 30-39, over 40); validating data (e.g. finding outliers or even incorrect data like birthdates in the future or too far in the past); and backfilling (imputing missing data.)

SAS Visual Data Mining and Machine Learning
Solve the most complex analytical problems with a single, integrated, collaborative solution
RapidMiner Studio
One platform, does everything
IBM Cloudpak for Data
Reinvent how you work
Modern MLOps focused on speed and simplicity
Machine learning made beautifully simple for everyone
AI and machine learning model management and operations for enterprise data science teams
Oracle – Data Science Platform
Data science platform
Creating data science
Open-source version control system for machine learning projects
Find the smart data inside your big data