Introducing the first enterprise-ready feature store for machine learning. Built by the creators of Uber Michelangelo, Tecton provides the first enterprise-ready feature store that manages the complete lifecycle of features — from engineering new features to serving them online for real-time predictions.
• Combine Batch, Streaming, and Real-Time Data. Use all your enterprise data to build high-quality features. Combine batch data (e.g. Amazon S3, Amazon Redshift, Snowflake), streaming data (e.g. Apache Kafka, Amazon Kinesis), and real-time data. Real-time data is passed to Tecton at the time of the feature request, and Tecton executes on-demand transformation to generate real-time values.
• Manage features as code. Define features in Python files that contain the transformation logic and metadata needed to compute features on an ongoing basis. Version-control your features (e.g. in Git) and integrate them with your existing Code Review and CI/CD processes.
• Develop with familiar data science tools. Build features using familiar programming languages and libraries including Python, SQL and PySpark. Use Tecton’s Python SDK in your preferred notebook environment to create training datasets.
• Generate accurate training data. Create accurate training datasets with just a few lines of code. Use row-level time travel to deliver the right values at the right time, for each individual row.
• Plan feature changes with confidence. “Am I about to modify a feature used in production? How much will this feature cost to process? Is this new feature a duplicate?” Before applying your changes, Tecton generates a plan that allows you to answer these questions. Test changes in private workspaces before deploying them to your production environment, and integrate seamlessly with your existing CI/CD pipeline.
• Automate feature transformations. Tecton orchestrates data pipelines to generate backfills and continuously compute fresh feature values. Alternatively, ingest feature data from pipelines managed outside of Tecton.
• Ensure data consistency. Provide a single source of truth for feature data across your organization. Tecton stores historical data in offline storage, and fresh data in online storage for low-latency retrieval. Ensure data consistency over time and eliminate training / serving skew. Reproduce historical data sets with row-level time travel to deliver the right values at the right time, for each individual row.
• Serve features online. Retrieve feature values from Tecton’s REST/gRPC API to power real-time predictions in production.
• Monitor data quality and operational metrics. Monitor features at every step of their lifecycle. Validate ingested data and detect data drift. Monitor operational metrics including serving latencies, serving volumes, and storage consumption.
• Search and discover features. Build a unified catalog of features across your organization. Enable teams to search and discover existing features to enable re-use across teams and models. Track detailed information on each feature including data lineage, feature health, and feature value distribution to increase confidence in existing features
• Enterprise-grade security. Tecton is deployed in your VPC, so your data never leaves your cloud account. Control access to individual features and data with Role-Based Access Control and ACLs. Integrate with AWS IAM, Okta, or any solution that supports SAML or Oauth.
• Built for scale. Based on Uber Michelangelo’s battle-tested architecture blueprint, Tecton is built for scale and can support thousands of models, tens of thousands of features, and millions of predictions per second. You can start small, and Tecton will scale with you.
• Develop high-quality features using real-time and batch data
• Deploy and serve features in production instantly
• Build accurate training data sets using time travel
• Monitor production features to detect breakages and drift
• Share, discover, and re-use features across your organization
• Built-in enterprise-grade service levels, governance, and security
• Integrates with common data infrastructure and ML platforms including Amazon SageMaker, Databricks, and Kubeflow