Pachyderm

Pachyderm is cost-effective at scale and enables data engineering teams to automate complex pipelines with sophisticated data transformations.

Pachyderm Overview

Pachyderm is a data-centric pipelining and versioning tool that allows ML and Data-Science teams to automate and collaborate on their organization’s data while maintaining full reproducibility.

Expand All

Categories:
All
MLOps
Data
Data+MLOps

System-wide Features:
Teamwork and Collaboration
Enterprise Security
Governance
Enterprise Support

MLOps Features:
Data Acquisition
Data Versioning
Data Visualization
Data Preparation
Data Pipelines
Data Labeling
AutoML
Featurization
Feature Store
ML Pipelines or Workflows
Model Registry
Model Marketplace
Model Training
Distributed Model Training
Model Debugging
Experiment Management
Deep Learning Support
Reinforcement Learning Support
Bias Detection and Mitigation
Model Explainability
Hyperparameter Optimization
Model Packaging
Model Deployment and Serving
Edge ML Support
Model Monitoring
Cost Management
ML Infrastructure Orchestration
Accelerator Support
Kubernetes Support

Have you used Pachyderm?

If so, please share your experiences with the TWIML community.

Additional Product Information

Product Links

Deploys On

Amazon Web Services
Google Cloud Platform
Microsoft Azure
Other Public Cloud
Kubernetes
NVIDIA
Private Cloud or Datacenter
SaaS

Marketplace Links

Pachyderm on Microsoft Azure

Pachyderm Features and Benefits

Benefits

Data Lineage
Think "git for data" but better. Pachyderm version-controls all data types, but it also delivers true data lineage. Data Lineage means knowing, with certainty, the complete journey of your data, code, models, and the relationships between them.

End-To-End Pipelines
Pachyderm makes it simple to build end-to-end data science workflows using any language or framework you want. Transform existing manual processes into fully automated event-driven workflows.

Enterprise Scale
Kubernetes makes software scalable. We built Pachyderm on top of Kubernetes to provide you with a direct path to production, using your choice of infrastructure. It doesn't matter if you're still in the POC phase, or processing petabytes of data, Pachyderm makes scaling simple.

Features

• Data versioning
• Containerized Pipelines
• Data lineage
• Distributed workloads
• GPU support
• Pachyderm dashboard
• Advanced statistics
• User Access controls
• S3 gateway
• Enterprise-Grade Support
• Custom Deployments
• Hosted Service

Pachyderm Vendor Information

Vendor Overview

Pachyderm is an enterprise-grade, open-source data science platform that makes explainable, repeatable, and scalable ML/AI a reality. Its platform brings together version control for data with the tools to build scalable end-to-end ML/AI pipelines while empowering users to use any language, framework, or tool they want.
Pachyderm is “Git for Data Science.” It offers complete version control for data and gives your data science team the same first-class development tools as software developers. Pachyderm is ideal for building machine learning pipelines and ETL workflows because we track every model/output directly to the raw input datasets that created it (aka: Provenance).