Pachyderm banner
Pachyderm logo


Pachyderm is cost-effective at scale and enables data engineering teams to automate complex pipelines with sophisticated data transformations.
Pachyderm Overview
Pachyderm is a data-centric pipelining and versioning tool that allows ML and Data-Science teams to automate and collaborate on their organization’s data while maintaining full reproducibility.
  • System-wide Features:
  • Teamwork and Collaboration
  • Enterprise Security
  • Governance
  • Enterprise Support
  • MLOps Features:
  • Data Acquisition
  • Data Versioning
  • Data Visualization
  • Data Preparation
  • Data Pipelines
  • Data Labeling
  • AutoML
  • Featurization
  • Feature Store
  • ML Pipelines or Workflows
  • Model Registry
  • Model Marketplace
  • Model Training
  • Distributed Model Training
  • Model Debugging
  • Experiment Management
  • Deep Learning Support
  • Reinforcement Learning Support
  • Bias Detection and Mitigation
  • Model Explainability
  • Hyperparameter Optimization
  • Model Packaging
  • Model Deployment and Serving
  • Edge ML Support
  • Model Monitoring
  • Cost Management
  • ML Infrastructure Orchestration
  • Accelerator Support
  • Kubernetes Support

Have you used Pachyderm?

If so, please share your experiences with the TWIML community.
Additional Product Information
Deploys On
  • Amazon Web Services
  • Google Cloud Platform
  • Microsoft Azure
  • Other Public Cloud
  • Kubernetes
  • Private Cloud or Datacenter
  • SaaS
Marketplace Links
Pachyderm Features and Benefits
Data Lineage
Think "git for data" but better. Pachyderm version-controls all data types, but it also delivers true data lineage. Data Lineage means knowing, with certainty, the complete journey of your data, code, models, and the relationships between them.

End-To-End Pipelines
Pachyderm makes it simple to build end-to-end data science workflows using any language or framework you want. Transform existing manual processes into fully automated event-driven workflows.

Enterprise Scale
Kubernetes makes software scalable. We built Pachyderm on top of Kubernetes to provide you with a direct path to production, using your choice of infrastructure. It doesn't matter if you're still in the POC phase, or processing petabytes of data, Pachyderm makes scaling simple.
• Data versioning
• Containerized Pipelines
• Data lineage
• Distributed workloads
• GPU support
• Pachyderm dashboard
• Advanced statistics
• User Access controls
• S3 gateway
• Enterprise-Grade Support
• Custom Deployments
• Hosted Service
Pachyderm Vendor Information
Vendor Overview
Pachyderm is an enterprise-grade, open-source data science platform that makes explainable, repeatable, and scalable ML/AI a reality. Its platform brings together version control for data with the tools to build scalable end-to-end ML/AI pipelines while empowering users to use any language, framework, or tool they want.
Pachyderm is “Git for Data Science.” It offers complete version control for data and gives your data science team the same first-class development tools as software developers. Pachyderm is ideal for building machine learning pipelines and ETL workflows because we track every model/output directly to the raw input datasets that created it (aka: Provenance).
Vendor Details
Year Founded
HQ Location
San Francisco, California, United States
Pachyderm Articles

Error: Feed has an error or is not valid.

Pachyderm logo

Contact Request

No data was found

Sorry. This form is no longer accepting new submissions.

Submit Review for Pachyderm

Sorry. This form is no longer accepting new submissions.