GitHub - datahub-project/datahub: The Context Platform for your Data and AI Stack
Open-source metadata platform for data discovery, lineage, and governance in your AI stack.
Overview
DataHub is an open-source metadata platform designed to help organizations discover, manage, and govern their data. It provides a central hub for metadata management, enabling data discovery, data lineage, and data governance across your data stack. Built by LinkedIn and now a Linux Foundation project, DataHub supports a wide range of data sources and offers a scalable, extensible architecture. It empowers data teams to find trusted data, understand its origin, and ensure compliance.
Key capabilities include:
- Automated metadata ingestion from databases, data lakes, warehouses, and more.
- Powerful search and discovery with faceted filters.
- Column-level lineage and impact analysis.
- Data governance with tags, terms, and ownership.
- Rich UI and API for metadata management.
Key Features
Metadata ingestion from 50+ sources
Real-time metadata sync via Kafka
Column-level lineage and impact analysis
Rich search with full-text and faceted filters
Data governance with tags, glossary terms, and ownership
Role-based access control (RBAC)
Active metadata graph with relationships
GraphQL API and Python SDK
UI for data discovery and curation
Domain and data platform support
Pros & Cons
Pros
- arrow_right + Open-source and free
- arrow_right + Excellent lineage and impact analysis
- arrow_right + Active community and frequent updates
- arrow_right + Supports many data sources
- arrow_right + Strong governance and search features
Cons
- arrow_right - Requires significant DevOps effort for self-hosting
- arrow_right - Documentation can be incomplete for advanced setups
- arrow_right - Learning curve for custom transformations
- arrow_right - Some connectors lack maturity
Pricing Details
DataHub is completely open-source under the Apache 2.0 license. There is no cost to download, install, and use. Optional managed services are available through Acryl Data (the company behind DataHub) with additional enterprise support and features. Pricing for managed services is not publicly listed; contact Acryl for details.
FAQ
What is DataHub?
expand_moreDataHub is an open-source metadata platform for data discovery, lineage, and governance.
How do I install DataHub?
expand_moreYou can deploy using Docker, Kubernetes, or Helm charts. Quickstart scripts are available.
Does DataHub support real-time ingestion?
expand_moreYes, via change-data-capture (CDC) and Kafka events.
Is DataHub free?
expand_moreYes, the open-source edition is free. Managed services are paid.
How does DataHub compare to Amundsen?
expand_moreDataHub offers more extensive lineage and governance features.
User Reviews
Share your experience
Please sign in to leave a star rating and detailed review for this tool.
No reviews yet.
Smart Alternatives & Comparison
Compare GitHub - datahub-project/datahub: The Context Platform for your Data and AI Stack side-by-side with other leading tools in the same category.
| Criteria |
GitHub - datahub-project/datahub: The Context Platform for your Data and AI Stack
This Tool
|
|||
|---|---|---|---|---|
| Overview | Open-source metadata platform for data discovery, lineage, and governance in your AI stack. | Your lifelong professional memory. Never forget a detail, insight, or connection again. | Real-time multiplayer markdown editor where people and AI agents collaborate seamlessly. | Turn any audio into structured notes, summaries, and more – with time-linked citations. Free daily, no signup required. |
| Pricing Model | Freemium | Freemium | Freemium | Freemium |
| Community Rating |
star
0.0
(0)
|
star
0.0
(0)
|
star
0.0
(0)
|
star
0.0
(0)
|
| Developer API | cancel Not Available | cancel Not Available | cancel Not Available | cancel Not Available |
| Open Source | lock Proprietary | lock Proprietary | lock Proprietary | lock Proprietary |
| Action | Visit Web open_in_new |