
Bark-Data Quality Service on Cloud for both real-time and batch data - elligao
DQSolution is an open source Data Quality solution for distributed data systems at any scale in both streaming or batch data context. When teams use big data products (e.g. Hadoop, Spark, Kafka, Storm), they always need a data quality service to build their confidence on the quality of the data processed by those platforms. DQSolution creates a unified process to define and construct data quality measurement pipeline across multiple data systems to provide:<p>FEATURES<p>- Accuracy Measurement - Accuracy of a data asset compared to a verifiable source<p>- Data Profiling - Statistical analysis and assessment of data values within a data asset for consistency, uniqueness and logic<p>- Anomaly detection -  Pre-built algorithm functions for the identification of events which do not conform to an expected pattern in a data asset<p>- Visualization - Dashboards that can report the state of data quality<p>KEY BENEFITS<p>- Real Time - The data quality checks can be executed in real-time to detect issues faster<p>- Extensible - The solution can work with multiple data systems<p>- Scalable - The solution is designed to work on large volumes of data. It currently runs on ~1.2 PB of data<p>- Self-Service - The solution provides a simple user interface to define new data assets and rules. It also allows users to visualize the data quality dashboards and personalize their view of the dashboards.<p>Github:https:&#x2F;&#x2F;ebay.github.io&#x2F;DQSolution&#x2F;      
     Please fork!Thanks!<p>Contact us: lzhixing@ebay.com
======
elligao
good!

