Challenge: Adopt Consistency at Scale
No matter what industry data professionals are in today, they have to ensure that all people within the organization are working from the same datasets. But, this is easier said than done. The issue is more complex than just keeping a record of the schemas at the data sources and data destinations and keeping them in sync. It is also necessary to understand how these schema changes affect existing pipelines which are expecting to receive a specific data schema and now have a different schema available to them.
Impact: Maintains Consistency with Central Record of Schemas
Schema changes are identified by BDM Control at source, and all pipelines that are derived from this data source are notified of the change, ensuring that all destinations are always working with the most up to date data that is available.
Solution: Schema & Pipeline Consistency
We have developed a solution which delivers on the schema and pipeline consistency issues:
The schema consistency of the data sources is read several times each day (configurable - with the ability to check schemas on-demand) and if changes have occurred these changes are recorded and stored in the central repository
All pipelines are recorded in the central repository and a record is created of all schemas that they are consuming data from
As changes are recorded in these schemas, the owner of the pipeline is informed and given the option to ignore these changes or to upgrade their pipeline
A version control system is also kept in place for pipelines to record which data source each pipeline is running off
Opportunity: Building Up-to-date Pipeline
BDM Schema Evolution guarantees consistency across the data. All users on the data lake will inherit their permissions from the original data sources whereas pipeline owners are now always informed of the changes to their data sources, allowing them to keep their data pipelines up to date. With the enhanced Audit Trail and data lineage traceability, data consistency is guaranteed between data sources and data destinations.