Inform pipeline owners of schema changes as they occur in source data sets
Challenge: Adopt Consistency at Scale
No matter what industry data professionals are in today, they have to ensure that all people within the organization are working from the same datasets. But, this is easier said than done. The issue is more complex than just keeping a record of the schemas at the data sources and data destinations and keeping them in sync. It is also necessary to understand how these schema changes affect existing pipelines which are expecting to receive a specific data schema and now have a different schema available to them. The easiest way to guarantee consistency is to automate the process, and guarantee that schema changes are informed to pipeline owners as soon as they happen.
Impact: Maintains Consistency with Central Record of Schemas
Schema changes are identified by BDM Control at source, and all pipelines that are derived from this data source are notified of the change, ensuring that all destinations are always working with the most up to date data that is available.
Solution: Schema & Pipeline Consistency
We have developed a solution which delivers on the schema and pipeline consistency issues:
The schema consistency of the data sources is read several times each day (configurable - with the ability to check schemas on-demand) and if changes have occurred these changes are recorded and stored in the central repository
All pipelines are recorded in the central repository and a record is created of all schemas that they are consuming data from
As changes are recorded in these schemas, the owner of the pipeline is informed and given the option to ignore these changes or to upgrade their pipeline
A version control system is also kept in place for pipelines to record which data source each pipeline is running off
Opportunity: Building Up-to-date Pipeline
BDM Schema Evolution guarantees consistency across the data. All users on the data lake will inherit their permissions from the original data sources whereas pipeline owners are now always informed of the changes to their data sources, allowing them to keep their data pipelines up to date. With the enhanced Audit Trail, data consistency is guaranteed between data sources and data destinations.