Search Results

49 items found for ""

Bluemetrix Data Manager: Harness the Power of Big Data
BDM together with Control-M from BMC, ensures the continuous integration of the data pipeline into a secure production environment, continually delivering the data from the pipeline to the end user. With Bluemetrix Data Manager, you can fully automate the process of big data ingestion, and provide a solid foundation for your Big Data projects, whether they are on-prem or on the cloud.
BDM Transformations Datasheet
The BDM Transformations module unlock the value of your big data in a quick and efficient manner. It allows your organisations leverage the full potential of data for relevant, accurate and meaningful insights, reusing transformations across the enterprise. From a marketing analyst to a data engineer, any data users can create and deploy complex data pipelines in minutes. Click here to learn more
Bluemetrix Data Masking and Tokenization Module
As the world of data continues to progress, organizations must find ways to safeguard sensitive information while adhering to privacy regulations. With modern data and analytics structures comes a need for practical and dependable data masking solutions. Bluemetrix provides a comprehensive platform for efficient data deidentification, striking a balance between preserving analytical capabilities and protecting privacy. Boost your enterprise's data masking and security requirements with Bluemetrix, leading to increased trust, streamlined processes, and improved analytical models. Start your venture into data security confidently, powered by Bluemetrix's Data Masking and Tokenization Module. Download our free guide to get started.
BDM Schema Evolution Datasheet
One of the key business challenges facing data professionals across all industries today is to ensure that all people within the organization are working from the same datasets. The issue is more complex than just keeping a record of the schemas at the data sources and data destinations and keeping them in sync. This datasheet explores how BDM schema evolution propagate all schema changes that occur at source, as they happen to the corresponding data store in your Data Lake. Click here to learn more
BDM Data Governance and Compliance Datasheet
Governance and Compliance are difficult to enforce in any environment, and even more difficult to enforce in a Data Lake which has many moving parts where users can interact with the data. Traditional systems and manual enforcement required all users to record the governance of what they did on the lake, ensuring they were recording and updating their most recent activity. With BDM Data Governance and Compliance, we provide a world-class solution for Data Lakes. Click here to learn more
BDM Ingestion Datasheet
BDM Ingestion automates the ingest of data – at rest, streaming or both, in a secure and fast manner using Spark. With our automated solution, organizations can eliminate coding and architecture errors as data to be moved from source to destination. Click here to learn more
Bluemetrix GDPR Readiness Program
Bluemetrix is an expert in the security and governance for Hadoop. We have been working on Hadoop projects for Blue Chip clients for almost ten years, and have developed the most comprehensive programmes and processes. This solution brief explores engagement model, using Bluemetrix GDPR Readiness Program, which you can work with you Big Data infrastructure with peace of mind. Click here to learn more
10 Point Hadoop Security Audit
A weakness in your Hadoop Cluster leads to a weakness in your overall IT security. Access control, privacy restrictions, regulatory requirements such as GDPR mean that without robust security controls and protocols, 'Big Data can become a Big Problem'. With Bluemetrix Hadoop Security Audit, we provide you with an understanding of how secure the cluster is along with a roadmap to fix any potential issues that are discovered. Click here to learn more
Bluemetrix Managed Services
We understand how difficult it can be to keep a cluster operating at maximum efficiency and in a secure manner. For this reason, we have launched our Managed Services, which will focus on keeping your cluster performing fully, while you focus on developing applications and IP to grow your business. Click here to learn more
Bluemetrix Professional Services
Bluemetrix have been providing Professional Services worldwide around the Hadoop stack to over 100 clients across a broad range of diverse industries which include the following: Financial, Insurance, Telecommunications, Retail, Automotive, Aviation, Manufacturing, etc. since 2000. Click here to learn more
Create, deploy & develop a Hadoop proof of concept in less than 1 month
What is the problem we are trying to solve? Hadoop is topical at the moment as it is the platform of choice for Big Data projects. Most companies are beginning to use Machine Learning and AI projects on their data to gain better business insights for their own companies’ internal needs and their clients. The vast majority of these projects are being carried out on a Hadoop platform as this is the platform best capable of handling the volumes of data required for analysis, and the platform most analytic tools have been developed for. The problem here is that Hadoop is difficult. It is not one language or one operating system, but rather an ecosystem of disparate systems that work together on a distributed processing environment. So apart from the complexity of having to master several languages (Python, Java, Scripting, SQL, etc.) to program over a dozen different modules (Hive, Spark, Sqoop, etc.), you also have to understand how the data is stored and processed on a distributed processing environment. It is not easy to make this work out of the box. Download the Ebook Create, deploy & develop a Hadoop proof of concept in less than 1 month. Exposition of the problem The first step in any Hadoop project is a Proof of Concept. This will determine if there is value to be derived from the data and if the project is worth pursuing onto production stage. POC’s are very straightforward for Hadoop Specialists to design and plan and there are a number of distinct stages to them: Data Identification: Identify what data is required for use in the POC and ensure that it is available for the duration of the project. Use Case: Establish the business use case to be implemented (e.g. using Machine Learning to identify customers for upselling opportunities, etc.) and what is required in terms of development and technology to prove this use case. Data Platform: Decide on a platform of choice: on-premise or cloud and the size of your Hadoop cluster 3/4/5 etc. nodes. Hadoop Distribution: Decide on which distribution you will use for the POC – Apache, Cloudera, Hortonworks or MapR. Data Ingestion: The nature of the data will determine the ingestion method. Static or Streaming, structured or unstructured – the options could be Sqoop, Kafka, Ni-Fi, StreamSets, etc. The nature of the data will determine the ingestion method. Static or Streaming, structured or unstructured– the options could be Sqoop, Kafka, Ni- Fi , StreamSets, etc. Data Storage: The nature of the data and the type of processing you expect to carry out on it will determine the storage platform that is used, options include HBase, HIVE, MongoDB, Impala, etc. Data Security: Do you develop for a Kerberos environment or not? It is certainly easier not to do so, but the work carried out on the POC will be of little use if you need to deploy Kerberos in production. Data Transformation: Before you apply the use case solution to the data, you will typically need to combine and re-format the data to suit your processing requirements. This can be done using SQL, Spark or other options. Data Governance: Finally, you may or may not decide to implement data governance on your POC, depending on the nature of your data and use case. Designing a POC plan is the easy part. It is the implementation of the POC where things can start to go wrong. In our experience most people do not have access to the full skill set of technologies required to successfully set up a POC (we have been involved in the implementation of over 100 Hadoop projects at this stage). If the full skill set is not in place this results in the project taking longer than expected and not delivering the required results. The most common problems that we have encountered are: Data Ingest: Writing Sqoop or Kafka code to move data from an EDW or a file into Hadoop is relatively straightforward – the problems occur when people don’t understand the changes to data types and special characters that need to be made to the code and data to ensure that it can run successfully in Hadoop. Cluster Security: Kerberos is tricky to implement, and developing applications for Kerberos environments is more difficult than non- Kerberos environments. A lot of projects avoid Kerberos at the start in order to get up and running quickly, but this can be a false saving which appears later in the project. Data Transformation: SQL for Hive is difficult, especially if the queries are complex, and even for an experienced DBA it takes time to get up to speed with it. Infrastructure: As simple as possible is best, and cloud solutions that can be deployed quickly offer major time savings over on-premise hardware. We have seen simple projects with non-complex data sources and data transformations taking weeks and months to get up and running correctly, leading to major delays in projects. The biggest problem we have seen is the original objective of the POC gets subsumed in the building of a Hadoop environment to prove the POC. The purpose of the POC is usually to determine if there is a business case to support the data use case that is being investigated. It is not to develop a Hadoop Cluster. This should be for the next phase of the project when the use case has been proven and accepted. How to Solve the problem It is possible to deliver a Hadoop POC within 1 month. This can be carried out by following the steps below. Hadoop Distribution: Select a distribution from one of the enterprise providers – Cloudera, Hortonworks or MapR. Infrastructure: Deploy on one of the major cloud infrastructure providers – Azure or AWS – and use a virtualised environment for the POC. The BM Cloudburst product will deploy a fully kerberised cluster on Azure in less than 1 hour, allowing you a platform to develop on. Use Case: Focus all of your energies on developing the application to substantiate the use case. Focus all of your energies on developing the application to substantiate the use case. Data Ingest: Use BM Data Ingest for ingestion of data onto your cluster. It has multiple connectors for different data sources and converts all of the data to work in Hadoop. This automatically generates the ingest code and has a drag and drop interface that can be easily understood and used by non-Hadoop experts. It is available to purchase on a monthly-use basis and data can be ingested in less than 1 day. Data Transformation: Use BM Data Transformer to combine and manipulate the data so it is available on Hive for your use case. All transformations are carried out in Spark using an extensive library, with a simple easy to use drag and drop interface requiring no Hadoop knowledge. All of the underlying code is developed automatically. Most data transformations can be created and deployed in minutes. Following the above 5 steps will get a cluster deployed and operational with data ingested and manipulated within a matter of days, allowing you to spend the rest of the month working on your use case application. Apart from being the fastest solution on the market for a Hadoop POC deployment, it has extraordinary cost savings. It uses low-cost tools to automate the process and removes the need for any skilled Hadoop knowledge. Using this methodology, any Data Science team can prove the business for a Hadoop Big Data project without ever having to be Hadoop experts. Download the book on how to Create, deploy & develop a Hadoop proof of concept in less than 1 month and for under €15,000. #hadoop #proofofconcept #datapipeline #architecture
Creating A New Mandate For The Chief Data Officer
How can the role of the Chief Data Officer (CDO) evolve to create business growth? Start with the framework of ‘Offense’ and ‘Defense’ to generate revenue and transform the business. Background The CDO bears responsibility for the firm’s data and information strategy, governance, control, policy development, and exploitation of data assets to create business value. Source – Gartner Chief Data Officers are a recent addition to the C-level suite and their rise is directly related to the new emphasis and focus companies are placing on their data assets. The role of the CDO is relatively new and evolving rapidly. Companies have come to realise that to ensure their data assets are protected and to maximise the return from assets they require a senior executive to be devoted exclusively to managing and protecting these assets. Even though it is a recent arrival the role of the CDO already appears to be in transition, with a shift in focus from risk and regulatory issues to activities that support business growth. The impetus behind the role of the CDO is continuing: a recent survey by Gartner (Third Gartner CDO Survey – How Chief Data Officers are Driving Business Impact, December 2017) found that the adoption of this role is rising globally. The number of organisations implementing an office of CDO also rose year on year, with 47% reporting that an office of the CDO was implemented in 2017, compared with 23% in 2016. Gartner predicts that by 2019, 90% of large organisations will have a Chief Data Officer. We are also seeing companies replacing other roles in favour of a new CDO role. In January 2018 Easyjet replaced the CMO role with a CDO role, which they said, “will give greater focus and weight to the airlines use of data to improve our customer proposition, drive revenue, reduce cost and improve operational reliability”. Download Ebook: Creating A Mandate for the Chief Data Officer The Role of the Chief Data Officer Chief Data Officers have an important job where data is the currency of opportunity. A survey commissioned by PWC to understand the factors driving the growth of the CDO role and how their mandate is evolving was very clear: the scope of many CDOs has expanded from setting policy and rolling out of foundational data management capabilities to owning platforms and actual oversight and execution of data programs. Despite this expansion in the CDO’s remit to new areas respondents are still in the process of implementing foundational data management capabilities. However, PWC found there is a strong need to obtain buy in and understanding of the CDO role across enterprise and align the role with and support business strategy. CDO’s are the custodians of an organisation’s information assets, they must use this information as a catalyst for change, to automate business processes, understand and develop better relationships with stakeholders, and ultimately capture strategic value from data and deliver high-impact business outcomes. The role is an enterprise wide role with responsibility for developing a vision and strategy around the protection and use of a company’s data assets. They are responsible for executing this vision and strategy. This means that amongst other things they are responsible for the following: Data Protection, Privacy and Security Data Governance Information Management Data Quality Management Data Lifecycle Management Definition and enforcement of standards The office of the CDO is a multi-disciplinary office with professionals and expertise drawn from sectors as diverse as Compliance and IT. It is not unusual to see professionals with the following expertise in the office: Privacy & Policy Experts, Data Stewards, Data Analysts, Data Scientists, Information Architects, etc. Creating a new mandate for the Chief Data Officer The remit of the CDO is very broad but is changing its focus rapidly. Valerie Logan, Research Director at Gartner, says: ‘while the early crop of CDOs was focused on data governance, data quality and regulatory drivers, today’s CDOs are now also delivering tangible business value and enabling a data-driven culture’. Indeed, the latest piece of research from PWC states clearly that the scope of many CDOs has expanded from setting policy and rolling out of foundational data management capabilities to owning platforms and actual oversight and execution of data programs. What we are now seeing is an increased maturity in the CDO role, with a change in focus from a Defensive position where the focus was on compliance, security and regulations, to an Offensive position with an increased focus on working the data assets and generating new revenue and opportunities for the business. It is this distinction between ‘Offensive’ and ‘Defensive’ that is set to become the defining characteristic of the role of the Chief Data Officer. The use of the ‘Offensive’ and ‘Defensive’ approach means that CDOs can create a framework for understanding, obtain buy-in for their role and quickly align outcomes with business strategy. Rethinking how CDO’s carry out their role: Defensive Vs Offensive In the very early stages of creating their data strategy, a Defensive Strategy meant that the CDO was concerned primarily with preventing the risk of damage occurring in the business because of a loss or inappropriate use of Data within the business. However, this definition is too narrow: today the CDO’s Defensive Strategy means focusing on the following: Creation and deployment of usage policy, security and protection policies and regulatory compliance around data assets Ensuring all data held within the company is of the highest quality Carrying out data validation as data is moved and transformed within the company to ensure quality does not degrade as this data is used within the company Controlling data access within the company Implementing data security on a granular level, allowing data to be exploited and used but within regulatory guidelines Practicing data lifecycle management ensuring that data is fresh and relevant, that data is not stored longer than is necessary. Adopting Offensive Strategies around data is a more recent trend. As a result, we are now seeing companies focus more on the following: Using data assets to enhance existing products and services Using data analytics to increase speed to market of new products Using information to boost product development Combining third party data with internal assets to create new assets and revenue, generating opportunities for the business Building a strong data foundation and a data-driven culture within the business Making changes in how C-level executives use data to drive culture change throughout the company How can Bluemetrix help with ‘Defense’ and ‘Offense’? Using Bluemetrix Data Manager with Control-M we can help in the following three areas: Creation and Deployment of a Hadoop Data Lake: Most Hadoop Data Lake projects take 12 months + to implement. They often run over budget and take a lot longer than anticipated to come into operation. Using our automation tools, you can have a Hadoop data lake operational and in production within 3~4 months of project kick-off. By this we mean we can do the following: Architect, design and deploy a secure data lake with Hadoop Move structured and unstructured data onto the lake and make it available for processing and analytics Embed Governance and GDPR compliance into all the processing so that it happens automatically in the background Make the data and the processing capabilities of the data lake available to business users and owners within the company Implementation of a Defensive Strategy: Use automation to deploy a Defensive Strategy and ensure day to day operations are operational, functioning and compliant within 3~4 months. This will include implementing the following: Automate the Ingestion of Data Apply governance by default to all data ingested onto the lake Enable data to be ingested using a simple easy to use drag and drop GUI, removing the need for any Hadoop knowledge Automate the Validation of Data Validate the data for completeness, consistency and integrity Validation algorithms are easily customisable and work with different data sets Embed the validation into the process which moves and transforms the data Enforce and Measure Data Quality Record all data movement as it occurs on the platform Allow variable tolerance levels depending on the checks and data Record and store all metrics for analysis and reporting Data Life Cycle Management Apply retention and expiration date to all data stored Automatically delete the data on a daily or weekly basis, or any time frame that is required Record History of Data Storage and Processing Record all archiving of data Record all storage of data across the processing cycle – intermediate and final stages Record processing jobs, data processed, time processed, etc. Embed Meta Data into Operations Create and deploy meta data as data is moved and transformed Automate the deployment and recording of Data Governance into operations Build Data Governance into the movement and transformation of all data Ensure all data has an entity available and there is lineage applied to all processing of the entity Customise the data stored to ensure it is fully GDPR compliant Deploy a Data Management System Deploy a dashboard to show a complete view of the data on the lake Provide drill down access to all metrics Highlight quality and governance issues with data as they occur Apply Data Masking & Security Enable data masking at a table, row or column level Apply different types of masking depending on the underlying data i.e. apply random values, replace values with xxxx, apply rotation methods, etc. Enable User Level Data Access This is all managed by the creation of correct access policies on Hadoop It can be tied in to meta data and masking policies It allows control of individual level access to the data Implementation of an Offensive Approach: Implementing all the preceding defensive strategies results in the following: One view of the data being available The integration of all data into one data lake in a manner which controls quality, life cycle, etc. of the data ensures that only one copy of the data exists, and every stake holder is reading and processing the same data. The application of accuracy checks, validation checks, etc. as users access and process the data guarantees that the data is always kept accurate and up to date. Easy Access to the data Stakeholders can now access the data in a controlled manner using the BDM/ Control-M GUI. This ensures that all relevant stakeholders can access the data they require without having to understand how the data lake works or having to involve any Developers or Administrators to help them use the system. Combining the control procedures of Hadoop and Control-M allows Bluemetrix Data Manager to provide this access. The automation of the Defensive Strategies frees up the CDO to focus on Offensive Strategies such as the following: Develop a data driven culture within the business Create new product opportunities Develop new features on existing products Create new revenue streams by combining third party data with internal data Upsell to existing customers. Conclusion To a great extent, the CDO role is about change management. CDOs first need to define their role and manage expectations by considering available resources. CDOs will gain authority when they successfully verify that their organisation can own and control its data and that it can create new, better and different outcomes. CDOs will gain authority when they successfully verify that their organisation can own and control its data. One of most useful frames of reference for a CDO is to think in terms of ‘Defensive’ strategies and ‘Offensive’ strategies. Using this context, the role of a Chief Data Officer becomes a lot easier to define within the organisation, enabling them to obtain the budget and resources they need to be successful within. Bluemetrix can help a Chief Data Officer on this path: we recommend that organisations serious about improving the realised value of their information assets start by considering the use of Bluemetrix Data Manager with Control-M within their overall data framework. #chiefdataofficer #datagovernance #datacontrol