Understanding different approaches to metadata management
Metadata management, the ability to provide context and meaning to data, is a foundation of many other data management disciplines.
The broad range of applications for metadata mean that approaches to metadata management vary widely, with no one tool set or platform addressing every need – particularly when addressing the complex data landscapes of big, modern enterprises.
A data catalogue is quickly becoming the cornerstone of metadata management for data-driven businesses. The catalogue is intended to ensure that any knowledge workers can quickly and easily find the data that they need to do their job, provide them with relevant context to allow them to make decisions, and get insight into how data is used within the organisation.
Key capabilities that differentiate data catalogues include:
- Business driven.
- Tool is accessible to all knowledge workers to allow data context to be referenced, crowd source content and support collaboration across and within data siloes.
- Interface supports both technical and non-technical users.
- Provide real-time insights into how data is being used to support business processes, compliance events, reports, and metrics.
- Integrated data quality.
- Value-driven data scoring.
- Data assets can be linked to business goals, objectives and outcomes to measure the ROI of metadata management activities.
- Governance driven.
- Clearly understand and communicate accountability for data and decisions around data and automate key processes to reduce the data stewardship workload and time wasted in meetings.
- Extensible data model to allow you to quickly and easily add any metadata type to your metadata repository.
- Business metadata may include data policies, report definitions, KPI’s and metrics, business terms, curated data sets, business processes and similar assets that provide context to data.
- Technical metadata documents physical systems, cloud data sources, data dictionaries, reference data, etc. Wherever possible this metadata should be harvested automatically.
- Relationships allow you to model your data landscape and understand how assets support each other.
- Open integration layer allowing metadata.
Unified Data Lineage
Data catalogues, like Data360 Govern, offer some form of automated metadata harvesting. For example, we can ingest databases structures by connecting directly to the underlying database and reading the tables.
However, for most catalogues, technical metadata and lineage is ingested via connectors to underlying ETL tools, data modelling tools and the like. In many cases these connectors are limited, for example a metadata vendor that provides and ETL tools may provide connectors for their ETL tool but have very limited connectors for third-party tools.
In practice, most organisations depend on multiple Extract Transform Load (ETL) tools, processes and code to move data around. For example, we may move data from operational systems to the enterprise data warehouse using an enterprise ELT tool. Once data is in the EDW we may use stored procedures to manipulate the data further e.g. to aggregate raw data or to move data into the Enterprise Data Warehouse (EDW) schemas. Data may be further manipulated in the reporting layer. Tracking and maintaining changes to these lineages can be very difficult but is increasingly becoming an organisational necessity to ensure trust in reporting.
Our partner, MANTA provide a specialist unified lineage platform. They make it quick and easy to connect to and ingest metadata from most commonly used data sources, reporting tools, modelling tools, ETL tools and even read code, such as JAVA, SQL and COBOL, to trace movements of and changes to data.
While MANTA provides a lineage view of data it also exposes data to third party data catalogues – such as Infogix, Collibra, IBM, Informatica and more. MANTA enhances the data catalogue by harvesting data flows and keeping these synchronised to their business context.
Understanding your ERP or CRM
Another niche application we have discovered is providing business context and meaning to the metadata layers of common, enterprise ERP and CRM packages such as SAP, Salesforce, and the Oracle and Microsoft stacks.
These platforms have large complex table structures that may not be meaningful when accessed at the database level. Safyr from Silwood Technology provides self-service metadata discovery for your ERP or CRM systems. Safyr makes it easy to, for example, isolate the tables and columns used for customer master in your ERP or CRM, provide business context and link these to the underlying tables, and present this metadata to your data catalogue.
For example, finding personal data in SAP is made much simpler using Safyr. Another use case would be to understand the impact of migrating from SAP ECC to Sap S4/HANA or migrating from Peoplesoft to Dynamics.
No one size fits all
Based on these descriptions you can make decisions based on your organisation’s size, complexity and priorities. What is clear is that for large organisations a multifaceted approach to metadata management will reduce manual effort and give a more accurate result.