How to Do Data Modeling the Right Way
Data modeling supports collaboration among business stakeholders – with different job roles and skills – to coordinate with business objectives.
Data resides everywhere in a business, on-premise and in private or public clouds. And it exists across these hybrid architectures in different formats: big and unstructured and traditional structured business data may physically sit in different places.
What’s desperately needed is a way to understand the relationships and interconnections between so many entities in data sets in detail.
Visualizing data from anywhere defined by its context and definition in a central model repository, as well as the rules for governing the use of those data elements, unifies enterprise data management. A single source of data truth helps companies begin to leverage data as a strategic asset.
What, then, should users look for in a data modeling product to support their governance/intelligence requirements in the data-driven enterprise?
Nine Steps to Data Modeling
- Provide metadata and schema visualization regardless of where data is stored
Data modeling solutions need to account for metadata and schema visualization to mitigate complexity and increase collaboration and literacy across a broad range of data stakeholders. They should automatically generate data models, providing a simple, graphical display to visualize a wide range of enterprise data sources based on a common repository of standard data assets through a single interface.
- Have a process and mechanism to capture, document and integrate business and semantic metadata for data sources
As the best way to view metadata to support data governance and intelligence, data models can depict the metadata content for a data catalog. A data modeling solution should make it possible for business and semantic metadata to be created to augment physical data for ingestion into a data catalog, which provides a mechanism for IT and business users to make use of the metadata or data structures underpinning source systems.
High-functioning data catalogs will provide a technical view of information flow as well as deeper insights into semantic lineage – that is, how the data asset metadata maps to corresponding business usage tables.
Data stewards can associate business glossary terms, data element definitions, data models and other semantic details with different mappings, drawing upon visualizations that demonstrate where business terms are in use, how they are mapped to different data elements in different systems and the relationships among these different usage points.
- Create database designs from visual models
Time is saved and errors are reduced when visual data models are available for use in translating the high-quality data sources that populate them into new relational and non-relational database design, standardization, deployment and maintenance.
- Reverse engineer databases into data models
Ideally a solution will let users create a logical and physical data model by adroitly extracting information from an existing data source – ERP, CRM or other enterprise application — and choosing the objects to use in the model.
This can be employed to translate the technical formats of the major database platforms into detailed physical entity-relationship models rich in business and semantic metadata that visualizes and diagrams the complex database objects.
Database code reverse-engineering, integrated development environment connections and model exchange will ensure efficiency, effectiveness and consistency in the design, standardization, documentation and deployment of data structures for comprehensive enterprise database management. Also helpful is if the offline reverse-engineering process is automated so that modelers can focus on other high-value tasks.
- Harness model reusability and design standards
When data modelers can take advantage of intuitive graphical interfaces, they’ll have an easier time viewing data from anywhere in context or meaning and relationships support of artifact reuse for large-scale data integration, master data management, big data and business intelligence/analytics initiatives.
It’s typically the case that modelers will want to create models containing reusable objects such as modeling templates, entities, tables, domains, automation macros. naming and database standards, formatting options, and so on.
The ability to modify the way data types are mapped for specific DBMS data types and to create reusable design standards across the business should be fostered through customizable functionality. Reuse serves to help lower the costs of development and maintenance and ensure data quality for governance requirements.
Additionally, templates should be available to help enable standardization and reuse while accelerating the development and maintenance of models. Standardization and reuse of models across data management environments will be possible when there is support for model exchange.
Consistency and reuse are more efficient when model development and assets are centralized. That makes it easier to publish models across various stakeholders and incorporate comments and changes from them as necessary.
- Enable user configuration and point-and-click report interfaces
A key part of data modeling is to create text-based reports for diagrams and metadata via a number of formats – HTML, PDF and CSV. By taking the approach of using point-and-click interfaces, a solution can make it easier to create detailed metadata reports of models and drill down into granular graphical views of reports that are inclusive of object types-tables, UDPS and more.
The process is made even more simple when users can take advantage of out-of-the-box reports that are pertinent to their needs as well as create them for individual models or across multiple models.
When generic ODBC interfaces are included, options grow for querying metadata, regardless of where it is sourced, from a variety of tools and interfaces.
- Support an all-inclusive environment of collaboration
When solutions focus on model management in a centralized repository, modular and bidirectional collaboration services are empowered across all data generators – human or machine—and stewards and consumers across the enterprise.
Data siloes, of course, are the enemies of data governance. They make it difficult to have a clear understanding of where information resides and how data is commonly defined.
It’s far better to centralize and manage access to ordered assets – whether by particular internal staff roles or to business partners granted role-based and read-only access – to maintain security.
Such an approach supports coordinated version control, model change management and conflict resolution and seeds cross-model impact analysis across stakeholders. Modeler productivity and independence can be enhanced, too.
- Promote data literacy
Stakeholder collaboration, in fact, depends on and is optimized by data literacy, the key to creating an organization that is fluent in the language of data. Everyone in the enterprise – from data scientists to ETL developers to compliance officers to C-level executives – ought to be assured of having a dynamic view of high-quality data pipelines operating on common and standardized terms.
So, it is critical that solutions focus on making the pipeline data available and discoverable in such a way that it reflects different user roles. When consumers can view data relevant to their roles and understand its definition within the business context in which they operate, their ability to produce accurate, actionable insights and collaborate across the enterprise to enact them for the desired outcomes is enhanced.
Data literacy built on business glossaries that enable the collaborative definition of enterprise data in business terms and rules for built-in accountability and workflow promote adherence to governance requirements.
- Embed data governance constructs within data models
Data governance should be integrated throughout the data modeling process. It manifests in a solution’s ability to adroitly discover and document any data from anywhere for consistency, clarity and artifact reuse across large-scale data integration, master data management, metadata management and big data requirements.
Data catalogs and business glossaries with properly defined data definitions in a controlled central repository are the result of ingesting metadata from data models for business intelligence and analytics initiatives.
You Don’t Know What You’ve Got
Bottom line, without centralized data models and a metadata hub, there is no efficient means to comply with industry regulations and business standards regarding security and privacy; set permissions for access controls; and consolidate information in easy-to-understand reports for business analysts.
The value of participating in data modeling to classify the data that is most important to the business in terms that are meaningful to the business and having a breakdown of complex data organization scenarios supports critical business reporting, intelligence and analytics tasks. That’s a clear need, as organizations today analyze and use less than 0.5 percent of the information they take in – a huge loss of potential value in the age of data-driven business.
Without illustrative data models businesses may not even realize that they already have the data needed for a new report, and time is lost and costs increase as data is gathered and interfaces are rebuilt.