Data governance focuses on improving data quality, protecting access to data, establishing business definitions, maintaining metadata and documenting data policies. Data governance relies on the right people involved at the right time using the right data to make the right decisions.
Our role is to ensure that the highest quality data possible is delivered throughout the university and provides valuable information to serve individual and organizational needs.
The university's institutional information is a valuable asset and must be maintained and protected as such. It is vital to have accurate, trusted data in order to make sound decisions at all levels of an organization. Data governance helps to provide data transparency and results in confidence among university faculty, staff and management to trust and rely on data for information and decision support.
The following principles are set forth as minimum standards to govern the appropriate use and management of institutional data:
Institutional data is the property of the George Washington University and shall be managed as a key asset
Unnecessary duplication of institutional data is discouraged
Quality standards for institutional data shall be defined and monitored
Institutional data shall be protected
Institutional data shall be accessible according to defined needs and roles
Institutional metadata shall be recorded, managed and utilized
Institutional representatives will be held accountable to their roles and responsibilities
Necessary maintenance of institutional data shall be defined
Resolution of issues related to institutional data shall follow consistent processes
Data stewards are responsible for the subset of data in their charge
Data quality is a measure of reliability in data to provide insight and deliver business value. DAMA defines high quality data as data that meets the needs and expectations of its consumers.
Data quality dimensions and business rules are two concepts that help determine data quality. Dimensions provide a vocabulary to quantify data quality while business rules define whether data is meeting needs and expectations of data consumers. Formally, business rules “describe expectations about the quality characteristics of data.”A data quality dimension is “a measurable feature or characteristics of data” and “provide a basis for measurable rules” of data quality.There are many different sets of data quality dimensions and even significant overlap between dimensions but some of the more common dimensions are:
An example of a business rule is: a physical address should be a valid mailing address. This rule involves the accuracy and validity dimensions. In terms of accuracy, a valid mailing address reflects a real physical location that mail can be delivered to. For an address to be valid, it needs to conform to specific format dictated by postal services. These dimensions can be used as the basis for metrics of data quality, e.g. address data need to meet a certain threshold of accuracy and validity to be considered high quality.
Why do Data Quality?
In a general sense, Gartner Inc. has estimated poor data quality costs the average enterprise about $15 million annually.IBM estimated poor data quality costs the US economy a staggering $3.1 trillion a year. Some ways poor data quality might impact the bottom line include:
Techniques & Services at GW
The Data Management team is available to support any staff or faculty group in managing their data quality. Data quality improvement is typically performed in a cycle with four general phases: discovery/analysis, definition, remediation, and monitoring. Discovery is principally done with a technique called data profiling. Definition is the phase where business rules for the data are established, based on both business processes and characteristics of the data found in discovery. As noted, high quality data meets the needs of its every day users and consumers. Therefore business rules are always formulated by people working in the line of business rather than technical staff. After rules are laid out, they are used to validate the data. If the data does not conform to the rules, the data can be cleaned up and issues remediated. Finally, data quality is continually monitored to ensure it complies with the business needs. One method of monitoring data quality is through data quality score cards. The Data Management team can assist with any single phase, combination of phases, or entire data quality life cycle.
Data profiling is an analysis technique, typically using summary statistics, to discover characteristics like structure, content, and quality of a collection of data.Some of the measures and metadata you might expect in a profile include:
Data Validation & Cleansing
In addition to rules defined by the lines of business, pre-built rules and reference information are available. Some of the most common existing rules include parsing, standardizing, and validating email address, physical addresses, and phone numbers. This also entails a USPS certified tool to validate and correct mailing addresses in the United States. User-defined and pre-defined rules form the basis for transformations to change poor quality data to cleaned, high quality data. Sometimes, if the data is mostly static, it is only cleaned once. Most data is not static however, and cleansing is scheduled at regular intervals to ensure data quality remains high.
Data Quality Monitoring & Scorecards
A data quality scorecard is a tool to monitor data quality and measure the business impact of data quality. The scorecard typically defines data quality metrics derived from data quality rules, the acceptable threshold for those metrics, and costs associated with failing to meet the thresholds. The scorecard then gives a picture of the organization’s data quality and the business impact of the data quality. Finally the scorecard is scheduled to run regularly and flag metrics that do not meet their thresholds.
An example of one metric on a scorecard: given the business rule that a customer id must be unique, the metric might be number of duplicate customer ids, and the threshold is 100% of ids are unique or 0% are duplicates. The cost associated with the metric is $35.00 per instance of duplicate id because it takes a person an hour to clean up a duplicate id and that is their hourly salary.
 DAMA-DMBOK: Data Management Body of Knowledge(Basking Ridge, NJ: Technics Publications, 2017), 454.
Saul Judah and Ted Friedman, “How to Create a Business Case for Data Quality Improvement,” Gartner, Inc., last modified April 23, 2018, https://www.gartner.com/document/code/347272?ref=ddisp&refval=347272.
The Four V’s of Big Data, infographic, https://www.ibmbigdatahub.com/infographic/extracting-business-value-4-vs....
Thomas C. Redman, “Bad Data Costs the U.S. $3 Trillion Per Year,” Harvard Business Review, last modified September 22, 2016, https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year.
Steve Lohr, “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights,” New York Times, Aug. 17, 2014. https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hu....
No one person, department, division, school or group "owns" data, even though specific units bear some responsibility for certain data. Several roles and responsibilities govern the management of, access to and accountability for institutional data.
A community is a grouping of users. Communities serve as taxonomies of high-level institutional data subject areas and are key to assigning accountability and responsibility. The following data governance communities have been established:
The Data Governance Committee (DGC) is comprised of data stewards and data custodians from across all functions and departments of the university. The committee meets every other month to review data quality issues, discusses proposed business terms, review policies and discuss other institutional data-related topics. The Data Governance Committee is comprised of the following Data Communities
Office of the Registrar
Office of Military and Veterans Affairs
School of Medicine and Health Sciences (SMHS)
Office of Institutional Research and Planning
Office of Survey Research and Analysis
Compliance & Privacy
Diversity & Inclusion
Division of Operations
Housing & Residential Life
Safety and Security
University Budget Office
The Data Governance Center is the single source of truth of all of our data governance and stewardship activities. It is used to manage all business definitions and key performance indicators (KPIs), support our data stewards in their daily activities and provide traceability between business and technical assets, policies and rules. It is a vital step toward achieving our vision of commonly understood consistent, trusted and high-quality data throughout the institution.
The business glossary is used to define, collaborate and align critical business definitions. The glossary helps to improve our understanding of business terminology so we can communicate more effectively across the institution.
|Data Dictionary||The data dictionary leverages the terms from the business glossary and maps them to the actual tables and attributes in our systems like Banner and the Enterprise Accounting System and our analytical applications like the data warehouse.|
|Portfolio||The portfolio is used to document the details related to a report, cube or visualization.|
|Issue Management||Issue management is a centralized way to report data quality issues that can then be resolved by the data governance team. Report issues, resolve them through triage and review, escalation and assignment to the right resources, and then notify the appropriate parties of the proposed solution.|
|Policies, Rules and Standards||These tools provide the ability to define and manage policies in collaboration with stakeholders and break them down into a rules hierarchy.
Log in to access the Data Governance Center using your GW NetID (the part of your GW email address before the "@") and password.
These policies, articles and tools make valuable resources for understanding data governance at GW.
801 22nd Street, NW B101
Washington, DC 20052