Data Governance

Data governance focuses on improving data quality, protecting access to data, establishing business definitions, maintaining metadata and documenting data policies. Data governance relies on the right people involved at the right time using the right data to make the right decisions. 

Our role is to ensure that the highest quality data possible is delivered throughout the university and provides valuable information to serve individual and organizational needs.

The university's institutional information is a valuable asset and must be maintained and protected as such. It is vital to have accurate, trusted data in order to make sound decisions at all levels of an organization. Data governance helps to provide data transparency and results in confidence among university faculty, staff and management to trust and rely on data for information and decision support.

The following principles are set forth as minimum standards to govern the appropriate use and management of institutional data:

  • Institutional data is the property of the George Washington University and shall be managed as a key asset

  • Unnecessary duplication of institutional data is discouraged

  • Quality standards for institutional data shall be defined and monitored

  • Institutional data shall be protected

  • Institutional data shall be accessible according to defined needs and roles

  • Institutional metadata shall be recorded, managed and utilized

  • Institutional representatives will be held accountable to their roles and responsibilities

  • Necessary maintenance of institutional data shall be defined

  • Resolution of issues related to institutional data shall follow consistent processes

  • Data stewards are responsible for the subset of data in their charge

Data quality is a measure of reliability in data to provide insight and deliver business value. DAMA defines high quality data as data that meets the needs and expectations of its consumers.[1]

 

Data quality dimensions and business rules are two concepts that help determine data quality. Dimensions provide a vocabulary to quantify data quality while business rules define whether data is meeting needs and expectations of data consumers. Formally, business rules “describe expectations about the quality characteristics of data.”[2]A data quality dimension is “a measurable feature or characteristics of data” and “provide a basis for measurable rules” of data quality.[3]There are many different sets of data quality dimensions and even significant overlap between dimensions but some of the more common dimensions are:

 

  • Accuracy– degree to which data correctly represents real world values or entities
  • Completeness– refers to whether required data is present
  • Consistency / Integrity – refers to whether records and their attributes are consistent across systems and time
  • Reasonability– the notion data meets the assumptions and expectations of its domain
  • Timeliness– whether data is up-to-date and/or available when it is needed
  • Uniqueness– degree that data is allowed to have duplicate values
  • Validity / Conformity– whether data conforms to the defined domain of values in type, format, and precision

 

An example of a business rule is: a physical address should be a valid mailing address. This rule involves the accuracy and validity dimensions. In terms of accuracy, a valid mailing address reflects a real physical location that mail can be delivered to. For an address to be valid, it needs to conform to specific format dictated by postal services. These dimensions can be used as the basis for metrics of data quality, e.g. address data need to meet a certain threshold of accuracy and validity to be considered high quality.

 

Why do Data Quality?

 

In a general sense, Gartner Inc. has estimated poor data quality costs the average enterprise about $15 million annually.[4]IBM estimated poor data quality costs the US economy a staggering $3.1 trillion a year.[5]  Some ways poor data quality might impact the bottom line include:

 

  • Poor quality data hampers analytical activity; business decisions from bad data are less reliable – or not reliable at all
  • Wastage of time and material resources
  • Some estimate regular information workers spend as much as 50% of their time improving data quality.[6]For data scientists, perhaps between 50 – 80% of their time could be spent cleaning data.[7]
  • Opportunity costs
  • E.g. Fewer leads generated, fewer conversions, grants not won, donations or gifts not received

 

Techniques & Services at GW

 

The Data Management team is available to support any staff or faculty group in managing their data quality. Data quality improvement is typically performed in a cycle with four general phases: discovery/analysis, definition, remediation, and monitoring. Discovery is principally done with a technique called data profiling. Definition is the phase where business rules for the data are established, based on both business processes and characteristics of the data found in discovery. As noted, high quality data meets the needs of its every day users and consumers. Therefore business rules are always formulated by people working in the line of business rather than technical staff. After rules are laid out, they are used to validate the data. If the data does not conform to the rules, the data can be cleaned up and issues remediated. Finally, data quality is continually monitored to ensure it complies with the business needs. One method of monitoring data quality is through data quality score cards. The Data Management team can assist with any single phase, combination of phases, or entire data quality life cycle.

 

Data Profiling

 

Data profiling is an analysis technique, typically using summary statistics, to discover characteristics like structure, content, and quality of a collection of data.[8]Some of the measures and metadata you might expect in a profile include:

  • Longest (or maximum) and shortest (or minimum) values
  • Frequency distributions of values
  • Distribution of null, duplicate, and unique values
  • Patterns of data
  • Inferred data type

Data Validation & Cleansing

 

In addition to rules defined by the lines of business, pre-built rules and reference information are available. Some of the most common existing rules include parsing, standardizing, and validating email address, physical addresses, and phone numbers. This also entails a USPS certified tool to validate and correct mailing addresses in the United States. User-defined and pre-defined rules form the basis for transformations to change poor quality data to cleaned, high quality data. Sometimes, if the data is mostly static, it is only cleaned once. Most data is not static however, and cleansing is scheduled at regular intervals to ensure data quality remains high.

 

Data Quality Monitoring & Scorecards

 

A data quality scorecard is a tool to monitor data quality and measure the business impact of data quality. The scorecard typically defines data quality metrics derived from data quality rules, the acceptable threshold for those metrics, and costs associated with failing to meet the thresholds. The scorecard then gives a picture of the organization’s data quality and the business impact of the data quality. Finally the scorecard is scheduled to run regularly and flag metrics that do not meet their thresholds.

 

An example of one metric on a scorecard: given the business rule that a customer id must be unique, the metric might be number of duplicate customer ids, and the threshold is 100% of ids are unique or 0% are duplicates. The cost associated with the metric is $35.00 per instance of duplicate id because it takes a person an hour to clean up a duplicate id and that is their hourly salary.

 

 

[1]  DAMA-DMBOK: Data Management Body of Knowledge(Basking Ridge, NJ: Technics Publications, 2017), 454.

[2]DAMA-DMBOK, 475.

[3]DAMA-DMBOK, 454.

[4]Saul Judah and Ted Friedman, “How to Create a Business Case for Data Quality Improvement,” Gartner, Inc., last modified April 23, 2018, https://www.gartner.com/document/code/347272?ref=ddisp&refval=347272.

[6]Thomas C. Redman, “Bad Data Costs the U.S. $3 Trillion Per Year,” Harvard Business Review, last modified September 22, 2016, https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year. 

[7]Steve Lohr, “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights,” New York Times, Aug. 17, 2014. https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hu....

[8]DAMA-DMBOK,470.

No one person, department, division, school or group "owns" data, even though specific units bear some responsibility for certain data. Several roles and responsibilities govern the management of, access to and accountability for institutional data.

  • Data governance committee: This committee is comprised of functional data stewards from across all functions and departments of the university.
  • Data stewards: Data stewards are university business officials (outside GW IT) who have direct operational-level responsibility for the management of one or more types of institutional data and have the authority to make decisions.
  • Data trustees: Data trustees are defined as institutional officers (e.g., vice presidents, vice provosts, deans and chancellors) who have authority over policies and procedures regarding business definitions of data and the access and usage of that data within their delegations of authority. Each data trustee appoints data stewards for specific subject area domains.
  • Data custodians: Data custodians are system administrators responsible for the operation and management of systems and servers that collect, manage and provide access to institutional data.
  • Data users: Data users are university units or individual university community members who have been granted access to institutional data in order to perform assigned duties or in fulfillment of assigned roles or functions within the university; this access is granted solely for the conduct of university business.
  • Data Governance Office (DGO): The DGO facilitates and supports data governance and data stewardship activities, including:
    • keeping track of data stakeholders and stewards
    • providing liaisons to other disciplines and programs, such as data quality, compliance, privacy, security, architecture and IT governance
    • collecting and aligning policies, standards and guidelines from these stakeholder groups
    • arranging for the providing of information and analysis to IT projects as requested
    • facilitating and coordinating data analysis and issue analysis projects
    • facilitating and coordinating meetings of data stewards
    • collecting metrics and success measures and reporting on them to data stakeholders
    • providing ongoing stakeholder care in the form of communication, access to information, record keeping and education/support
    • articulating the value of data governance and stewardship activities
    • providing centralized communications for governance-led and data-related matters
    • maintaining governance records

A community is a grouping of users. Communities serve as taxonomies of high-level institutional data subject areas and are key to assigning accountability and responsibility. The following data governance communities have been established:

  • Academics
  • Advancement
  • Finance
  • Research
  • Human Resources
  • Services and Resources
  • Master Data

The Data Governance Committee (DGC) is comprised of data stewards and data custodians from across all functions and departments of the university. The committee meets every other month to review data quality issues, discusses proposed business terms, review policies and discuss other institutional data-related topics.  The Data Governance Committee is comprised of the following Data Communities

ACADEMICS
Office of the Registrar
Admissions
Enrollment Management
Financial Aid
Office of Military and Veterans Affairs
School of Medicine and Health Sciences (SMHS)

ADVANCEMENT
Development and Alumni Relations (DAR)

RESEARCH
Office of the Vice President for Research (OVPR)

HUMAN RESOURCES
HR Information Services (HRIS)

FINANCE
Accounts Payable
Finance Directors
Student Accounts
Office of the Comptroller

Procurement & Travel Services
SAIG Operations
Tax, Payroll and Benefits Admin

SERVICES
Office of Institutional Research and Planning
Office of Survey Research and Analysis
Academic Affairs
Career Services

Compliance & Privacy
Diversity & Inclusion
Division of Operations
External Relations
GW Libraries
Housing & Residential Life

Information Technology
Safety and Security
University Budget Office

The Data Governance Center is the single source of truth of all of our data governance and stewardship activities. It is used to manage all business definitions and key performance indicators (KPIs), support our data stewards in their daily activities and provide traceability between business and technical assets, policies and rules. It is a vital step toward achieving our vision of commonly understood consistent, trusted and high-quality data throughout the institution. 

Features

Business Glossary

The business glossary is used to define, collaborate and align critical business definitions. The glossary helps to improve our understanding of business terminology so we can communicate more effectively across the institution.

Data Dictionary The data dictionary leverages the terms from the business glossary and maps them to the actual tables and attributes in our systems like Banner and the Enterprise Accounting System and our analytical applications like the data warehouse.
Portfolio The portfolio is used to document the details related to a report, cube or visualization.
Issue Management Issue management is a centralized way to report data quality issues that can then be resolved by the data governance team. Report issues, resolve them through triage and review, escalation and assignment to the right resources, and then notify the appropriate parties of the proposed solution.
Policies, Rules and Standards These tools provide the ability to define and manage policies in collaboration with stakeholders and break them down into a rules hierarchy.
 

Log in to access the Data Governance Center using your GW NetID (the part of your GW email address before the "@") and password. 

Data Governance Center Quick Reference Guide (PDF)

These policies, articles and tools make valuable resources for understanding data governance at GW.

Policies

Downloadable Resources