Skip to main content

HESA Data Futures: the data quality challenge

Higher Education Institutions (HEIs) have always needed to ensure the student data they capture is accurate, complete, and consistent. The introduction of HESA Data Futures this academic year (22/23) means that HEIs are faced with new data quality challenges to navigate. The data required for HESA Data Futures is vast and varied, and includes, for example, the student’s highest qualification when they joined the institution, their parent’s occupation, and their study pattern/location.

In our first blog: Why is everyone talking about HESA Data Futures?we talked about what HESA Data Futures is and how HEIs should prepare. In this blog, we examine the data quality challenges faced by HEIs and outline approaches to address them.

Why is it so challenging to develop a quality Data Futures return?

The size and complexity of the HESA Data Futures requirement means that before final submission to HESA in October 2023, providers must:

  • Resolve any existing issues they have with their student data.
  • Ensure their data aligns to the new data quality rules.
  • Ensure their data is consistent across the student record.
  • Collect and derive all new data fields.

Many HEIs experience data quality issues caused by disparate system landscapes, challenging data collection processes or a lack of system functionality. These need to be rectified or remediated to adhere to Data Futures requirements. Many HEIs will need to invest additional time and resource to ensure their data is reported in line with HESA’s revised quality rules.

Data Futures has over 1,000 business quality rules that HEIs need to adhere to when building their return. These rules define how each data field should be formatted, which values are accepted, and how they should align with other data points in the record. The specificity and interactions of the data quality rules make it challenging for HEIs to build a compliant return, particularly for those students or courses which are non-standard.

What is the impact of poor data quality?

As the return is used for many different purposes, the quality of insight and reporting derived from it can be significantly impacted by poor data quality:

  • Poor quality data in the HESA Data Futures return poses a reputational risk as the data is used in league tables, performance indices and annual publications. If the data is subsequently found to be inaccurate it can create mistrust from stakeholders; including students, staff and regulators.
  • Statutory returns are important for HEI funding. If the data provided is not an accurate representation of the university, there may be funding implications.
  • Poor student data quality can lead to poor decision making. HEIs with an inaccurate and incomplete understanding of who their students are, what they are doing or how they interact with the institution may be at risk of making uninformed decisions.
  • A poor quality return can trigger a regulatory OfS audit and more scrutiny of data returns.

How can HEIs avoid poor data quality?

Monitoring and maintaining data quality in business-critical data fields, like those used to populate the HESA return, requires ongoing effort and activity. Remediation of poor data quality can be time-consuming and costly if not dealt with promptly.

In addressing data quality for HESA Data Futures there is an opportunity for HEIs to build a rich data landscape for wider business benefits, such as: developing reliable insight into attrition to drive decision making, and delivering an autonomous and personalised student experience. High quality student data can be realised across an institution through ​an effective data quality management programme and embedded data governance.

Below is a cyclical data quality improvement process that can help institutions deliver higher quality data for HESA and beyond:

Step 0: embed data ownership (a prerequisite to improving data quality)

High quality data requires input and buy-in from across the institution. If there is an absence of dedicated owners across the business who are incentivised to improve data quality, there may be a lack of time or resource to complete remediation tasks, leaving the institution with a list of unresolved issues. Before institutions go ahead with data quality remediation and cleansing activities, institutions should aim to have:

  • A dedicated team with clear roles and responsibilities to drive the data quality assessment against defined and consistent data standards.
  • Clear and defined data ownership across the business.
  • Buy-in from business areas who own data upstream.
  • Clear and agreed roles and responsibilities for remediation activities.
  • Time assigned to drive remediation activities.

To support cross-business collaboration efforts, institutions should consider a communication strategy dedicated to ensuring that the business appreciates the significance of HESA Data Futures and can take ownership of data quality remediation activity.

Don’t have Data Owners? Take a look at our blog on Data Ownership: What’s in a Name?

Step 1: determine high priority HESA fields and business requirements

Institutions should understand their business critical and HESA-critical data points so that an informed, prioritised data quality plan can be developed. HEIs should:

  • Review the HESA Data Futures specification to determine which rules their data must satisfy to enable a quality, compliant return.
  • Identify the critical data points and respective data owners for populating the HESA return.
  • Engage with data SMEs to identify the potential data quality challenges to satisfy HESA Data Futures requirements.
  • Based on the outcomes from the three activities above, develop a prioritised list of business requirements.
Step 2: develop data quality rules

The data quality lead should collaborate with business data owners to translate the business requirements developed in step 1 into data quality rules that account for HESA’s requirements. Here are some example requirements to consider capturing in the data quality rules:

  • Do the values fall within the expected range defined by HESA Data Futures?
  • Are the values mapped correctly from the legacy return according to HESA Data Futures?
  • Are null values only present when no minimum requirement is set in HESA Data Futures?
  • Is the data complete, accurate, consistent, and valid? Is there clear data integrity and uniqueness where required?
Step 3: data profiling and reporting

Once the data quality rules are configured, institutions should profile their data. The aim of data profiling is to identify issues in the data that require remediation. The data profiling assessment should align to the business and data requirements defined in steps 1 and 2. As a minimum this should involve:

  • Analysing the data content and behaviour across different student and course scenarios.
  • Validating the data against data quality rules.
  • Checking consistency between interacting fields.
  • Testing the data consistency against: the data reported in the previous HESA return (21/22); other students returns, for example the HESES return; and internally and externally-reported metrics, for example, continuation rates.

Once identified, data quality metrics should be developed and combined to create a data assessment report. The data assessment report will identify critical data elements that contain significant quality issues and require prioritised remediation action.

To support data profiling, troubleshooting, and reporting activities HEIs are likely to need certain data quality software. Although HEIs will have access to the HESA Data Platform to test their data against HESA’s quality rules and requirements, they should consider using additional data quality tools to perform ongoing data validation testing as a business-as-usual activity.

Step 4: engage with data owners to remediate issues

Once data quality issues have been identified, there should be a clear issue escalation process to follow. Organisations should:

  • Define the structures involved in issue management, prioritisation, mitigation, and resolution.
  • Engage with data owners and utilise the data assessment report developed in step 3 to drive root-cause analysis to data quality issues.
  • Collaborate with data owners and engineers to define activities, prioritised deliverables, timelines, and ownership for data quality issue(s) remediation.
  • Design, build, test and implement remediations.
  • Perform post-issue rectification testing and report solution improvement metrics to stakeholders.
Step 5: repeat!

Improving data quality is not a one-off activity. Data quality assessment and remediation should be embedded in the organisation as a business-as-usual process, as part of an institution-wide strategy to become data-driven. Continuous monitoring and maintenance of student data quality will soon become even more pressing for HEIs as Data Futures introduces twice-annual collections in 24/25, and in later years when near-live reporting will be required.

Are you prepared for HESA Data Futures? To find out or for more support, please get in touch for a Deloitte HESA Readiness Assessment or a discussion with the any of the contacts listed below.

This is the second in a series of HESA Data Futures blogs from Deloitte.