United Inscriptic Committee
· Features : UK

Summary of the HEFCE Data Futures conference July 2017

1. Data Futures Project

  • There are three components to Data Futures:
  1. software platforms and processes to transact data
  2. student data collection from 2019/20 and specification of the collection
  3. aggregation of data collections and capability through data collections to increase utility and benefit to providers in the longer term.
  • Currently on time, budget and scope but early in project. Outcomes of detailed design and publication of specification etc in August 2017.
  • Alpha test under way to prove the concept.
  • Beta testing to be expanded to c.100 providers in 2018/19.
  • HESA intend to not ask for the same dataset twice during transition but not planned as yet. When live will start to look at the benefits case.
  • Project communication and interactions with institutions will increase moving forward.

2. Institutional Readiness

  • The Data capability survey identified a lack of enterprise-wide design, management in silos, duplication and inefficiency, student-facing services suffer and internal and external reporting is undermined. There is a need for transactional data to be closer to reporting data.
  • How: technology depends on data integration and systems capability. Different mechanisms will be dependent on the system an institution is using. (Slide with a matrix diagram on it) High system capability but low integrated will feel like 3 HESA returns. High integration but low capability via a data warehouse would allow extension of business processes to report to HESA. High integration and high capability would enable overnight updates as data changes per day. Moves HESA data into business as usual as a product of good data management and treating data as an asset.

3. New developments

  • The iterative data model means quality needs to accrue and need to be able to track changes to the data submitted. Will put pressure on planning teams in institutions.
  • New code of practice states how suppliers of data should behave.  New version takes strategic elements and re-works for data demand side to complement the supply side as complementary documents. Draft to be published in HESA website in next couple of weeks. Considers ‘burden assessment’ which is agreed to be too high but do not have shared understanding of ‘burden’ or measuring it.
  • Burden assessment service – a new 4 stage process for requesting, understanding and approving changes to the data model in consultation with data suppliers.

Speakers with Q&A

Paul Clark

the day will provide details and plans for Data Futures with a theme of preparedness and views from different angles both within and outside the sector. Data Futures anticipates greater use of data for regulatory purposes and will inform use of data and choices which underpin in. The agenda affects the whole of the UK to make greater use of integrated data to inform policy-making and understanding performance and efficiencies as we enter a more complex environment. The need for reliable, accurate and timely data is increasing all the time and being driven through Data Futures as a policy and strategic agenda.

3 components: software platforms and processes to transact data; student data collection from 2019/20 and specification of the collection; aggregation of data collections and capability through data collections to increase utility and benefit to providers in the longer term. Significant value and benefit to the sector.

Steve Egan

Early feedback to HEFCE on the accountability burden required a response. There was variation in perceptions of burden as operational requirements were not seen as burden. An initial review by HEFCE and Board sought an understanding of the unintended consequences. Work following the review reduced the burden by c.60% and reduced the audit and verification burden through improved via data quality.

The uses of data have changed.

Government use of data

  • Resource allocation. If data is used then the quality improves because it matters
  • Accountability: HESA:HESES reconciliation and calculations of non-completion leads to variation in interpretation and reconciliation took a large amount of time and resource but was dampened by use of a tolerance band by HEFCE. Tuition fees means that every student matters and there are different external accountability frameworks which are not based in tolerance bands and delay in data,
  • Market information: use for this purpose has expanded and become more important. Data used to change behaviours in the form of information, eg. NSS, LEO, TEF, DLHE. A different form of accountability which is not about resource allocation but student driven understanding of quality and provision is required.  EMS data and REF were given as examples of helping institutions to manage themselves better.
  • Policy development: use of data and statistics to understand data better. Example of multiple data sources being used to identify performance of BME students which drove policy and became contentious but withstood scrutiny. Noted the importance of transformation and interpretation of data to inform policy.

Government requirements

accuracy; timeliness and frequency (immediate response to market means different demands on timeliness); coverage (responses to wider needs in a context of cost and need); integrity (robust time series and trust in the data, varying levels of integrity from ‘some providers’ in the system?); comparability (discontinuity in the data indicating problems in some instances); cost (minimise cost of collection and burden but in tension with market information use and need for greater granularity, requirements on OfS will be greater).

University requirements

Meet funding and regulatory requirements (for some institutions it stops there but others are aware of the need to proactively use data); provide student information via dashboards, benchmarks and statistics; resource allocation within institutions linked with performance management and data driven; internal accountability; data analytics for student performance and/or retention.

Challenges

‘Pooled sovereignty’ whereby users agree a set definition and specification for data based on stakeholder requirements; system alignment data definitions, integration points and reference between datasets; granularity, easier to do as systems develop; governance and change control to manage the system and data requests, a political not technical issue; audit and verification associated with student data, market regulation. The system needs to have consistent integrity and reliability. Also covers security and data protection, supported by reliable systems.

Many need a political solution in the regulatory context as we recognise the importance of data and develop it for ourselves as accurate, timely and trusted data.

Qs: how do we manage senior teams’ understanding of curation and use of data? Senior teams understand data use at strategic level but have a more mixed understanding on the technical complexities and challenge of data.

Government needs for data and market information. What about commercial tensions in a competitive market where there are tensions?  Transparency is good for good institutions but commercial interests have to be protected.

Rob Phillpotts

Data collection mechanisms are no longer fit for purposes so Data Futures moves to a hub and in-year collection for a wider customer base. Also changes to governance model. Overview of the project to date (noted the focus on the student record) and an overview of the 2017 engagement undertaken and project governance.

Noted alpha testing by providers and suppliers managed via user groups.  The group covered size, IT systems used, course and geographic spread. The AHUA contact group will keep an eye on benefits and enable higher level engagement during the transition.

Engagement survey: enthusiasm coming through and opportunities for tightening up processes; concern re external customers which should be balanced with transparency; need for the data spec; plan to reduce the burden on academic staff.

Currently on time, budget and scope but early in project. Outcomes of detailed design and publication of specification etc in August 2017.

Beta testing to be expanded to c.100 providers in 2018/19. HESA intend to not ask for the same dataset twice during transition but not planned as yet. When live will start to look at the benefits case.

Data Futures identified 215 student data collections – c.30 fit in new student model and a further 90 could do so with small changes.

Smoother burden by aligning the Student Return with business processes.  In year-data via dashboards; improve issue management process and make this easier; clarity re what data is available pre-sign-off and allow for new and greater range of data in future. Vision of tighter integration with institutional systems and data being right at source and linked with student administration system. Comparison with rich data sets in public services – regularly reporting a single version of the truth to improve service.

The iterative data model means quality needs to accrue and need to be able to track changes to the data submitted. Will put pressure on planning teams in institutions. 

2019/20 will need change at providers, software firms, statutory customers, HEFCE and HESA to be synchronised. What is good for institutions will be good for the sector.

What can we be doing? Consolidating organisational data (timely single version of the truth); data capability toolkit; Data Futures resources page; Data Futures on JISCMAIL.

Andy Youell

Explosion of computing, devices and the internet.

Data capabilities at sector level

Level defined by data landscape and managed via collective governance.  Data Landscape Steering Group: leader from supply and demand sides of data transactions to provide oversight and leadership of data landscape. Oversee standardisation and rationalisation of data flows, development of a common data language, working on a code of practice for data collectors.

Original code of practice states how suppliers of data should behave.  New version takes strategic elements and re-works for data demand side to complement the supply side as complementary documents. Draft to be published in HESA website in next couple of weeks. Considers ‘burden assessment’ which is agreed to be too high but do not have shared understanding of ‘burden’ or measuring it. Appropriated and adapted an NHS concept and process for higher education. Developing a model to assess burden which will be part of the documentation.

Organisational level

Data capability toolkit with online assessment process developed from wider management thinking. Harvested data for over 100 institutions which suggests the sector as a whole is in the ‘reactive’ stage of development which is just above chaos and not in stable, proactive or predictive. Need to be well into the stable space.

The Data capability survey identified a lack of enterprise-wide design, management in silos, duplication and inefficiency, student-facing services suffer and internal and external reporting is undermined. There is a need for transactional data to be closer to reporting data.

Data is an asset – so is money and the control and audit applied to this could be compared with that applied to data. Accountants do this well – we’ve had money for centuries but data is comparatively new.

Data Futures has engaged with the community via website, seminars, communities of practice, training, engagement via sector bodies.

Individual level

greater demand for data skills. A recent NESTA study identified 4 skills: business/domain knowledge, systems/software skills, data/analysis skills, communication skills. Final one is the hardest. NESTA described people with all four as ‘unicorns’.  Cost of ‘data scientists’ has increased: no recognised data profession; no defined skillset, qualifications, professional standards, no defined career pathway, no professional body/regulator. Is this a challenge or an opportunity?

Data Futures is key to the rationalisation of the landscape and governance to drive better behaviours but need to leverage institutional and individual capabilities.

Hetan Shah

There is a juxtaposition of a big data and a post-truth narrative.  Mid-point is the impact data can make in changing the current world. Post-truth implies there was a ‘truth’ era. Trump is a major factor but the UK is in a different place to the USA. Referendum stats had an impact – less validated data use than in elections. Polling on trust of experts has remained much the same. During the referendum Martin Lewis was the most trusted expert according to a poll because he was ‘on our side’. Need to ensure expertise is seen to be used in the public interest.

Big data is used as a ‘mood word’ for the ubiquity of data. New phenomena have emerged as a result of big data including professions and technical context such as ‘internet of things’.  Examples of new data use: Amazon algorithms for new products; Google translate based on language correlations; development of AI to beat humans at Chinese game Go which is more complex than Chess; AI learning to negotiate and to lie as part of a game and work differently depending on context. Development of driverless cars, genomic data to inform healthcare, weather data to deal with short-term climate issues; use of twitter to indicate where there are problems.

Some issues just need more data or for the technology to develop. Powerful public interest uses of data to guide intervention. Issue of privacy – different norms in China mean facial recognition is used differently, for example to stop theft of toilet rolls. Instance of bias in algorithms depending on the training data used and can lead to bias in racial terms.

Problems and limitations of datasets leading to mis-representation. Need for accountability to inform decisions made by algorithms and GDPR suggests more to do in this area. Data monopolies by large corporations are of concern and require a new competition law lens on monopolies which suggests a need for data commons.

There is a need for public trust to support development, e.g. GM crops had a backlash but stem cell research did not. The same principle needs to apply to data: need to be transparent, competent and working in the public interest with data.

Qs: How will open data develop? Example of open data is travel apps. The agenda continues and Government is supportive. Question of whether private holders of data will respond. ‘Reproduceability’ crisis in Science possible from more access to data.

Conflict between big data, FOI and DP? Difference between protection and innovation in data. Encourage data work within protection frameworks. Audience noted that this needs to be communicated by the Information Commissioner.

Panel session

What is being envisaged for the beta pilots? Do not need end to end pilots as yet. Alpha phase is testing the concept. Beta needs to have success defined but will need to test end to end data submission from student systems to new HESA data platform. Could just be at provider end and still a success. Software suppliers have been involved in developing data specification and will be finalising it in September. There is enough info to start building that capability.

The sector is working within conflicting narratives of competition and differentiation but also regulatory imperative to rationalise and standardise. There could be disruption of the current model. What steps are being taken to avoid unintended consequences which also stifle innovation? Change and upheaval in the sector: Data Futures is coincident with regulatory overhaul so the environment is not static. Aiming for maximum flexibility as cannot predict future requirements. Data Futures is in discussion with current and future regulators and sector. Moving from co-regulation to new regulatory model. Increase in diversity across the sector in terms of scale and model which has led to complex data models.  Need to be alive to changes in the sector to redefine and develop the data specification. Transparency will help to drive innovation. Move from a static snapshot model to a model of what is happening now.

Plans for alternative provider consultation and engagement? HESA has been good at institutional level engagement but via groups. Alternative providers do not have the same types of bodies.

Comments on the practicalities of ‘what next’ excluded setting up projects and seeking funding. Are we not at that stage yet or how should we get there? Data Futures will have a profound and variable impact on institutions. Need an appropriate institutional response.  Cannot define response for individual institutions and needs to be defined by institutions. Will be opportunities to learn from peers and understand what is happening elsewhere.

How will the feedback from the alpha pilot be used for institutions which are further away from the standard model and may be using different software (eg Salesforce)? Want to achieve tight integration and a question about where we will be in 19/20 and what do we roadmap for later years. Need to think about future possibilities and diversity of institutions and their capabilities.

Integrity of alternative providers suggested by their attendance. HESA strategy for providing context for the data released? Sector is changing and all have responsibility to raise the quality of the debate.  LEO data was used as an example of cross-department data needing context for interpretation.

When will you know what the transition year will look like? Cannot yet design the transition as HESA have not finalised the data specification as yet. Hope to provide an outline in the September comms. Hope to not ask for the same data twice in moving to a cumulative model. A transition plan will be part of the August outputs and likely to be shared in September/October.

Could HESA contribute to education of the wider group who need to understand and interpret data? MOOC or similar? Yes.

Dan Cook

The logical observed model was developed to capture the landscape and reflects the business cycle. Funding and monitoring moves from being integral to a separate piece outside the model which is different from the previously defined data model. The new model is a combination of old concepts and data requirements with a new structure for collections. In-year collections are intended to be closer to business data and require translation which may necessitate additional granularity or data items being removed from the return. 485 issues were identified in collection responses and suggest that the model is not broken but need to focus on the detail to enable the next stage of development. Will develop a data dictionary and new data model which will be the spec for the 19/20 return. Can have confidence in the model as published being what will be required. The new model has heritage from the HESA student collection and will be applicable to all institutions within the fee cap. OfS will need to determine how it applies to alternative providers.

The new model is transactional, follows business events and is continuous. Builds up a picture over time. Perpetual and continuous – not three HESAs or three chunks. Full model is in scope at all times. Scope for collection changes depending on the collection period. Changes during data period with review and sign-off process at the end as move between phases. Experimental data in the first instance. Phase i) what does new intake look like in August to November; Phaese ii) December to March for January starters, Nurses etc. Will no longer be 2/3 of a Nurse and will not finish at the end of the academic year which is ok in a continuous data mode; Phase iii) final detail on the student population, March to July.

How: technology depends on data integration and systems capability. Different mechanisms will be dependent on the system an institution is using. (Slide with a matrix diagram on it) High system capability but low integrated will feel like 3 HESA returns. High integration but low capability via a data warehouse would allow extension of business processes to report to HESA. High integration and high capability would enable overnight updates as data changes per day. Moves HESA data into business as usual as a product of good data management and treating data as an asset.

Data quality: a different approach using data quality dimensions. Current quality mechanisms are good but not timely.  Timeliness may override completeness of the dataset and requires reasonableness in data quality rules based on a specified time period. Rules based on distance from an event or data which is required by a specific point. More time to understand onward use of the data.

Move from validation process to peers in the QA process.  Anyone can raise an issue – problems with data return can be raised as an issue in making the return, or regulators may spot features of the data for further explanation.  Data will be validated and cleaned before it is finally signed off.

Governing requirements: HESA plan to target a list of data collectors who could obtain data direct from HESA and HESA needs to be more tactical in its communications which will include a burden assessment service and governance capabilities to enable configuration and release to predictable windows. The Burden assessment service – a new 4 stage process for requesting, understanding and approving changes to the data model in consultation with data suppliers.

John Hogan:

There tends to be a mismatch between senior managers and people who know what they’re doing with data.  Senior managers just want things to work and recognise the role of planning teams in getting this right for us but aren’t sure what it means so need clarity and a sense of direction.

It’s easy to ask for more data.  Email and the internet increased the number of requests but has not reduced the effort required to collect data so collect more to be on the safe side.

AHUA has engaged in understanding Data Futures and look forward to greater engagement and sessions for feedback. Will need good project management in institutions and to integrate the changes to HESA student to take advantage of the new return.

Strategic planning teams work in the context of OfS, TEF, REF, Brexit but also develop data usage to support priorities and efficiency. What is the benefit to students? How is the message on additional return points being communicated in Universities? Planning teams need to do the right thing.

Shaun Briggs and Heidi Wright

Alternative Providers have only been returning data to HESA since 2015 but are still learning and are flexible to HESA reporting requirements. Small teams and small budgets, working off Excel spreadsheets. Small numbers of students. International links can require reporting overseas as well as in the UK and systems do not fit multiple requirements.

Kaplan Open Learning challenges – data quality; changes to the student system; change to in-year; process changes; additional time commitment from staff; managing initial implementation and speed of change. Preparing via more discussions and audit of data. Fortunate to have resource to commit to the changes to HESA.

Nazarene Theological College– manual coding of data and tech support for conversion from Excel to XML for return. Made senior management aware but may need new student system.  Challenges on student system, pressure to do reporting when other activities are taking place and need to consider if resourced appropriately.

APs will be significantly impacted and are keen to have a voice.

Qs: did not take questions due to running late.

Dave Shepherd and Steve Longshaw

Overview of Civica Digital’s work and products to develop digital solutions working with a range of industries. Civica was chosen because of extensive experience of relevant solutions and organisations. Developed the Track solution for UCAS which is used in Confirmation, Adjustment and Clearing. Provided an example where Civica had developed critical systems to manage enterprise wide data. Data Futures is about data acquisition and storage.

Work so far has included the design process which has used the user interface to develop the proposal for the interface. This is based on user personas. Also worked on the data quality strategy for quality at the start of the process.  Need historic data to test rules applied to data.

Provided an overview of the project timeline (which was covered earlier).  The alpha pilot is also testing classroom based training for new requirements. This will be refined in the beta pilot and expanded to webinar as well. Readiness assessment is due by January 2018 for testing kick-off. Will need a high level of user engagement to deliver the project.

Paul Clark

Data Futures won’t succeed without communication and engagement.  Direct engagement with HESA by institutions was encouraged. Will be more webinars and Q&A to push out more detail and slides will be available.

More announcements on the detail in September and engagement with the sector by HESA will accelerate.