The data science behind HMRC’s COVID-19 response
Read our blog from Anna, who talks about her role leading the data science team during the COVID-19 pandemic.
Why HMRC and data science are the perfect fit
Eleven months ago, when I joined HMRC to lead our data science team and exploit data science capability I could have not imagined what the next months would bring. Of course, I knew how important the work of the department is: we are the people who collect the money that pays for the UK's public services and help families and organisations with targeted financial support - little did I know how large that financial support need would become.
I was excited about all the data HMRC hold. We are probably one of the biggest repositories of individual and company data in the UK and I looked forward to the opportunity and challenge of leading even more innovation and driving value across HMRC with my brilliant new colleagues. Just as we started making big plans, COVID-19 struck and practically overnight we had to adapt to our new reality and emerging data science challenges.
Data science behind the Chancellor's schemes
You will have heard of the COVID-19 schemes announced by the government to give financial support to individuals and businesses during the pandemic, but did you know that HMRC’s Data Science team were instrumental in supporting the schemes?
Just to name a few examples, for the Self-Employment Income Support Scheme (SEISS) we designed and delivered a Reproducible Analytical Pipeline (RAP) solution to accurately determine the eligibility status of the entire self-employed population. The process included rapid ingestion of multiple and frequent complex data sources, the modelling of eligibility and the generation of outputs to feed the contact strategy; the live digital service; and the internal advisor user interface. Given the anxieties and fears of those requiring support, providing certainty and accuracy was key, and we worked closely with colleagues from across HMRC as one team to achieve that.
We also provided the Treasury with detailed insights on the Coronavirus Job Retention Scheme (CJRS) to support higher-level furlough decision-making and evaluation and fed directly into CJRS national statistics published from June to September.
Analysis was also focused on HMRC operations. In a matter of days, we had gone from being a mainly office-based workforce to having 55,000 people working from home. We helped our IT support service team to identify emerging issues raised by HMRC colleagues on our Yammer chat. The chat was being manually monitored but our IT service support team wanted to know what the next big thing our new homeworkers would be needing IT support on.
We tried multiple parallel approaches to explore it. One of them was Latent Dirichlet Allocation (LDA) to identify conversation topics and then plotted those over time to see what to see what new topics emerged. The second approach was simple with no machine learning (ML), we explored word frequencies across different time periods and then used a chi-squared test to find any statistically significant differences over time.
When you can't meet face to face
Deliverables are not the only thing the HMRC Data Science community get involved in. Last week we held our first fully digital Data Science Conference DiGi-CON 2020. It was attended by over 500 colleagues from across various data science, analytical and performance areas as well as digital, technology and operations. Some were data science practitioners, others just starting on their data science journey as well as colleagues from affiliated disciplines and data fans.
We had nearly 30 speakers and sessions which ranged from practical demos and real live case studies to those focused on our technology and data strategy going forward. It was fantastic to see the capabilities, cross team co-operation, enthusiasm and real purpose expressed by my colleagues and the audiences.