Value for money: experimenting with algorithms for school comparisons

Part II

This week Reform published the first in a series of blogs outlining our research on value for money in schools. It described the growing budgetary constraints schools are facing and argued that improving value for money should be seen as a fundamental task.

This second blog will describe the technique Reform experimented with to compare similar schools. The overarching idea was to create static groups of similar schools before examining differences in the economy, efficiency and effectiveness of school types within those groups – such as academies and community schools. This blog will also present the results from Reform's attempt to create homogeneous school clusters using a sample of both primary and secondary school-level data.

Previous research

Most studies of English schools have either evaluated just one aspect of value for money, such as economy, or the whole chain (i.e. economy, efficiency and effectiveness) for only certain school types.

In a government-commissioned report, Allen et al. analyse primary and secondary schools' financial returns to understand how much of the variation in spending can be attributed to a school’s circumstance and how much reflects specific decisions it has made. The authors find that 30 per cent of the variation in teacher expenditure between schools is unexplained, even after controlling for varying circumstances. More generally, it finds that "many of the important operational financial decisions of schools are largely idiosyncratic", thus highlighting the difficulty of analysing school economy.

The Department for Education’s efficiency review of schools made use of a sophisticated mathematical technique, data envelopment analysis (DEA) to identify efficient schools. This was supplemented by the use of case studies to qualitatively assess what made those schools efficient.

Why context matters

To explore differences in the value for money of school types it is crucial to compare schools that are similar in terms of their pupil intake. This is the only way to make fair comparisons between schools as it ensures that schools evolving in relatively similar contexts are compared to each other. Pupil characteristics such as prior attainment and eligibility for free school meals have a large impact on both pupil performance and school performance in league tables, yet these factors are outside the school's control.

In addition, schools receive per pupil funding based on a variety of contextual factors and can receive additional funding through the Pupil Premium, which could in itself explain spending decisions. Controlling for pupil characteristics is therefore vital to making fair comparisons between schools on school economy.

The algorithm

Reform applied a specific version of one of the most widely used unsupervised clustering algorithms, known as K-means, to cluster schools according to specified characteristics. The term unsupervised refers to the fact the algorithm itself is left to discover interesting structures in the data. It has not been trained to identify a right answer or specific output. For example, if the specified clustering characteristics are prior attainment, percentage of students eligible for free school meals and percentage of students with persistent absence, the final shape (i.e. size, type of schools included in the cluster, etc.) of the clusters is unknown prior to the start of the clustering mechanisms.

The clustering procedure is as follows:

(1) Number of clusters are determined by the user

(2) Number of clustering variables are determined by the user

(3) Each cluster is associated with a centre point (see Figure 1) which is arbitrarily selected by the algorithm at the start of the of procedure

(4) Each data point is assigned to the cluster with the closest centre point

(5) Cluster centres are updated as more data points are included in the cluster

(6) Step 4 and 5 are repeated until nothing changes [1]

When step 6 is reached, it means the algorithm converges to the same results – to the same schools being clustered together – regardless of the number of iterations (i.e. the number of times the algorithm is run). This convergence assures that on a technical level the clustering has worked.

Figure 1 shows schools clustered by three characteristics. Schools within a cluster are similar in terms of the clustering characteristics specified in the algorithm and dissimilar to schools in other clusters. The algorithm compromises across all the characteristics so that clusters are not necessarily most similar according to one characteristic but averagely similar across all. K-means is classified as a partitioning method as the clusters are not overlapping, which means that one school cannot be in two clusters at the same time.

Reform used Boris Mirkin's intelligent K-means, also known as iK-Means. The iK-Means procedure differs from the 'classical’ K-means clustering procedure in several ways. It detects anomalous patterns in the dataset and automatically determines "the number of clusters" that should be created given the nature and structure of the data, instead of having to specify it.[2] This feature of iK-Means makes it particularly appealing as finding the appropriate number of clusters within a dataset can prove complicated.

One of the main limitations of this method is the tradeoffs the algorithm has to make when it clusters similar schools together. This has the potential to impact the homogeneity of the clusters and the problem generally gets worse the more clustering variables are specified in the algorithm.

There are a range of factors that could affect the economy, efficiency or effectiveness of schools. These factors were first chosen through a thorough literature review and then tested to see which most affected attainment in primary and secondary schools. Several combinations of clustering variables were tested, and within-cluster homogeneity seemed to be least compromised when only two clustering features were used. The two school-level variables which had the strongest relationship with KS2 results in the primary school sample were: the percentage of pupils eligible for free school meals in the past 6 years and the percentage of students with persistent absence. In the secondary sample, levels of prior attainment and persistent absence had the strongest impact on KS4 results.

The clusters

Figure 2 and 3 show the iK-Means clustering results for primary and secondary according to the chosen pupil charactertics. The size of cluster varies more in primary than secondary schools, respectively ranging from 9 to 872 schools and 43 to 230 schools. Cluster 6 with only 9 primary schools is removed from the results appearing in the next blog of this series as it does not contain enough schools and still has a wide variation in terms the percentage of pupils with persistent absence.


As shown by the figures below the clusters are relatively homogenous in terms of the clustering variables with small within cluster variations.




Given that adding clustering variables to the algorithm compromised the homogeneity of the clusters, it is possible that the school groups presented in this blog still do not sufficiently mitigate the differences in schools' contexts. Thus, not allowing for fully fair comparisons to be made between schools. However, Reform is not aware of any other model that has clustered schools in a fairer way than this.

Further research

Finding the appropriate methodology to assess value for money in schools is crucial to the research in this field. Previous attempts have assessed the merits of different quantitative techniques in the evaluation of value for money in schools. Smith and Street warned that some methods should not be used to make a judgement on value for money in schools due to data quality issues. Other methods were found to produce school rankings that were extremely sensitive to the variables specified in the model. To Reform's knowledge no other attempt has been made to use a machine learning technique (i.e. iK-Means) to cluster similar schools together. Despite the experimental nature of this work and the uncertainty that lies in the appropriateness of the clusters, Reform believes that the application of this innovative methodology is a beneficial contribution to research in this field, and that further experimentation would be valuable.



[1] Han, Jiawei, Micheline Kamber, and Jian Pei. '10 - Cluster Analysis: Basic Concepts and Methods'. In Data Mining: Concepts and Techniques (Third Edition), Jiawei Han, Micheline Kamber, Jian Pei. Boston: Morgan Kaufmann, 2012.

[2] Mirkin, Boris. Core Concepts in Data Analysis: Summarization, Correlation and Visualization. London: Springer London, 2011.