Standardised grades in Smartgrade – Smartgrade

When assessment publishers standardise assessments in Smartgrade they apply three types of standardisation:

Percentile Rank.
1. This is the percentile rank of a student within a given assessment, with 1st being the bottom percentile and 100th being the top.
2. The median student will always sit in the 50th percentile.
3. Each percentile rank contains an equal number of students as far as is possible based on the sample.
Standardised Score.
1. We use a scale for Standardised Scores that is widely used in education circles.
2. Its key properties are that the mean average raw score always converts to a standardised score of 100, and each standard deviation represents 15 points on the scale. The standardised score is calculated by subtracting the mean average score from the achieved score and then dividing the result by the standard deviation of the assessment cohort.
3. As far as is possible based on the sample, 96% of students will achieve a score within two standard deviations of the average. Where the assessment follows a normal distribution, this would lead to 96% of students receiving a grade between 70 and 130. However, not all assessments in Smartgrade follow a normal distribution - so this pattern shouldn't be assumed.
4. The range of potential scores we allow is 60 to 140, though the highest and lowest scores for any given test could be significantly narrower, depending on the sample size and distribution of raw scores.
5. This Wikipedia article gives more information about the properties of standardised scores.
6. You'll find more information at the bottom of this article about comparing Percentile Ranks and Standardised Scores in Smartgrade.
Smartgrade.
1. Assessment publishers can apply one or more Smartgrades to an assessment. These are custom grades that may be specific to the context of that assessment or phase of education.
2. Smartgrades can have a number of categories ranging between 2 and 50. They can be given their own custom names by the publisher, and the categories themselves can also be named as either values (e.g. 100, 101) or letters (e.g. EXS, WTS).
3. By default, Smartgrade proposes grade boundaries for categories which divide results into equally sized groups. Publishers may then adjust grade boundaries based on their professional judgment as to where grade boundaries should lie.

NB Smartgrades are available for all kinds of assessments, including tiered assessments. Percentile Ranks are also calculated for some tiered assessments, but Standardised Scores are not calculated for tiered assessments. You can read more about tiered assessments here.

Global vs Org-wide standardisations

There are two levels at which we standardise: across a global sample or across your organisation:

Global standardisations are available when an assessment is available to a broad sample of schools, and are designed to be as representative of the national cohort as possible.
Org-wide standardisations are available on all assesments, and allow you to standardise using the sample of your school or MAT. This sample is unlikely to be nationally representative (unless your school or MAT has a very similar profile to the national cohort), but it can offer a consistent and normalised scale across subjects and year groups, particularly in MATs.

We often recommend comparing the global and org-wide percentile ranks for an assessment, to see how your school or MAT's rank order compares to a national cohort using an intuitive scale.

Other standardised grade types available in Smartgrade

There are some other commonly-used Smartgrade scales that you may come across, and which are explained in some more detail below:

Performance Indicator
1. On partner Maths/Reading/GPS primary assessments, we usually calculate a Performance Indicator.
2. This converts the raw score into: BLW, WTS, EXS, HS.
3. EXS and HS thresholds are set based on the percentage of students that achieved that designation in the given subject in the previous year's KS2 SATs series. The percentile rank of the child in the live standardisation is used to establish the appropriate performance indicator.
4. BLW is applied to students scoring 0 (deemed to be working below the standard of the test), and WTS is everyone between BLW and EXS.
5. Important note regarding KS1 SATs. For KS1 SATs we calculate Performance Indicators of BLW, WTS and EXS. Students scoring 0 (deemed to be working below the standard of the test) receive BLW; students with a scaled score (using the DfE scaled score conversion tables from the relevant SATs year) of 99 or below receive WTS; and students with a scaled score of 100 or more receive EXS. HS is not calculated in the KS1 SATs.
Scaled Score (May XXXX)
1. On Practice KS2 SATs assessments we provide the conversion from the raw score to the scaled score that corresponded with that value in the year that the SATs paper was set.
2. This value, not adjusted to account for the live sample, meaning that a child may have a Performance Indicator of EXS (because their percentile rank is at or above the percentage of students achieving EXS in the previous year's SATs), but a Scaled Score (May XXXX) below 100, because in the actual SATs exam in May of the relevant year their score would not have achieved the threshold for 100/EXS.
Scaled Score Indicator
1. On Practice KS2 SATs assessments we calculate a Scaled Score Indicator.
2. The thresholds for these indicators are based on the percentage of students achieving that scaled score in the given subject in the previous KS2 SATs series. The percentile rank of the child in the live standardisation is used to establish the appropriate scaled score.
Phonics Performance Indicator
1. On the practice Phonics Screening Checks, in Year 1 we usually calculate a Performance Indicator.
2. This converts the score to Wt (working towards the standard) and Wa (working at the standard).
3. The Wa threshold is set based on the percentage of students achieving Wa in the previous year's Phonics Screening Check. The percentile rank of the child in the live standardisation is used to establish whether a student is Wa.
4. Wt is applied to all students not achieving a Wa.
Phonics DfE Threshold (32)
1. On the Phonics Screening Check assessments we calculate whether a child is Wt (working towards the standard) and Wa (working at the standard) based on the DfE pass mark from the year that the Phonics Screening Check was sat.
2. Since the threshold has always been at 32 since the tests were introduced, to date the threshold is at 32 regardless of the year of the practice assessment, though if it changes in future we'll adjust the threshold for the relevant test.
3. This value, not adjusted for the live sample, means that a child may have a Performance Indicator of Wa (because their percentile rank is at or above the percentage of students passing the Phonics Screening Check in the previous year's tests), but a Phonics DfE Threshold (32) of Wt, because in the actual Phonics Screening Check of the relevant year, their score achieved was below the DfE pass mark.

Comparing Percentile Ranks and Standardised Scores

People sometimes assume that there is a fixed relationship between Percentile Ranks and Standardised Scores, but this is only the case where an assessment follows a normal distribution. A number of assessments in Smartgrade will not follow a normal distribution, because they were not designed to do so. There are good reasons for this - for example, we standardise results from past papers (e.g. Practice SATs), which have quite different distributions at different points in the year. Some of our partners also offer curriculum-linked assessments, where ensuring the assessment covers the relevant parts of the curriculum is considered more important than achieving a perfectly "normal" bell curve of results.

When an assessment does not have a normal distribution, the relationship between Percentile Ranks and Standardised Scores can vary, because of the way they are calculated. As explained above, Percentile Ranks are anchored around the median raw score whereas Standardised Scores are anchored around the mean average. The median and mean may equate to different raw scores where the distribution of marks is not normal.

Another thing to understand about the two scales is that they are sensitive in different ways. For many distributions (including normal distributions), a wide range of percentile ranks around the median will correspond to a narrow range of raw scores. Or to put it another way, the difference between a percentile rank of 45 and 55 may not be that great when viewed as a raw score. The opposite is true at the extremes, so for example the 98th - 100th percentile ranks could cover a large range of raw scores if not many students achieved scores at the top end of the scale.

Standardised Scores behave differently. For most distributions (including normal distributions), the achieved standardised scores will be more bunched around the mean average (100) than with percentile ranks, and will be distributed more widely around the extremes. For example, on a normal distribution, the range of standardised scores from 95 to 105 maps to a range of percentile ranks from 37 to 63. So on the Standardised Scores, a range covering just 12.5% of the available grades maps to a range of percentile ranks covering 26% of the available grades. In contrast, at the extremes on a normal distribution the 98th to 100th percentile ranks equate to Standardised Scores of 130-140.

Which to use when analysing Smartgrade data: Percentile Ranks or Standardised Scores?

While both scales have value, on balance we prefer percentile ranks for analysing data, because:

They are less likely to be confused with DfE Scaled Scores.
Most people intuitively understand percentiles.
The relationship between percentile ranks and performance indicators is fixed, because the latter is derived from the former, whereas the relationship between standardised scores and performance indicators is fluid.
Where assessments do not follow a normal distribution, a user may assume a closer relationship between standardised scores and percentile ranks than is the case for some assessments.