Our Quality Checker has been developed in partnership with our friends at Evidence Based Education. For a detailed understanding of the concepts that feed into the report, you may wish to consider their range of assessment training products. This support article provides an overview of the most important concepts.
Once you've uploaded data into the Quality Report you'll be directed to a summary report that looks like this:
Reports are divided into three sections:
- Reliability gives you a score between 0 and 1 indicating the reliability of your assessment. The score is based on a statistical measure of internal consistency called Cronbach's Alpha. In other words, we look at the interrelationship between all the item responses and create a reliability score to indicate whether the test appears to be measuring what you set out to measure. While there are no hard and fast rule as to what constitutes a "good" reliability score, and what you will find acceptable should depend in large part on how you plan to use the assessment, we colour code scores using the following rough guildelines:
- 0.9+: "Excellent". This is a highly reliable with strong internal reliability.
- 0.8-0.89: "Good". This is a reliable assessment and it can be used for a fairly wide range of purposes, though that does not necessarily mean that on its own the assessment is sufficient to make high stakes decisions (e.g. setting).
- 0.7-0.79: "Acceptable". This assessment can be used for a range of tasks, but it is not of the highest reliability and we would not suggest using it on its own on its own for high stakes decisions (e.g. setting).
- 0.6-0.69: "Questionable". The assessment may have value for some purposes, but we suggest you examine the format of the assessment - and non-performing questions in particular - before using it for higher stakes purposes.
- 0.59 and below: "Poor". The assessment's reliability may not be strong enough to support extensive usage, particularly for higher stakes analysis.
- Precision shows you the potential variance of grades from the "true score", expressed as a percentage. For example, on the report above, a true score of 6% means that the Standard Error of Measurement (SEM) is 6% of the available marks. In other words, the true score of a student sitting the assessment would be expected to fit into a range that is +-6% of the actual score they received. Therefore you're looking to get your precision score as low as possible, since that equates to a smaller margin of error on the potential marks that a student could achieve.
- Misclassification conveys the number of students that would be misclassified if you were to use the assessment to divide students into nine equally sized groups. This is helpful to understand how accurate your grades are likely to be should you be using a 9-1 scale. Note: misclassification scores can be surprisingly high - numbers above 50% are not uncommon! This doesn't mean that the underlying assessments can't be used for such purposes, but higher scores should perhaps lead to reflection on how the assessment will be used. For example, it may be unwise to conduct setting on the basis of an assessment with a misclassification score above 25% alone.
In addition to the summary page, additional reports can be found by accessing each of the three areas above:
- The Detailed Reliability Report includes RICK Item Analysis. Here you'll find a table of all the assessment's questions, categorised as either "Replace", "Improve", "Check" or "Keep". This helps you understand how to improve your test for future usage, or what to learn from this assessment to incorporate into future design. We suggest you focus in particular on examining items marked as "Replace", since these are the items that appeared to be most problematic in terms of reliability.
- The Detailed Precision Report includes Grade Distribution Analysis. This helps you see the distribution of all achieved scores. A well-designed assessment may give a visualisation that looks something like a bell curve (i.e. a standard normal distribution). The area equating to the top and bottom 10% of potential grades is shaded in grey - it is good practice to design an assessment in such a way that not too many students scoring within the shaded range.
- The Detailed Misclassification Report includes the misclassification % scores for equally sides groups of 2-9. This can help you decide how to use the assessment. For example, a high misclassification percentage should make you wary of using the test for setting purposes.