Quality Assessment Guidelines - ACT Board of Senior Secondary Studies

Introduction

Assessment forms an integral part of the ACT Senior Secondary System. Developing quality assessment tasks is therefore important for the integrity of the Senior Secondary System. To support teachers and ensure quality, the assessment guidelines are based on contemporary research and are designed to develop a common understanding and language of how to develop assessment to meet the needs of the students. In addition, the guidelines will inform the work of the Board of Senior Secondary Studies (BSSS) in the areas of moderation and assessment.

The BSSS Quality Assessment Criteria (BSSS QAC) can be used for designing and reviewing tasks. The application of the BSSS QAC can be used to assess individual tasks and be used to assess tasks holistically across a unit.

Quality Assessment Guidelines (770 kb)

A web version of this tool is available here: https://sites.google.com/view/bsss-quality-assessment

Validity

It is impossible to directly assess the knowledge and understandings in the brain of a student. What teachers instead try to do is use carefully selected proxies (assessment tasks) to provide evidence in order to make valid inferences on the knowledge and understandings of the student (Christodoulou, 2016).

The idea of validity in assessment is a key lynchpin of all assessment tasks and inferences drawn from assessment data (Christodoulou, 2016). Three perspectives are considered in determining validity, “the form of the measure, the purpose of the assessment, and the population for which it is intended.” (Dirksen, 2013). Masters (2013) argues that validity focuses on how fit for purpose the assessment is for the domain being assessed. Darr (2005a) notes that “Judging validity cannot be reduced to a simple technical procedure. Nor is validity something that can be measured on an absolute scale. The validity of an assessment pertains to particular inferences and decisions made for a specific group of students.” (p.55). Inferences drawn from the data that assessment generates, is the foundation of the ACT system. Bennett (2011) argues that for an assessment to be valid, it should be supported with data that shows that different observers would draw the same inferences from the same evidence.

Validity can be affected by six factors which form the core of the quality assessment guidelines:

coverage of the curriculum
reliability
bias
provision for a range of thinking levels
student engagement
academic integrity.

The Criteria

Coverage of BSSS Accredited Courses

Wiliam (2014) outlines two threats to validity: assessment which is ‘too small’ (construct under-representation) and fails to assess what it should, and assessment which is ‘too big’ (construct irrelevant variance) and assesses things which it should not. An example of both issues may be a video presentation assignment in a History class on a specific small historical aspect. Some teachers may look at the assignment and argue that the assignment is ‘too small’ only assessing a small part of the unit and others may argue that it is ‘too big’ assessing things it should not such as their video editing and presentation skills. This is not to say that this assessment should not take place. This assignment could provide a fantastic opportunity for students, but the teacher should try and address these concerns across the entirety of the unit assessment.

The domain of a subject’s knowledge, skills and understandings is often impossibly large to assess in entirety. Even at the unit level there can often be goals or descriptions that could be interpreted and assessed in infinite ways. Due to this, assessment is almost always a construct under-representation but is then used to make inferences as to the students’ performance in the construct as a whole. For these inferences to be valid, teachers should ensure that appropriate breadth and depth is assessed (Christodolou, 2016).

A New Zealand meta-analysis review of the effects of curricula and assessment on pedagogical approaches (2005) shows that high stakes assessment can limit the classroom curriculum for students, particularly lower achievers, and minority students. It is easy for teachers to fall into the trap of assessing what is easy to assess and ignoring the assessment of more difficult to assess skills or content. Wiliam (2014) uses an example of the assessing of practicals in science. It had been shown previously that the skills in science practicals were highly correlated to the scores in science tests. However, when practical assessment was removed from the formal assessment program this correlation does not hold. It is important that assessment type and scope should not be allowed to distort curriculum delivery. (Carr, McGee, Jones, McKinley, Bell, Barr & Simpson, 2005).

Further information on designing and assessing Coverage of BSSS Accredited Courses in assessment is available in the full Quality Assessment Guidelines Document (770 kb)

Reliability

To make valid inferences of student knowledge, skills and understandings in the domain, assessment measurements need to minimise the influence of non-relevant factors in the measurement. This is called reliability.

To understand what reliability means we need to understand that all assessment measurements (observed scores) have an error contained within them such that:

Observed Score = True Score + Error

The True Score in the above equation is not that we think a student’s ability is predetermined or fixed but represents what that student would get on average if the task was given repeatedly, completed with appropriate ‘memory wipe’, or was given a multiple parallel assessment of the exact same difficulty on the same material (Bramley & Dhawan, 2010). Note that it is not possible to completely remove this error, while improving reliability of assessment means to aim to minimise this error to improve the stability of results there will always be variation. (Dirksen, 2013). Increased reliability increases our certainty that a student who receives an 80 in an assessment has a higher achievement than a student who receives a 70 for example.

Reliability can be thought of in terms of consistency:

across time (would students receive the same result from the task if conditions were different?)
across tasks (would students receive the same result from different tasks assessing this material?)
and across markers (would students receive the same result from different markers?) (Christodolou, 2016; Darr, 2005b).

Within an assessment item such as a test, reliability can also be thought of as the consistency of a question compared to all the other questions in the task assessing the same material (Dirkson, 2013).

Reliability can really only be determined through the examination of results in the assessment but the factors that decrease error are well known. These include: standardising assessment conditions; designing suitable questions in terms of difficulty for the students involved; having questions that lead to a spread of scores; and having quality rubrics and marking schemes leading to consistent marking and moderation (Darr, 2005b; Masters, 2013).

Further information on designing and assessing Reliability in assessment is available in the full Quality Assessment Guidelines Document (770 kb)

Bias Awareness

Bias in assessment is one which favours a student or students over others based on factors other than the key knowledge, skills, and understandings of the student in the unit. Bias plays a role in how inferences are drawn, and so to make assessment more principled, teachers need to recognise “that our characterisations of students are inferences and that, by their very nature, inferences are uncertain and also subject to unintentional biases.” (Bennett, 2011, p.18). Bias can be evident in the construction of assessment tasks which means that teachers need to design assessment with, for example, gender, socio-economic and cultural considerations in mind in order to be able to make valid inferences from the data.

The most common way bias is caused by classroom teachers in assessment is through assumptions of background knowledge or the privileging of certain types of background knowledge (OECD, 2013). An individual assessment task may require a level of background knowledge to fully engage with, teachers should be aware of this and allow easy access to this information to lessen the impact of advantage or disadvantage and to not compound this advantage or disadvantage in other assessment items. The Illinois Guiding Principles of Assessment (2015) highlights the importance of classroom assessment practices being responsive to and respectful of the cultural and linguistic diversity of students and mentions unnecessary linguistic complexity as an example of bias. The NSW Centre for Education Statistics & Evaluation (2015) refers to assessment that does not “tacitly or explicitly privilege students from high socio-economic backgrounds” (p.6).

Under the Disability Standards for Education (2005) teachers are required to make reasonable adjustments to assessment for students with a disability. Reasonable adjustments are ones that maintain the assessment of a student against the Achievement Standards, unit goals and unit content descriptions of the unit while mitigating the effect of a disability on the assessment. Identifying the key knowledge, skills and understandings is an essential component to ensure that the validity of the assessment is maintained.

Formal assessment in senior secondary should assess the student’s objective performance and not incorporate judgements of character, effort, behaviour or potential (Hanover Research, 2011). This can be difficult for some teachers. Teachers can, however, take steps to ensure these unconscious biases do not cloud their objective judgement such as transparent and explicit marking schemes and marking processes, deidentified student assessment, or having teachers not teaching the unit as markers of assessment (Stevens, Ructtinger, Liyanage & Crawford, 2017; Masters & Forster, 1996).

Calculating the bias in assessment can really only be determined through the analysis of assessment results.

Further information on designing and assessing Bias Awareness in assessment is available in the full Quality Assessment Guidelines Document (770 kb)

Levels of Thinking

There are a number of proposed theories for how students learn and how their thinking in concepts progresses. The most widely known general theoretical frameworks are Bloom’s Taxonomy (1956), Anderson and Krathwohl’s Taxonomy (Bloom’s revised taxonomy) (2001) or SOLO Taxonomy (Biggs & Collis, 1982). These generally aim to describe phases of understanding and application, and the interconnectedness with other concepts or ideas.

Individual concepts from a domain can be mapped out to describe the sequence of how ideas and practices develop. These are generally called ‘learning progressions’ (Furtak, Morrison, and Kroog, 2014). The best developed learning progressions aim to be ‘top-down’ involving the views of content experts and ‘bottom-up’ by seeking to understand how student learning intersects with the content (Stevens, Ructtinger, Liyanage & Crawford, 2017). Ideally, they are linear and impossible for students to achieve higher elements without satisfying earlier elements. For this reason, learning progressions work best when focused on an appropriately small concept and are locally adapted to the students (Wiliam, 2014).

Providing assessment that assesses a range of thinking levels allows students access to the assessment task as well as the opportunity to develop and extend their thinking. Teachers are faced with increasing diversity in classrooms (Moon, 2005) and therefore using assessment tasks that have a range of thinking levels, from low to high, will allow for a spread of results. In addition, having a range of assessment tasks will allow students to demonstrate different thinking levels, skills and abilities, and different assessment tools such as group work, oral tests or debates can help to improve their learning (Murillo & Hidalgo, 2017).

All assessment tasks in the ACT are based on the Achievement Standards which cater for the needs of diverse learners. Rubrics which are developed for each task are specific and should use the verbs from the theoretical framework to define levels of achievement (Griffin, 2018).

Further information on designing and assessing Levels of Thinking in assessment is available in the full Quality Assessment Guidelines Document (770 kb)

Student Engagement

Students who are unmotivated to complete an assessment will not produce reliable or valid assessment results (Nuthall, 2007). Which means student engagement is an important aspect of a quality assessment.

Transparent and clear assessment instructions which describes what success looks like allows students to participate fairly in the assessment process and increases reliability (Wiliam, 2014). Students need to feel equipped to complete the task with the knowledge, understanding and skills gained from the classroom.

In addition, designing assessments that are embedded in contemporary issues and relevant to the students also improves engagement. Authentic tasks promote realistic problem-solving (Masters, 2014, Bae & Kokka, 2016) and allow students to think as an expert would in a discipline area. Bae and Kokka also outline how student autonomy can improve engagement, giving students decision-making opportunities in regard to their assessment. Collaborative opportunities are also often popular with students.

A student’s engagement with assessment is not just affected by these factors. Indeed family, peer and internal pressures can have a greater impact on a student’s motivation than the formal assessment requirements (Nuthall, 2007). Schools, leaders, and classroom teachers need to promote positive student wellbeing, ensuring that students feel supported with their needs.

Further information on designing and assessing Student Engagement in assessment is available in the full Quality Assessment Guidelines Document (770 kb)

Academic Integrity

Academic integrity is the assurance that student work is the genuine product of the student being assessed. Academic integrity is of the utmost importance for ensuring that results allow valid inferences to be made about student achievement.

Assessment tasks that utilise ‘test conditions’ that prevent communication between students is a common approach for appropriate tasks. The test conditions should be clearly communicated to students to remove the possibility of ambiguity or confusion. Maintaining test security and ensuring tasks are not reused will further assist in academic integrity.

Teachers can build academic integrity into their assessments through: designing a wide range of assessment types; changing tasks regularly; using a recent or local context rather than a general context; incorporating classroom experiences that outside agents would not be privy to; including personal reflection/opinion; using interdependent tasks and drafting or evidence of planning, check points and clear tracking (Charles Sturt University, 2020, University of Waterloo, n.d., University of Tasmania, 2018).

Further information on designing and assessing Academic Integrity in assessment is available in the full Quality Assessment Guidelines Document (770 kb)

Quality Assessment Guidelines (770 kb)