The Role of Testing and Assessment in Education

Ensure knowledge being shared by teacher has been received by students
Helping students better their learning skills —> administer tests pinpointing possible areas of learning difficulty
Determine if students are prepared to learn more advanced material —> ‘readiness’ or ‘aptitude’ tests
Testing required by law

Response to Intervention

Background

Mid 1970s – specific learning disability (SLD) was diagnosed if a significant discrepancy existed between the child’s measured intellectual ability (usually on an intelligence test) and the level of achievement that could reasonably be expected from the child in one or more areas (including oral comprehension, listening comprehension, written comprehension, basic reading skills, reading comprehensions, mathematics calculation and mathematics reasoning)
2007 – SLD: a disorder in one or more of the basic psychological processes involved in understanding or in using language, spoken to written, which disorder may manifest itself in the imperfect ability to listen, think, speak, read, write, spell or do mathematical calculations.

RtI model

Response to intervention model: a multilevel prevention framework applied in educational settings that is designed to maximise student achievement through the use of data that identifies students at risk for poor learning outcomes combined with evidence based intervention and teaching that is adjusted on the basis of student responsiveness
(a) teachers provide evidence-based instruction
(b) student learning of that instruction is regularly evaluated
(c) intervention, if required, occurs in some form of appropriate adjustment to the instruction
(d) reevaluation of learning takes place
(e) intervention and reassessment occur as necessary
Model is multilevel because there are at least 3 levels of intervention (or teaching)
1. Classroom environment – all students are being taught what it is that the teacher is teaching
2. Small group of learners who have failed to make adequate progress in the classroom have been segregated for special teaching
3. Individually tailored and administered instruction for students who have failed to respond to the second level of intervention.
By providing intervention appropriate to the level of the students needs, the objective of RtI is to accelerate the learning process for all students
also identifies students with learning disabilities

Diagnostic tests do not necessarily provide information that will answer questions concerning why a learning difficulty exists —> Other tests are needed to answer that question.
In general, diagnostic tests are administered to students who have already demonstrated their problem with a particular subject area through their poor performance either in the classroom or on some achievement test.
For this reason, diagnostic tests may contain simpler items than achievement tests designed for use with members of the same grade.

Reading Tests

The Woodcock Reading Mastery Tests-Revised (WRMT-R)

Age 5 and older and adults to age 75 and beyond.
Subtests
- Letter Identification: Items that measure the ability to name letters presented in different forms. Both cursive and printed as well as uppercase and lowercase letters are pre- sented.
- Word Identification: Words in isolation arranged in order of increasing difficulty. The student is asked to read each word aloud.
- Word Attack: Nonsense syllables that incorporate phonetic as well as structural analy- sis skills. The student is asked to pronounce each nonsense syllable.
- Word Comprehension: Items that assess word meaning by using a four-part analogy format.
- Passage Comprehension: Phrases, sentences, or short paragraphs, read silently, in which a word is missing.

The student must supply the missing word.

Three subtests new to the third edition are Phonological Awareness, Listening Comprehension, Oral Reading Fluency
All of the subtests taken together are used to derive a picture of the testtaker’s reading-related strengths and weaknesses, as well as an actionable plan for reading remediation where necessary

Math Tests

The Stanford Diagnostic Mathematics Test, Fourth Edition (SDMT4) and the KeyMath 3 Diagnostic System (KeyMath3-DA) are two of the many tests that have been developed to help diagnose difficulties with arithmetic and mathematical concepts
Items of such tests typically test everything from knowledge of basic concepts and operations through applications entailing increasingly advanced problem solving skills.
KeyMath3-DA
- Age 4.5 – 21
- Test comes in 2 forms, each containing 10 subtests
SDMT4
- Standardised test that can provide useful diagnostic insights with regard to the mathematical abilities of children just entering school to just entering college
- Individual or group administration
- Contains multiple choice and (optional) free-response items
Some instruments assess attitudes in specific subject areas, whereas others, such as the Survey of School Attitudes and the Quality of School Life Scales, are more general in scope.
The Survey of Study Habits and Attitudes (SSHA) and the Study Attitudes and Methods Survey combine the assessment of attitudes with the assessment of study methods.
The SSHA, intended for use in grades 7 through college, consists of 100 items tapping poor study skills and attitudes that could affect academic performance.
Two forms are available, Form H for grades 7 to 12 and Form C for college, each requiring 20 to 25 minutes to complete.
Students respond to items on the following 5-point scale: rarely, sometimes, frequently, generally, or almost always.
Test items are divided into six areas: Delay Avoidance, Work Methods, Study Habits, Teacher Approval, Education Acceptance, and Study Attitudes.
The test yields a study skills score, an attitude score, and a total orientation score.

Structurally, ten new subtests were created, eight of the existing subtests were removed, and only eight of the original subtests remained.

Such significant structural changes in the test must be kept in mind by test users making comparisons between testtaker K-ABC scores and KABC-II scores.
Conceptually, the grounding of the K-ABC in Luria’s theory of sequential versus simultaneous processing theory was expanded.
In addition, a grounding in the Cattell- Horn-Carroll (CHC) theory was added.
This dual theoretical foundation provides the examiner with a choice as to which model of test interpretation is optimal for the particular situation.
You can choose the Cattell-Horn-Carroll model for children from a mainstream cultural and language background; if Crystallized Ability would not be a fair indicator of the child’s cognitive ability, then you can choose the Luria model, which excludes verbal ability.
Administer the same subtests on four or five ability scales.
Then, interpret the results based on your chosen model.
Either approach gives you a global score that is highly valid and that shows small differences between ethnic groups in comparison with other comprehensive ability batteries.
In general, reviewers of the KABC-II found it to be a psychometrically sound instrument for measuring cognitive abilities.
However, few evidenced ease with its new, dual theoretical basis. e.g. Thorndike (2007) wondered aloud about assessing two distinct sets of processes and abilities without adequately explaining “how a single test can measure two distinct constructs” (p. 520). Braden and Ouzts (2007) expressed their concern that combining the two interpretive models “smacks of trying to have (and market) it both ways”. Bain and Gray (2008) were disappointed that the test manual did not contain sample reports based on each of the models.
Some reviewers raised questions about the variable (or variables) that were actually being measured by the KABCII. For example, Reynolds et al. (2007) questioned the extent to which certain supplemental tests could best be conceived as measures of specific abilities or measures of multiple abilities.
In general, however, they were satisfied that for “school-age children, the KABC-II is closely aligned with the five

CHC broad abilities it is intended to measure”

The Woodcock-Johnson III (WJ III)

A psychoeducational test package consisting of two co-normed batteries:
- Tests of Achievement
- Tests of Cognitive Abilities
Based on the Cattell-Horn-Carroll (CHC) theory of cognitive abilities.
Use with persons as young as 2 and as old as “90+,” according to the test manual.︎
The WJ III yields a measure of general intellectual ability (g) as well as measures of specific cognitive abilities, achievement, scholastic aptitude, and oral language. Along with other assessment techniques, the WJ III may be used to diagnose SLDs and to plan educational programs and interventions.
The Tests of Achievement are packaged in parallel forms designated A and B, each of which are divided into a standard battery (twelve subtests) and an extended battery (ten additional subtests).
As illustrated in Table 11–4, interpretation of an achievement test is based on the testtaker’s performance on clusters of tests in specific curricular areas.
The Tests of Cognitive Abilities may also be divided into a standard battery (ten subtests) and an extended battery (ten additional subtests).
As illustrated in Table 11–5, the subtests tapping cognitive abilities are conceptualized in terms of broad cognitive factors, primary narrow abilities, and cognitive performance clusters.
When using either the achievement or cognitive abilities tests, the standard battery might be appropriate for screenings or brief reevaluations.
The extended battery would likely be used to provide a more comprehensive and detailed assessment, complete with diagnostic information.
In any case, cluster scores are used to help evaluate performance level, gauge educational progress, and identify individual strengths and weaknesses.
The WJ III was normed on a sample of 8,818 subjects from ages 24 months to “90” years who were representative ︎ of the population of the United States.
Age-based norms are provided from ages 24 months to 19 years by month and by year after that.
Grade-based norms are provided for kindergarten through grade 12, two-year college, and four-year college, including graduate school.
Procedures for analysis of reliabilities for each subtest were appropriate, depending upon the nature of the tests.

e.g. the reliability of tests that were not speeded and that did not have multiple-point scoring systems was analyzed by means of the split-half method, corrected for length using the Spearman-Brown correction formula.

The test manual also presents concurrent validity data.
Support for the validity of various aspects of the test has also come from independent researchers.
g. Floyd et al. (2003) found that certain cognitive clusters were significantly related to mathematics achievement in a large, nationally representative sample of children and adolescents.
Scoring of the WJ III is accomplished with the aid of software provided in the test kit —> Data from the raw scores are entered, and the program produces a summary report and a table of scores, including all derived scores for tests administered as well as clusters of tests.

OTHER TOOLS OF ASSESSMENT IN EDUCATIONAL SETTINGS

Performance, Portfolio, and Authentic Assessment

Performance assessment has vaguely referred to any type of assessment that requires the examinee to do more than choose the correct response from a small group of alternatives.
g. essay questions and the development of an art project are examples of performance tasks. By contrast, true– false questions and multiple-choice test items would not be considered performance tasks.
Portfolio assessment: evaluation of one’s work samples.
In many educational settings, dissatisfaction with some more traditional methods of assessment has led to calls for more performance-based evaluations —> Authentic assessment
When used in the context of like-minded educational programs, portfolio assessment and authentic assessment are techniques designed to target academic teachings to real- world settings external to the classroom.
g. how students could use portfolios to gauge their progress in a high-school algebra course —> could be instructed to devise their own personal portfolios to illustrate all they have learned about algebra.
An important aspect of portfolio assessment is the freedom of the person being evaluated to select the content of the portfolio.
g. Some students might include narrative accounts of their understanding of various algebraic principles. Other students might reflect in writing on the ways algebra can be used in daily life. Still other students might attempt to make a convincing case that they can do some types of algebra problems that they could not do before taking the course.
Benefits

Engaging students in the assessment process, giving them the opportunity to think generatively, and encouraging them to think about learning as an ongoing and integrated process.

Drawback:
- Penalty such a technique may levy on the noncreative student. Typically, exceptional portfolios are creative efforts. A person whose strengths do not lie in creativity may have learned the course material but be unable to adequately demonstrate that learning in such a medium.
- Evaluation of portfolios —> great deal of time and thought must be devoted to their evaluation. In a lecture class of 300 people, for example, portfolio assessment would be impractical. – Difficult to develop reliable criteria for portfolio assessment, given the great diversity of work products.

Hence, inter-rater reliability in portfolio assessment can become a problem.

Authentic assessment (performance- based assessment) – Evaluation of relevant, meaningful tasks that may be conducted to evaluate learning of academic subject matter but that demonstrate the student’s transfer of that study to real-world activities.
- g. Authentic assessment of students’ writing skills would be based on writing samples rather than on responses to multiple-choice tests.
- g. Authentic assessment of students’ reading would be based on tasks that involve reading—preferably “authentic” reading, such as an article in a local newspaper as opposed to a piece contrived especially for the purposes of assessment.
- g. Students in a college-level psychopathology course might be asked to identify patients’ psychiatric diagnoses on the basis of videotaped interviews with the patients.
- Benefit
- Authentic assessment is thought to increase student interest and the transfer of knowledge to settings outside the classroom.
- Drawback
- The assessment might assess prior knowledge and experience, not simply what was learned in the classroom. E.g., students from homes where there has been a long-standing interest in legislative activities may well do better on an authentic assessment of reading skills that employs an article on legislative activity.
- Authentic skill may inadvertently entail the assessment of some skills that have little to do with classroom learning. E.g. authentic assessment of learning a cooking school lesson on filleting fish may be confounded with an assessment of the would-be chef’s perceptual-motor skills.

Peer Appraisal Techniques

One method of obtaining information about an individual is by asking that individual’s peer group to make the evaluation.
Peer appraisals can help call needed attention to an individual who is experiencing academic, personal, social, or work-related difficulties—difficulties that, for whatever reason, have not come to the attention of the person in charge.
Peer appraisals allow the individual in charge to view members of a group from a different perspective: the perspective of those who work, play, socialize, eat lunch, and walk home with the person being evaluated.
Peer appraisals supply information about the group’s dynamics: who takes which roles under what conditions.

Knowledge of an individual’s place within the group is an important aid in guiding the group to optimal efficiency.

Peer appraisal techniques tend to be most useful in settings where the individuals doing the rating have functioned as a group long enough to be able to evaluate each other on specific variables.
The nature of peer appraisals may change as a function of changes in the assessment situation and the membership of the group. E.g. an individual who is rated as the shyest in the classroom can theoretically be quite gregarious—and perhaps even be rated the rowdiest—in a peer appraisal undertaken at an after-school center.
One method of peer appraisal that can be employed in elementary school (as well as other) settings is called the Guess Who? technique. Brief descriptive sentences (such as “This person is the most friendly”) are read or handed out in the form of question- naires to the class, and the children are instructed to guess who. Whether negative attri- butes should be included in the peer appraisal (for example, “This person is the least friendly”) must be decided on an individual basis, considering the potential negative consequences such an appraisal could have on a member of the group.
The nominating technique is a method of peer appraisal in which individuals are asked to select or nominate other individuals for various types of activities. A child being interviewed in a psychiatric clinic may be asked, “Who would you most like to go to the moon with?” as a means of determining which parent or other individual is most important to the child. Members of a police department might be asked, “Who would you most like as your partner for your next tour of duty and why?” as a means of finding out which police officers are seen by their peers as especially competent or incompetent.
The results of a peer appraisal can be graphically illustrated. One graphic method of organizing such data is the sociogram. Figures such as circles or squares are drawn to represent different individuals, and lines and arrows are drawn to indicate various types of interaction. At a glance, the sociogram can provide information such as who is popular in the group, who tends to be rejected by the group, and who is relatively neutral in the opinion of the group. Nominating techniques have been the most widely researched of the peer appraisal techniques, and they have generally been found to be highly reliable and valid. Still, the careful user of such techniques must be aware that an individual’s perceptions within a group are constantly changing. As some members leave the group and others join it, the positions and roles the members hold within the group change. New alliances form, and as a result, all group members may be looked at in a new light. It is therefore important to periodically update and verify information.

Measuring Study Habits, Interests, and Attitudes

Academic performance is the result of a complex interplay of a number of factors.
Ability and motivation are inseparable partners in the pursuit of academic success.
A number of instruments designed to look beyond ability and toward factors such as study habits, interests, and attitudes have been published.
g. the Study Habits Checklist, designed for use with students in grades 9 through 14, consists of 37 items that assess study habits with respect to note taking, reading material, and general study practices.
If a teacher knows a child’s areas of interest, instructional activities engaging those interests can be employed.
The What I Like to Do Interest Inventory consists of 150 forced-choice items that assess four areas of interests: academic interests, artistic interests, occupational interests, and interests in leisure time (play) activities.
Included in the test materials are suggestions for designing instructional activities that are consonant with the designated areas of interest.
Attitude inventories used in educational settings assess student attitudes toward a variety of school-related factors.
Interest in student attitudes is based on the premise that “positive reactions to school may increase the likelihood that students will stay in school, develop a lasting commitment to learning, and use the school setting to advantage”

PSYCHOEDUCATIONAL TEST BATTERIES

Psychoeducational test batteries are test kits that generally contain two types of tests: those that measure abilities related to academic success and those that measure educational achievement in areas such as reading and arithmetic.
Data derived from these batteries allow for normative comparisons (how the student compares with other students within the same age group), as well as an evaluation of the testtaker’s own strengths and weaknesses—all the better to plan educational interventions.

The Kaufman Assessment Battery for Children (K-ABC) and the Kaufman Assessment Battery for Children, Second Edition (KABC-II)

K-ABC

Developed by a husband-and-wife team of psychologists, the K-ABC was designed for use with testtakers from age 21⁄2 through age 121⁄2
Subtests measuring both intelligence and achievement
The K-ABC intelligence subtests are divided into two groups, reflecting the two kinds of information-processing skills: simultaneous skills and sequential skills
Table 11–3 presents the particular learning and teaching styles that reflect the two types of intelligence measured by the K-ABC.
Scores on the simultaneous and sequential subtests are combined into a Mental Processing Composite (analogous to the IQ measure calculated on other tests).
Factor-analytic studies of the K-ABC have confirmed the presence of 2 factors:
- Simultaneous processing
- Sequential processing
Reseachers have had difficulty finding an achievement factor. —> i.e.Good and Lane (1988) identified the third factor of the K-ABC as verbal comprehension and reading achievement, Kaufman and McLean (1986) identified it as achievement and reading ability, Keith and Novak (1987) identified it as reading achievement and verbal reasoning.
Whatever the factor is, the K-ABC Achievement Scale has been shown to predict achievement
Recommendations for teaching based on Kaufman and Kaufman’s (1983a, 1983b) concept of processing strength can be derived from the K-ABC test findings.
It may be recommended, for example, that a student whose strength is processing sequentially should be taught using the teaching guidelines for sequential learners.
Students who do not have any particular processing strength may be taught using a combination of methods.

KABC-II

The next generation of the K-ABC was published in 2004 —> (KABC-II).
There are changes in the age range covered, the structure of the test, and even the conceptual underpinnings of the test.
The age range for the second edition of the test was extended upward (ages 3 to 18) in order to expand the possibility of making ability/achievement comparisons with the same test through high school.
Structurally, ten new subtests were created, eight of the existing subtests were removed, and only eight of the original subtests remained.
Such significant structural changes in the test must be kept in mind by test users making comparisons between testtaker K-ABC scores and KABC-II scores.
Conceptually, the grounding of the K-ABC in Luria’s theory of sequential versus simultaneous processing theory was expanded.
In addition, a grounding in the Cattell- Horn-Carroll (CHC) theory was added.
This dual theoretical foundation provides the examiner with a choice as to which model of test interpretation is optimal for the particular situation.
You can choose the Cattell-Horn-Carroll model for children from a mainstream cultural and language background; if Crystallized Ability would not be a fair indicator of the child’s cognitive ability, then you can choose the Luria model, which excludes verbal ability.
Administer the same subtests on four or five ability scales.
Then, interpret the results based on your chosen model.
Either approach gives you a global score that is highly valid and that shows small differences between ethnic groups in comparison with other comprehensive ability batteries.
In general, reviewers of the KABC-II found it to be a psychometrically sound instrument for measuring cognitive abilities.
However, few evidenced ease with its new, dual theoretical basis. e.g. Thorndike (2007) wondered aloud about assessing two distinct sets of processes and abilities without adequately explaining “how a single test can measure two distinct constructs” (p. 520). Braden and Ouzts (2007) expressed their concern that combining the two interpretive models “smacks of trying to have (and market) it both ways”. Bain and Gray (2008) were disappointed that the test manual did not contain sample reports based on each of the models.
Some reviewers raised questions about the variable (or variables) that were actually being measured by the KABCII. For example, Reynolds et al. (2007) questioned the extent to which certain supplemental tests could best be conceived as measures of specific abilities or measures of multiple abilities.
In general, however, they were satisfied that for “school-age children, the KABC-II is closely aligned with the five