Validity – conceptual basis – ActiveMinds@York: Changing the Conversation About Mental Health

Validity- defined & described:

The degree to which a test measures what it is supposed to measure
The most important issue in psychological measurement
More formally, the degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses of a test

Test is valid:Thus, a test itself is neither valid or invalid o Concerns the interpretations and uses of a measure’s scores

Is related to the proposed uses of the scores
g. final exam scores are used principally to order students from most knowledgeable/competent to least knowledgeable/competent
There is some level of interpretation of degree of knowledge/competence o g. someone who scored 90 on the final would be expected to have knowledge/competence of the material. Contrasting, someone who scored 20 on the final would be expected to have knowledge/competence of less than 50% of the maternal

People do tend to refer to a test as valid
This is incorrect:

o Naivety o Laziness

The test itself isn’t valid, but the interpretations of the test is valid

Validity is a matter of degree:

Validity is not an all or none issue

The validity of test score interpretations should be conceived in terms of strong versus weak rather than valid versus invalid
When you choose a psychological test, you should choose the test that will support the interpretations that you want to make from the test scores
Typically there are several tests on the market from which to choose, validity should be one of the primary considerations
Validity is based on empirical evidence and theory
It’s not good enough to hear someone say that the test (or scores) are valid in someone’s experience
There are many popular tests out there that have little or no validity o g. writing analysis as an indicator of someone’s personality (no evidence) o E.g. colour quiz (no evidence)

How is validity determined empirically?

Unlike internal consistency reliability, there is no single analysis that can be used to represent the degree to which the interpretations of test scores are valid
Instead, several different types of analyses are conducted
Some validity analyses are quantitative and do involve statistical analyses
The pursuit of establishing the validity of the interpretation of test scores revolves around the concept of construct validity
Construct validity refers to the degree to which test scores can be interpreted as reflecting a particular psychological construct

Test content:

Most fundamental type of validity
Represents the match between the actual content of the test and the content that should be included in the test
If test scores are to be interpreted as indicators of a particular construct of interest, then the items included in the test should reflect the important facets of the construct
The description of the nature of the construct should help define the appropriate content of the test
Two types of validity related to test content:

o Content validity o Face validity

Content validity:

A test may be suggested to be associated with good content validity when the items cover the entire breadth of the construct

However the items cannot exceed the boundaries of the construct

g. final exam, there should be items in the exam from all lectures in the semester. But there should be no items from a different unit

Face validity:

The degree to which the items associated with a measure appear to be related to the construct of interest
This appearance is in the judgement of ‘non-experts’
Isn’t crucial from a fundamental psychometric perspective, just more a practical consideration of respondents
Respondents need to be made to feel that they are responding to items that are relevant to the task at hand
g. trying to hire introverts for a traffic controller job, candidates are asked to respond if they are ‘the life of the party’. Some applicants may say this question is unrelated to the job, don’t want to answer even though it would be useful.
Disadvantages:

o People can respond in a way that they think is most advantageous for them

Factorial validity:

When a test is designed, it is typically done so in such a way that the number of dimensions and facets are specified
Use a technique known as factor analysis to evaluate the factorial validity of the scores derived from a test
There are two types of factor analysis:

o Unrestricted factor analysis o Restricted factor analysis

Response processes:

There should be a close match between psychological processes that the respondents actually use when completing a measure and the process that they should use
You can’t just assume that people are going to do what you expect them to do
g. responding well to questions because you want the job, not because they possess an attribute

Association with other variables:

Another type of validity involves the association between test scores and other variables

understanding of scores from a test will be, in part shaped by the association between those scores and other measures or variables

We would expect a particular pattern of associations

Emotional intelligence:

Several researchers have created psychological inventories designed to measure emotional intelligence
To establish the validity of the scored derived from the inventories, they specified that the EI scores should:
- Correlate positively with intellectual intelligence
- Correlate negatively with the neuroticism personality dimension o Correlate positively with age
- No correlation with a measure of morningness/eveningness

Convergence evidence:

Usually described as convergence validity
The degree to which test scores are correlated with tests of related constructs
Emotional intelligence should correlate positively with intellectual intelligence
There should be a positive relationship between your self-reported scores and the raterreported scores. This evidence is known as consensual validity

Discriminant evidence:

Also known as discriminant validity
The degree to which test scores are uncorrelated with tests of unrelated constructs
It often helps to know what a construct is not in the process of its validation
Constructs should not correlate with everything under the sun, if they do they boundaries are overly expansive
Researchers do not hypothesise the correlation to be zero, but just generally low
When the correlation is so big, you know there is no discriminant validity

Concurrent validity:

rbserved when the scores from one measure correlate in a theoretically meaningful way with the scores of another measure which is considered the ‘gold standard’
g. correlating scores from a new IQ test with the WAIS
Least compelling evidence, in some cases there is no gold standard test

Predictive validity

The degree to which test scores are correlated with relevant variables that are measured at a future point in time

E.g. correlate university grades with future annual earnings

Most impressing evidence, but is relatively rare because of the time and resources required to keep track of people over time

Consequential validity:

The social/personal consequences associated with using a particular test
g. two tests were equally predictive of a criterion of interest, but one of the tests tended to yield scores that were biased against women, then we would consider the non-biased test to be associated with greater consequential validity

Criterion validity:

Non-theoretical approach to validation
g. psyc lab exam, administer the tests to students, staff, post-grad students
The spss lab exam scores would be associated validity, if there was a linear trend in the means across the three groups (with staff scoring highest)

Induction-construct development interplay:

There are occasions where a measure is developed solely from an inductive perspective
g. create a measure of personality by including all of the ‘person-descriptive’ adjectives in the dictionary (moody, unpredictable)
People rate the degree to which all of the adjectives describe them
Then the researcher would factor analyse all of the responses to help uncover the common dimensions

Contrasting reliability and validity:

Very related but very consistent
Reliability is pertinent to consistency in measurement
Differences in test scores from the perspective of reliability reflect differences among people in their levels of the trait that affects test scores- whatever the trait may be
Validity by contrast, is directly related to the nature of the trait supposedly being assessed by the measure o Reliability is a property of test scores o Validity is a property of test score interpretations o Validity is closely tied to psychological theory (reliability is not)
Reliability is a necessary but not sufficient condition for validity (need consistency for any hope of consistency)