Reliability

Is pertinent to the consistency of measurement
There are various types of reliability, all of which are estimated quantitatively
Most basic way to understand the concept is through repeated measurements
Reliability and measurement error are essentially antonymous
We calculate various reliability estimates to understand the nature and magnitude of the measurement error associated with scores obtained from instruments and tests (tools and tests don’t have reliability, only scores)

Importance of reliability:

A very large percentage of psychology is relevant to the estimation associations between variables, broadly defined (regression o ANrVA).
If the variables we use to test our hypotheses are associated with a larger amount of measurement error, then the whole exercise is pointless.

Classical test theory (CTT):

Is a measurement theory that defines the conceptual basis of reliability
Also specifies procedures for estimating the reliability of scores derived from a psychological test or instrument
A person’s observed score on a test is a function of that person’s true score, plus error:

X_r = X_T + X_E

True scores:

A hypothetical score devoid of measurement error, never be able to actually measure them
Conceived in the context of a particular test or instrument- not a construct
Note that true scores are not ‘construct scores’, there is no such thing
True scores may be perfect but this is only true in the context of measurement error associated with data derived from a particular test or instrument
True scores can be perfect from error standpoint but absolutely terrible from a valid representation of a construct standpoint
g. person steps on scales 10 times, each time the weight is the same. This score is perfectly reliable, a true score. However how do you now that the weight the scale says is actually the person’s weight? Weight could be consistently wrong

Observed scores:

We obtain from tests or instruments, the actual measurement.

We want our observed scores to be close to their corresponding true scores as much as possible

The differences between the two scores is the reliability.
Discrepancy between observed scores and true scores is considered to be due to measurement error

Observed, true and error scores:

All other things equal, you want there to be a large positive correlation between observed scores and true scores o This correlation exists only in theory
By contrast, you want observed scores and error scores to be uncorrelated
If the observed scores and the error scores are correlated highly, it means they are measuring the same process: error.

Error scores:

Should have a mean of zero o This is because there should e just as many people that should have an observed score that is too large as too small
Should be a random process o As they are random, they should not correlate with anything (except possibly their corresponding observed scores)
Error scores should be uncorrelated with true scores o Whether you are genuinely high or low on self-esteem, the ‘extraneous’ error related factors should effect people equally

R² between observed and true scores:

When you square the reliability index (r), you get a conceptual estimation of the reliability

RXX = rot2

The final term (e.g R_XX= 0.48) concludes that 48% of the variance in observed scores is shared with true scores

Interpretation guidelines for variability:

60 – too low for any purpose
70 – bare minimum acceptable for beginning stage research
80 – good level for research purposes
90+ – necessary in applied contexts where important decisions are made about individuals

Ratio of true score variance to observed variance:

This conceptualisation is similar to eta squared: the ratio of SS_EFFECT to SS_TrTAL

Conceptually, in the reliability case it is the ratio of SS_TRUEto SS_rBSERVED

Formulated as: R_xx= s_t² / S_o²

Lack of correlation between O and E:

If reliability is the correlation between true scores and observed scores, then it is necessarily the case that it is the relative absence of a correlation between observed and error scores
Rxx = 1 -r2oe

Relative lack of error variance:

Instead of the ratio of true score variance to observed variance, in this case we speak of the ratio of error variance to observed variance
We subtract this ratio by 1 to place in the same context of reliability (rather than error)
Rxx = 1 – s2e / s20 Conceptualisations review:

Parallel tests:

Two tests are considered parallel if they are identical to each other psychometrically, but differ in the actual items that make up each test
All tau-equivalence assumptions:

o Implies that the true scores associated with each test represent the same construct o Thus a person’s true score on one test would be expected to be identical on the other test

Assumes equal error variance between the two tests as well

Parallel tests and reliability:

According to CTT, the correlation between the composite scores on test 1 and the composite

scores on test 2 represent the reliability associated with the scores

The closer the correlation is to 1.0 the more reliable we consider the scores

Note the correlation between tests can still be reliable, but invalid at representing the attribute or construct of interest