|
This example analyzes 32
items selected from the 48-item version of the Jenkins Activity
Survey for Health Prediction, Form B (Jenkins, Rosenman, and
Zyzanski, 1972). The data are responses of 598 men from central
Finland drawn from a larger survey sample. Most of the items
are rated on three-point scales representing little or no,
occasional, or frequent occurrence of the activity or behavior
in question. For purposes of the present analysis, the scales
have been dichotomized near the median. Wording in the positive
or negative direction varies from item to ttem as follows
(item numbers are those of the original pool of items from
which those of the present form was selected):
-Q156,-Q157,+Q158,-Q165,-Q166,-Q167,+Q247,+Q248,-Q249,-Q250,+Q251,+Q252,
+Q253,+Q254,+Q255,+Q256,+Q257,-Q258,-Q259,+Q260,+Q261,+Q262,+Q263,+Q264,
+Q265,-Q266,+Q267,+Q268,+Q269,+Q270,+Q271,+Q272,-Q273,-Q274,-Q275,+Q276,
+Q277,+Q278,-Q279,-Q280,+Q307,+Q308,+Q309,+Q310,+Q311,-Q312,-Q313,-Q314.
The first 7 lines of the
data file exampl03.dat are shown below.
201000220122112221022212202112211101122112222000
001221211011100111111111111110111102211111211020
0010.02100222122021221222112112212.0011111222001
002020220212012120011112112221221022211111222202
201000221000211221221112012211122112211111222000
001001221022011120022222212222211101121112222101
102100111022112120021212212221121212111022200021
The first 10 columns of
each record are used as case identification and are read first.
Starting again in the first column by using the 'T' operator,
the responses to the 48 items are read as single fields (48A1).
(10A1,T1,48A1)
The SELECT keyword on the
PROBLEM command indicates that 32 items are selected from
the original 48 items. The SELECT command provides the selected
items in the order in which they will be used. The RESPONSE
command lists the 5 responses indicated on the PROBLEM command
(RESPONSE keyword) and the KEY command provides the correct
responses for each of the 48 items. The NOTPRESENTED option
on the PROBLEM command is required if one of the response
codes identifies not presented items. The '.' code on the
RESPONSE command identifies these responses.
The TETRACHORIC command
requests the printing of the coefficients to 3 decimal places
(NDEC = 3) in the printed output file (LIST option). The tetrachoric
correlation matrix,
item parameters, rotated factor loadings, and the factor scores
will be saved in the files exampl03.cor, exampl03.par,
example03.rot, and exampl03.fsc, respectively
as specified on the SAVE command. The FACTOR and FULL commands
are used to specify parameters for the full-information item
factor analysis. Three factors and ten latent roots are to
be extracted, as indicated by the NFAC and NROOT keywords
respectively. A VARIMAX rotation is requested. Note that this
keyword may not be abbreviated in the FACTOR command. A maximum
of 80 EM cycles will be performed (CYCLES keyword on the FULL
command). The convergence criterion for the EM cycles is given
by the PRECISION keyword on the TECHNICAL command.
Cases will be scored by
EAP (Expected A Posteriori, or Bayes) estimation
with adaptive quadrature (METHOD = 2 on the SCORE command).
Posterior standard deviations will also be computed. Results
will be saved in the exampl03.fsc file (FSCORE option
on the SAVE command). The factor scores for the first 20 cases
will be listed in the output file (LIST = 20). See next
example for MAP (Maximum A Posteriori, or Bayes Modal)
estimation for the same cases.
>TITLE
ITEMS FROM THE JENKINS
ACTIVITY SURVEY
ADAPTIVE
ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
>PROBLEM NITEM=48,SELECT=32,RESPONSES=5,NOTPRESENTED;
>NAMES Q156,Q157,Q158,Q165,Q166,Q167,Q247,Q248,Q249,Q250,Q251,Q252,
Q253,Q254,Q255,Q256,Q257,Q258,Q259,Q260,Q261,Q262,Q263,Q264,
Q265,Q266,Q267,Q268,Q269,Q270,Q271,Q272,Q273,Q274,Q275,Q276,
Q277,Q278,Q279,Q280,Q307,Q308,Q309,Q310,Q311,Q312,Q313,Q314;
>RESPONSE '8','0','1','2','.';
>KEY 002000220022222220022222202222220002220022222000;
>SELECT 3,5,6,7,9,11(1)14,17(1)23,25(1)30,32,33,35,36,39(1)42,47,48;
>TETRACHORIC LIST, NDEC=3;
>FACTOR NFAC=3,NROOT=10,ROTATE=VARIMAX;
>FULL CYCLES=80;
>TECHNICAL PRECISION=0.005;
>SCORE METHOD=2,LIST=20;
>SAVE CORR,PARM,FSCORE;
>INPUT NIDW=10,SCORES,FILE='EXAMPL03.DAT';
(10A1,T1,48A1)
>STOP

 |
The first part of the output
contains the name of the command file (exampl03.tsf)
and the name of the output file (exampl03.out). Each
TESTFACT run produces output under one or more of the following
headings, depending on the type of analysis.
The analysis specified
in exampl03.tsf produces Phase 0, Phase 1, Phase 2,
Phase 5 and Phase 7 output.

Regardless of the type
of analysis, a Phase 0
output is produced, being an echo of the input commands contained
in the *.tsf file.
PHASE 0: INPUT COMMANDS
ITEMS FROM THE JENKINS
ACTIVITY SURVEY
ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
------------------------------------------------------------
>PROBLEM NITEM=48,SELECT=32,RESPONSES=5,NOTPRESENTED;
This example analyzes 32 items
selected from the 48-item version
of the Jenkins Activity Survey
for Health Prediction, Form B (Jenkins,
Rosenman, and Zyzanski, 1972).
The data are responses of 598 men from
central Finland drawn from a larger
survey sample. Most of the items
are rated on three-point scales
representing little or no, occasional,
or frequent occurance of the activity
or behavior in question. For
purposes of the present analysis,
the scales have been dichotomized
near the median. Wording in the
positive or negative direction varies
from item to time as follows (item
numbers are those of the original
pool of items from which those
of the present form was selected):
-Q156,-Q157,+Q158,-Q165,-Q166,-Q167,+Q247,+Q248,-Q249,-Q250,
+Q251,+Q252,+Q253,+Q254,+Q255,+Q256,+Q257,-Q258,-Q259,+Q260,+Q261,
+Q262,+Q263,+Q264,+Q265,-Q266,+Q267,+Q268,+Q269,+Q270,+Q271,+Q272,
-Q273,-Q274,-Q275,+Q276,+Q277,+Q278,-Q279,-Q280,+Q307,+Q308,+Q309,
+Q310,+Q311,-Q312,-Q313,-Q314.
The tetrachoric correlation matrix,
item parameters, rotated factor
loadings, and the factor scores
will be saved in the files EXAMPL03.COR,
EXAMPL03.PAR, EXAMPL03.ROT, and
EXAMPL03.FSC, respectively.
Cases will be scored by EAP (Expected
A Posteriori, or Bayes)
estimation with adaptive quadrature
(Method 2). Posterior standard
deviations will also be computed.
Results will be saved in the
EXAMPL03.FSC file.
See Exampl3a.tsf for MAP (Maximum A Posteriori,
or Bayes Modal) estimation for
the same cases.
>NAMES Q156,Q157,Q158,Q165,Q166,Q167,Q247,Q248,Q249,Q250,Q251,Q252,
Q253,Q254,Q255,Q256,Q257,Q258,Q259,Q260,Q261,Q262,Q263,Q264,
Q265,Q266,Q267,Q268,Q269,Q270,Q271,Q272,Q273,Q274,Q275,Q276,
Q277,Q278,Q279,Q280,Q307,Q308,Q309,Q310,Q311,Q312,Q313,Q314;
>RESPONSE
'8','0','1','2','.';
>KEY 002000220022222220022222202222220002220022222000;
>SELECT
3,5,6,7,9,11(1)14,17(1)23,25(1)30,32,33,35,36,39(1)42,47,48;
>TETRACHORIC LIST, NDEC=3;
>FACTOR NFAC=3,NROOT=10,ROTATE=VARIMAX;
>FULL CYCLES=80;
>TECHNICAL PRECISION=0.005;
>SCORE METHOD=2,LIST=20;
>SAVE CORR,PARM,FSCORE;
>INPUT NIDW=10,SCORES,FILE='EXAMPL03.DAT';
DATA FILE NAME IS EXAMPL03.DAT
DATA FORMAT=
(10A1,T1,48A1)

Values of the response
categories (8, 0, 1, 2, .), the answer key, contents of the
first observation, the sum of weights and number of records
are given. This information enables you to verify that the
data values were read correctly from the data file exampl03.dat.
The response categories indicate a code of '8' for omitted
responses (first value) and a code of '.' for not-presented
items (last value).
Thirty-two items were selected
from the 48-item test. Based on the answer key values, a total
score for each of the 598 respondents is scored. Each item
has a set of responses: right, wrong, omit, or not presented.
For item j, j = 1, 2, ., 32, the response of
person i, i = 1, 2, ., 598 can be written as
if the response is correct,
and
if the response is incorrect.
At your option, omitted
items can be considered either wrong or not presented. The
total test score
for person i is
Respondent 1, for example,
has a total score of 19 correct out of a possible 32 as shown
below.
Answer key:
20020222220022222022222002002200
Respondent 1:
10020221121022212021121101211200
ITEMS FROM THE JENKINS ACTIVITY SURVEY
ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
------------------------------------------------------------
RESPONSE
CATEGORIES: 8 0 1 2 .
ANSWER
KEY: 20020222220022222022222002002200
CONTENTS OF FIRST OBSERVATION:
ID=2010002201
WEIGHT=
1
ITEM
RESPONSES= 201000220122112221022212202112211101122112222000
ITEM
RESPONSES AFTER SELECTION =
10020221121022212021121101211200
SUM OF WEIGHTS
=
598
NUMBER OF RECORDS=
598
Using this information,
a frequency table of the score distribution is calculated
and presented graphically.
PHASE 1: HISTOGRAM AND BASIC STATISTICS
ITEMS FROM THE JENKINS ACTIVITY SURVEY
ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
------------------------------------------------------------
MAIN TEST HISTOGRAM
FREQUENCY :
|
|
|
**
|
****
|
*****
8.0+
*****
|
*****
|
*****
|
***** *
|
* ***** *
|
*********
|
**********
|
***********
|
***********
|
***********
4.0+
***********
|
***********
|
*************
|
**************
|
**************
|
**************
|
***************
|
****************
|
*******************
|
*******************
0.0+-----+----+----+----+----+----+----+----+----+----+----+----+----+--
0.
5. 10.
15. 20.
25. 30.
SCORES
NUMBER OF OBSERVATIONS AT EACH SCORE
SCORE
COUNT FREQ |
SCORE COUNT
FREQ | SCORE
COUNT FREQ
0
0 0.0 |
11
35 5.9 |
22
21 3.5
1
0 0.0 |
12
40 6.7 |
23
10 1.7
2
0 0.0 |
13
38 6.4 |
24
8 1.3
3
0 0.0 |
14
52 8.7 |
25
6 1.0
4
1 0.2 |
15
54 9.0 |
26
1 0.2
5
2 0.3 |
16
54 9.0 |
27
1 0.2
6
1 0.2 |
17
56 9.4 |
28
0 0.0
7
5 0.8 |
18 57
9.5 | 29
0 0.0
8
7 1.2 |
19
36 6.0 |
30
0 0.0
9
18 3.0 |
20
43 7.2 |
31
0 0.0
10
20 3.3 |
21
32 5.4 |
32 0
0.0
The last portion of the
Phase 1
output gives the mean (15.9) and standard deviation (4.0)
of the Total Scores.
TEST
RECORD NUMBER
MEAN S.D.
PROPORTION S.D.
MAIN
598
598 15.9
4.0 0.497
0.500
The proportion of correct
responses, p, is
with a standard deviation

For each item, eight statistics
are produced. The Number, Mean and S.D. for item 2, for example,
are 590, 15.92, and 4.03 respectively. These values are obtained
by 'deleting' each row of the data if a not presented code
is encountered for item 2. Since 8 rows contain not-presented
codes, the mean and standard deviation of the Total Scores
is calculated for the remaining 590 cases. Note, for example,
that item 1 was presented to all 598 persons, while item 4
was presented to 592 persons.
PHASE 2: ITEM STATISTICS
ITEMS FROM THE JENKINS ACTIVITY SURVEY
ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
-----------------------------------------------------------
MAIN TEST ITEM STATISTICS
ITEM
NUMBER MEAN
S.D. RMEAN FACILITY
DIFF BIS
P.BIS
1 Q158
598 15.91
4.01 14.46
0.206 16.29
-0.262 -0.185
2 Q166
590 15.92
4.03 17.13
0.653 11.43
0.532 0.413
3 Q167
596 15.90
4.01 16.35
0.790 9.77
0.305 0.215
4 Q247
592 15.93
4.01 16.71
0.694 10.97
0.384 0.292
5 Q249
594 15.92
4.01 15.89
0.466 13.34
-0.008 -0.006
6 Q251
598 15.91
4.01 17.16
0.532 12.68
0.417 0.332
7 Q252
598 15.91
4.01 17.39
0.490 13.10
0.451 0.360
8 Q253
598 15.91
4.01 18.16
0.410 13.91
0.591 0.467
9 Q254
597 15.91
4.02 18.99
0.203 16.33
0.551 0.387
10 Q257
597 15.92
4.01 17.99
0.449 13.51
0.585 0.466
.
31 Q313
597 15.91
4.02 16.31
0.843 8.98
0.349 0.231
32 Q314
594 15.93
4.02 16.86
0.586 12.13
0.351 0.278
The mean score for those
subjects who get a specific item correct is denoted by RMEAN.
For example, since 385 respondents selected the correct response
for item 2, RMEAN for item
2 is calculated as the mean of the corresponding 385 Total
Scores and equals 17.13.
The item facility (FACILITY)
is the proportion correct response for a specific item. For
example, 385 of the 590 respondents presented with item 2
selected the correct response, and hence
The delta statistic (
or DIFF) is calculated as
where p is the item
facility and
denotes the inverse normal
transformation. This statistic has an effective range of 1
to 25, with a mean and standard deviation of 13 and 4 respectively.
The last 2 statistics are
the biserial (BIS) and
point biserial (P.BIS)
correlations. The formula for the sample point biserial correlation
is
.
For item 8, for example,
The point biserial correlation
is the correlation between the item score and the total score,
or subtest score. Theoretically
but in practice
Therefore, 0.467 indicates
a relatively strong association between item 8 and the Total
Score.
The formula for calculating
the sample biserial correlation coefficient, BIS, is
Consider, for example,
the item 3 facility, which equals 0.790. From the inverse
normal tables, this corresponds to a
-value of 0.8062.
For item 3,

The first part of the output
contains, for each selected item, the Number of Cases, Percent
Correct, Percent Omitted, Percent Not Reached and Percent
Not Presented.
PHASE 5: TETRACHORIC
CORRELATIONS
ITEMS FROM THE JENKINS ACTIVITY SURVEY
ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
------------------------------------------------------------
MAIN TEST MISSING RESPONSE INFORMATION
----------------------------------------------------------------------------
ITEM
NUMBER PERCENT
PERCENT PERCENT
PERCENT
OF CASES CORRECT
OMITTED NOT REACHED
NOT PRESENTED
----------------------------------------------------------------------------
1. Q158
598
20.6
0.0
0.0
0.0
2. Q166
590
64.4
0.0
0.0
1.3
3. Q167
596
78.8
0.0 0.0
0.3
4. Q247
592
68.7
0.0
0.0
1.0
5. Q249
594
46.3
0.0
0.0
0.7
.
31. Q313
597
84.1
0.0
0.0
0.2
32. Q314
594
58.2
0.0
0.0
0.7
----------------------------------------------------------------------------
This summary indicates
that there were no omitted codes in the data and that all
598 respondents could complete the test. The percent Not Presented
varies from 0.0 to a maximum of 1.3 for item 2. For item 2,
this percentage is calculated as
Note that the Percent Correct
is calculated here as the number of respondents who selected
the correct answer, divided by the total number of cases.
For item 2
This value differs from
the facility estimate (385/590) given under Phase 2 of the
output.
Display
1: Tetrachoric correlation matrix
The tetrachoric correlation
coefficient is widely used as a measure of association between
two dichotomous items. Tetrachoric correlations are obtained
by hypothesizing, for each item, the existence of a continuous
'latent' variable underlying the 'right-wrong' dichotomy imposed
in scoring. It is additionally hypothesized that, for each
pair of items, the corresponding two continuous 'latent' variables
have a bivariate normal distribution.
AVERAGE TETRACHORIC CORRELATION =
0.0654
STANDARD DEVIATION
= 0.2384
NUMBER OF VALID
ITEM PAIRS = 496
DISPLAY 1.
TETRACHORIC CORRELATION MATRIX
1
2
3
4
5
6
Q158 Q166
Q167 Q247
Q249 Q251
1
Q158
1.000
2
Q166 -0.383
1.000
3
Q167 -0.145
0.124 1.000
4
Q247 -0.535
0.368 0.054
1.000
5
Q249
0.106 -0.019
0.016 -0.161
1.000
6
Q251 -0.065
0.017 0.019
0.016
-0.126 1.000
.
In TESTFACT, use is made
of
, (n = number of items)
frequency tables to calculate
the tetrachoric coefficients. From the computer output, the
number of valid item pairs is 496. Since the number of items
equals 32, 32(32 - 1)/2 = 496, this data set contains no non-valid
pairs. Non-valid pairs have zero off-diagonal or marginal
frequencies. Examples of non-valid pairs are
and
The average tetrachoric
correlation equals 0.0654. Since the output contains both
negative and positive correlation coefficients, the average
value does not shed much light on the actual strength of association
between item pairs. Note that tetrachoric correlation matrices
are not necessarily positive definite.

Display 2: The positive latent roots
of the correlation matrix
By definition, a symmetric
matrix is positive definite if all its characteristic roots
are positive. From the output below, it is seen that only
the first 31 of the 32 roots are positive, and therefore the
matrix of tetrachoric correlations
is not positive definite. This problem can be corrected by
replacing the negative roots of the matrix by zero or a small
non-zero quantity.
DISPLAY 2.
THE POSITIVE LATENT ROOTS OF THE CORRELATION MATRIX
1
2
3
4
5 6
1
7.491350 3.442602
2.592276 1.745235
1.576302 1.442306
7
8
9
10
11
12
1
1.248438 1.118638
1.015248 0.971235
0.908476 0.835705
13
14
15
16
17
18
1
0.768426 0.719607
0.657375 0.638227
0.631485 0.555802
19
20
21
22
23
24
1
0.514488 0.461871
0.398661 0.375292
0.349726 0.312994
25
26 27
28
29
30
1
0.292964 0.243591
0.218973 0.183170
0.167582 0.117183
31
1
0.055375
Display 3: Number of items and sum
of latent roots and their ratio
This section of the output
shows the sum of positive roots and the ratio with which each
root has to be multiplied to obtain a sum of 'corrected roots'
which equals the number of items. To illustrate, consider
a
correlation matrix with
latent roots 3, 1, 0.8, 0.3, and -0.1. The sum of the roots
equals 5. In general, for any correlation matrix based on
n items, the sum of roots equals n.
Suppose the value of -0.1
is replaced by 0.0001, then the new sum of roots equals 5.1001.
However, by multiplying each root by the ratio 5/5.1001 =
0.9804, a 'corrected' set of roots is obtained in the sense
that their sum equals 5.
From the Display 3 part
of the output, the ratio required to obtain a corrected set
of latent roots equals 0.9984211. The corrected set is given
under the Display 4 heading.
DISPLAY 3.
NUMBER OF ITEMS AND SUM OF LATENT ROOTS
AND THEIR RATIO
32 32.0506033
0.9984211
Display 4: Corrected latent roots
DISPLAY 4.
THE CORRECTED LATENT ROOTS OF THE CORRELATION MATRIX
1
2
3
4
5
6
1 7.479522
3.437167 2.588184
1.742479 1.573814
1.440029
.
.
.
.
.
.
Display 5: Initial smoothed inter-item
correlation matrix
Any symmetric matrix can
be decomposed as
where
is a diagonal matrix with
diagonal elements the characteristic roots of
As mentioned previously,
if all roots are positive, that is, all the diagonal elements
of
are positive,
is a positive definite
matrix. When this is not the case, a 'smoothed' correlation
matrix,
may be obtained by replacing
the elements of
with the corrected roots
and negative roots with either 0 or some small positive quantity,
so that
where the columns of
are eigenvectors and the
elements of
the corrected latent roots.
The elements of the smoothed correlation matrix for the first
6 of the 32 items are given below.
DISPLAY 5.
INITIAL SMOOTHED INTER-ITEM CORRELATION MATRIX
1
2
3
4
5
6
Q158 Q166
Q167 Q247
Q249 Q251
1
Q158
1.000
2
Q166 -0.383
1.000
3
Q167 -0.145
0.124 1.000
4
Q247 -0.534
0.368 0.054
1.000
5
Q249
0.106 -0.019
0.016 -0.161
1.000
6
Q251 -0.066
0.017 0.019
0.016 -0.126
1.000
Display 6: Iterated communality
estimates
A communality is defined
as the squared multiple correlation between an observed variable
and the set of factors. The output below shows the estimated
communalities for iterations 1, 2, 3, and 4. Note the small
changes in the estimated values going from iteration 3 to
iteration 4.
At iteration 1, the squared
multiple correlation of an item with all other items is calculated
for each of the 32 items. The MINRES method (see Display 7)
is subsequently used to obtain post-solution improvements
to these initial multiple regression communality estimates.
DISPLAY 6.
ITERATED COMMUNALITY ESTIMATES
1 2
3 4
1
Q158
0.413 0.373
0.371 0.371
2
Q166
0.370 0.325
0.323 0.322
3
Q167
0.156 0.116
0.115 0.115
4
Q247
0.516 0.471
0.466 0.465
5
Q249
0.142 0.088
0.087 0.087
6
Q251
0.351 0.269
0.257 0.255
.
31
Q313
0.477 0.422
0.415 0.414
32
Q314
0.458 0.396
0.387 0.386
Display 7: The NROOT largest latent
roots of the correlation matrix
TESTFACT uses the minimum
squared residuals (MINRES) method to extract factors from
the smoothed correlation matrix
. The MINRES method minimizes the sum of squares of
the residual in a matrix
, where
where
is a
common factor matrix and
the diagonal elements
of
, the unique variances, i = 1, 2, ., p.
If
denotes the communality
for item i, then
equals .
The sum of squares of the
residuals is expressed as a statistical function (see, e.g.
Tucker and MacCallum, 1997), which is minimized by the determination
of the matrix of factor loadings
and uniqueness
In this part of the output,
the NROOT largest roots of the matrix
are reported. Note that,
since
equals
, characteristic roots are actually obtained from the
smoothed correlation matrix with the unit diagonal elements
replaced by the communalities. In general, the matrix
will be non-positive definite
and hence a subset of the roots will be negative.
If one replaces NROOT =
10 in the FACTOR command with, for example, NROOT = 20, the
output shows that roots with numbers 16, 17, 18 and higher
are all negative. An empirical rule for the selection of the
number of factors, k, is to set k equal to the
number of latent roots larger than 1. For the present example
it appears as if 3 or 4 factors are appropriate. Usually,
the number of factors is selected on the basis of some theoretical
framework concerning the items included in the analysis.
DISPLAY 7.
THE NROOT LARGEST LATENT ROOTS OF THE CORRELATION MATRIX
1
2
3
4
5
6
1
6.886994 2.861018
1.961481 1.149766
0.934423 0.738751
7
8
9
10
1
0.582337 0.423875
0.326571 0.270941
Display 8: MINRES principal factor
loadings
The estimated factor loadings
at convergence of the MINRES method are given below. These
values are used to obtain starting values for the marginal
maximum likelihood procedure specified in the FULL (full information)
command.
Note that each communality
is equal to the sum of squares of the corresponding factor
loadings. For example, for item 12, the 3 factor loadings
are 0.406, 0.275, and 0.555. Hence,
(see Display 6, communality
for item 12 at iteration 4).
DISPLAY 8.
MINRES PRINCIPAL FACTOR LOADINGS
1 2
3
1
Q158 -0.579
0.189 0.022
2
Q166
0.519 -0.230 -0.001
3
Q167
0.246 0.215 -0.091
4
Q247
0.535 -0.420 -0.049
5
Q249
-0.152 -0.022 -0.251
6
Q251
0.250 0.245
0.364
.
31
Q313
0.431 -0.478 -0.018
32
Q314
0.338 -0.511 0.105
Display 9: Initial intercept and
slope estimates
The intercept and slope
estimates are functions of the item facility and factor loadings.
If the ROTATE keyword is
omitted in the FACTOR command,
the factor loadings are the MINRES factor loadings (see Display
8). Otherwise the initial rotated factor loadings are used
(not shown in the output).
Suppose the factor loadings
for item 1 and a 3-factor solution are denoted by
,
, and
respectively. Let
and denote the slopes corresponding
to item 1 by
,
, and
respectively. Then

Intercepts are computed
as
, where

and
is the z-value corresponding
to an area under the N(0,1) curve equal to the item i facility.
For item 1, for example,
facility equals 0.206 and the corresponding z-value
is -0.8202. For item 1,
= 0.791 and therefore the item
1 intercept estimate is

Conversely, factor loadings
are related to the slopes. Let
and
respectively denote the j-th
factor loading and slope of item i,
j = 1, 2, ., nfac. Then

where

The initial intercept and
slope values are used as initial estimates for the full information
maximum likelihood procedure specified by the FULL CYCLES
= 24; command.
DISPLAY 9.
INITIAL INTERCEPT AND SLOPE ESTIMATES
INTERCEPT SLOPES
1
2
3
1 Q158
-1.036 0.387
0.636 0.191
2 Q166
0.476 -0.285
-0.609 -0.156
3 Q167
0.858 -0.341
0.023 -0.115
4 Q247
0.695 -0.245
-0.900 -0.033
5 Q249
-0.088 -0.030
0.092 0.293
6 Q251
0.092 -0.097
0.025 -0.576
.
31 Q313
1.313 -0.083
-0.837 0.020
32 Q314
0.277 0.107
-0.784 -0.045
Display 10: The EM estimation of
parameters
This part of the output
shows that parameter estimates will be based on the EM (Expected
Maximization) method and that the number of quadrature points
equals 4. Quadrature is a numeric integration method that
is often used in practice to calculate the value of an integral,
when no closed-form solution exists.
For a one-factor analysis,
for example, the log-likelihood function can be expressed
as

where N denotes
the number of cases and
a set of unknown parameters.
The integrals, or so-called
marginal probabilities, are approximated by

where
denote the weights and
the quadrature points.
Display 11: quadrature points and
weights
The numeric values of the
4 quadrature points and weights are listed. Note that the
weights are always positive and that the quadrature points
are symmetric.
DISPLAY 11.
4 QUADRATURE POINTS AND
WEIGHTS:
1
-2.334414
0.045876
2
-0.741964
0.454124
3
0.741964
0.454124
4
2.334414
0.045876
The next part of the output
shows the progress of the iterative procedure. At each cycle,
-2 x LOG-LIKELIHOOD is
reported as well as the maximum change in the intercept and
slope values. For example, the maximum change in slope 1 estimates
is equal to 0.098630. In other words, starting from the initial
slope values of 0.387 (item 1), 0.467 (item 2), ., 0.277 (item
32), the differences between these values and the revised
cycle 1 slope 1 estimates are at the most 0.098630 units.
Small maximum changes in
intercept and slope estimates are therefore an indication
of convergence.
Note that, starting from
cycle 6, the difference between -2 log L of the previous cycle
and the present cycle is reported. At cycle 19, for example,
this value, reported as CHANGE,
is 0.0726.
SUM OF MARGINAL PROBABILITIES =
0.17040D-02
CYCLE 1
- 2 X MARGINAL LOG LIKELIHOOD =
0.2084060567D+05
MAXIMUM CHANGE OF ESTIMATES
INTERCEPT = 0.038118
SLOPE = 0.098630
0.056828
0.037478
Number of patterns with zero probability
= 0
.
SUM OF MARGINAL PROBABILITIES =
0.17167D-02
CYCLE 32
- 2 X MARGINAL LOG LIKELIHOOD =
0.2080175353D+05
CHANGE = -0.3000105835D-02
MAXIMUM CHANGE OF ESTIMATES
INTERCEPT = 0.002038
SLOPE = 0.005042
0.001369
0.003811
Number of patterns with zero probability
= 0
Display 12: Chi-square and degrees
of freedom
The
-statistic reported below is calculated as

where
denotes the number of unique observed
response patterns,
the sum of weights and
the marginal probability (marginal
likelihood function) for pattern j.
The degrees of freedom,
ndf, equal

For this example,
, nfac = 3, and n (number of items) equals 32.
Hence

This
statistic can be used to
test hypotheses of the form:
: A k- factor model provides an adequate description
of the data.
: A (k + 1)- factor model provides an adequate description
of the data.
The resultant test statistic
is the difference between the
under
and the
under
with degrees of freedom
equal to the difference in degrees of freedom for
and
.
If we replace the NFAC
= 3 keyword in the FACTOR command with NFAC = 2, then

From the output below,
with 472 degrees of freedom.
The
for a 2-factor versus a
3-factor model is 13498.63 - 13155.01 = 343.62 with 502 -
472 = 30 degrees of freedom. Since this value is highly significant,
we reject the 2-factor model in favor of the 3-factor model.
DISPLAY 12.
CHI-SQUARE =
13155.03 DF =
472.00 P = 0.000
Display 13: Untransformed item parameters
The output below shows
the estimated intercept and slope estimates after convergence
is attained, or alternatively, after the maximum number of
cycles specified is reached. The number of EM cycles can be
specified by one of the following commands:
>FULL CYCLES = ncycles;
>TECHNICAL ITER(a,b,c);
DISPLAY 13.
UNTRANSFORMED ITEM PARAMETERS
INTERCEPT SLOPE
ESTIMATES
1
2
3
1 Q158
-1.048 0.280
0.620 0.264
2 Q166
0.482 -0.244
-0.562 -0.181
3 Q167
0.868 -0.294
0.038 -0.135
4 Q247
0.693 -0.141
-0.853 -0.125
5 Q249
-0.086 -0.028
0.083 0.277
6 Q251
0.093 -0.066
0.015 -0.591
.
31 Q313
1.361 0.063
-0.863 -0.133
32 Q314
0.278 0.115
-0.757 -0.069
Display 14: Standardized difficulty,
communality and principal factors
Each communality is equal
to the sum of squared factor loadings for the corresponding
item. For example, for item 1 the factor loadings are -0.533,
-0.194 ad 0.069. The communality is equal to
The standardized difficulty for
item i is calculated as
, where (see comments for Display
9)

and
denotes the j-th slope
for item j. For item 1, for example,

Hence, the standardized
difficulty for item 1 = -( -1.048)/1.238 = 0.846.
An item with a standardized
difficulty of 0 can be regarded as an item with 'average'
difficulty. Standardized difficulty scores above 0 are associated
with the more difficult items and a value of 1.0, for example,
indicates that examinees can be expected to find this item
more difficult to answer than an item with standardized difficulty
of less than 1. On the other hand, items with standardized
difficulty of less than 0 (for example item 31) can be expected
to be much easier to answer correctly.
As mentioned earlier (see
Display 9), the relationship between slopes and unrotated
factor loadings is given by

where i is the item
number, j the slope number and
as defined above.
The principal factor loadings
given below are obtained as follows. Let
be a
matrix of factor loadings
with typical elements
and define
as the
symmetric matrix
with column rank equal to the
number of factors, nfac. This implies that
has a maximum of nfac non-zero
characteristic roots
If we denote the corresponding eigenvectors by
, then the principal factor loadings shown in the output below
are computed as
,
and
where the elements of
denote the factor loadings for
the j-th factor, j = 1, 2, 3.
DISPLAY 14.
STANDARDIZED DIFFICULTY, COMMUNALITY, AND PRINCIPAL
FACTORS
DIFF. COMM.
FACTORS
1
2
3
1 Q158
0.846 0.348
-0.553 -0.194
0.069
2 Q166
-0.406 0.290
0.496 0.208
-0.030
3 Q167
-0.825 0.096
0.215 -0.215
0.062
4 Q247
-0.522 0.433
0.512 0.410
-0.050
5 Q249
0.083 0.078
-0.146 0.032
0.235
6 Q251
-0.080 0.261
0.246 -0.242
-0.377
.
31 Q313
-1.024 0.434
0.419 0.487
-0.145
32 Q314
-0.221 0.372
0.341 0.488
-0.131
Display 15: Percent of variance explained
The percentage variance
explained by factor j is calculated as

where
is the j-th characteristic
root of
(see Display 14) and
n the number of items.
From the values reported
in the output, it is seen that 20.31% of the total variance
is explained by the first factor, 8.64% by the second and
5.68% by the third factor. Since

it follows that
= 6.50.
DISPLAY 15.
PERCENT OF VARIANCE
1
2
3
1
20.31014 8.64630
5.68340
Display 16: Standardized difficulty,
communality and VARIMAX factors
The output below contains
the VARIMAX rotated factors. Note that the standardized difficulty
and communality estimates are the same as those given in Display
14. To determine which items are associated with a specific
factor, one may select, for each item, the column with the
highest loading (ignoring the sign of the loading). The following
items appear to be indicators of Factor 2, for example: items
1, 2, 4, 8, 20, 24, 25, 26, 31 and 32.
DISPLAY 16.
STANDARDIZED DIFFICULTY, COMMUNALITY, AND VARIMAX FACTORS
DIFF. COMM.
FACTORS
1
2
3
1 Q158
0.846
0.348 0.261
0.499 0.175
2 Q166
-0.406 0.290
-0.234 -0.470
-0.117
3 Q167
-0.825 0.096
-0.287 0.043
-0.110
4 Q247
-0.522 0.433
-0.138 -0.641
-0.058
5 Q249
0.083 0.078
-0.005 0.091
0.263
6 Q251
-0.080 0.261
-0.092 -0.006
-0.503
.
31 Q313
-1.024 0.434
0.014 -0.654
-0.075
32 Q314
-0.221 0.372
0.063 -0.605
-0.035

The factor scores
are Bayes estimates computed under the assumption that the
corresponding ability factors are normally distributed in
the population from which the sample of examinees was drawn.
Let
denote the j-th ability
score, k = 1, 2, ., nfac for examinee i,
i = 1, 2, ., N, then the factor scores are
where
is the item j score for
examinee i (see the discussion of here
for more details).
Display 17: Quadrature points and weights
To obtain these conditional
expectations, a 5-point quadrature formula is employed. The
points and weights are shown below.
DISPLAY 17.
5 FACTOR SCORE QUADRATURE
POINTS AND WEIGHTS:
1
-2.856970
0.011257
2
-1.355626
0.222076
3
0.000000
0.533333
4
1.355626
0.222076
5
2.856970
0.011257
Display 18: Factor scores and standard
error estimates
The syntax file contains
the command
>SCORE METHOD=2, LIST = 20;
This command requests that
the factor ability scores for the first 20 cases should be
listed as part of the output. The full set of factor scores
are written
to the file exampl03.fsc. For each case, the case ID,
number of items presented, percent correct and percent omitted
are reported. Below these values, the ability scores for each
factor, with estimated standard errors marked with an asterisk,
are given. Case 3, for example, was presented with 30 items
of which 13 were answered correctly. Hence the percentage
correct for this case is

Case 10 answered 84.4%
percent correctly and had factor scores
of 0.898, 1.234 and 1.710 respectively. Since the means of
the 598 factor scores
(see the last part of the output) are approximately 0 with
standard deviations of 0.86, 0.86 and 0.82 respectively, it
can be concluded that examinee 10 attained factor scores
that are
at least one standard deviation above average.
Factor scores
are not unique in the sense that multiplication of any column
of factor scores
by -1 does not affect the validity of the estimates. It may
therefore happen that negative scores are associated with
above average percent responses and vice versa for below average
responses. TESTFACT attempts to reverse the signs in such
a way that scores above zero are usually assigned with above
average achievement.
DISPLAY 18.
FACTOR SCORES AND STANDARD ERRORS (S.E.)
CASE HEADER:
CASE
NUMBER PERCENT PERCENT
CASE ID
PRESENTED
CORRECT OMITTED
SCORES:
1
2
3
S.E.*
==============================================================
1
32 59.4
0.0 2010002201
0.264
1.018
0.120
0.560*
0.543*
0.576*
2
32 12.5
0.0 0012212110
-1.329
-0.100
-1.495
0.483*
0.469*
0.645*
3
30 43.3
0.0 0010.02100
-0.572
0.346
0.035
0.420*
0.511*
0.527*
4
32 43.8
0.0
0020202202
-0.612
-1.378
0.584
0.530*
0.521*
0.587*
5
32 37.5
0.0 2010002210
-0.901
-0.061
-0.123
0.446*
0.482*
0.541*
6
32 59.4
0.0 0010012210
0.603
0.049
0.611
0.543*
0.526*
0.605*
7
32 59.4
0.0 1021001110
0.548
-1.653
1.132
0.456* 0.532*
0.611*
8
32 34.4
0.0 0010012210
-0.156
-0.332
-0.817
0.436*
0.484*
0.574*
9
32 28.1
0.0 2010011100
-0.204
-0.590
-0.597
0.433*
0.478*
0.556*
.

|