Scientific Software International (SSI) publishes statistical data analysis software: LISREL (structural equation model/SEM, survey generalized linear model/SGLIM), 
HLM (hierarchical linear modeling, multilevel model), SuperMix (mixed models, mixed-effects program, MIXREG, MIXOR, MIXNO and MIXPREG) and Item Response Theory/IRT (BILOG-MG, MULTILOG, PARSCALE)Scientific Software International (SSI) publishes statistical data analysis software: LISREL (structural equation model/SEM, survey generalized linear model/SGLIM), 
HLM (hierarchical linear modeling, multilevel model), SuperMix (mixed models, mixed-effects program, MIXREG, MIXOR, MIXNO and MIXPREG) and Item Response Theory/IRT (BILOG-MG, MULTILOG, PARSCALE)Scientific Software International (SSI) publishes statistical data analysis software: LISREL (structural equation model/SEM, survey generalized linear model/SGLIM), 
HLM (hierarchical linear modeling, multilevel model), SuperMix (mixed models, mixed-effects program, MIXREG, MIXOR, MIXNO and MIXPREG) and Item Response Theory/IRT (BILOG-MG, MULTILOG, PARSCALE)

M  TESTFACT: Adaptive item factor analysis and factor score (EAP) estimation

This example analyzes 32 items selected from the 48-item version of the Jenkins Activity Survey for Health Prediction, Form B (Jenkins, Rosenman, and Zyzanski, 1972). The data are responses of 598 men from central Finland drawn from a larger survey sample. Most of the items are rated on three-point scales representing little or no, occasional, or frequent occurrence of the activity or behavior in question. For purposes of the present analysis, the scales have been dichotomized near the median. Wording in the positive or negative direction varies from item to ttem as follows (item numbers are those of the original pool of items from which those of the present form was selected):

-Q156,-Q157,+Q158,-Q165,-Q166,-Q167,+Q247,+Q248,-Q249,-Q250,+Q251,+Q252,     +Q253,+Q254,+Q255,+Q256,+Q257,-Q258,-Q259,+Q260,+Q261,+Q262,+Q263,+Q264,
+Q265,-Q266,+Q267,+Q268,+Q269,+Q270,+Q271,+Q272,-Q273,-Q274,-Q275,+Q276,
+Q277,+Q278,-Q279,-Q280,+Q307,+Q308,+Q309,+Q310,+Q311,-Q312,-Q313,-Q314.

The first 7 lines of the data file exampl03.dat are shown below.

201000220122112221022212202112211101122112222000
001221211011100111111111111110111102211111211020
0010.02100222122021221222112112212.0011111222001
002020220212012120011112112221221022211111222202
201000221000211221221112012211122112211111222000
001001221022011120022222212222211101121112222101
102100111022112120021212212221121212111022200021

The first 10 columns of each record are used as case identification and are read first. Starting again in the first column by using the 'T' operator, the responses to the 48 items are read as single fields (48A1).

(10A1,T1,48A1)

The SELECT keyword on the PROBLEM command indicates that 32 items are selected from the original 48 items. The SELECT command provides the selected items in the order in which they will be used. The RESPONSE command lists the 5 responses indicated on the PROBLEM command (RESPONSE keyword) and the KEY command provides the correct responses for each of the 48 items. The NOTPRESENTED option on the PROBLEM command is required if one of the response codes identifies not presented items. The '.' code on the RESPONSE command identifies these responses.

The TETRACHORIC command requests the printing of the coefficients to 3 decimal places (NDEC = 3) in the printed output file (LIST option). The tetrachoric correlation matrix, item parameters, rotated factor loadings, and the factor scores will be saved in the files exampl03.cor, exampl03.par, example03.rot, and exampl03.fsc, respectively as specified on the SAVE command. The FACTOR and FULL commands are used to specify parameters for the full-information item factor analysis. Three factors and ten latent roots are to be extracted, as indicated by the NFAC and NROOT keywords respectively. A VARIMAX rotation is requested. Note that this keyword may not be abbreviated in the FACTOR command. A maximum of 80 EM cycles will be performed (CYCLES keyword on the FULL command). The convergence criterion for the EM cycles is given by the PRECISION keyword on the TECHNICAL command.

Cases will be scored by EAP (Expected A Posteriori, or Bayes) estimation with adaptive quadrature (METHOD = 2 on the SCORE command). Posterior standard deviations will also be computed. Results will be saved in the exampl03.fsc file (FSCORE option on the SAVE command). The factor scores for the first 20 cases will be listed in the output file (LIST = 20). See next example for MAP (Maximum A Posteriori, or Bayes Modal) estimation for the same cases.

>TITLE
   ITEMS FROM THE JENKINS ACTIVITY SURVEY
       ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
>PROBLEM NITEM=48,SELECT=32,RESPONSES=5,NOTPRESENTED;
>NAMES Q156,Q157,Q158,Q165,Q166,Q167,Q247,Q248,Q249,Q250,Q251,Q252,
       Q253,Q254,Q255,Q256,Q257,Q258,Q259,Q260,Q261,Q262,Q263,Q264,
       Q265,Q266,Q267,Q268,Q269,Q270,Q271,Q272,Q273,Q274,Q275,Q276,
       Q277,Q278,Q279,Q280,Q307,Q308,Q309,Q310,Q311,Q312,Q313,Q314;
>RESPONSE  '8','0','1','2','.';
>KEY 002000220022222220022222202222220002220022222000;
>SELECT  3,5,6,7,9,11(1)14,17(1)23,25(1)30,32,33,35,36,39(1)42,47,48;
>TETRACHORIC LIST, NDEC=3;
>FACTOR NFAC=3,NROOT=10,ROTATE=VARIMAX;
>FULL CYCLES=80;
>TECHNICAL PRECISION=0.005;
>SCORE METHOD=2,LIST=20;
>SAVE CORR,PARM,FSCORE;
>INPUT NIDW=10,SCORES,FILE='EXAMPL03.DAT';
(10A1,T1,48A1)
>STOP

top

D  Discussion of output

The first part of the output contains the name of the command file (exampl03.tsf) and the name of the output file (exampl03.out). Each TESTFACT run produces output under one or more of the following headings, depending on the type of analysis.

The analysis specified in exampl03.tsf produces Phase 0, Phase 1, Phase 2, Phase 5 and Phase 7 output.

top

P  Phase 0: Input commands

Regardless of the type of analysis, a Phase 0 output is produced, being an echo of the input commands contained in the *.tsf file.

PHASE 0: INPUT COMMANDS
    ITEMS FROM THE JENKINS ACTIVITY SURVEY
        ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
 ------------------------------------------------------------
 >PROBLEM NITEM=48,SELECT=32,RESPONSES=5,NOTPRESENTED;
 This example analyzes 32 items selected from the 48-item version
 of the Jenkins Activity Survey for Health Prediction, Form B (Jenkins,
 Rosenman, and Zyzanski, 1972).  The data are responses of 598 men from
 central Finland drawn from a larger survey sample. Most of the items
 are rated on three-point scales representing little or no, occasional,
 or frequent occurance of the activity or behavior in question. For
 purposes of the present analysis, the scales have been dichotomized
 near the median. Wording in the positive or negative direction varies
 from item to time as follows (item numbers are those of the original
 pool of items from which those of the present form was selected):

-Q156,-Q157,+Q158,-Q165,-Q166,-Q167,+Q247,+Q248,-Q249,-Q250,
+Q251,+Q252,+Q253,+Q254,+Q255,+Q256,+Q257,-Q258,-Q259,+Q260,+Q261,
+Q262,+Q263,+Q264,+Q265,-Q266,+Q267,+Q268,+Q269,+Q270,+Q271,+Q272,
-Q273,-Q274,-Q275,+Q276,+Q277,+Q278,-Q279,-Q280,+Q307,+Q308,+Q309,
+Q310,+Q311,-Q312,-Q313,-Q314.

The tetrachoric correlation matrix, item parameters, rotated factor
 loadings, and the factor scores will be saved in the files EXAMPL03.COR,
 EXAMPL03.PAR, EXAMPL03.ROT, and EXAMPL03.FSC, respectively.

Cases will be scored by EAP (Expected A Posteriori, or Bayes)
 estimation with adaptive quadrature (Method 2). Posterior standard
 deviations will also be computed.  Results will be saved in the
 EXAMPL03.FSC file.  See Exampl3a.tsf for MAP (Maximum A Posteriori,
 or Bayes Modal) estimation for the same cases.

>NAMES Q156,Q157,Q158,Q165,Q166,Q167,Q247,Q248,Q249,Q250,Q251,Q252,
        Q253,Q254,Q255,Q256,Q257,Q258,Q259,Q260,Q261,Q262,Q263,Q264,
        Q265,Q266,Q267,Q268,Q269,Q270,Q271,Q272,Q273,Q274,Q275,Q276,
        Q277,Q278,Q279,Q280,Q307,Q308,Q309,Q310,Q311,Q312,Q313,Q314;
 >RESPONSE  '8','0','1','2','.';
 >KEY 002000220022222220022222202222220002220022222000;
 >SELECT  3,5,6,7,9,11(1)14,17(1)23,25(1)30,32,33,35,36,39(1)42,47,48;
 >TETRACHORIC LIST, NDEC=3;
 >FACTOR NFAC=3,NROOT=10,ROTATE=VARIMAX;
 >FULL CYCLES=80;
 >TECHNICAL PRECISION=0.005;
 >SCORE METHOD=2,LIST=20;
 >SAVE CORR,PARM,FSCORE;
 >INPUT NIDW=10,SCORES,FILE='EXAMPL03.DAT';

   DATA FILE NAME IS EXAMPL03.DAT

DATA FORMAT=
 (10A1,T1,48A1)

top

P  Phase 1: Data description

Values of the response categories (8, 0, 1, 2, .), the answer key, contents of the first observation, the sum of weights and number of records are given. This information enables you to verify that the data values were read correctly from the data file exampl03.dat. The response categories indicate a code of '8' for omitted responses (first value) and a code of '.' for not-presented items (last value).

Thirty-two items were selected from the 48-item test. Based on the answer key values, a total score for each of the 598 respondents is scored. Each item has a set of responses: right, wrong, omit, or not presented. For item j, j = 1, 2, ., 32, the response of person i, i = 1, 2, ., 598 can be written as

 if the response is correct, and

 if the response is incorrect.

At your option, omitted items can be considered either wrong or not presented. The total test score  for person i is

Respondent 1, for example, has a total score of 19 correct out of a possible 32 as shown below.

Answer key:

20020222220022222022222002002200

Respondent 1:

10020221121022212021121101211200

ITEMS FROM THE JENKINS ACTIVITY SURVEY
        ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
 ------------------------------------------------------------
      RESPONSE CATEGORIES: 8 0 1 2 .
      ANSWER KEY: 20020222220022222022222002002200

CONTENTS OF FIRST OBSERVATION:
      ID=2010002201
      WEIGHT=          1
      ITEM RESPONSES= 201000220122112221022212202112211101122112222000
      ITEM RESPONSES AFTER SELECTION =
                      10020221121022212021121101211200

SUM OF WEIGHTS  =        598
NUMBER OF RECORDS=       598

Using this information, a frequency table of the score distribution is calculated and presented graphically.

PHASE 1: HISTOGRAM AND BASIC STATISTICS

ITEMS FROM THE JENKINS ACTIVITY SURVEY
        ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
 ------------------------------------------------------------

MAIN TEST HISTOGRAM

FREQUENCY :
       |
       |
       |                 **
       |               ****
       |              *****
    8.0+              *****
       |              *****
       |              *****
       |              ***** *
       |            * ***** *
       |            *********
       |           **********
       |           ***********
       |           ***********
       |           ***********
    4.0+           ***********
       |           ***********
       |          *************
       |         **************
       |         **************
       |         **************
       |         ***************
       |         ****************
       |       *******************
       |       *******************
    0.0+-----+----+----+----+----+----+----+----+----+----+----+----+----+--
       0.    5.   10.  15.  20.  25.  30.
SCORES

NUMBER OF OBSERVATIONS AT EACH SCORE
   SCORE     COUNT  FREQ |   SCORE     COUNT  FREQ |   SCORE     COUNT  FREQ
      0          0   0.0 |     11         35   5.9 |     22         21   3.5
      1          0   0.0 |     12         40   6.7 |     23         10   1.7
      2          0   0.0 |     13         38   6.4 |     24          8   1.3
      3          0   0.0 |     14         52   8.7 |     25          6   1.0
      4          1   0.2 |     15         54   9.0 |     26          1   0.2
      5          2   0.3 |     16         54   9.0 |     27          1   0.2
      6          1   0.2 |     17         56   9.4 |     28          0   0.0
      7          5   0.8 |     18         57   9.5 |     29          0   0.0
      8          7   1.2 |     19         36   6.0 |     30          0   0.0
      9         18   3.0 |     20         43   7.2 |     31          0   0.0
     10         20   3.3 |     21         32   5.4 |     32          0   0.0

 

The last portion of the Phase 1 output gives the mean (15.9) and standard deviation (4.0) of the Total Scores.

TEST     RECORD     NUMBER      MEAN     S.D.    PROPORTION     S.D.
MAIN        598        598      15.9      4.0      0.497       0.500

The proportion of correct responses, p, is

with a standard deviation

top

P  Phase 2: Item statistics

For each item, eight statistics are produced. The Number, Mean and S.D. for item 2, for example, are 590, 15.92, and 4.03 respectively. These values are obtained by 'deleting' each row of the data if a not presented code is encountered for item 2. Since 8 rows contain not-presented codes, the mean and standard deviation of the Total Scores is calculated for the remaining 590 cases. Note, for example, that item 1 was presented to all 598 persons, while item 4 was presented to 592 persons.

PHASE 2: ITEM STATISTICS

ITEMS FROM THE JENKINS ACTIVITY SURVEY
ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
-----------------------------------------------------------

MAIN TEST ITEM STATISTICS

  ITEM          NUMBER   MEAN   S.D.  RMEAN FACILITY   DIFF     BIS   P.BIS
 1 Q158            598  15.91   4.01  14.46    0.206  16.29  -0.262  -0.185
 2 Q166            590  15.92   4.03  17.13    0.653  11.43   0.532   0.413
 3 Q167            596  15.90   4.01  16.35    0.790   9.77   0.305   0.215
 4 Q247            592  15.93   4.01  16.71    0.694  10.97   0.384   0.292
 5 Q249            594  15.92   4.01  15.89    0.466  13.34  -0.008  -0.006
 6 Q251            598  15.91   4.01  17.16    0.532  12.68   0.417   0.332
 7 Q252            598  15.91   4.01  17.39    0.490  13.10   0.451   0.360
 8 Q253            598  15.91   4.01  18.16    0.410  13.91   0.591   0.467
 9 Q254            597  15.91   4.02  18.99    0.203  16.33   0.551   0.387
10 Q257            597  15.92   4.01  17.99    0.449  13.51   0.585   0.466
 .
31 Q313            597  15.91   4.02  16.31    0.843   8.98   0.349   0.231
32 Q314            594  15.93   4.02  16.86    0.586  12.13   0.351   0.278

The mean score for those subjects who get a specific item correct is denoted by RMEAN. For example, since 385 respondents selected the correct response for item 2, RMEAN for item 2 is calculated as the mean of the corresponding 385 Total Scores and equals 17.13.

The item facility (FACILITY) is the proportion correct response for a specific item. For example, 385 of the 590 respondents presented with item 2 selected the correct response, and hence

The delta statistic ( or DIFF) is calculated as

where p is the item facility and  denotes the inverse normal transformation. This statistic has an effective range of 1 to 25, with a mean and standard deviation of 13 and 4 respectively.

The last 2 statistics are the biserial (BIS) and point biserial (P.BIS) correlations. The formula for the sample point biserial correlation is

.

For item 8, for example,

The point biserial correlation is the correlation between the item score and the total score, or subtest score. Theoretically  but in practice  Therefore, 0.467 indicates a relatively strong association between item 8 and the Total Score.

The formula for calculating the sample biserial correlation coefficient, BIS, is

Consider, for example, the item 3 facility, which equals 0.790. From the inverse normal tables, this corresponds to a -value of 0.8062.

For item 3,

top

P  Phase 5: Tetrachoric correlations

The first part of the output contains, for each selected item, the Number of Cases, Percent Correct, Percent Omitted, Percent Not Reached and Percent Not Presented.

PHASE 5:  TETRACHORIC CORRELATIONS

ITEMS FROM THE JENKINS ACTIVITY SURVEY
        ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
 ------------------------------------------------------------

MAIN TEST MISSING RESPONSE INFORMATION
----------------------------------------------------------------------------
 ITEM         NUMBER     PERCENT      PERCENT     PERCENT       PERCENT
             OF CASES    CORRECT      OMITTED   NOT REACHED  NOT PRESENTED
----------------------------------------------------------------------------
    1. Q158         598         20.6         0.0          0.0         0.0
    2. Q166         590         64.4         0.0          0.0         1.3
    3. Q167         596         78.8         0.0          0.0         0.3
    4. Q247         592         68.7         0.0          0.0         1.0
    5. Q249         594         46.3         0.0          0.0         0.7
        .
   31. Q313         597         84.1         0.0          0.0         0.2
   32. Q314         594         58.2         0.0          0.0         0.7
----------------------------------------------------------------------------

This summary indicates that there were no omitted codes in the data and that all 598 respondents could complete the test. The percent Not Presented varies from 0.0 to a maximum of 1.3 for item 2. For item 2, this percentage is calculated as

Note that the Percent Correct is calculated here as the number of respondents who selected the correct answer, divided by the total number of cases. For item 2

This value differs from the facility estimate (385/590) given under Phase 2 of the output.

Display 1: Tetrachoric correlation matrix

The tetrachoric correlation coefficient is widely used as a measure of association between two dichotomous items. Tetrachoric correlations are obtained by hypothesizing, for each item, the existence of a continuous 'latent' variable underlying the 'right-wrong' dichotomy imposed in scoring. It is additionally hypothesized that, for each pair of items, the corresponding two continuous 'latent' variables have a bivariate normal distribution.

AVERAGE TETRACHORIC CORRELATION =  0.0654
     STANDARD DEVIATION =   0.2384
     NUMBER OF VALID ITEM PAIRS =    496

DISPLAY   1.   TETRACHORIC CORRELATION MATRIX

                      1        2        3        4        5        6
                     Q158     Q166     Q167     Q247     Q249     Q251
     1   Q158       1.000
     2   Q166      -0.383    1.000
     3   Q167      -0.145    0.124    1.000
     4   Q247      -0.535    0.368    0.054    1.000
     5   Q249       0.106   -0.019    0.016   -0.161    1.000
     6   Q251      -0.065    0.017    0.019    0.016   -0.126    1.000
                    .

In TESTFACT, use is made of , (n = number of items)  frequency tables to calculate the tetrachoric coefficients. From the computer output, the number of valid item pairs is 496. Since the number of items equals 32, 32(32 - 1)/2 = 496, this data set contains no non-valid pairs. Non-valid pairs have zero off-diagonal or marginal frequencies. Examples of non-valid pairs are

  R W
R

 

O

W

O

 

 

  R W
R

O

O

W

 

 

 and

 

R W
R

 

O

W

 

O

The average tetrachoric correlation equals 0.0654. Since the output contains both negative and positive correlation coefficients, the average value does not shed much light on the actual strength of association between item pairs. Note that tetrachoric correlation matrices are not necessarily positive definite.

top

P  Phase 6: Factor analysis

Display 2: The positive latent roots of the correlation matrix

By definition, a symmetric matrix is positive definite if all its characteristic roots are positive. From the output below, it is seen that only the first 31 of the 32 roots are positive, and therefore the  matrix of tetrachoric correlations is not positive definite. This problem can be corrected by replacing the negative roots of the matrix by zero or a small non-zero quantity.

DISPLAY   2.   THE POSITIVE LATENT ROOTS OF THE CORRELATION MATRIX

              1         2         3         4         5         6
     1     7.491350  3.442602  2.592276  1.745235  1.576302  1.442306

              7         8         9        10        11        12
     1     1.248438  1.118638  1.015248  0.971235  0.908476  0.835705

             13        14        15        16        17        18
     1     0.768426  0.719607  0.657375  0.638227  0.631485  0.555802

             19        20        21        22        23        24
     1     0.514488  0.461871  0.398661  0.375292  0.349726  0.312994

             25        26        27        28        29        30
     1     0.292964  0.243591  0.218973  0.183170  0.167582  0.117183

             31
     1     0.055375

Display 3: Number of items and sum of latent roots and their ratio

This section of the output shows the sum of positive roots and the ratio with which each root has to be multiplied to obtain a sum of 'corrected roots' which equals the number of items. To illustrate, consider a  correlation matrix with latent roots 3, 1, 0.8, 0.3, and -0.1. The sum of the roots equals 5. In general, for any correlation matrix based on n items, the sum of roots equals n.

Suppose the value of -0.1 is replaced by 0.0001, then the new sum of roots equals 5.1001. However, by multiplying each root by the ratio 5/5.1001 = 0.9804, a 'corrected' set of roots is obtained in the sense that their sum equals 5.

From the Display 3 part of the output, the ratio required to obtain a corrected set of latent roots equals 0.9984211. The corrected set is given under the Display 4 heading.

DISPLAY   3.     NUMBER OF ITEMS AND SUM OF LATENT ROOTS
                  AND THEIR RATIO
                  32      32.0506033       0.9984211

Display 4: Corrected latent roots

DISPLAY   4.   THE CORRECTED LATENT ROOTS OF THE CORRELATION MATRIX

      1         2         3         4         5         6
1     7.479522  3.437167  2.588184  1.742479  1.573814  1.440029
         .         .         .         .         .         .

Display 5: Initial smoothed inter-item correlation matrix

Any symmetric matrix can be decomposed as

           

where  is a diagonal matrix with diagonal elements the characteristic roots of  As mentioned previously, if all roots are positive, that is, all the diagonal elements of  are positive,  is a positive definite matrix. When this is not the case, a 'smoothed' correlation matrix,  may be obtained by replacing the elements of  with the corrected roots and negative roots with either 0 or some small positive quantity, so that

where the columns of  are eigenvectors and the elements of  the corrected latent roots. The elements of the smoothed correlation matrix for the first 6 of the 32 items are given below.

DISPLAY   5.   INITIAL SMOOTHED INTER-ITEM CORRELATION MATRIX

                     1        2        3        4        5        6
                    Q158     Q166     Q167     Q247     Q249     Q251
     1   Q158       1.000
     2   Q166      -0.383    1.000
     3   Q167      -0.145    0.124    1.000
     4   Q247      -0.534    0.368    0.054    1.000
     5   Q249       0.106   -0.019    0.016   -0.161    1.000
     6   Q251      -0.066    0.017    0.019    0.016   -0.126    1.000

Display 6: Iterated communality estimates

A communality is defined as the squared multiple correlation between an observed variable and the set of factors. The output below shows the estimated communalities for iterations 1, 2, 3, and 4. Note the small changes in the estimated values going from iteration 3 to iteration 4.

At iteration 1, the squared multiple correlation of an item with all other items is calculated for each of the 32 items. The MINRES method (see Display 7) is subsequently used to obtain post-solution improvements to these initial multiple regression communality estimates.

DISPLAY   6.   ITERATED COMMUNALITY ESTIMATES

                       1      2      3      4
     1   Q158        0.413  0.373  0.371  0.371
     2   Q166        0.370  0.325  0.323  0.322
     3   Q167        0.156  0.116  0.115  0.115
     4   Q247        0.516  0.471  0.466  0.465
     5   Q249        0.142  0.088  0.087  0.087
     6   Q251        0.351  0.269  0.257  0.255
         .
    31   Q313        0.477  0.422  0.415  0.414
    32   Q314        0.458  0.396  0.387  0.386

Display 7: The NROOT largest latent roots of the correlation matrix

TESTFACT uses the minimum squared residuals (MINRES) method to extract factors from the smoothed correlation matrix . The MINRES method minimizes the sum of squares of the residual in a matrix , where

where  is a  common factor matrix and the diagonal elements  of , the unique variances, i = 1, 2, ., p.

If  denotes the communality for item i, then    equals .

The sum of squares of the residuals is expressed as a statistical function (see, e.g. Tucker and MacCallum, 1997), which is minimized by the determination of the matrix of factor loadings  and uniqueness

In this part of the output, the NROOT largest roots of the matrix

are reported. Note that, since  equals , characteristic roots are actually obtained from the smoothed correlation matrix with the unit diagonal elements replaced by the communalities. In general, the matrix  will be non-positive definite and hence a subset of the roots will be negative.

If one replaces NROOT = 10 in the FACTOR command with, for example, NROOT = 20, the output shows that roots with numbers 16, 17, 18 and higher are all negative. An empirical rule for the selection of the number of factors, k, is to set k equal to the number of latent roots larger than 1. For the present example it appears as if 3 or 4 factors are appropriate. Usually, the number of factors is selected on the basis of some theoretical framework concerning the items included in the analysis.

DISPLAY   7.   THE NROOT LARGEST LATENT ROOTS OF THE CORRELATION MATRIX

              1         2         3         4         5         6
     1     6.886994  2.861018  1.961481  1.149766  0.934423  0.738751

              7         8         9        10
     1     0.582337  0.423875  0.326571  0.270941

Display 8: MINRES principal factor loadings

The estimated factor loadings at convergence of the MINRES method are given below. These values are used to obtain starting values for the marginal maximum likelihood procedure specified in the FULL (full information) command.

Note that each communality is equal to the sum of squares of the corresponding factor loadings. For example, for item 12, the 3 factor loadings are 0.406, 0.275, and 0.555. Hence,

(see Display 6, communality for item 12 at iteration 4).

DISPLAY   8.   MINRES PRINCIPAL FACTOR LOADINGS

                       1      2      3
     1   Q158       -0.579  0.189  0.022
     2   Q166        0.519 -0.230 -0.001
     3   Q167        0.246  0.215 -0.091
     4   Q247        0.535 -0.420 -0.049
     5   Q249       -0.152 -0.022 -0.251
     6   Q251        0.250  0.245  0.364
           .
    31   Q313        0.431 -0.478 -0.018
    32   Q314        0.338 -0.511  0.105

Display 9: Initial intercept and slope estimates

The intercept and slope estimates are functions of the item facility and factor loadings. If the ROTATE keyword is omitted in the FACTOR command, the factor loadings are the MINRES factor loadings (see Display 8). Otherwise the initial rotated factor loadings are used (not shown in the output).

Suppose the factor loadings for item 1 and a 3-factor solution are denoted by , , and  respectively. Let

and denote the slopes corresponding to item 1 by , , and  respectively. Then

Intercepts are computed as , where

and  is the z-value corresponding to an area under the N(0,1) curve equal to the item i facility.

For item 1, for example, facility equals 0.206 and the corresponding z-value is -0.8202. For item 1,  = 0.791 and therefore the item 1 intercept estimate is

Conversely, factor loadings are related to the slopes. Let  and  respectively denote the j-th factor loading and slope of item i,  j = 1, 2, ., nfac. Then

where

The initial intercept and slope values are used as initial estimates for the full information maximum likelihood procedure specified by the FULL CYCLES = 24; command.

DISPLAY   9.     INITIAL INTERCEPT AND SLOPE ESTIMATES
              INTERCEPT    SLOPES
                            1       2       3
    1 Q158       -1.036   0.387   0.636   0.191
    2 Q166        0.476  -0.285  -0.609  -0.156
    3 Q167        0.858  -0.341   0.023  -0.115
    4 Q247        0.695  -0.245  -0.900  -0.033
    5 Q249       -0.088  -0.030   0.092   0.293
    6 Q251        0.092  -0.097   0.025  -0.576
       .
   31 Q313        1.313  -0.083  -0.837   0.020
   32 Q314        0.277   0.107  -0.784  -0.045

Display 10: The EM estimation of parameters

This part of the output shows that parameter estimates will be based on the EM (Expected Maximization) method and that the number of quadrature points equals 4. Quadrature is a numeric integration method that is often used in practice to calculate the value of an integral, when no closed-form solution exists.

For a one-factor analysis, for example, the log-likelihood function can be expressed as

where N denotes the number of cases and  a set of unknown parameters.

The integrals, or so-called marginal probabilities, are approximated by

where  denote the weights and  the quadrature points.

Display 11: quadrature points and weights

The numeric values of the 4 quadrature points and weights are listed. Note that the weights are always positive and that the quadrature points are symmetric.

DISPLAY   11.      4  QUADRATURE POINTS AND WEIGHTS:

      1         -2.334414          0.045876
      2         -0.741964          0.454124
      3          0.741964          0.454124
      4          2.334414          0.045876

The next part of the output shows the progress of the iterative procedure. At each cycle, -2 x LOG-LIKELIHOOD is reported as well as the maximum change in the intercept and slope values. For example, the maximum change in slope 1 estimates is equal to 0.098630. In other words, starting from the initial slope values of 0.387 (item 1), 0.467 (item 2), ., 0.277 (item 32), the differences between these values and the revised cycle 1 slope 1 estimates are at the most 0.098630 units.

Small maximum changes in intercept and slope estimates are therefore an indication of convergence.

Note that, starting from cycle 6, the difference between -2 log L of the previous cycle and the present cycle is reported. At cycle 19, for example, this value, reported as CHANGE, is 0.0726.

SUM OF MARGINAL PROBABILITIES =     0.17040D-02

CYCLE  1  - 2 X MARGINAL LOG LIKELIHOOD =     0.2084060567D+05

MAXIMUM CHANGE OF ESTIMATES
              INTERCEPT =   0.038118  SLOPE =   0.098630
                                                0.056828
                                                0.037478

Number of patterns with zero probability =    0
.

SUM OF MARGINAL PROBABILITIES =     0.17167D-02

 CYCLE 32  - 2 X MARGINAL LOG LIKELIHOOD =     0.2080175353D+05
                                  CHANGE =    -0.3000105835D-02

MAXIMUM CHANGE OF ESTIMATES
              INTERCEPT =   0.002038  SLOPE =   0.005042
                                                0.001369
                                                0.003811

Number of patterns with zero probability =    0

Display 12: Chi-square and degrees of freedom

The -statistic reported below is calculated as

where  denotes the number of unique observed response patterns,  the sum of weights and  the marginal probability (marginal likelihood function) for pattern j.

The degrees of freedom, ndf, equal

For this example, , nfac = 3, and n (number of items) equals 32. Hence

This  statistic can be used to test hypotheses of the form:

: A k- factor model provides an adequate description of the data.

: A (k + 1)- factor model provides an adequate description of the data.

The resultant test statistic is the difference between the  under  and the  under  with degrees of freedom equal to the difference in degrees of freedom for  and .

If we replace the NFAC = 3 keyword in the FACTOR command with NFAC = 2, then

From the output below,  with 472 degrees of freedom. The  for a 2-factor versus a 3-factor model is 13498.63 - 13155.01 = 343.62 with 502 - 472 = 30 degrees of freedom. Since this value is highly significant, we reject the 2-factor model in favor of the 3-factor model.

DISPLAY  12.     CHI-SQUARE =          13155.03  DF =     472.00  P = 0.000

Display 13: Untransformed item parameters

The output below shows the estimated intercept and slope estimates after convergence is attained, or alternatively, after the maximum number of cycles specified is reached. The number of EM cycles can be specified by one of the following commands:

>FULL CYCLES = ncycles;
>TECHNICAL ITER(a,b,c);

DISPLAY  13.     UNTRANSFORMED ITEM PARAMETERS
              INTERCEPT    SLOPE ESTIMATES
                            1       2       3
    1 Q158       -1.048   0.280   0.620   0.264
    2 Q166        0.482  -0.244  -0.562  -0.181
    3 Q167        0.868  -0.294   0.038  -0.135
    4 Q247        0.693  -0.141  -0.853  -0.125
    5 Q249       -0.086  -0.028   0.083   0.277
    6 Q251        0.093  -0.066   0.015  -0.591
      .
   31 Q313        1.361   0.063  -0.863  -0.133
   32 Q314        0.278   0.115  -0.757  -0.069

Display 14: Standardized difficulty, communality and principal factors

Each communality is equal to the sum of squared factor loadings for the corresponding item. For example, for item 1 the factor loadings are -0.533, -0.194 ad 0.069. The communality is equal to  The standardized difficulty for item i is calculated as , where (see comments for  Display 9)

and  denotes the j-th slope for item j. For item 1, for example,

Hence, the standardized difficulty for item 1 = -( -1.048)/1.238 = 0.846.

An item with a standardized difficulty of 0 can be regarded as an item with 'average' difficulty. Standardized difficulty scores above 0 are associated with the more difficult items and a value of 1.0, for example, indicates that examinees can be expected to find this item more difficult to answer than an item with standardized difficulty of less than 1. On the other hand, items with standardized difficulty of less than 0 (for example item 31) can be expected to be much easier to answer correctly.

As mentioned earlier (see Display 9), the relationship between slopes and unrotated factor loadings is given by

where i is the item number, j the slope number and  as defined above.

The principal factor loadings given below are obtained as follows. Let  be a  matrix of factor loadings with typical elements  and define  as the  symmetric matrix  with column rank equal to the number of factors, nfac. This implies that  has a maximum of nfac non-zero characteristic roots If we denote the corresponding eigenvectors by , then the principal factor loadings shown in the output below are computed as ,  and  where the elements of  denote the factor loadings for the j-th factor, j = 1, 2, 3.

DISPLAY  14.  STANDARDIZED DIFFICULTY, COMMUNALITY, AND PRINCIPAL FACTORS

DIFF.   COMM.  FACTORS
                                    1       2       3
    1 Q158        0.846   0.348  -0.553  -0.194   0.069
    2 Q166       -0.406   0.290   0.496   0.208  -0.030
    3 Q167       -0.825   0.096   0.215  -0.215   0.062
    4 Q247       -0.522   0.433   0.512   0.410  -0.050
    5 Q249        0.083   0.078  -0.146   0.032   0.235
    6 Q251       -0.080   0.261   0.246  -0.242  -0.377
       .
   31 Q313       -1.024   0.434   0.419   0.487  -0.145
   32 Q314       -0.221   0.372   0.341   0.488  -0.131

Display 15: Percent of variance explained

The percentage variance explained by factor j is calculated as

where  is the j-th characteristic root of  (see Display 14) and n the number of items.

From the values reported in the output, it is seen that 20.31% of the total variance is explained by the first factor, 8.64% by the second and 5.68% by the third factor. Since

it follows that  = 6.50.

DISPLAY  15.   PERCENT OF VARIANCE

               1         2         3
     1     20.31014   8.64630   5.68340

Display 16: Standardized difficulty, communality and VARIMAX factors

The output below contains the VARIMAX rotated factors. Note that the standardized difficulty and communality estimates are the same as those given in Display 14. To determine which items are associated with a specific factor, one may select, for each item, the column with the highest loading (ignoring the sign of the loading). The following items appear to be indicators of Factor 2, for example: items 1, 2, 4, 8, 20, 24, 25, 26, 31 and 32.

DISPLAY  16.  STANDARDIZED DIFFICULTY, COMMUNALITY, AND VARIMAX FACTORS

DIFF.   COMM.  FACTORS
                                    1       2       3
    1 Q158        0.846   0.348   0.261   0.499   0.175
    2 Q166       -0.406   0.290  -0.234  -0.470  -0.117
    3 Q167       -0.825   0.096  -0.287   0.043  -0.110
    4 Q247       -0.522   0.433  -0.138  -0.641  -0.058
    5 Q249        0.083   0.078  -0.005   0.091   0.263
    6 Q251       -0.080   0.261  -0.092  -0.006  -0.503
      .
   31 Q313       -1.024   0.434   0.014  -0.654  -0.075
   32 Q314       -0.221   0.372   0.063  -0.605  -0.035

top

P  Phase 7: Factor scores using EAP estimates

The factor scores are Bayes estimates computed under the assumption that the corresponding ability factors are normally distributed in the population from which the sample of examinees was drawn.

Let  denote the j-th ability score, k = 1, 2, ., nfac for examinee i, i = 1, 2, ., N, then the factor scores are  where  is the item j score for examinee i (see the discussion of here for more details).

Display 17: Quadrature points and weights

To obtain these conditional expectations, a 5-point quadrature formula is employed. The points and weights are shown below.

DISPLAY   17.      5  FACTOR SCORE QUADRATURE POINTS AND WEIGHTS:

      1           -2.856970       0.011257
      2           -1.355626       0.222076
      3            0.000000       0.533333
      4            1.355626       0.222076
      5            2.856970       0.011257

Display 18: Factor scores and standard error estimates

The syntax file contains the command

>SCORE METHOD=2, LIST = 20;

This command requests that the factor ability scores for the first 20 cases should be listed as part of the output. The full set of factor scores are written to the file exampl03.fsc. For each case, the case ID, number of items presented, percent correct and percent omitted are reported. Below these values, the ability scores for each factor, with estimated standard errors marked with an asterisk, are given. Case 3, for example, was presented with 30 items of which 13 were answered correctly. Hence the percentage correct for this case is

Case 10 answered 84.4% percent correctly and had factor scores of 0.898, 1.234 and 1.710 respectively. Since the means of the 598 factor scores (see the last part of the output) are approximately 0 with standard deviations of 0.86, 0.86 and 0.82 respectively, it can be concluded that examinee 10 attained factor scores that are at least one standard deviation above average.

Factor scores are not unique in the sense that multiplication of any column of factor scores by -1 does not affect the validity of the estimates. It may therefore happen that negative scores are associated with above average percent responses and vice versa for below average responses. TESTFACT attempts to reverse the signs in such a way that scores above zero are usually assigned with above average achievement.

DISPLAY   18.  FACTOR SCORES AND STANDARD ERRORS (S.E.)

CASE HEADER:
 CASE    NUMBER   PERCENT PERCENT   CASE ID
       PRESENTED  CORRECT OMITTED
 SCORES:            1           2           3
 S.E.*
==============================================================
    1      32      59.4    0.0  2010002201
                  0.264       1.018       0.120
                  0.560*      0.543*      0.576*
    2      32      12.5    0.0  0012212110
                 -1.329      -0.100      -1.495
                  0.483*      0.469*      0.645*
    3      30      43.3    0.0  0010.02100
                 -0.572       0.346       0.035
                  0.420*      0.511*      0.527*
    4      32      43.8    0.0  0020202202
                 -0.612      -1.378       0.584
                  0.530*      0.521*      0.587*
    5      32      37.5    0.0  2010002210
                 -0.901      -0.061      -0.123
                  0.446*      0.482*      0.541*
    6      32      59.4    0.0  0010012210
                  0.603       0.049       0.611
                  0.543*      0.526*      0.605*
    7      32      59.4    0.0  1021001110
                  0.548      -1.653       1.132
                  0.456*      0.532*      0.611*
    8      32      34.4    0.0  0010012210
                 -0.156      -0.332      -0.817
                  0.436*      0.484*      0.574*
    9      32      28.1    0.0  2010011100
                 -0.204      -0.590      -0.597
                  0.433*      0.478*      0.556*
    .

top back


Copyright © 2005-2010, Scientific Software International, Inc., All rights reserved.
7383 N. Lincoln Ave., Suite 100, Lincolnwood, IL 60712-1747