Scientific Software International (SSI) publishes statistical data analysis software: LISREL (structural equation model/SEM, survey generalized linear model/SGLIM), 
HLM (hierarchical linear modeling, multilevel model), SuperMix (mixed models, mixed-effects program, MIXREG, MIXOR, MIXNO and MIXPREG) and Item Response Theory/IRT (BILOG-MG, MULTILOG, PARSCALE)Scientific Software International (SSI) publishes statistical data analysis software: LISREL (structural equation model/SEM, survey generalized linear model/SGLIM), 
HLM (hierarchical linear modeling, multilevel model), SuperMix (mixed models, mixed-effects program, MIXREG, MIXOR, MIXNO and MIXPREG) and Item Response Theory/IRT (BILOG-MG, MULTILOG, PARSCALE)Scientific Software International (SSI) publishes statistical data analysis software: LISREL (structural equation model/SEM, survey generalized linear model/SGLIM), 
HLM (hierarchical linear modeling, multilevel model), SuperMix (mixed models, mixed-effects program, MIXREG, MIXOR, MIXNO and MIXPREG) and Item Response Theory/IRT (BILOG-MG, MULTILOG, PARSCALE)


P  Parscale: Calibration and EAP Scoring with a Graded Model

This example illustrates calibration and scoring of a test or scale containing 20 multiple category items. Syntax is shown below.

EXAMPLE 1: ARTIFICIAL EXAMPLE:  MONTE CARLO DATA
GRADED RATING SCALE MODEL, NORMAL RESPONSE FUNCTION:  EAP SCALE SCORES
>COMMENT ;
>FILE   DFNAME='PSLDAT\EXAMPL01.DAT',SAVE;
>SAVE   PARM='PSLDAT\EXAMPL01.PAR',SCORE='EXAMPL01.SCO';
>INPUT  NIDCH=10,NTOTAL=20,NTEST=1,LENGTH=(20),NFMT=1;
(46X,10A1,/,20A1)
>TEST1  TNAME=SCALE1,ITEM=(1(1)20),NBLOCK=1;
>BLOCK1 BNAME=SBLOCK1,NITEMS=20,NCAT=4, CADJUST=0.0;
>CAL    GRADED,NQPTS=30,CYCLE=(25,2,2,2,2),
        NEWTON=5,CRIT=0.005,ITEMFIT=10;
>SCORE  EAP,NQPTS=30,SMEAN=0.0,SSD=1.0,NAME=EAP,PFQ=5;

The simulated data represent responses of 1000 examinees drawn randomly from a population with a mean trait score of 0.0 and standard deviation of 1.0. As the default for SAMPLE on INPUT is 1000, all generated data will be used as input by default.

The generating trait value of each examinee is used as the case ID. The case ID is 10 characters long and is indicated as such using the NIDCH keyword on the INPUT command. It is also reflected in the format statement as 10A1.

Data are read from the file exampl01.dat in the examples folder using the DFNAME keyword on the FILES command.

All 20 items are used in a single test (NTEST=1 on INPUT command, with LENGTH=20). All 20 items have common categories and are assigned to the same BLOCK (NBLOCK=1 on TEST; NITEMS=20 on BLOCK).

All items have four categories (NCAT=4 on BLOCK command) and varying difficulties and discriminating powers. The graded model is assumed (GRADED on CALIB command); A logistic response model (LOGISTIC on CALIB command) is requested. The choice between a logistic or normal response function metric is effective only if the graded response model is used. The response function of the graded model can be either the normal ogive or its logistic approximation. Graded is the default. If logistic is selected, the item parameters can be in the natural metric of the logistic ogive. Natural is the default. For the normal metric, set SCALESCALE_keyword_on_CALIB_command equal to 1.7. Neither LOGISTIC nor SCALE are needed when PARTIAL is selected. Because the generalized model allows for varying item discriminating powers, both a slope and threshold is estimated for each item. The CADJUST keyword on the BLOCK command is used to set the mean of the category parameters to 0 as simultaneous estimation of slope parameters and all category parameters is not obtainable.

The ITEMFIT keyword is used to set the number of frequency score groups for the computation of item fit statistics to 10. Note that there is no default value for the ITEMFIT keyword.

The CYCLES keyword specifies 25 EM iterations, with maximum 2 inner EM iterations for the item and category parameter estimation. Five Newton-Gauss iterations are requested (NEWTON=5 on CALIB). A convergence criterion of 0.005 is specified using the CRIT keyword on CALIB.

30 quadrature points are to be used in the EM and Newton estimation instead of the default of 10 for cases where LENGTH less or equal to 50 in the INPUT command. The calibration procedure depends on the evaluation of integrals using Gauss-Hermite quadrature. In general, the accuracy of numerical integration increases with the number of quadrature points used.

The score estimation method is specified (EAP option on SCORE command). Scale scores for each subtest are estimated by the Bayes (EAP) method, and their posterior standard deviations serve as standard errors.

The scores, which are rescaled to zero mean and unit standard deviation in the sample (SMEAN and SSD on SCORE), are saved in the file exampl01.sco using the SCORE keyword on the SAVE command.

The PFQPFQ_keyword_on_SCORE_command keyword is specified. This keyword is usually used to make ML scores more computable but would also improve EAP estimates somewhat.

 In addition, the estimated item parameters are saved in the file exampl01.par (PARM keyword on the SAVE command).

The first three records of the data file exampl01.dat are shown below.

Samp      Group          1      1  00001  1.0      .44739
42444232223343433332
Samp      Group          1      1  00002  1.0     -.93465
12221121122324121432
Samp      Group          1      1  00003  1.0     -.56465
32212212213342314121

Two lines of data are given for each respondent. On the first line, information concerning generation of the data is given. The generating value for each respondent is the last entry given on the first line: it is used here as the case ID. In the format statement

(46X,10A1,/20A1)

46 columns on the first line are skipped, after which the case ID is read as a character string of length 10. The '/' indicates that following information should be read on the second line of data. From this line, the 20 item responses are read. Item responses are given in the first 20 columns of this line, and are read as character values of length 1 each. In the format statement this is indicated by 20A1.

Although the data for each respondent are spread out over 2 lines, the format statement in the syntax file occupies just one line, and thus NFMT is set to 1 on the INPUT command.

top

P  Phase 0 output

At the beginning of the output for Phase 0, the syntax file is echoed. Information on the number of tests, items, and type of model to be fitted as interpreted by PARSCALE is also given.

EXAMPLE 1: ARTIFICIAL EXAMPLE:  MONTE CARLO DATA                               
            GRADED MODEL, NORMAL METRIC:  EAP SCALE SCORES                      
 >COMMENT ;                                                                     
 >FILE   DFNAME='PSLDAT\EXAMPL01.DAT',SAVE;                                     
 >SAVE   PARM='PSLDAT\EXAMPL01.PAR',SCORE='EXAMPL01.SCO';                       
 >INPUT  NIDCH=10,NTOTAL=20,NTEST=1,LENGTH=(20),NFMT=1;                          

 SINGLE MAIN TEST IS USED.

 NUMBER OF ITEMS:    20

 FORMAT OF DATA INPUT IS
 (46X,10A1,/,20A1)                                                              

 >TEST1  TNAME=SCALE1,ITEM=(1(1)20),NBLOCK=1;                                   

 BLOCK CARD:   1
 >BLOCK1 BNAME=SBLOCK1,NITEMS=20,NCAT=4,CADJ=0.0;                               
 >CAL    GRADED,SCALE=1.7,NQPTS=30,CYCLE=(25,2,2,2,2),                          
         NEWTON=2,CRIT=0.00001,ITEMFIT=10;                                      

 MODEL SPECIFICATIONS
 ======================

 NORMAL OGIVE - GRADED ITEM RESPONSE MODEL IS SPECIFIED.
                SCALE CONSTANT  1.70 FOR SLOPE PARAMETERS.

This section of the output file contains information on the settings to be used during the item parameter estimation in Phase 2.

 CALIBRATION PARAMETERS
 ======================

 MAXIMUM NUMBER OF EM CYCLES:                 25
 MAXIMUM INNER EM CYCLES:                      2
 MAXIMUM CATEGORY ESTIMATION CYCLES:           2
 MAXIMUM ITEM PARAMETER ESTIMATION CYCLES:     2
 MAXIMUM NUMBER OF NEWTON CYCLES:              2
 CONVERGENCE CRITERION FOR EM CYCLES:         0.0000
 CONVERGENCE CRITERION FOR SLOPE:             0.0000
 CONVERGENCE CRITERION FOR THRESHOLD:         0.0000
 CONVERGENCE CRITERION FOR CATEGORY:          0.0000
 CONVERGENCE CRITERION FOR GEUSSING:          0.0000
 ORDER OF INNER EM CYCLES:                  CATEGORY - ITEM PARAMETERS
 ESTIMATION ACCELERATOR:                    NO (DEFAULT)
 RIDGE METHOD:                              NO (DEFAULT)

No prior distribution was requested in the CALIB command, and consequently the default prior, a normal distribution on equally spaced points, will be used (DIST=2 on CALIB). The number of quadrature points to be used during item parameter estimation was set to 30 (NQPT on CALIB). The program-generated quadrature points and weights are printed to the Phase 0 output file, as shown below.

 

 THE FIXED PRIOR DISTRIBUTION FOR LATENT TRAITS
                                            MEAN     :  0.0000
                                            S.D.     :  1.0000

 QUADRATURE POINTS AND PRIOR WEIGHTS (PROGRAM-GENERATED NORMAL APPROXIMATION):

                1           2           3           4           5
 POINT    -0.4000E+01 -0.3724E+01 -0.3448E+01 -0.3172E+01 -0.2897E+01
 WEIGHT    0.3692E-04  0.1071E-03  0.2881E-03  0.7181E-03  0.1659E-02

                6           7           8           9          10
 POINT    -0.2621E+01 -0.2345E+01 -0.2069E+01 -0.1793E+01 -0.1517E+01
 WEIGHT    0.3550E-02  0.7042E-02  0.1294E-01  0.2205E-01  0.3481E-01

               11          12          13          14          15
 POINT    -0.1241E+01 -0.9655E+00 -0.6897E+00 -0.4138E+00 -0.1379E+00
 WEIGHT    0.5093E-01  0.6905E-01  0.8676E-01  0.1010E+00  0.1090E+00

               16          17          18          19          20
 POINT     0.1379E+00  0.4138E+00  0.6897E+00  0.9655E+00  0.1241E+01
 WEIGHT    0.1090E+00  0.1010E+00  0.8676E-01  0.6905E-01  0.5093E-01

               21          22          23          24          25
 POINT     0.1517E+01  0.1793E+01  0.2069E+01  0.2345E+01  0.2621E+01
 WEIGHT    0.3481E-01  0.2205E-01  0.1294E-01  0.7042E-02  0.3550E-02

               26          27          28          29          30
 POINT     0.2897E+01  0.3172E+01  0.3448E+01  0.3724E+01  0.4000E+01
 WEIGHT    0.1659E-02  0.7181E-03  0.2881E-03  0.1071E-03  0.3692E-04

 TOTAL WEIGHT: 1.00000
 MEAN        : 0.00000
 S.D.        : 0.99970

The control settings to be used during calibration is followed by settings to be used during the scoring phase (Phase 3). The EAP method of scoring is requested (EAP option) and, as in the calibration phase, 30 quadrature points were requested. Since no prior distribution was requested using the DIST keyword, by default a normal distribution on equally spaced points will be used (DIST = 2 on SCORE). Note that the DIST keyword applies only when EAP scoring has been selected.

 >SCORE  EAP,NQPTS=30,SMEAN=0.0,SSD=1.0,NAME=EAP,PFQ=5;                         

 PARAMETERS FOR SCORING AND TEST AND ITEM INFORMATION
 ====================================================

 METHOD OF SCORING SUBJECTS:                EXPECTATION A POSTERIORI
                                            (EAP; BAYES ESTIMATES)

 TYPE OF PRIOR:                             NORMAL APPROXIMATION

 NUMBER OF QUADRATURE POINTS                 30
 SCORES WRITTEN TO FILE                     EXAMPL01.SCO                   

 QUADRATURE POINTS AND PRIOR WEIGHTS (PROGRAM-GENERATED NORMAL APPROXIMATION):

                1           2           3           4           5
 POINT    -0.4000E+01 -0.3724E+01 -0.3448E+01 -0.3172E+01 -0.2897E+01
 WEIGHT    0.3692E-04  0.1071E-03  0.2881E-03  0.7181E-03  0.1659E-02

                6           7           8           9          10
 POINT    -0.2621E+01 -0.2345E+01 -0.2069E+01 -0.1793E+01 -0.1517E+01
 WEIGHT    0.3550E-02  0.7042E-02  0.1294E-01  0.2205E-01  0.3481E-01

               11          12          13          14          15
 POINT    -0.1241E+01 -0.9655E+00 -0.6897E+00 -0.4138E+00 -0.1379E+00
 WEIGHT    0.5093E-01  0.6905E-01  0.8676E-01  0.1010E+00  0.1090E+00

               16          17          18          19          20
 POINT     0.1379E+00  0.4138E+00  0.6897E+00  0.9655E+00  0.1241E+01
 WEIGHT    0.1090E+00  0.1010E+00  0.8676E-01  0.6905E-01  0.5093E-01

               21          22          23          24          25
 POINT     0.1517E+01  0.1793E+01  0.2069E+01  0.2345E+01  0.2621E+01
 WEIGHT    0.3481E-01  0.2205E-01  0.1294E-01  0.7042E-02  0.3550E-02

               26          27          28          29          30
 POINT     0.2897E+01  0.3172E+01  0.3448E+01  0.3724E+01  0.4000E+01
 WEIGHT    0.1659E-02  0.7181E-03  0.2881E-03  0.1071E-03  0.3692E-04

 TOTAL WEIGHT: 1.00000
 MEAN        : 0.00000
 S.D.        : 0.99970

The values assigned to the rescaling constants SMEAN and SSD in the SCORE command are shown:

 SET NUMBER      :    1
 SCORE NAME      : EAP    
 NUMBER OF ITEMS :   20
 RESCALE CONSTANT: MEAN =       0.00   S.D. =       1.00

 ITEMS           :   1    2    3    4    5    6    7    8    9   10
                    11   12   13   14   15   16   17   18   19   20

                   0001 0002 0003 0004 0005 0006 0007 0008 0009 0010
                   0011 0012 0013 0014 0015 0016 0017 0018 0019 0020

Input and output files as requested with the DFNAME keyword on the FILES command and the PARM and SCORE keywords on the SAVE command are listed:

 FILE ASSIGNMENTS AND DISPOSITIONS
 =================================

 [INPUT FILES]

 SUBJECT DATA INPUT FILE                    PSLDAT\EXAMPL01.DAT            
                                            SINGLE-SUBJECT DATA
                                            NO CASE WEIGHTS

 [OUTPUT FILES]

 ITEM PARAMETERS FILE                       PSLDAT\EXAMPL01.PAR            
 SUBJECT SCALE-SCORE FILE                   EXAMPL01.SCO                   

 [SCRATCH FILES]

 PARSCALE SYSTEM BINARY DATA FILE           Exampl01.MFL                   
 TEMPORARY FILE                             Exampl01.T99                   
 TEMPORARY FILE                             Exampl01.T98                   
 TEMPORARY FILE                             Exampl01.T97                    
 TEMPORARY FILE                             Exampl01.T96                   

To allow the user to verify that data have been read in correctly from the raw data file, the first two records from the data file are echoed in the output. The INPUT RESPONSES fields give the original responses while the RECODED RESPONSES reflect any recoding of the responses. Recoding of responses is controlled by the ORIGINAL and MODIFIED keywords on the BLOCK command.

 INPUT AND RECODED RESPONSE OF FIRST AND SECOND OBSERVATIONS

 OBSERVATION #      1
 GROUP:  1
 ID:      .4473
INPUT RESPONSES:  4  2  4  4  4  2  3  2  2  2  3  3  4  3  4  3  3  3  3  2
RECODED RESPONSES:4  2  4  4  4  2  3  2  2  2  3  3  4  3  4  3  3  3  3  2

 OBSERVATION #      2
 GROUP:  1
 ID:     -.9346
INPUT RESPONSES:  1  2  2  2  1  1  2  1  1  2  2  3  2  4  1  2  1  4  3  2
RECODED RESPONSES:1  2  2  2  1  1  2  1  1  2  2  3  2  4  1  2  1  4  3  2

Finally, the number of observations to be used in the analysis is recorded; by default, all observations will be used. The number of observations to be used can be manipulated using the SAMPLE or TAKE keywords on the INPUT command.

 [MAIN TEST: SCALE1  ]

      1000 OBSERVATIONS READ FROM FILE:   PSLDAT\EXAMPL01.DAT            
      1000 OBSERVATIONS WRITTEN TO FILE:  Exampl01.MFL                   

top

P  Phase 1 output

The title given in the TITLE command and name assigned to the test in the TEST command in the syntax file are echoed in the output file.

EXAMPLE 1: ARTIFICIAL EXAMPLE:  MONTE CARLO DATA                               
            GRADED MODEL, NORMAL METRIC:  EAP SCALE SCORES                      

 MAINTEST: SCALE1 

The master file created during Phase 0 is used as input. Note that the master file exampl01.mfl may be saved using the MASTER keyword on the SAVE command for use as input in a subsequent analysis (MFNAME keyword on FILES command). The keywords TAKE and SAMPLE on the INPUT command controls the number of records read from the raw data file. As the default value of SAMPLE is 1000, neither keyword was used and all data were used by default.

  1000 OBS.(WEIGHTS:  1000.000) WERE READ FROM Exampl01.MFL                   

Summary item statistics for the 20 items are given next. Since no not-represented (NFNAME on FILES) or omit key (OFNAME on FILES) was used, no frequencies or percentages are reported under the "NOT PRESENT" or "OMIT" headings. Under the "CATEGORIES" heading, frequencies and percentages of responses for each of the 4 categories are given item-by-item. Cumulative frequencies and percentages for the categories over all items are given at the end of the table.

Note that, if empty categories are encountered, the user has to recode the corresponding items of which this occurs before proceeding with the analysis.

  SUMMARY ITEM STATISTICS
 =======================

 BLOCK NO.:   1     NAME: SBLOCK1
 ---------------------------------------------------------------                
 ITEM   | TOTAL    NOT     OMIT |          CATEGORIES                          
        |        PRESENT        |                                              
        |                       |   1       2       3       4                  
 ---------------------------------------------------------------               
 0001   |                       |                                              
   FREQ.|   1000       0       0|    194     303     313     190               
   PERC.|            0.0     0.0|   19.4    30.3    31.3    19.0               
        |                       |                                              
 0002   |                       |                                              
   FREQ.|   1000       0       0|    204     284     310     202               
   PERC.|            0.0     0.0|   20.4    28.4    31.0    20.2               
        |                       |                                              
 0003   |                       |                                               
   FREQ.|   1000       0       0|    206     308     285     201               
   PERC.|            0.0     0.0|   20.6    30.8    28.5    20.1               
        .
 0020   |                       |                                               
   FREQ.|   1000       0       0|    305     211     212     272               
   PERC.|            0.0     0.0|   30.5    21.1    21.2    27.2               
        |                       |                                               
 ---------------------------------------------------------------               
 CUMMUL.|                       |                                              
   FREQ.|                       |   4844    5186    5204    4766                
   PERC.|                       |   24.2    25.9    26.0    23.8               
 ---------------------------------------------------------------               

Item means, initial slope estimates, and Pearson and polyserial item-test correlations are shown in the next table.

Pearson

The sample product-moment correlation of the test score,

           

and m-category polytomous item score,  is the point polyserial correlation , where

           

where n is the sample size,  is the mean test score and , the mean item score.

In this example n = 1000. For item 1,

           

so that

           

Also

           

so that

           

Polyserial correlation

The polyserial correlation  can be expressed in terms of the point polyserial correlation as

where

  •  is the scoring corresponding to the cumulative proportion,
  •  of the k-th response category to item j (for item 1, for example, the cumulative proportions are 0.194, 0.497, and 0.81 for categories 1,2, and 3),
  •  is the standard deviation of item scores for item j (1.009 for item 1), and
  •  is the point-polyserial correlation.
  •  is the ordinate in the normal distribution in the point ; that is

Initial slopes and location

The polyserial correlation estimates the item factor loading, , say. If the arbitrary scale of the item latent variable, , is chosen so that the variance  equals 1, then

           

where  is the factor score with mean 0 and variance 1, and the error, , has mean 0 and variance .

For purposes of MML parameter estimation in IRT, it is convenient to rescale the item latent variable so that the error variance equals 1. The factor loading then becomes the item slope,

           

This provisional estimate of the slope is then used as the starting value in the iterative EM solutions of the marginal maximum likelihood equations for estimating the parameters of the polytomous item response models. The initial locations shown in the last column of the table are the averages of the category thresholds for each item.

Initial item-category threshold parameters

Item-category threshold parameters can be calculated once the polyserial coefficients have been obtained. The expression for the threshold parameter in terms of the cumulative category proportions and the biserial correlation coefficient (Lord & Novick, 1968) as

           

with  the biserial correlation for item j and  the z score that cuts of  proportion of the cases to item j in a unit-normal distribution; that is

           

where  is the frequency of the categorical response for item j and category k. These provisional thresholds of the categories serve as starting values in MML estimation of the corresponding item parameters. For the rating scale model, whether all items have the same thresholds, the category proportions are computed from frequencies accumulated over all items; i.e. ,

           

In Muraki's (1990) formulation of the rating scale model, the category threshold parameter, , is expressed as a deviation from the item threshold parameter, ; that is

 under the constraint that

           

In the context of the rating scale model,  is referred to as a "location" parameter. The INITIAL LOCATION column provides the values of the average of the category thresholds for each item.

 ---------------------------------------------------------------------------
  BLOCK     |  RESPONSE   TOTAL SCORE | PEARSON  &  |  INITIAL      INITIAL  
     ITEM   |    MEAN         MEAN    | POLYSERIAL  |   SLOPE       LOCATION 
            |    S.D.*        S.D.*   | CORRELATION |                        
 ---------------------------------------------------------------------------
   SBLOCK1  |                         |             |                        
     1 0001 |     2.499      49.892   |     0.778   |     1.488      -0.009
            |     1.009*     14.754*  |     0.830   |
     2 0002 |     2.510      49.892   |     0.797   |     1.628      -0.028
            |     1.030*     14.754*  |     0.852   |
     3 0003 |     2.481      49.892   |     0.785   |     1.545       0.020
            |     1.031*     14.754*  |     0.839   |
     4 0004 |     2.515      49.892   |     0.805   |     1.695      -0.045
            |     1.037*     14.754*  |     0.861   |
     5 0005 |     2.511      49.892   |     0.811   |     1.739      -0.031
            |     1.032*     14.754*  |     0.867   |
     6 0006 |     2.137      49.892   |     0.728   |     1.293       0.844
            |     1.037*     14.754*  |     0.791   |
     7 0007 |     2.118      49.892   |     0.735   |     1.336       0.863
            |     1.033*     14.754*  |     0.801   |
     8 0008 |     2.144      49.892   |     0.754   |     1.426       0.765
            |     1.029*     14.754*  |     0.819   |
     9 0009 |     2.136      49.892   |     0.736   |     1.329       0.838
            |     1.029*     14.754*  |     0.799   |
    10 0010 |     2.128      49.892   |     0.730   |     1.293       0.889
            |     1.002*     14.754*  |     0.791   |
    11 0011 |     2.870      49.892   |     0.645   |     0.985      -1.160
            |     1.041*     14.754*  |     0.702   |
    12 0012 |     2.874      49.892   |     0.655   |     1.029      -1.087
            |     1.071*     14.754*  |     0.717   |
    13 0013 |     2.874      49.892   |     0.690   |     1.144      -1.009
            |     1.053*     14.754*  |     0.753   |
    14 0014 |     2.831      49.892   |     0.673   |     1.072      -0.946
            |     1.057*     14.754*  |     0.731   |
    15 0015 |     2.847      49.892   |     0.679   |     1.114      -0.930
            |     1.094*     14.754*  |     0.744   |
    16 0016 |     2.492      49.892   |     0.590   |     0.839       0.018
            |     1.161*     14.754*  |     0.643   |
    17 0017 |     2.541      49.892   |     0.548   |     0.738      -0.166
            |     1.125*     14.754*  |     0.594   |
    18 0018 |     2.463      49.892   |     0.589   |     0.834       0.109
            |     1.152*     14.754*  |     0.641   |
    19 0019 |     2.470      49.892   |     0.573   |     0.798       0.093
            |     1.160*     14.754*  |     0.624   |
    20 0020 |     2.451      49.892   |     0.583   |     0.830       0.050
            |     1.184*     14.754*  |     0.639   |
 ---------------------------------------------------------------------------
   CATEGORY |            |    MEAN    |     S.D.    | PARAMETER
      1     |            |    36.116  |    10.656   |     0.599
      2     |            |    46.091  |    11.156   |    -0.003
      3     |            |    54.107  |    11.165   |    -0.609
      4     |            |    63.427  |    10.739   |     0.000
----------------------------------------------------------------------------

At the end of this table, descriptive statistics for the raw total scores of examinees who responded in each of the 4 categories are given. The highest average total score of 63.427 was for respondents who responded in the 4th category.

top

P  Phase 2 output

A MML approach is used for estimation, and either a normal or empirical latent distribution with mean 0 and standard deviation 1 is assumed. The type of distribution used is controlled by the DIST keyword on the CALIB command. By default, a normal distribution with equally spaced points is used and, for analyses where the LENGTH keyword on the INPUT command is set to a value less or equal to 50, 10 quadrature points will be used.

Because of the potentially wide spacing of category boundary parameters on the latent dimension, it is advisable to use a greater number of quadrature points than in BILOG-MG. In this example, the number of quadrature points was set to 30 (NQPT on the CALIB command).

The EM algorithm is used in the solution of the likelihood equations for parameters, starting from the initial values described in the Phase 1 output. At each iteration, the -2 ln L is given, along with information on the parameter for which the largest change between cycles was observed. The number of EM cycles is controlled by the CYCLE keyword on the CALIB command, and the convergence criterion may be set using the CRIT keyword on the same command. By default, 10 EM cycles would be performed when LENGTH 50 on the INPUT command. In this example, 25 EM cycles with a maximum of 2 inner EM iterations for the item and category parameter estimation. The default convergence criterion is 0.001. For this example, it was set to 0.05.

 [E-M CYCLES]   GRADED RESPONSE MODEL

 CATEGORY AND ITEM PARAMETERS AFTER CYCLE    0

   LARGEST CHANGE=  0.000
   -2 LOG LIKELIHOOD =      55997.850

 CATEGORY AND ITEM PARAMETERS AFTER CYCLE    1

   LARGEST CHANGE=  0.827 (  1.426->  0.599) at Slope    of Item:  8 0008
   -2 LOG LIKELIHOOD =      45361.390

 CATEGORY AND ITEM PARAMETERS AFTER CYCLE    2

   LARGEST CHANGE=  0.335 (  0.599->  0.934) at Slope    of Item:  8 0008
   -2 LOG LIKELIHOOD =      44285.065

 CATEGORY AND ITEM PARAMETERS AFTER CYCLE    3

.

 CATEGORY AND ITEM PARAMETERS AFTER CYCLE   19

   LARGEST CHANGE=  0.005 (  1.010->  1.005) at Slope    of Item:  5 0005

Convergence of the EM algorithm was obtained after 19 cycles was completed. After reaching either the maximum number of EM cycles or convergence, the program will perform the Newton-Gauss (Fisher scoring) cycles requested through the NEWTON keyword on the CALIB command. In this example, NEWTON was set to 5. The information matrix for all item parameters is approximated during each Newton step and then used at convergence to provide large-sample standard errors of estimation for the item parameter estimates.

 

 [NEWTON CYCLES]   GRADED RESPONSE MODEL

 CATEGORY AND ITEM PARAMETERS AFTER CYCLE    0

   LARGEST CHANGE=  0.000
   -2 LOG LIKELIHOOD =      44151.019

 CATEGORY AND ITEM PARAMETERS AFTER CYCLE    1

   LARGEST CHANGE=  0.015 (  0.451->  0.466) at Location of Item: 10 0010
   -2 LOG LIKELIHOOD =      44150.027
.
CATEGORY AND ITEM PARAMETERS AFTER CYCLE    3

   LARGEST CHANGE=  0.004 (  0.997->  0.993) at Slope    of Item:  5 0005

The Newton cycles converged after 3 iterations. As all items were assigned to the same BLOCK, only one table is printed to the output file.

At the top of the table, the estimated category parameters are given. For each m category item, there are m-1 category threshold parameters with

For a polytomous item response model, the discriminating power of a specific categorical response depends on the width of the adjacent category thresholds as well as a slope parameter. Because of this property, the simultaneous estimation of the slope parameter and all  category parameters is not obtainable. If the model includes the slope parameter for each item j as in this example, the location of the category parameters must be fixed. The CADJUST keyword on the BLOCK command was set to 0, and thus the mean of the category parameters is 0. A plot of the category response functions for item 2 is given below.

For each item, the slope at location parameters, along with corresponding standard errors, are given. All guessing parameters are zero for this model.

   ITEM BLOCK   1  SBLOCK1

   CATEGORY PARAMETER  :     0.947     0.005    -0.952
   S.E.                :     0.010     0.009     0.010
+------+-----+---------+---------+---------+---------+---------+---------+
| ITEM |BLOCK|  SLOPE  |   S.E.  |LOCATION |   S.E.  |GUESSING |   S.E.  |
+======+=====+=========+=========+=========+=========+=========+=========+
| 0001 |   1 |   0.918 |   0.033 |   0.009 |   0.043 |   0.000 |   0.000 |
| 0002 |   1 |   0.949 |   0.036 |  -0.005 |   0.041 |   0.000 |   0.000 |
| 0003 |   1 |   0.918 |   0.035 |   0.026 |   0.042 |   0.000 |   0.000 |
| 0004 |   1 |   0.977 |   0.038 |  -0.013 |   0.041 |   0.000 |   0.000 |
| 0005 |   1 |   0.993 |   0.036 |  -0.004 |   0.040 |   0.000 |   0.000 |
| 0006 |   1 |   0.727 |   0.027 |   0.466 |   0.046 |   0.000 |   0.000 |
| 0007 |   1 |   0.753 |   0.029 |   0.488 |   0.045 |   0.000 |   0.000 |
| 0008 |   1 |   0.809 |   0.030 |   0.447 |   0.043 |   0.000 |   0.000 |
| 0009 |   1 |   0.759 |   0.028 |   0.470 |   0.045 |   0.000 |   0.000 |
| 0010 |   1 |   0.784 |   0.028 |   0.469 |   0.046 |   0.000 |   0.000 |
| 0011 |   1 |   0.614 |   0.021 |  -0.488 |   0.051 |   0.000 |   0.000 |
| 0012 |   1 |   0.589 |   0.022 |  -0.495 |   0.051 |   0.000 |   0.000 |
| 0013 |   1 |   0.648 |   0.024 |  -0.489 |   0.049 |   0.000 |   0.000 |
| 0014 |   1 |   0.635 |   0.024 |  -0.431 |   0.049 |   0.000 |   0.000 |
| 0015 |   1 |   0.594 |   0.023 |  -0.473 |   0.049 |   0.000 |   0.000 |
| 0016 |   1 |   0.461 |   0.017 |   0.023 |   0.058 |   0.000 |   0.000 |
| 0017 |   1 |   0.472 |   0.017 |  -0.054 |   0.058 |   0.000 |   0.000 |
| 0018 |   1 |   0.473 |   0.018 |   0.055 |   0.057 |   0.000 |   0.000 |
| 0019 |   1 |   0.451 |   0.017 |   0.051 |   0.059 |   0.000 |   0.000 |
| 0020 |   1 |   0.434 |   0.017 |   0.081 |   0.059 |   0.000 |   0.000 |
+------+-----+---------+---------+---------+---------+---------+---------+

Item information curves and boundary category curves for selected items are given below.

 

The average parameters over all 20 items are given next. If the items are regarded as random samples from a real or hypothetical universe, these quantities estimate the means and standard deviations of the parameters. They could serve as item parameter priors in future item calibrations in this universe.

    SUMMARY STATISTICS OF PARAMETER ESTIMATES

      +----------+---------+---------+----+
      |PARAMETER |   MEAN  | STN DEV |  N |
      +==========+=========+=========+====+
      |SLOPE     |    0.698|    0.189|  20|
      |LOG(SLOPE)|   -0.396|    0.280|  20|
      |LOCATION  |    0.007|    0.344|  20|
      |GUESSING  |    0.000|    0.000|   0|
      +----------+---------+---------+----+

The estimated latent distribution is given next. This distribution is the sum of the posterior distributions of theta for all respondents in the sample. It is represented here as point masses, scaled to sum to 1.0, at 30 equally spaced points on the theta dimension. If the population distribution is normal and the test is sufficiently informative over the range of theta, the posterior distributions for all respondents will approach normality and the latent distribution will approach normality.

                1           2           3           4           5
 POINT    -0.4000E+01 -0.3724E+01 -0.3448E+01 -0.3172E+01 -0.2897E+01
 WEIGHT    0.4056E-04  0.1186E-03  0.3209E-03  0.7976E-03  0.1794E-02

                6           7           8           9          10
 POINT    -0.2621E+01 -0.2345E+01 -0.2069E+01 -0.1793E+01 -0.1517E+01
 WEIGHT    0.3575E-02  0.6218E-02  0.9755E-02  0.1606E-01  0.3047E-01

               11          12          13          14          15
 POINT    -0.1241E+01 -0.9655E+00 -0.6897E+00 -0.4138E+00 -0.1379E+00
 WEIGHT    0.5170E-01  0.7091E-01  0.8433E-01  0.1025E+00  0.1188E+00

               16          17          18          19          20
 POINT     0.1379E+00  0.4138E+00  0.6897E+00  0.9655E+00  0.1241E+01
 WEIGHT    0.1172E+00  0.1049E+00  0.8685E-01  0.7017E-01  0.5187E-01

               21          22          23          24          25
 POINT     0.1517E+01  0.1793E+01  0.2069E+01  0.2345E+01  0.2621E+01
 WEIGHT    0.3443E-01  0.1973E-01  0.1009E-01  0.4569E-02  0.1812E-02

               26          27          28          29          30
 POINT     0.2897E+01  0.3172E+01  0.3448E+01  0.3724E+01  0.4000E+01
 WEIGHT    0.6602E-03  0.2278E-03  0.7509E-04  0.2378E-04  0.7269E-05

 TOTAL WEIGHT: 1.00000
 MEAN        : 0.00000
 S.D.        : 0.99970

The goodness-of-fit of the polytomous item response model can be tested item by item. Summation of the item fit can also be used for the goodness-of-fit for the test as a whole. The fit statistics are useful in evaluating the fit of models to the same response data when models are nested in their parameters.

Respondents are assigned to H intervals on the -continuum. The number of intervals is set using the ITEMFIT keyword on the CALIB command. The expected a posteriori (EAP) score of each respondent is used for assigning respondents to the H intervals. The observed frequency  of the k-th category response to item j in interval h, and , the number of respondents assigned to item j in the h-th interval, are computed. The estimated s are rescaled so that the variance of the sample distribution equals that of the latent distribution on which the MML estimation of the parameters is based.

Thus an H by  contingency table is obtained for each item j. In order to avoid expected values less than 5, neighboring intervals and/or categories may be merged. For each interval, the interval mean, , and the value of the fitted response function , is computed. Finally, a likelihood-ratio -statistic for each item is computed by

where  is the number of intervals left after neighboring intervals are merged. The degrees of freedom is  where  is the number of categories left after collapsing.

The likelihood-ratio -statistic for the test as a whole is simply the summation of the separate -statistics. The number of degrees of freedom is also the summation of the degrees of freedom for each item.

               ITEM FIT STATISTICS
 -----------------------------------------------
 |  BLOCK   | ITEM | CHI-SQUARE |  D.F. | PROB. |
 -----------------------------------------------
 | SBLOCK1  | 0001 |   21.77618 |   18. | 0.242 |
 |          | 0002 |   21.43210 |   18. | 0.258 |
 |          | 0003 |   26.15977 |   18. | 0.096 |
 |          | 0004 |   17.87777 |   17. | 0.397 |
 |          | 0005 |   21.00994 |   17. | 0.225 |
 |          | 0006 |   15.79930 |   19. | 0.671 |
 |          | 0007 |   35.01442 |   19. | 0.014 |
 |          | 0008 |   17.11320 |   19. | 0.583 |
 |          | 0009 |   21.75840 |   19. | 0.296 |
 |          | 0010 |   15.01418 |   19. | 0.722 |
 |          | 0011 |   29.25256 |   19. | 0.062 |
 |          | 0012 |   17.20233 |   19. | 0.577 |
 |          | 0013 |   21.39086 |   19. | 0.315 |
 |          | 0014 |   16.11206 |   19. | 0.650 |
 |          | 0015 |   20.23002 |   19. | 0.381 |
 |          | 0016 |    9.16042 |   22. | 0.992 |
 |          | 0017 |   32.65036 |   21. | 0.050 |
 |          | 0018 |   18.78548 |   22. | 0.659 |
 |          | 0019 |   11.37258 |   23. | 0.979 |
 |          | 0020 |   28.82264 |   23. | 0.186 |
 -----------------------------------------------
 |  TOTAL   |      |  417.93460 |  389. | 0.150 |
 -----------------------------------------------

The null hypothesis tested here is that there are no significant differences between the expected and observed frequencies. A significant -statistic indicates that item parameters differ across the raw score groups and that the assumed model is not appropriate for the data. In this case, no item showed poor fit to the assumed model.

top

P  Phase 3 output

The first information given in the output from the scoring phase is on the scoring function used for scaling. The default function is STANDARD, and thus the standard scoring function (1.0, 2.0) will be used even though a different scoring function may be used for calibration. The scoring function may also be set to CALIBRATION (SCORING keyword on the SCORE command) to use the calibration scoring function specified in the BLOCK command instead. Note that the scoring function only applies to the partial credit model.

 SCORING FUNCTION FOR SCALING

 BLOCK:   1  SBLOCK1

          1     1.000
          2     2.000
          3     3.000
          4     4.000

Bayes estimates are computed for each examinee with respect to his or her group latent distribution (controlled by the EAP option on the SCORE command used here). A discrete distribution on a finite number of points (see below) is used as prior. The user may select the number of points and the type of prior using the NQPT and DIST keywords on the SCORE command.

 [EAP SUBJECT ESTIMATION]

 QUADRATURE POINTS AND PRIOR WEIGHTS:

                1           2           3           4           5
 POINT    -0.4000E+01 -0.3724E+01 -0.3448E+01 -0.3172E+01 -0.2897E+01
 WEIGHT    0.3692E-04  0.1071E-03  0.2881E-03  0.7181E-03  0.1659E-02

                6           7           8           9          10
 POINT    -0.2621E+01 -0.2345E+01 -0.2069E+01 -0.1793E+01 -0.1517E+01
 WEIGHT    0.3550E-02  0.7042E-02  0.1294E-01  0.2205E-01  0.3481E-01

               11          12          13          14          15
 POINT    -0.1241E+01 -0.9655E+00 -0.6897E+00 -0.4138E+00 -0.1379E+00
 WEIGHT    0.5093E-01  0.6905E-01  0.8676E-01  0.1010E+00  0.1090E+00

               16          17          18          19          20
 POINT     0.1379E+00  0.4138E+00  0.6897E+00  0.9655E+00  0.1241E+01
 WEIGHT    0.1090E+00  0.1010E+00  0.8676E-01  0.6905E-01  0.5093E-01

               21          22          23          24          25
 POINT     0.1517E+01  0.1793E+01  0.2069E+01  0.2345E+01  0.2621E+01
 WEIGHT    0.3481E-01  0.2205E-01  0.1294E-01  0.7042E-02  0.3550E-02

               26          27          28          29          30
 POINT     0.2897E+01  0.3172E+01  0.3448E+01  0.3724E+01  0.4000E+01
 WEIGHT    0.1659E-02  0.7181E-03  0.2881E-03  0.1071E-03  0.3692E-04

 MEANS AND STANDARD DEVIATIONS OF ABILITY DISTRIBUTIONS

 SCORE        MEAN    STANDARD     TOTAL
 NAME                 DEVIATION FREQUENCIES
 ---------------------------------------------
 EAP            0.000     0.948      1000.00
 ---------------------------------------------

In this example, the keywords SMEAN and SSD were set to 0 and 1 respectively on the SCORE command. As a result, the following output reflects the rescaling constants (0.000 and 1.055) used in this particular case.

 RESCALING DONE WITH RESPECT TO USER SUPPLIED LINEAR TRANSFORMATION

 SCORE       LOCATION  SCALING      TOTAL
 NAME        CONSTANT  CONSTANT  FREQUENCIES
 ---------------------------------------------
 EAP            0.000     1.055      1000.00
 ---------------------------------------------

Scores are saved to an external file (keyword SCORE on SAVE command), but the first three scores are printed to the output file for purposes of checking. When EAP is used for scoring, the S.E. column represents the posterior standard deviation.

 SUBJECT IDENTIFICATION                 WEIGHT/FREQUENCY
 SCORE NAME    GROUP    WEIGHT MEAN CATEGORY ATTEMPTS    ABILITY       S.E.
----------------------------------------------------------------------------
   .4473          |        1  GROUP 01      1.00
1  EAP         1  |     1.00       3.00     1.00        0.6415      .2179
----------------------------------------------------------------------------
  -.9346          |        2  GROUP 01      1.00
1  EAP         1  |     1.00       1.95     1.00       -0.7410      0.2148
----------------------------------------------------------------------------
  -.5646          |        3  GROUP 01      1.00
1  EAP         1  |     1.00       2.10     1.00       -0.4371      0.2100
----------------------------------------------------------------------------

 MEANS AND STANDARD DEVIATIONS OF ABILITY DISTRIBUTIONS

 SCORE        MEAN    STANDARD     TOTAL
 NAME                 DEVIATION FREQUENCIES
 ---------------------------------------------
 EAP            0.000     1.000      1000.00
 ---------------------------------------------

When EAP is selected, an estimate of the population distribution of ability in the form of a discrete distribution of a finite number of points is obtained by accumulating the posterior densities over the subjects at each quadrature point. These sums are then normalized to obtain the estimated probabilities at the points. Improved estimates of the latent distribution may be obtained after one more iteration of the solution.

The program also computes the mean and standard deviation for the estimated latent distribution. Sheppard's correction for coarse grouping is used in the calculation of the standard deviation. The EAP estimate is the mean of the posterior distribution while the standard error is the standard deviation of the posterior distribution. Posterior weights are only given when EAP is used. Note that it is based on all cases, and not just on those cases used in calibration.

A plot of the quadrature points and posterior weights are given below.

 

 

 QUADRATURE POINTS AND POSTERIOR WEIGHTS: SCORE SET #    1

                1           2           3           4           5
 POINT    -0.4000E+01 -0.3724E+01 -0.3448E+01 -0.3172E+01 -0.2897E+01
 WEIGHT    0.5205E-04  0.1497E-03  0.3950E-03  0.9498E-03  0.2056E-02

                6           7           8           9          10
 POINT    -0.2621E+01 -0.2345E+01 -0.2069E+01 -0.1793E+01 -0.1517E+01
 WEIGHT    0.3954E-02  0.6721E-02  0.1056E-01  0.1753E-01  0.3232E-01

               11          12          13          14          15
 POINT    -0.1241E+01 -0.9655E+00 -0.6897E+00 -0.4138E+00 -0.1379E+00
 WEIGHT    0.5369E-01  0.7190E-01  0.8280E-01  0.9979E-01  0.1151E+00

               16          17          18          19          20
 POINT     0.1379E+00  0.4138E+00  0.6897E+00  0.9655E+00  0.1241E+01
 WEIGHT    0.1133E+00  0.1016E+00  0.8522E-01  0.7102E-01  0.5304E-01

               21          22          23          24          25
 POINT     0.1517E+01  0.1793E+01  0.2069E+01  0.2345E+01  0.2621E+01
 WEIGHT    0.3597E-01  0.2151E-01  0.1129E-01  0.5361E-02  0.2294E-02

               26          27          28          29          30
 POINT     0.2897E+01  0.3172E+01  0.3448E+01  0.3724E+01  0.4000E+01
 WEIGHT    0.8969E-03  0.3271E-03  0.1125E-03  0.3665E-04  0.1131E-04

 TOTAL WEIGHT: 1.00000
 MEAN        : 0.00027
 S.D.        : 0.97432

The mean and standard deviation of the latent posterior distribution calculated from posterior weights at quadrature points are also given. In these calculations, the formulas for the variance of grouped data are used, with quadrature points as class marks and posterior weights as class frequencies.

top back


Copyright © 2005-2010, Scientific Software International, Inc., All rights reserved.
7383 N. Lincoln Ave., Suite 100, Lincolnwood, IL 60712-1747