VSTRUCTURE directive

Defines a variance structure for random effects in a REML model.


Options

TERMS = formula
Model terms for which the covariance structure is to be defined

FORMATION = string
Whether the structure is formed by direct product construction or by definition of the whole matrix (direct, whole); default dire

CORRELATE = string
Whether to impose correlation across the model terms if several are specified (none, positivedefinite, unrestricted); default none

CINITIAL = scalars
Initial values for covariance matrix across terms

COORDINATES = identifiers
Coordinates of the data points to be used in calculating distance-based models (list of variates or matrix)


Parameters

MODEL = strings
Type of covariance model associated with the term(s), or with individual factors in the term(s) if FORMATION=direct (identity, fixed, AR, MA, ARMA, power, banded, correlation, antedependence, unstructured, diagonal, uniform, FA, FAequal) default iden

ORDER = scalar
Order of model

HETEROGENEITY = string
Heterogeneity for correlation matrices (none, outside); default none

METRIC = string
How to calculate distances when MODEL=power (cityblock, squared, euclidean); default city

FACTOR = factors
Factors over which to form direct products

MATRIX = identifiers
To define matrix values for a term or the factors when MODEL=fixed

INVERSE = identifiers
To define values for matrix inverses (instead of the fixed matrices themselves) when MODEL=fixed

INITIAL = scalars, variates, matrices, symmetric matrices or pointers
Initial parameter values for each correlation matrix (supplied in the structures appropriate for the model concerned)

CONSTRAINTS = texts
Texts containing strings none, fix or positive to define constraints for the parameters in each model

EQUALITYCONSTRAINTS = variates
Non-zero values in the variate indicate groups of parameters whose values are to be constrained to be equal


Description

VSTRUCTURE can be used to define the form of covariance structure for any term in the random model defined for REML by VCOMPONENTS. By default, the effects for each random term are assumed to be independent with common variance σj2 for term j, that is, the random term has covariance matrix σj2I. VSTRUCTURE is used to define correlation between random effects within terms, to allow a changing variance within a term, and to define correlations between different random terms. These models are particularly useful when fitting linear models to repeated measurements or spatial data and for random coefficient regression.

   VSTRUCTURE can only be used after VCOMPONENTS has been used to define the fixed and random models. It can be used more than once to define different structures for different random terms. The information is accumulated within GenStat, and it will all be used by subsequent REML commands. You can check on the model and covariance structures defined at any time by using the VSTATUS directive. To cancel a covariance structure for a term you simply need to use VSTRUCTURE to change the model back to the scaled identity matrix σj2I. To cancel all covariance structures you can give a new VCOMPONENTS command.

   For a random term constructed from more than one factor, the covariance matrix can be formed either as a single matrix for the whole term, or as the direct product of several matrices corresponding to the factors. Consider an analysis of repeated measurements where data has been taken weekly from each subject, and one of several different treatments has been applied to each subject. It is likely that data taken from the same subject will be correlated, with correlation decreasing over time, but that subjects will be independent. This corresponds to an I (×) C covariance structure, where the identity matrix I corresponds to the independent subjects, and the covariance matrix C corresponds to the correlated measurements over time within subjects. If we take C to be an auto-regressive process of order 1, this can be defined and fitted as follows:

VCOMPONENTS [FIXED=Tmt] RANDOM=Subject.Week

VSTRUCTURE [TERM=Subject.Week] MODEL=I,AR; ORDER=1; \

  FACTOR=Subject,Week

REML Y

The TERM option is used to specify the term to which the covariance structure is to be applied. For each factor in the term you can then specify the covariance model to be applied (see below for list of available models). However, it is not necessary to specify factors for which the default identity model is required, so the following is an equivalent specification:

VCOMPONENTS [FIXED=Tmt] RANDOM=Subject.Week

VSTRUCTURE [TERM=Subject.Week] MODEL=AR; ORDER=1; FACTOR=Week

To cancel the covariance structure for the term, a null setting is sufficient:

VSTRUCTURE [TERM=Subject.Week]

Although the covariance structure for each term here is of the form Gj = I, the variance matrix for the data is of the form

V = σ2 ( Σj γjZjGjZj′ + I )

In this case the random subject term generates correlations that are equal across all the times within subjects. It is important to remember that including a random term in the model will generate uniform correlations between units with the same values of the random factor(s). Retaining these terms in the model as well as specifying a correlated structure may be appropriate for some data sets, but can sometimes lead to difficulties in parameter estimation.

   The possible settings for the MODEL parameter, generating symmetric covariance matrices C (Ci, j = Cj, i for all i, j), are listed below. Where more than one model order can be used, the default is shown in bold and can be changed by using the ORDER option. For the AR, MA, ARMA, power and banded models, the order is the same as the number of parameters to be fitted. For the banded, correlation, ante-dependence and unstructured models, the order is the number of non-zero off-diagonal bands in the matrix. For the FAequal and FA models, the order is the number of columns in the matrix Λ.

identity

identity matrix

Ci, i = 1, Ci,j = 0, for ij

fixed

fixed matrix

Ci, j specified

AR

auto-regressive

order 1 or 2

2=0 for order 1)

Ci, i = 1

Ci+1, i = φ1 / (1-φ2)

Ci, j = φ1 Ci-1, j + φ2 Ci-2, j,

i > j+1, -1 < φ1, φ2 < 1,

12|<1, φ21<1, φ2>-1

MA

moving average

order 1 or 2

2=0 for order 1)

Ci, i = 1

Ci+1, i = -θ1(1-θ2)/(1+θ1222)

Ci+2, i = -θ2 / (1+θ1222)

Ci, j = 0, i>j+2

-1 < θ1, θ2 < 1, θ21 < 1

ARMA

auto-regressive moving-average

order 1

Ci, i = 1

Ci+1, i = (θ-φ)(1-φθ)/(1+θ2-2φθ)

Ci, j = φCi-1, j , i>j+1

-1 < φ, θ < 1

power

based on distance

order 1 or 2

1 = φ2 for order 1)

Ci, i = 1

Ci, j = φ1d1φ2d2

d1, d2 = distance in 1st and 2nd dimensions

0 < φ1, φ2 < 1

banded

equal bands

1 < order < nrows-1

Ci, i = 1

Ci+k, i = θk , 1 < k < order

-1 < θk < 1

Ci+k, i = 0, otherwise

correlation

general correlation matrix

1 < order < nrows-1

Ci, i = 1

Ci, j = θij ,

1 < |i-j| ≤ order

Ci, j = 0, |i-j| > order

-1 < θij < 1

uniform

uniform matrix

Ci, j = θ for all i,j

diagonal

diagonal matrix

Ci, i = θi

Ci, j = 0, ij

antedependence

ante-dependence model

1 < order < nrows-1

C-1 = UD-1U

Di, i-1 = di-1,

Di, j = 0 for ij

Ui, i = 1,

Ui, j = uij ,

1 ≤ j-i ≤ order

Ui, j = 0, for i>j

unstructured

general covariance matrix

1 < order < nrows-1

Ci, j = θij ,

0 < |i-j| ≤ order

Ci, j = 0, |i-j| > order

FA

factor analytic

order = 1 or 2

C = ΛΛ′ + Ψ

Λ is an nrows × q matrix

order=q

Ψi = ψi for i=1...nrows

FAequal

factor analytic with common variance

order = 1 or 2

C = ΛΛ′ + Ψ

Λ is an nrows × q matrix

order=q

Ψi = ψ for i=1...nrows


   Initial parameter values can be specified using the INITIAL parameter. For most models, the number of initial values required is the number of parameters, and default values will be generated. However, for unstructured models, a full covariance matrix of initial values must be given, and for the correlation model a full correlation matrix must be provided. For the ante-dependence model, either a full covariance matrix can be provided, or a pointer to a U and a D-1 matrix of the correct forms. For the FA and FAequal models, a pointer must be used to give the initial Λ and Ψ matrices, otherwise default initial values are generated. The FAequal model can be used to get initial values for the FA model. Initial values are required for these models because the algorithm may not converge when many parameters are fitted if the starting values are not realistic. Initial values might be generated from covariance matrices estimated by fitting simpler models, or from residuals from a null variance model. A missing value in the initial values is taken to mean that the value is inestimable and it will be fixed at a small value for the analysis. Alternatively, a parameter can be fixed at its initial value using the CONSTRAINTS parameter. The codes (not case sensitive and able to be abbreviated) may take value fix to indicate the parameter is to be fixed at its initial value, positive to indicate it is to remain positive or none to indicate no constraints. The default is positive/no constraint depending on context, for example scaling parameters are always constrained to remain positive. The default is positive/no constraint depending on context, for example scaling parameters are always constrained to remain positive. The EQUALITYCONSTRAINTS parameter allows you to constrain some of the parameters to have the same value. The variate that it specifies contains a zero value if there is no constraint, and an identical integer value for any set of parameters whose values are to be equal. So, a variate containing the values (0,1,2,1,2) would constrain the second and parameter to be equal to the fourth parameter, and the third parameter to be equal to the fifth parameter.

   For the models defined in terms of correlation matrices, that is, the AR, MA, ARMA, uniform, power, banded and correlation models, it may sometimes be desirable to allow for unequal variances. This can be done by setting option HETEROGENEITY=outside. This means a diagonal matrix D of standard errors will be applied to the correlation matrix C to generate a matrix D½CD½. In this case, a number of extra parameters (equal to the number of effects in the factor or term) should be added to the vector of initial values. These models allow investigation of a structured correlation pattern for changing variances and are particularly useful in the analysis of repeated measurements data when variance increases over time. For example, to allow for changing variance over time in our example above, we can specify

VCOMPONENTS [FIXED=Tmt] RANDOM=Subject.Week

VSTRUCTURE [TERM=Subject.Week] MODEL=AR; ORDER=1;\

  FACTOR=Week; HETEROGENEITY=outside

REML Y

   In some circumstances, you may wish to define a single model to apply to the whole term, instead of using the direct product form illustrated above. In this case, you should set option FORM=whole. Note that, when a term consists of a single factor, it is not necessary to set the FACTOR option.

   When MODEL=fixed is used, you must either give the values of the covariance matrix C using option MATRIX, or give the inverse matrix using option INVERSE. Values for the matrix or its inverse can be supplied as diagonal matrices or symmetric matrices. In addition, values for the inverse matrix can be supplied in sparse form as a pointer. The output from VPEDIGREE is designed for input here, but you can also define the inverse matrix explicitly. The second element of the pointer should then be a variate containing the non-zero values of the inverse in lower triangular order. The first element should be a factor, with number of levels equal to the number of rows n(n+1)/2 of the matrix. This has firstly a block of n values giving the position in the variate of the first value stored for each row. There is then a block of values for each row in turn, giving the columns in which each non-zero value appears.

   When MODEL=power is used to define a distance-based model, the coordinates of the data points must be specified by the COORDINATES option using either a list of variates or a matrix. The number of units for the coordinates must be equal to the number of units in model factors/variates (and the data). Coordinates for groups of points will be taken as the mean value over the group. In the current release, only one- or two- dimensional distances can be used. The distance calculation is defined by the METRIC option. For coordinates {ri,ci} distances dij between points i and j are defined as

    dij = |ri-rj| + |ci-cj|
for METRIC=cityblock (the default);

    dij = (ri-rj)2 + (ci-cj)2
for METRIC=squared; and

    dij = [ (ri-rj)2 + (ci-cj)2 ]1/2
for METRIC=euclidean.

When ORDER=1 for a power model, the correlation pattern is the same in both dimensions and distances will be calculated simultaneously over all dimensions. When ORDER=2, distances will be calculated within each dimension separately and in this case Euclidean and city-block distances are equivalent.

   The CORRELATE option allows you to specify correlations between model terms which have equal numbers of effects. A common correlation will then be fitted between parallel effects. For example, consider a random coefficient regression model where the fixed model contains common response to covariate X and the random model allows for deviations in the intercept and slope about this line for each subject. The random intercept and slope for each subject may be correlated, but subjects are independent. This correlation across terms is defined using the CORRELATE option as follows:

VCOMPONENTS [FIXED=X] RANDOM=SUBJECT+SUBJECT.X

VSTRUCTURE [SUBJECT+SUBJECT.X; CORRELATE=positivedefinite;\

  CINITIAL=!(1,0.1,0.3); FORM=whole]

The CORRELATE option setting positivedefinite is used to ensure that the correlation matrix between the terms remains positive definite. This constraint can be relaxed using the setting unrestricted (an unstructured covariance matrix is then used to describe covariance across the terms). The model fitting is done here in terms of a covariance matrix, where the diagonal elements are the gammas for the correlated terms. The CINITIAL option is used to give initial values for this matrix. If no initial values are given, the initial values are taken from initial gamma values given in VCOMPONENTS when the model is declared. When correlations are declared between terms, you must set FORMATION=whole. In the random coefficient regression model above, no correlation structure is declared within terms since the subjects are independent. However, it is possible to declare correlation/covariance models within terms as usual. For example, an animal breeding model might use VPEDIGREE to set up covariances within terms as follows:

VPEDIGREE PROGENY=animal; FEMALE=dam; MALE=sire; INVERSE=Ainv

VCOMPONENTS [FIXED=Trt] RANDOM=animal+dam+env

VSTRUCTURE [animal+dam; CORRELATE=unr; FORM=whole] \

  MODEL=fixed; INVERSE=Ainv

These declarations set up random terms with covariance structures of the form: cov(animal) = σa2 A, cov(dam) = σd2 A, cov(animal, dam) = σad A.


Direct Products

Although the direct product construction used to build the covariance structures does not generally constrain the models that can be fitted to any data set, you should be aware of the implications that arise when defining covariance structures for the residual term. The REML algorithm used by GenStat detects the presence of the residual term in the model by searching for terms with number of levels equal to the number of data values, n. When no covariance structures are specified, the first term with number of levels > n is used as the residual. However, when covariance structures are defined, the form of the variance model is

V = σ2 ( Σj γjZjGjZj′ + R )

where matrix R corresponds to the residual term and has n rows. For this reason, any term found with > n rows will not be used as the residual if it has a covariance matrix. If no valid residual term is found, a residual term will automatically be added to the model. This may result in an extra error term being fitted unintentionally. An example where this may happen is in repeated measurements data where unequal numbers of measurements have been taken on subjects. If direct product construction is used, the matrix generated will have more rows than the data and cannot be used as R. A work-around is to put missing values in the data set to give equal replication and use REML option MVINCLUDE=yvariate to retain the missing values in the analysis. Alternatively, you could fix the residual component at a small value.

   Note that in the repeated measurements example above, if measurements are taken at different times for each subject, the direct product structure is not appropriate. In this case, a power model may be fitted over the whole term, constraining the between subject correlation to zero:

VSTRUCTURE [TERM=Subject.Week; FORM=whole;\

  COORD=subject,week] MODEL=power; ORDER=2;\

  INITIAL=!(0,0.1); CONSTRAIN=!T(Fix,None)

Note that the parameters run in the order of the coordinates vectors (variates not factors).

 

Options: TERMS, FORMATION, CORRELATE, CINITIAL, COORDINATES.

Parameters: MODEL, ORDER, HETEROGENEITY, METRIC, FACTOR, MATRIX, INVERSE, INITIAL, CONSTRAINTS, EQUALITYCONSTRAINTS.