TREE directive

Declares one or more tree data structures and initializes each one to have a single node known as its root.


No options


Parameter

IDENTIFIER = identifiers
Identifiers of the trees


Description

BTREE declares and initializes GenStat tree structures. These can be used to represent hierarchical structures like classification trees, identification keys and regression trees. These types of tree can be constructed by special-purpose procedures BCLASSIFICATION, BKEY and BREGRESSION, respectively, and displayed by procedures BGRAPH and BPRINT. Most users will use only these special-purpose procedures, and will not need to operate on trees directly, nor to be aware of how they are formed, stored or manipulated. The procedures, however, are based on a suite of directives, functions and procedures summarized below, which provide the tool kit not only for the officially-supported tree facilities but also for user enhancements and extensions.

   The tree structure is like a real tree, which starts from a root and then splits into branches, except that it is usually viewed as growing downwards instead of upwards. The branch-points in the tree are known as nodes, with the initial node being called the root (as in a real tree). There is also a node at the end of each branch, known as its terminal node. In GenStat a tree is similar to a pointer, with an element for each node. These elements are the identifiers of data structures which can be used to store information about the nodes. Usually the data structures will be pointers, so that several pieces of information can be stored for each node, but the precise contents depend on the type of tree (see, for example, procedures BCLASSIFICATION, BKEY and BREGRESSION).

   Each node thus has a number, corresponding to the index of its element in the tree. The root is always numbered one, and this is the only node that the tree contains when it is declared by TREE. Further nodes can be added by the BGROW or BJOIN directives, which form branches from a terminal node or join another tree to a terminal node, respectively. The converse process of cutting a tree at a defined node and discarding the nodes and information below it is provided by the BCUT directive.

   The numbers of the subsequent nodes can be obtained from the functions that are provided to navigate around a tree:

    BNEXT
provides the numbers of the nodes below a node;

    BPREVIOUS
provides the number of the node immediately above a node;

    BTERMINAL
finds the next terminal node after a node;

    BSCAN
finds the number of the node immediately after a node in a standard branch-by-branch order that visits each node once.

Other useful functions include:

    BNBRANCHES
provides the number of branches below a node;

    BDEPTH
calculates the depth of a node (taking the root as being at depth 1);

    BPATH
provides a variate containing the numbers of the nodes on the branch to a node;

    BBRANCHES
provides a variate containing the numbers of the branches taken on the path to a node;

    BBELOW
provides a variate containing numbers of all the nodes or all the terminal nodes below a node;

    BNNODES
provides the number of nodes in a tree;

    BMAXNODE
provides the maximum node number in a tree.

   There are also several utility procedures, which are used by the special-purpose tree procedures.

    BCONSTRUCT
constructs a tree (using subsidiary procedure BSELECT, which is customized according to the type of tree).

    BGRAPH
plots a tree.

    BPRINT
displays a tree.

    BPRUNE
prunes a tree using minimal cost complexity (assuming that "accuracy" values have been stored at each node of the tree, which can be done using customized procedure BVALUES).

New tree-based analyses can thus be added by writing a main procedure (like BCLASSIFICATION, BREGRESSION etc), and defining appropriate versions of BSELECT.

 

Options: none.

Parameter: IDENTIFIER.