• regression;
  • PLS;
  • path models;
  • H-principle;
  • multi-block data;
  • directed network


The author has developed a unified theory of path and multi-block modelling of data. The data blocks are arranged in a directional path. Each data block can lead to one or more data blocks. It is assumed that there is given a collection of input data blocks. Each of them is supposed to describe one or more intermediate data blocks. The output data blocks are those that are at the ends of the paths and have no succeeding data blocks. The optimisation procedure finds weights for the input data blocks so that the size of the total loadings for the output data blocks are maximised. When the optimal weight vectors have been determined, the score and loading vectors for the data blocks in the path are determined. Appropriate adjustment of the data blocks is carried out at each step. Regression coefficients are computed for each data block that show how the data block is estimated by data blocks that lead to it. Methods of standard regression analysis are extended to this type of modelling. Three types of ‘strengths’ of relationship are computed for each set of two connected data blocks. First is the strength in the path, second the strength where only the data blocks leading to the last one are used and third if only the two are considered. Cross-validation and other standard methods of linear regression are carried out in a similar manner. In industry, processes are organised in different ways. It can be useful to model the processes in the way they are carried out. By proper alignment of sub-processes, overall model can be specified. There can be several useful path models during the process, where the data blocks in a path are the ones that are actual or important at given stages of the process. Data collection equipments are getting more and more advanced and cheap. Data analysis need to ‘catch up’ with the challenges that these new technology provides with. Copyright © 2008 John Wiley & Sons, Ltd.