# Ordinations

## Ordinations

In a nutshell: Ordination are a diverse set of approaches in multivariate statistics aiming to reduce or model multivariate data into main gradients or variables.

### Background

Ordination techniques evolved already more than a century ago in mathematics, and allowed for a reduction of information that makes these analysis approaches timely up until today. Ordination techniques rely strongly on a profound knowledge of the underlying data format of the respective dataset that is being analysed. Since ordination allow for both inductive and deductive analysis, they often pose a risk for beginners, who typically get confused by the diversity of approaches and the answers these analysis may provide. This conduction is often increased by the model parameters available to evaluate the results, since much of ordination techniques allows for neither probability based assumptions, let alone more advanced information based techniques. What is more, ordinations are often deeply entangled in disciplinary cultures, with some approaches such as factor analysis being almost exclusive to some disciplines, and other approaches such as principal component analysis being utilised in quite diverse ways within different disciplines. This makes norms and rules when and how to apply different techniques widely scattered and intertwined with disciplinary norms, while the same disciplines are widely ignorant about other approach from different disciplines. Here, we try to diver a diverse and reflected overview of the different techniques, and what their respective strengths and weaknesses are. This will necessary demand a certain simplification, and will in addition trigger controversy within certain branches of science, and these controversies are either rooted in partial knowledge or in experiential identity. Unboxing the whole landscape of ordinations is also a struggle because these analysis are neither discrete nor always conclusive. Instead they pose starting points that often serve initial analysis, or alternatively enable analysis widely uncoupled from the vast landscape of univariate statistics. We need to acknowledge to this end that there is a vast difference between the diverse approaches not only in the underlying mathematics, yet also how these may be partly ignored. This is probably the hardest struggle that you can fin neither in textbooks nor in articles. The empirical reality is that many applications of ordinations violate much of the mathematical assumptions or rules, yet the patterns derived from these analyses are still helpful if not even valid. Mathematicians can choose to live in a world where much of ordination techniques is perfect in every way, yet the datasets the world gives to ordinations are simply not. Instead, we have to acknowledge that multivariate data is almost always messy, contains a high amount of noise, many redundancies, and even data errors. Safety comes in numbers. Ordinations are so powerful exactly because they can channel all these problems through the safety of the size of the data, and thus derive either initial analysis or even results that serve as endpoints. However, there is a difference between initial or final results, and this will be our first starting point here.

Ordination are one of the pillars of pattern recognition, and therefore play an important role not only in many disciplines, but also in data science in general. The most fundamental differentiation in which analysis you should choose is rooted in the data format. The difference between continuous data and categorical or nominal data is the most fundamental devision that allows you to choose your analysis pathway. The next consideration you need to review is whether you see the ordination as a string point to inspect the data, or whether you are planning to use it as an endpoint or a discrete goal within your path of analysis. Ordinations are indeed great for skimming through data, yet can also serve as a revelation of results you might not get through other approaches. Other consideration regarding ordinations are related to deeper matters of data formats, especially the question of linearity of continuous variables. This already highlights the main problem of ordination techniques, namely that you need a decent overview in order to choose the most suitable analysis, because only through experience can you pick what serves your dataset best. This is associated to the reality that many analysis made with ordinations are indeed compromises. Ecology and psychology are two prominent examples of disciplines widely using ordinations for diverse analyses and thus establish diverse traditions. However, many analyses based on ordinations are indeed compromises, and from a mathematical standpoint are real world analysis based on ordinations a graveyard of mathematical assumptions, and violation of analytical foundations that borderline ethical misconduct. In other words, much of ordinations are messy. This is especially true because ordinations are indeed revealing mostly continuous results in the form of location on ordination axes. While multivariate analyis based on cluster analysis are hence more discrete through the results being presented as groups, ordinations are typically nice to graphically inspect, but harder to analytical embedded into a wider framework. More on this point later. Let us now begin with a presentation of the diverse ordination types and their respective origins.

## Specific ordinations

### Correspondence analysis

This ordination is one of the most original ordination techniques, and builds form its underlying mechanics on the principal component analysis. However, since it is based on the chi square test, it is mainly applied for categorical data, although it can also be applied to count data, given that the dataset contains enough statistical power for this. In a nutshell, the correspondence analysis creates orthogonal axis that represent a dimension reduction of the input data, thereby effectively reducing the multivariate categorical data into artificial exes, out of which the first contains the most explanatory power. Typically, the second and third axis contain still meaningful information, yet for most datasets the first two axis may suffice. The correspondence analysis is today mostly negotiable in terms of its direct application, yet serves as an important basis for other approaches, such as the Detrended Correspondence analysis or the Canonical Correspondence analysis. This is also partly related to the largest flaw in the Correspondence analysis, namely the so called Arch-effect, where information on the first two axis is skewed due to mathematical representation of the data. Still, the underlying calculation, mainly the reciprocal averaging approach make it stand out as a powerful tool to sort large multivariate datasets based on categorical or count data. Consequently, the basic reciprocal averaging was initially very relevant for scientific disciplines such as ecology and psychology.

### Detrended Correspondence analysis

Hill can be credited with eliminating the arch- or horseshoe effect by a move that mathematicians will probably criticise until the end of time. What this analysis does is that it simply takes the geometric space that comes out of a Correspondence analysis and bends it into an even shape. In other words, the loadings out of a CA are detrended. This has several benefits. For once, you look at a dimension reduction that is visually more easy to interpret, because it does not follow the horseshoe effect of the CA. The second benefit that the detrending has is that you can calculate a measure that is called a turnover. A full turnover in the data is defined as the two most extreme data points do not share ay joined information or data. This can be a highly useful tool to access the heterogeneity of a dataset, and is indeed a unique measure to approximate diversity. While this has found raving success in ecology, there are few areas outside of this domain that realised the value of calculating turnovers. Equally is the DCA clearly less abundantly used outside of ecology, which is a loss for everyone with multivariate presence/absence or count data on their hand. Instead other fields usually opt for other approaches, which is partly rooted in their data structure and may also be partly attributed to their specific philosophy of science (i.e. inductive vs. deductive), but mostly due to different methodological developments in the respective fields. Another advantage of the DCA is the possibility to postdoc fit environmental parameters of the plots onto the axes of the ordination. This allows to interpret how the environmental patterns relate to the respective ordination axes as well as individual plots. Since this posthoc test is based on a permutation analysis, there is even a crude measure on whether relations between environmental parameters and ordination axes are significant.

### Canonical correspondence analysis

This ordination method became really big around the millennium, and flips the script in terms of the analysis. Instead of reducing the multivariate species/plot matrix into artificial axes, and posthoc fit environmental information onto these axes as the DCA does, the CCA first derives a multivariate space based on the environmental information, and then fits species data into this environmental ordination. In other words is it assumed that the environment determines the species occurrence, which is closer to the assumptions of ecology, and reduces the multivariate environment into a multivariate space, just like niches. Within these multivariate axes the species information-or primary matrix- is integrated. The CCA thus fits the species into the environmental space, or in other words, the secondary metrix is used to calculate the axes reductions, and the primary matrix is implemented into this in a second step. This was a highly promising approach at the turn of the millennium, yet has decreased in importance over the last years. While it clearly has its specific benefits, the original promise to "put things into even better order" (Palmer 1993 Ecol) has only been partly fulfilled. No one ordination is better than any other ordination, but they all have their specific preconditions and use cases.

### Non metric (multi)dimensional scaling

This type of analysis has clearly a huge benefit compared to other approaches, it is able to deal with non-linearity in data. NMDS can include non-parametric as well as euclidean data, and it's thus a powerful ordination algorithms that can not only deal with a a variety of data and its distribution, but also with missing data. Since it can be quite computer demanding, the NMDS is still on the rise with rising computer capacity, yet has not cracked the largest datasets yet, which are beyond its reach so far. Still, since it is more flexible and adaptable in its approach, it has found raving success, and is among the newer approaches in the pantheon of ordinations. The main difference to other techniques is that the user predefines the number of axes they expect from the analysis. NMDS then reduces the multivariate data and calculates a stress value, which is a a measure how well the analysis reduced the multivariate information into artificial axes. The NMDS aces many datasets when it comes to group differentiations, which is probably one of the main reasons why it has become a darling of data scientists who often prefer groups over gradients. Some users propose to measure the R2 value of the analysis, and there are rumours about thresholds that defines an excellent fit. Just as other such thresholds this depends on the data and aim of the analysis, no one cut-off level can help here. Still, the NMDs is versatile, flexible, and surpasses many of its alternatives especially when the analysis is more rigorous and strutted and not just some initial fishing.

### Principal component analysis

The PCAS is probably the most abundantly used ordination technique. Focussed on an Euclidiean approach, it is unmached in analysing linear continuous data. Whenever you have a larger multivariate datasets, such as a table with many continuous variables and are interested in the main gradients of the dataset, the PCA is your universal and versatile tool. It reduces multivariate data into artificial main components based on linear combinations, and this is a powerful approach to test any given dataset for redundancies. Therefore, if you want to analyse a larger dataset through univariate analysis, the PCA is s staple to test your predictor dates for redundancies. It shows the predictor variables as vector, where lengths shows the correlative power with the artificial axes, and direction shows with which axes it correlates. The first axes explains the most, and the others subsequently less. Often the first axis can explain up to 50 % of the variance of a dataset, and you shall see that the following axis explain way less at a certain point. This is called a broken stick approach, which is used to eyeball how many axes need to be considered from the model. More often than not, it is two or three. Within trait analysis e.g. in psychology or ecology, it can also be more, and there are other examples where more axes are need to meaningfully represent the dataset. All this is again based on how homogenous the dataset is. Hence the PCA can also serve as a dimension reduction that allows us to reduce highly collinear data into artificial axes, which is helpful i.e. in climate science, remote sensing or engineering, where multicollinearity is almost pathologically high. In such branches of sciences, linear models are not even made with individual predictions, but instead with the PCA axes themselves, which are implemented in the univariate model.s While this literally cures the problem of redundancy and collinearity, it can make interpretation of the results much harder, as one always has to think around the corner on what the individual axes actually meant, and with which predictor variables they correlated. Still, or maybe exactly because of this, the PCA is one of the most powerful ordination techniques, and has proven its value over time. A special case of the PCA is the factor analysis, which is not focussing on dimension reduction, but instead tries to identify latent variables. Such latent variables are often constructs such as characteristics generalising several variables, such as GDP related variables in the world bank global datasets. These latent variables represent correlates of the predictor variables, and while the PCA is widely inductive does the factor analysis serve more as a modelling approach that aims to generate predictor variables in the form of constructs.

### Redundancy analysis

Just in a nutshell, redundancy analysis are a quite special case of how several variables or set of variables can be compared in how they explain each other or a dependent variable. This is a very special case of multivariate statistics that we do not go into too deeply, but in a nutshell it is very helpful if you have groups of variables that can be the tested as groups, and not as individual variables. Since the results can be visualised in a form of a Venn diagram, it can be worthwhile exploring it. Since deep down the RDA is comparing matrices, it is surely multivariate, yet also about one metric relating to another, creating a vital -at least mechanical- link to univariate statistics. Since RDA is based on the regression model, it is also based in its preconditions. While the PCA is modelling an array of explanatory variables, the RDA models and array of predicted variables, which is the main difference.

## Normativity

Ordinations are deeply rooted in many scientific traditions that resolve around the realms of the quantitative. Some of the approaches date back decades, and have been tested and integrated as a staple of statistics. However, all disciplines are deeply dogmatic and utilise only few ordinations, while other approaches are virtually unknown in this discipline. Psychology has the factor analysis, ecology the DCA, remote sensing the PCA and more novel data science the NMDS. While all have been used in all these disciplines this is rarely the case, and there is a certain preference for the established tool(s) of choice. This is especially a reason for concern because all disciplines could rely on the diversity of these approaches and models. Another reason for concern is that all these approaches have quite different evaluative criteria, and these are hard to compare and properly utilise. Ordinations are a clusterbliep of almost uncountable measures of statistics, and one would wish to use a dimension reduction to make head or tails of them. Yet ordinations are far from such save bets of univariate statistics. There is nothing to solve this but to learn all these measures and approaches point by point. Yet the last and most difficult challenge is to know how much these evaluate criteria mean, since within different disciplines variances and patterns differ. In linguistics noise is high, hence ordinations explain little. In ecology in some ecosystems noise may be comparably lower. Yet this does not mean that ecology can explain more, but that habitat patterns of plants in a well defined ecosystem are less noise than language. In other words, ordination results and pattern clarity depend on the context, and this cannot be generalised at all. This may be the reason why disciplines depend on few ordinations, because these are became more tamed over the years.

## Outlook

While ordinations remain relevant in statistics, there is a slow trends that machine learning approaches slowly take extremely inductive approaches over. Yet dimension reduction is also highly relevant in data science, and this may actually lead to further developments that embed ordinations deeper into more diverse applications. The biggest hope is that ordinations are developed further, because there are still unsolved questions of model building, comparability of evaluative measures, and indeed of deductive interpretations of these models that make them highly attractive in times of big data, also since many of these approaches are computationally powerful.