Glossary

Median:

The median of a variable in a sample is the “middle” value, with 50% of the observations having a higher and 50% of the observations having a lower value.

Regression:

Regression is a statistical method which is used to determine the direction and strength of the relationship between one dependent variable and several explanatory variables. The general form of a multiple linear regression is:

Y = a + b1X1 + b2X2 + b3X3 + ... + btXt + u

Where:

  • Y is the dependent variable the model tries to explain
  • X1, X2, X3, ... are the variables used to explain Y (explanatory or independent variables) • a is the intercept
  • b1, b2, b3, ... are the slope parameters
  • u is the regression residual

A regression model can include variables measured on different scales (nominal, ordinal, interval, ration scale).

The values of slope parameters and the intercept are then estimated using a data set which contains values of the dependent and the explanatory variables for a sample of observations. A statistical method is applied to identify the values of the intercept and slope parameters which are the best “fit” for the model, i.e. which minimise the residuals.

Logistic regression:

Logistic regression models are used when the dependent variable in a multi-variate analysis is binary (i.e. can take only the values 0 or 1). Logistic regression is useful to analyse the probability of a certain event (e.g. the probability that a child stops doing hazardous work after receiving support) based on one or several explanatory variables. In a logistic regression, the equation which models the relationship between the dependent and the explanatory variables is a non-linear equation, which accounts for the fact that the value on one side of the equation can only be either zero or one. The explanatory variables in a logistic regression model can be measured on any scale, and the explanatory variables included in the model can be measured on different scales.

Multicollinearity:

In regression analysis, multicollinearity occurs when the dataset contains a high correlation between one explanatory variable and another. Since regression analysis is based on the idea that the value of one independent variable can be changed while holding the value of all other independent variables fixed, it becomes difficult to estimate independently the relationship between each explanatory variable and the dependent variable when changes in one explanatory variable are associated with changes in another. Strong correlations between explanatory variables can therefore cause problems when fitting the regression model and interpreting the results.