The data were analyzed with a multinominal logistic regression model with ordered data (ordinal scale), using the cumulative logit link (proc genmod, SAS Inst.). The training level, profession of the data collector, and age were continuous variables. The type of horse, discipline/intended use, sex, and site of collection were fixed effects. Data collector was included as a repeated factor due to an assumption that differences between collectors could be expected.
After an initial multinominal logistic regression analysis, “profession of data collector” was excluded, as it showed a very high level of correlation with “site of collection/type of consent”. The remaining factors were retained in the model even if they were not significant due to confounding effects between factors.
An additional analysis was made where BCS categories were merged into three groups: ideal (BCS 5,6), above ideal (BCS > 6), and below ideal (BCS < 5). The rationale behind this was that the ordinal scale of BCS may not reflect the same mechanisms in horses scoring below the ideal BCS vs. horses scoring above the ideal BCS. Mechanisms that increase the risk of being below ideal are likely different from the mechanisms that decrease the risk of being above ideal.
At first, simple contingency tables were made with p-values based on chi-square tests (Fisher’s exact test was used if there were fewer than five observations in a cell). A multinomial logistic regression model (proc glimmix, SAS Institute) was performed, with ideal BCS as the reference value, and below and above ideal BCS as categories. The modelling approach followed the same procedure as the ordinal analyses. Data collector was introduced as a random variable.
For explanatory variables with more than two levels and a p-value below 0.10 for the overall effect of the variable, further analyses were performed to investigate if there were groups that differed significantly from the rest of the groups. This was done stepwise, so the group with the largest difference from the average BCS was tested against the remaining groups. If this was significant, the procedure was repeated, comparing the next group to the remaining groups. This procedure continued until the p-value was non-significant.