Predictive Modeling of Longitudinal Muscle and Bone Adaptations on Explosive Performance in Collegiate Athletes

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Health and Human Performance

Date of Award

Spring 5-1-2026

Abstract

Recent advances in sport science and data analytics have increased interest in applying machine learning techniques to better understand and predict athletic performance outcomes. Explosive lower-body performance, particularly vertical jump ability, is influenced by complex interactions among musculoskeletal composition and biomechanical factors. However, the extent to which these variables can be used to accurately predict explosive performance outcomes remains an area of ongoing investigation. PURPOSE: The purpose of this study was to evaluate the effectiveness of multiple machine learning algorithms in predicting vertical jump performance outcomes using musculoskeletal and biomechanical variables collected from collegiate athletes. METHODS: A total of 305 NCAA Division I athletes (age = 20.4 ± 1.8 years) representing multiple sports participated in this longitudinal analysis. Body composition variables were obtained using dual-energy X-ray absorptiometry (DXA), while performance and biomechanical variables were collected using a three-dimensional markerless motion capture system (DARI Motion). The dataset included 1,445 observations collected across repeated testing sessions. Predictive models were developed using several statistical and machine learning approaches, including Linear Regression, Lasso, Elastic Net, K-Nearest Neighbors, Decision Tree, Random Forest, LightGBM, and CatBoost. The dataset was partitioned into training (80%) and testing (20%) subsets. Model performance was evaluated using the coefficient of determination (R²) and root mean square error (RMSE). Additional analyses examined the effects of feature selection, feature engineering, and outlier removal on model performance. RESULTS: Gradient boosting models demonstrated the strongest predictive performance. The optimized CatBoost model achieved the highest overall accuracy, with R² values of 0.852 for vertical jump height (RMSE = 0.0514) and 0.887 for peak power (RMSE = 495.65), while prediction of rate of force development (R² = 0.614; RMSE = 2868.27) was comparatively lower across models. Removing statistical outliers reduced predictive accuracy across all outcomes. Feature selection and engineered datasets improved model performance in several cases. CONCLUSIONS: These findings demonstrate that machine learning approaches, particularly gradient boosting algorithms, can effectively model relationships between musculoskeletal characteristics and explosive performance outcomes. Vertical jump height and peak power were predicted with high accuracy, while rate of force development exhibited greater unexplained variability. The results highlight the importance of dataset structure and feature engineering when applying machine learning in sport science and support the integration of predictive analytics into athlete monitoring and performance evaluation systems.

Advisor

Michael Oldham

Subject Categories

Kinesiology | Life Sciences

Share

COinS