Penalized Regression and Model Selection. Penalized regression methods are assessed here to determine best fit for data on atmospheric conditions. Initial data preprocessing was conducted to deal with missing values and colinearity. Alternative method were then considered. Penalized regression methods of Ridge Regression, Elastic Net, and LASSO were assessed and cross validated to tune lamda values. A bootstrap comparison to confirm reduction in variability of estimates of standard deviations follows. A final cross validated assessment of the assessment process concludes the script.
Autoregressive Integrated Moving Average. Time series analysis using autoregressive integrated moving average to predict incoming cardio vascular examinations at health centers located in Abbeville, Louisiana. Model performance is compared against an exponential smoothing state space model and evaluated by Akaike information criterion.
Variable Selection is intended to choose the “best” subset of predictor variables. Here is a simple illustration of the procedure applied to choose a parsimonious model from atmospheric data collected over the course of several years in a city in Taiwan.
Clustering Methods and Principal Component Analysis - Looking at a dataset of different wines and their characteristics the following will assess both hierarchical and nonhierarchical clustering methods. The results of which will be used in a principal component analysis.
Logistic Regression for predicting the future status of loans in order to increase profitability. Data are comprised of a sample of 50,000 records of loans less than $35,000. There are 30 available variables. Methods include general summary statistics and transformations, visualizations, and an analysis of predictions by modeling.