Predicting Life Expectancy
Intro
The focus here is on EDA (Exploratory Data Analysis) and investigating the best choice for the hyperparameter for LASSO and Ridge Regression.
We will be working on the Life Expectancy CSV data obtained from WHO.
Peeking at Data
We begin by viewing the columns of the Life Expectancy Dataframe:
Index(['Country', 'Year', 'Status', 'Life expectancy ', 'Adult Mortality', 'infant deaths', 'Alcohol', 'percentage expenditure', 'Hepatitis B', 'Measles ', ' BMI ', 'under-five deaths ', 'Polio', 'Total expenditure', 'Diphtheria ', ' HIV/AIDS', 'GDP', 'Population', ' thinness 1-19 years', ' thinness 5-9 years', 'Income composition of resources', 'Schooling'], dtype='object')
We can then view the range of our life expectancy values with a box plot:

From here we can glean
- the folks that died early at late 30's, early 40's;
- the minimum and maximum excluding these outliers (whiskers)
- the first and third Quartiles; and
- the mean () life expectancy
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2938 entries, 0 to 2937 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Country 2938 non-null object 1 Year 2938 non-null int64 2 Status 2938 non-null object 3 Life expectancy 2928 non-null float64 4 Adult Mortality 2928 non-null float64 5 infant deaths 2938 non-null int64 6 Alcohol 2744 non-null float64 7 percentage expenditure 2938 non-null float64 8 Hepatitis B 2385 non-null float64 9 Measles 2938 non-null int64 10 BMI 2904 non-null float64 11 under-five deaths 2938 non-null int64 12 Polio 2919 non-null float64 13 Total expenditure 2712 non-null float64 14 Diphtheria 2919 non-null float64 15 HIV/AIDS 2938 non-null float64 16 GDP 2490 non-null float64 17 Population 2286 non-null float64 18 thinness 1-19 years 2904 non-null float64 19 thinness 5-9 years 2904 non-null float64 20 Income composition of resources 2771 non-null float64 21 Schooling 2775 non-null float64 dtypes: float64(16), int64(4), object(2) memory usage: 505.1+ KB
Correlations
Then to produce a correlation matrix we would require exclusively continuous values. As such I have dropped those that are not:
Note: We only need to view the lower / upper triangular section of the matrix due to the symmetry.

Letting our libraries continue doing the heavy lifting for us:

From here we learn that Adult Mortality is very negatively correlated with Life Expectancy (which makes sense). Also the 3 most positively correlated features with Life Expectancy are
- HIV/AIDS 0.753
- Income Comp of Resources 0.729
- Schooling 0.722
All of which are interesting in their own right.
Standard Scaling
Because we will be using regularised regression today and the penalties on weight will be affected by the magnitudes of the features we must first standardise our data:
Feature Means: [-7.96290464e-15 -7.75607595e-17 -4.30893108e-18 6.89428973e-17 2.58535865e-17 1.63739381e-16 -4.30893108e-18 -6.89428973e-17 -2.58535865e-17 2.57458632e-16 1.68048312e-16 1.29267933e-17 6.03250352e-17 8.61786217e-17 -3.01625176e-17 5.60161041e-17 -7.75607595e-17 1.72357243e-16 7.92843319e-16] Feature Variances: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
Splitting Training and Test Data
Now we split the testing and training data
Empiricism on Lambda (Ridge)
And start training a Ridge regression with varying values of the hyperparameter lambda:

We then find the best Lambda:

Empiricism on Lambda (LASSO)
We then repeat for our L1 regularisation model:

Thus we find the optimal lambda to be…

Results
It seems that in both situations we have fucked up. The LASSO and Ridge hyperparameters are being found to be 0. A quick fitting of the traning data to sklearn's LassoCV model may help clear some confusion:
Mean Squared Error (TRAIN): 62.824553484016874 Mean Squared Error (TEST) : 68.37568737602082 With Lambda as {'alphas': None, 'copy_X': True, 'cv': 5, 'eps': 0.001, 'fit_intercept': True, 'max_iter': 1000, 'n_alphas': 100, 'n_jobs': None, 'positive': False, 'precompute': 'auto', 'random_state': None, 'selection': 'cyclic', 'tol': 0.0001, 'verbose': False}
Conclusion
The above shows that our analysis was not incorrect in attempting to determine the most optimal s, but rather a mistake has occurred in the preprocessing step causing the statistical signifance of my data to become muddled. In a later refactoring I may come back and improve my preprocessing, or abandon it completely in favour of a more homogenous dataset.
Train Accuracy: -8.965044425866656 Test Accuracy: -7.17445447809699