10x Data Science Academy Certification by DataRobot
BY: RYAN ZERNACH
SUMMARY — As a member of the inaugural cohort of DataRobot’s 10x Data Science Academy, how did I learn to more effectively solve problems using machine learning?
PROJECTS
FEATURE IMPORTANCES SHOW “DevType” AS THE MOST INFLUENTIAL FEATURE IN PREDICTING THE SALARY OF A DEVELOPER
BLUEPRINT OF THE MODEL SHOWS THAT THERE ARE MANY TEXT FEATURES THAT ARE BEING NUMERICALLY ENCODED FOR DATA PREP
THE MODEL TENDS TO OVER-PREDICT FOR HIGHER SALARIES & UNDER-PREDICT FOR LOWER SALARIES
THE FEATURE IMPORTANCE MAP (IF YOU ZOOM IN) SHOWS THAT THERE IS CORRELATION “CLUSTERS” AMONG THE FEATURES THAT ARE COLORED (NOT WHITE) — MOST PRIMARILY ORANGES & GREENS
THE BEAUTY OF VISUALIZING THE PREDICTED VALUES VERSUS ACTUAL VALUES — ZOOM IN!
DESCRIPTION
BUILT-IN DASHBOARD FOR MAKING PREDICTIONS USING MODEL
This prediction algorithm can be used to predict a vehicle’s fuel efficiency, miles-per-gallon (MPG), based on the vehicle’s mechanical specifications. However, a greater number of vehicle manufacturers are continuing to make the switch to producing electric vehicles.
If I were to generate a model to predict an electric vehicle’s miles-per-kWh (kilowatt hour) electric efficiency, I would not use cylinders, transmissions, exhaust valves, nor ethanol — which are most of the features within the MPG dataset. Instead, I would primarily be focused on the laws of physics: weight & aerodynamics. How much does the intended electric vehicle weigh, and how much friction is caused with air particles as the vehicle slices through the atmosphere? These are all questions that could be answered by vehicle design software before it’s even prototyped/produced.
BUILT-IN DASHBOARD FOR MAKING PREDICTIONS USING MODEL
B
B
B
B
MULTICOLLINEARITY AMONGST FEATURES DOESN’T HAVE A NEGATIVE EFFECT ON MODEL ACCURACY, BUT IT REDUCES INTERPRETABILITY OF THE EFFECTS OF CORRELATED FEATURES ON THE TARGET ACCURACY
FEATURE CORRELATION MAP SHOWS (12) DIFFERENT CORRELATION “CLUSTERS” AMONG FEATURES
NOTE: bolded have_confirmed_18 s the target feature to be predicted
B
B
B
B
B
B
B
B
MULTICOLLINEARITY AMONGST FEATURES DOESN’T HAVE A NEGATIVE EFFECT ON MODEL ACCURACY, BUT IT REDUCES INTERPRETABILITY OF THE EFFECTS OF CORRELATED FEATURES ON THE TARGET ACCURACY
FEATURE CORRELATION MAP SHOWS (12) DIFFERENT CORRELATION “CLUSTERS” AMONG FEATURES
NOTE: bolded have_confirmed_18 s the target feature to be predicted
B
B
LINKS
B
B
B
B
MULTICOLLINEARITY AMONGST FEATURES DOESN’T HAVE A NEGATIVE EFFECT ON MODEL ACCURACY, BUT IT REDUCES INTERPRETABILITY OF THE EFFECTS OF CORRELATED FEATURES ON THE TARGET ACCURACY
FEATURE CORRELATION MAP SHOWS (12) DIFFERENT CORRELATION “CLUSTERS” AMONG FEATURES
NOTE: bolded have_confirmed_18 s the target feature to be predicted
B
B
B
B
B
MULTICOLLINEARITY AMONGST FEATURES DOESN’T HAVE A NEGATIVE EFFECT ON MODEL ACCURACY, BUT IT REDUCES INTERPRETABILITY OF THE EFFECTS OF CORRELATED FEATURES ON THE TARGET ACCURACY
FEATURE CORRELATION MAP SHOWS (12) DIFFERENT CORRELATION “CLUSTERS” AMONG FEATURES
NOTE: bolded have_confirmed_18 s the target feature to be predicted
B
CURRICULUM
GOALS
PROCESS
LINKS
THE PATH TO SUCCESSFUL A.I.
DATAROBOT DASHBOARD SCREENSHOTS
Endless Number of Tabs to Learn Everything You’d Ever Want to Know About Your Algorithm’s Model
AutoML Feature Trains 60+ Algorithmic Models on Your Data to Determine the Most Appropriate Match
LINKS
KEY POINTS
Target leakage can be found in datasets of every variety: classification, regression, time series, natural language processing, and computer vision. Target leakage is when a variable/feature/column is included in a model, but it’s a variable that’s not known at the time of prediction.
Organizations that are concerned about risks from machine learning models require interpretability and strong documentation–especially for highly regulated institutions such as insurance and banking. Many organizations exert heightened scrutiny around models that might have sensitive characteristics such as race and gender.
LINKS
KEY POINTS
LINKS
KEY POINTS
LINKS
KEY POINTS
LINKS
KEY POINTS
LINKS
KEY POINTS
LINKS