🤖 10x Data Science Academy Certification by DataRobot

BY: RYAN ZERNACH

SUMMARY — As a member of the inaugural cohort of DataRobot's 10x Data Science Academy, what projects did I build using DataRobot? What problems does DataRobot's AutoML platform solve?

PROJECTS


PREDICTING SALARIES OF STACKOVERFLOW USERS

FEATURE IMPORTANCES SHOW "DevType" AS THE MOST INFLUENTIAL FEATURE IN PREDICTING THE SALARY OF A DEVELOPER

BLUEPRINT OF THE MODEL SHOWS THAT THERE ARE MANY TEXT FEATURES THAT ARE BEING NUMERICALLY ENCODED FOR DATA PREP

THE MODEL TENDS TO OVER-PREDICT FOR HIGHER SALARIES & UNDER-PREDICT FOR LOWER SALARIES

ZOOM IN
ZOOM OUT

THE FEATURE IMPORTANCE MAP (IF YOU ZOOM IN) SHOWS THAT THERE IS CORRELATION "CLUSTERS" AMONG THE FEATURES THAT ARE COLORED (NOT WHITE) — MOST PRIMARILY ORANGES & GREENS

THE BEAUTY OF VISUALIZING THE PREDICTED VALUES VERSUS ACTUAL VALUES — ZOOM IN!

DESCRIPTION

ZOOM IN
ZOOM OUT

BUILT-IN DASHBOARD FOR MAKING PREDICTIONS USING MODEL

GENERATING FUEL EFFICIENCY PREDICTIONS FOR FUTURE VEHICLES

This prediction algorithm can be used to predict a vehicle's fuel efficiency, miles-per-gallon (MPG), based on the vehicle's mechanical specifications. However, a greater number of vehicle manufacturers are continuing to make the switch to producing electric vehicles.

If I were to generate a model to predict an electric vehicle's miles-per-kWh (kilowatt hour) electric efficiency, I would not use cylinders, transmissions, exhaust valves, nor ethanol — which are most of the features within the MPG dataset. Instead, I would primarily be focused on the laws of physics: weight & aerodynamics. How much does the intended electric vehicle weigh, and how much friction is caused with air particles as the vehicle slices through the atmosphere? These are all questions that could be answered by vehicle design software before it's even prototyped/produced.

BUILT-IN DASHBOARD FOR MAKING PREDICTIONS USING MODEL

CREATING PREDICTIONS FOR WHICH COUNTY WILL NEXT BE INFECTED WITH COVID-19

B

B

B

B

MULTICOLLINEARITY AMONGST FEATURES DOESN'T HAVE A NEGATIVE EFFECT ON MODEL ACCURACY, BUT IT REDUCES INTERPRETABILITY OF THE EFFECTS OF CORRELATED FEATURES ON THE TARGET ACCURACY

FEATURE CORRELATION MAP SHOWS (12) DIFFERENT CORRELATION "CLUSTERS" AMONG FEATURES

NOTE: bolded have_confirmed_18 s the target feature to be predicted

B

B

B

B

REDUCING BIAS IN THE HIRING PROCESS

B

B

B

B

MULTICOLLINEARITY AMONGST FEATURES DOESN'T HAVE A NEGATIVE EFFECT ON MODEL ACCURACY, BUT IT REDUCES INTERPRETABILITY OF THE EFFECTS OF CORRELATED FEATURES ON THE TARGET ACCURACY

FEATURE CORRELATION MAP SHOWS (12) DIFFERENT CORRELATION "CLUSTERS" AMONG FEATURES

NOTE: bolded have_confirmed_18 s the target feature to be predicted

B

B

DETECTING CHURN BEFORE IT HAPPENS TO PREVENT IT FROM HAPPENING

B

B

B

B

MULTICOLLINEARITY AMONGST FEATURES DOESN'T HAVE A NEGATIVE EFFECT ON MODEL ACCURACY, BUT IT REDUCES INTERPRETABILITY OF THE EFFECTS OF CORRELATED FEATURES ON THE TARGET ACCURACY

FEATURE CORRELATION MAP SHOWS (12) DIFFERENT CORRELATION "CLUSTERS" AMONG FEATURES

NOTE: bolded have_confirmed_18 s the target feature to be predicted

B

CATEGORIZING CONSUMER COMPLAINTS USING NATURAL LANGUAGE PROCESSING

B

B

B

B

MULTICOLLINEARITY AMONGST FEATURES DOESN'T HAVE A NEGATIVE EFFECT ON MODEL ACCURACY, BUT IT REDUCES INTERPRETABILITY OF THE EFFECTS OF CORRELATED FEATURES ON THE TARGET ACCURACY

FEATURE CORRELATION MAP SHOWS (12) DIFFERENT CORRELATION "CLUSTERS" AMONG FEATURES

NOTE: bolded have_confirmed_18 s the target feature to be predicted

B

CURRICULUM


WEEK 1 — GETTING STARTED

THE PATH TO SUCCESSFUL A.I.

DATAROBOT DASHBOARD SCREENSHOTS

Endless Number of Tabs to Learn Everything You'd Ever Want to Know About Your Algorithm's Model

AutoML Feature Trains 60+ Algorithmic Models on Your Data to Determine the Most Appropriate Match

WEEK 2 – EXPLAIN & EVALUATE YOUR MODEL
WEEK 3 – TARGET LEAKAGE & PARTITIONING

KEY POINTS

Target leakage can be found in datasets of every variety: classification, regression, time series, natural language processing, and computer vision. Target leakage is when a variable/feature/column is included in a model, but it's a variable that's not known at the time of prediction.

Organizations that are concerned about risks from machine learning models require interpretability and strong documentation--especially for highly regulated institutions such as insurance and banking. Many organizations exert heightened scrutiny around models that might have sensitive characteristics such as race and gender.

LINKS

WEEK 4 – BIAS, TRUST, & ETHICS

KEY POINTS

LINKS

WEEK 5 – MODEL DEPLOYMENT

KEY POINTS

LINKS

WEEK 6 – BEYOND AUTOPILOT

KEY POINTS

LINKS

WEEK 7 – MULTI-MODAL DATA / IMAGES-GEOSPATIAL / DEEP LEARNING

KEY POINTS

LINKS

WEEK 8 – UNDERSTANDING HOW TO USE TIME

KEY POINTS

LINKS