🤖 GUIDE: MACHINE LEARNING ALGORITHMS
BY: RYAN ZERNACH
SUMMARY — What flavors of machine learning algorithms are there? What are the differences between them? Which instances are most appropriate for each algorithm?
ABC’s — FOUNDATIONAL KNOWLEDGE
THERE ARE MANY DIFFERENT FLAVORS OF MACHINE LEARNING ALGORITHMS…
Like a magic spell, most machine learning algorithm can be called with only a few lines of code. However, each algorithm has different mathematical computations that are happening behind the scenes. Therefore, it’s important to know which machine learning algorithm would be best — determined by the features that are being used, the values that you’re trying to predict, and the amount of data that you have.
We’ll dig into more about when it’s best to use each algorithm, but first, here’s a diagram to illustrate the vast array of “flavors” of machine learning algorithms.
WHAT ARE THE DIFFERENCES BETWEEN REGRESSION & CLASSIFICATION MACHINE LEARNING ALGORITHMS?
When trying to predict a a value that is numerically continuous, such as the value of a home, one would use a regression model. How much? How many?
However, when trying to classify a row into one of n-number of categories, such as object detection in an image, then you’d need to use a classification model.
WHAT ARE THE DIFFERENCES BETWEEN SUPERVISED & UNSUPERVISED MACHINE LEARNING ALGORITHMS?
In instances of supervised machine learning, we have prior knowledge of what the output values for our samples should be.
On the other hand, unsupervised machine learning does not have labeled outputs, so its goal is to infer structure within a set of data points.
123’s — MACHINE LEARNING ALGORITHMS
THIS IS A LIST OF NOT EVEN (10) MACHINE LEARNING ALGORITHMS.
DATAROBOT AUTOMATICALLY TRAINS 60+ ALGORITHMS WITH THE CLICK OF A BUTTON.
LEARN MORE ABOUT MY DATAROBOT EXPERIENCE HERE.
from sklearn.linear_model import LinearRegression
Specifications?
1) Data is relatively linear
2) Instances have several attributes
3) Attributes are conditionally dependent
Applications?
1) Evaluate business trends to make estimates or forecasts
2)
3)
Colab Notebooks
from sklearn.linear_model import Ridge
Specifications?
1) TBD
Applications?
1) TBD
Colab Notebooks
from sklearn.linear_model import LogisticRegression
Specifications?
1) TBD
Applications?
1) TBD
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import DecisionTreeRegressor
Specifications?
1) TBD
Applications?
1) Unsupervised Categorization
2) Document Categorization
3) Classifying News Articles
from sklearn.naive_bayes import GaussianNB
Specifications?
Applications?
1) Sentiment Analysis
2) Document Categorization
3) Classifying News Articles
4) Email Spam Filtering
from sklearn.cluster import KMeans
Specifications?
1) TBD
Applications?
1) Unsupervised Categorization
2) Document Categorization
3) Classifying News Articles
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import RandomForestRegressor
Specifications?
In the random forest, we grow multiple trees in a model. To classify a new object based on new attributes each tree gives a classification and we say that tree votes for that class. The forest chooses the classifications having the most votes of all the other trees in the forest and takes the average difference from the output of different trees.
In general, Random Forest built multiple trees and combines them together to get a more accurate result.While creating random trees it split into different nodes or subsets. Then it searches for the best outcome from the random subsets. This results in the better model of the algorithm. Thus, in a random forest, only the random subset is taken into consideration.
Applications?
1) Unsupervised Categorization
2) Document Categorization
3) Classifying News Articles
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neighbors import NearestNeighbors
Specifications?
1) Moderate/large training dataset
2) Instances have several attributes
3) Attributes are conditionally dependent
Applications?
1) Unsupervised Categorization
2) Document Categorization
3) Classifying News Articles
from sklearn.cluster import DBSCAN
Specifications?
Given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away).
Applications?
1) Biotech wearable signal processing/classifying
2) EEG neuroelelctrical transmissions
3) Noise reduction/filtering technologies
Click here to view DBSCAN’s CS-Unit-1-Build-Week Implementation in Google Colab
THANKS FOR READING!
CHECK OUT ANOTHER PROJECT OR BLOG POST OF MINE…