THANKS FOR READING!

CHECK OUT ANOTHER PROJECT OF MINE:

TENSORFLOW DEVELOPER CERTIFICATION

Although I worked with TensorFlow in Lambda School, I wanted to earn an official TensorFlow Developer certification, so I've been using Coursera to study for the exam.

CERTIFICATE OF COMPLETION
TERMINOLOGY & TOOLS

Loss Functions — determines how good or how bad the model's prediction was

  • 'mean_squared_error' — measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value
  • 'sparse_categorical_crossentropy' — measure the dissimilarity between the distribution of observed class labels and the predicted probabilities of class membership. Categorical refers to the possibility of having more than two classes (instead of binary, which refers to two classes). Sparse refers to using a single integer from zero to the number of classes minus one
  • 'binary_crossentropy' —
  • 'categorical_crossentropy' —
  • Huber — less sensitive to outliers than squared error loss functions

Optimizers — makes a new prediction based on the data from the loss function

  • stg (stochastic gradient descent) —
  • adam —
  • RMSProp(lr=0.001) — if gradient descent is not working (if J-theta is increasing instead of decreasing), then use smaller learning rate, but if it's too small, then it'll take a very long time to converge! Here' s a video by Andrew Ng to demonstrate how to choose the proper learning rate.


Layers defines the output of that node given an input or set of inputs

  • Dense — the most common, regular, connected layer: does the below operation on the input and return the output
  • Conv2D — 2-dimensional convolution layer (e.g. spatial convolution over images) to filter the most important features that determine the output
  • MaxPooling2D — max pooling operation (feature emphasis compression) for 2-dimensional spatial data
  • Conv1D —
  • SimpleRNN —
  • Lambda —

Activation Functions

  • tf.nn.relu — effectively means "If X>0 return X, else return 0" -- so what it does it it only passes values 0 or greater to the next layer in the network
  • tf.nn.softmax — measures probabilities of all possibilities, all of which sum up to one... takes a set of values, and effectively picks the biggest one, so, for example, if the output of the last layer looks like [0.1, 0.1, 0.05, 0.1, 9.5, 0.1, 0.05, 0.05, 0.05], it saves you from fishing through it looking for the biggest value, and turns it into [0,0,0,0,1,0,0,0,0] -- The goal is to save a lot of coding!
  • 'sigmoid' — produces a result from 0 to 1, which is great for binary classifiers
WEEK 1 — A NEW PROGRAMMING PARADIGM

QUIZ

Question 1 — The diagram for traditional programming had Rules and Data In, but what came out?

Answers

Question 2 — The diagram for Machine Learning had Answers and Data In, but what came out?

Rules

Question 3 — When I tell a computer what the data represents (i.e. this data is for walking, this data is for running), what is that process called?

Labelling

Question 4 — What is a Dense?

A layer of connected neurons

Question 5 — What does a Loss function do?

Measures how good the current ‘guess’ is

Question 6 — What does the optimizer do?

Generates a new and improved guess

Question 7 — What is Convergence?

The process of getting very close to the correct answer

Question 8 — What does model.fit do?

It trains the neural network to fit one set of values to another

WEEK 2 — INTRODUCTION TO COMPUTER VISION

QUIZ

Question 1 — What’s the name of the dataset of Fashion images used in this week’s code?

Fashion MNIST

Question 2 — What do the above mentioned Images look like?

28x28 Greyscale

Question 3 — How many images are in the Fashion MNIST dataset?

70,000

Question 4 — Why are there 10 output neurons?

There are 10 different labels

Question 5 — What does Relu do?

It only returns x if x is greater than zero

Question 6 — Why do you split data into training and test sets?

To test a network with previously unseen data

Question 7 — What method gets called when an epoch finishes?

on_epoch_end

Question 8 — What parameter to you set in your fit function to tell it to use callbacks?

callbacks=

WEEK 3 — ENHANCING VISION WITH CONVOLUTIONAL NEURAL NETWORKS

QUIZ

Question 1 — What is a Convolution?

A technique to isolate features in images

Question 2 — What is a Pooling?

A technique to reduce the information in an image while maintaining features

Question 3 — How do Convolutions improve image recognition?

They isolate features in images

Question 4 — After passing a 3x3 filter over a 28x28 image, how big will the output be?

26x26

Question 5 — After max pooling a 26x26 image with a 2x2 filter, how big will the output be?

13x13

Question 6 — Applying Convolutions on top of our Deep neural network will make training:

It depends on many factors. It might make your training faster or slower, and a poorly designed Convolutional layer may even be less efficient than a plain DNN!

WEEK 4 — USING REAL-WORLD IMAGES

QUIZ

Question 1 — Using Image Generator, how do you label images?

It’s based on the directory the image is contained in

Question 2 — What method on the Image Generator is used to normalize the image?

rescale

Question 3 — How did we specify the training size for the images?

The target_size parameter on the training generator

Question 4 — When we specify the input_shape to be (300, 300, 3), what does that mean?

Every Image will be 300x300 pixels, with 3 bytes to define color

Question 5 — If your training data is close to 1.000 accuracy, but your validation data isn’t, what’s the risk here?

You’re overfitting on your training data

Question 6 — Convolutional Neural Networks are better for classifying images like horses and humans because:

All of the above (The distinguishable features may be in different parts of the frame, There’s a wide variety of horses, There’s a wide variety of humans)

Question 7 — After reducing the size of the images, the training results were different. Why?

We removed some convolutions to handle the smaller images

CERTIFICATE OF COMPLETION
TERMINOLOGY & TOOLS
WEEK 1 — EXPLORING A LARGER DATASET

QUIZ

Question 1 — What does flow_from_directory give you on the ImageGenerator?

All of the Above (ability to easily load images for training, ability to pick the size of training images, ability to automatically label images based on their directory name)

Question 2 — If my Image is sized 150x150, and I pass a 3x3 Convolution over it, what size is the resulting image?

148x148

Question 3 — If my data is sized 150x150, and I use Pooling of size 2x2, what size will the resulting image be?

75x75

Question 4 — If I want to view the history of my training, how can I access it?

Create a variable ‘history’ and assign it to the return of model.fit or model.fit_generator

Question 5 — What’s the name of the API that allows you to inspect the impact of convolutions on the images?

The model.layers API

Question 6 — When exploring the graphs, the loss levelled out at about .75 after 2 epochs, but the accuracy climbed close to 1.0 after 15 epochs. What's the significance of this?

There was no point training after 2 epochs, as we overfit to the training data

Question 7 — Why is the validation accuracy a better indicator of model performance than training accuracy?

The validation accuracy is based on images that the model hasn't been trained with, and thus a better indicator of how the model will perform with new images

Question 8 — Why is overfitting more likely to occur on smaller datasets?

Because there's less likelihood of all possible features being encountered in the training process.

WEEK 2 — AUGMENTATION: A TECHNIQUE USED TO AVOID OVERFITTING

QUIZ

Question 1 — How do you use Image Augmentation in TensorFlow?

Using parameters to the ImageDataGenerator

Question 2— If my training data only has people facing left, but I want to classify people facing right, how would I avoid overfitting?

Use the ‘horizontal_flip’ parameter

Question 3 — When training with augmentation, you noticed that the training is a little slower. Why?

Because the image processing takes cycles

Question 4 — What does the fill_mode parameter do?

It attempts to recreate lost information after a transformation like a shear

Question 5 — When using Image Augmentation with the ImageDataGenerator, what happens to your raw image data on-disk?

Nothing, all augmentation is done in-memory

Question 6 — How does Image Augmentation help solve overfitting?

It manipulates the training set to generate more scenarios for features in the images

Question 7 — When using Image Augmentation my training gets...

Slower

Question 8 — Using Image Augmentation effectively simulates having a larger data set for training.

True

WEEK 3 — TRANSFER LEARNING

QUIZ

Question 1 — If I put a dropout parameter of 0.2, how many nodes will I lose?

20% of them

Question 2 — Why is transfer learning useful?

Because I can use the features that were learned from large datasets that I may not have access to

Question 3 — How did you lock or freeze a layer from retraining?

layer.trainable = false

Question 4 — How do you change the number of classes the model can classify when using transfer learning? (i.e. the original model handled 1000 classes, but yours handles just 2)

When you add your DNN at the bottom of the network, you specify your output layer with the number of classes you want

Question 5 — Can you use Image Augmentation with Transfer Learning Models?

Yes, because you are adding new layers at the bottom of the network, and you can use image augmentation when training these

Question 6 — Why do dropouts help avoid overfitting?

Because neighbor neurons can have similar weights, and thus can skew the final training

Question 7 — What would the symptom of a Dropout rate being set too high?

The network would lose specialization to the effect that it would be inefficient or ineffective at learning, driving accuracy down

Question 8 — Which is the correct line of code for adding Dropout of 20% of neurons using TensorFlow?

tf.keras.layers.Dropout(0.2)

WEEK 4 — MULTICLASS CLASSIFICATION

QUIZ

Question 1 — The diagram for traditional programming had Rules and Data In, but what came out?

Answers

Question 2 — Why does the DNN for Fashion MNIST have 10 output neurons?

The dataset has 10 classes

Question 3 — What is a Convolution?

A technique to extract features from an image

Question 4 — Applying Convolutions on top of a DNN will have what impact on training?

It depends on many factors. It might make your training faster or slower, and a poorly designed Convolutional layer may even be less efficient than a plain DNN!

Question 5 — What method on an ImageGenerator is used to normalize the image?

rescale

Question 6 — When using Image Augmentation with the ImageDataGenerator, what happens to your raw image data on-disk.

Nothing

Question 7 — Can you use Image augmentation with Transfer Learning?

Yes. It's pre-trained layers that are frozen. So you can augment your images as you train the bottom layers of the DNN with them

Question 8 — When training for multiple classes what is the Class Mode for Image Augmentation?

class_mode='categorical'

CERTIFICATE OF COMPLETION
TERMINOLOGY & TOOLS

Layers:

  • Embedding — where the magic happens for using neural networks in conjunction with NLP
  • GlobalAveragePooling1D — similar to "Flatten" layer, but instead, it average across the vector to produce flattened results
WEEK 1 — SENTIMENT IN TEXT

QUIZ

Question 1 — What is the name of the object used to tokenize sentences?

Tokenizer

Question 2 — What is the name of the method used to tokenize a list of sentences?

fit_on_texts(sentences)

Question 3 — Once you have the corpus tokenized, what’s the method used to encode a list of sentences to use those tokens?

texts_to_sequences(sentences)

Question 4 — When initializing the tokenizer, how do you specify a token to use for unknown words?

oov_token=

Question 5 — If you don’t use a token for out of vocabulary words, what happens at encoding?

The word isn’t encoded, and is skipped in the sequence

Question 6 — If you have a number of sequences of different lengths, how do you ensure that they are understood when fed into a neural network?

Use the pad_sequences object from the tensorflow.keras.preprocessing.sequence namespace

Question 7 — If you have a number of sequences of different length, and call pad_sequences on them, what’s the default result?

They’ll get padded to the length of the longest sequence by adding zeros to the beginning of shorter ones

Question 8 — When padding sequences, if you want the padding to be at the end of the sequence, how do you do it?

Pass padding=’post’ to pad_sequences when initializing it

WEEK 2 — WORD EMBEDDINGS

QUIZ

Question 1 — What is the name of the TensorFlow library containing common data that you can use to train and test neural networks?

TensorFlow Datasets

Question 2 — How many reviews are there in the IMDB dataset and how are they split?

50,000 records, 50/50 train/test split

Question 3 — How are the labels for the IMDB dataset encoded?

Reviews encoded as a number 0-1

Question 4 — What is the purpose of the embedding dimension?

It is the number of dimensions for the vector representing the word encoding

Question 5 — When tokenizing a corpus, what does the num_words=n parameter do?

It specifies the maximum number of words to be tokenized, and picks the most common ‘n’ words

Question 6 — To use word embeddings in TensorFlow, in a sequential layer, what is the name of the class?

tf.keras.layers.Embedding

Question 7 — IMDB Reviews are either positive or negative. What type of loss function should be used in this scenario?

Binary crossentropy

Question 8 — When using IMDB Sub Words dataset, our results in classification were poor. Why?

Sequence becomes much more important when dealing with subwords, but we’re ignoring word positions

WEEK 3 — SEQUENCE MODELS

QUIZ

Question 1 — Why does sequence make a large difference when determining semantics of language?

Because the order in which words appear dictate their impact on the meaning of the sentence

Question 2 — How do Recurrent Neural Networks help you understand the impact of sequence on meaning?

They carry meaning from one cell to the next

Question 3 — How does an LSTM help understand meaning when words that qualify each other aren’t necessarily beside each other in a sentence?

Values from earlier words can be carried to later ones via a cell state

Question 4 — What keras layer type allows LSTMs to look forward and backward in a sentence?

Bidirectional

Question 5What’s the output shape of a bidirectional LSTM layer with 64 units?

(None, 128)

Question 6 — When stacking LSTMs, how do you instruct an LSTM to feed the next one in the sequence?

Ensure that return_sequences is set to True only on units that feed to another LSTM

Question 7 — If a sentence has 120 tokens in it, and a Conv1D with 128 filters with a Kernal size of 5 is passed over it, what’s the output shape?

(None, 116, 128)

Question 8What’s the best way to avoid overfitting in NLP datasets?

None of the above (Use LSTMs, Use GRUs, Use Conv1D)

WEEK 4 — SEQUENCE MODELS & LITERATURE

QUIZ

Question 1 — What is the name of the method used to tokenize a list of sentences?

fit_on_texts(sentences)

Question 2 — If a sentence has 120 tokens in it, and a Conv1D with 128 filters with a Kernal size of 5 is passed over it, what’s the output shape?

(None, 116, 128)

Question 3 — What is the purpose of the embedding dimension?

It is the number of dimensions for the vector representing the word encoding

Question 4 — IMDB Reviews are either positive or negative. What type of loss function should be used in this scenario?

Binary crossentropy

Question 5 — If you have a number of sequences of different lengths, how do you ensure that they are understood when fed into a neural network?

Use the pad_sequences object from the tensorflow.keras.preprocessing.sequence namespace

Question 6 — When predicting words to generate poetry, the more words predicted the more likely it will end up gibberish. Why?

Because the probability that each word matches an existing phrase goes down the more words you create

Question 7 — What is a major drawback of word-based training for text generation instead of character-based generation?

Because there are far more words in a typical corpus than characters, it is much more memory intensive

Question 8 — How does an LSTM help understand meaning when words that qualify each other aren’t necessarily beside each other in a sentence?

Values from earlier words can be carried to later ones via a cell state

CERTIFICATE OF COMPLETION
TERMINOLOGY & TOOLS

Metrics

  • errors = forecasts - actual
  • mse = np.square(errors).mean()
  • rmse = np.sqrt(mse)
  • mae = np.abs(errors).mean()
  • mape = np.abs(errors / x_valid).mean()
WEEK 1 — SEQUENCES & PREDICTION

QUIZ

Question 1 — What is an example of a Univariate time series?

Hour by hour temperature

Question 2 — What is an example of a Multivariate time series?

Hour by hour weather

Question 3 — What is imputed data?

A projection of unknown (usually past or missing)

Question 4 — A sound wave is a good example of time series data

True

Question 5 — What is Seasonality?

A regular change in shape of the data

Question 6 — What is a trend?

An overall direction for data regardless of direction

Question 7 — In the context of time series, what is noise?

Unpredictable changes in time series data

Question 8 — What is autocorrelation?

Data that follows a predictable shape, even if the scale is different

Question 9 — What is a non-stationary time series?

One that has a disruptive event breaking trend and seasonality

WEEK 2 — DEEP NEURAL NETWORKS FOR TIME SERIES

QUIZ

Question 1 — What is a windowed dataset?

A fixed-size subset of a time series

Question 2 — What does ‘drop_remainder=true’ do?

It ensures that all rows in the data window are the same length by cropping data

Question 3 — What’s the correct line of code to split an n column window into n-1 columns for features and 1 column for a label

dataset = dataset.map(lambda window: (window[:-1], window[-1:]))

Question 4 — What does MSE stand for?

Mean Squared Error

Question 5 — What does MAE stand for?

Mean Absolute Error

Question 6 — If time values are in time[], series values are in series[] and we want to split the series into training and validation at time 1000, what is the correct code?

time_train = time[:split_time]

x_train = series[:split_time]

time_valid = time[split_time:]

x_valid = series[split_time:]

Question 7 — If you want to inspect the learned parameters in a layer after training, what’s a good technique to use?

Assign a variable to the layer and add it to the model using that variable. Inspect its properties after trainingDecompile the model and inspect the parameter set for that layer

Question 8 — How do you set the learning rate of the SGD optimizer? 

Use the lr property

Question 9 — If you want to amend the learning rate of the optimizer on the fly, after each epoch, what do you do?

Use a LearningRateScheduler object in the callbacks namespace and assign that to the callback

WEEK 3 — RECURRENT NEURAL NETWORKS FOR TIME SERIES

QUIZ

Question 1 — If X is the standard notation for the input to an RNN, what are the standard notations for the outputs?

Y(hat) and H

Question 2 — What is a sequence to vector if an RNN has 30 cells numbered 0 to 29

The Y(hat) for the last cell

Question 3 — What does a Lambda layer in a neural network do?

Allows you to execute arbitrary code while training

Question 4 — What does the axis parameter of tf.expand_dims do?

Defines the dimension index at which you will expand the shape of the tensor

Question 5 — A new loss function was introduced in this module, named after a famous statistician. What is it called?

Huber loss

Question 6 — What’s the primary difference between a simple RNN and an LSTM?

In addition to the H output, LSTMs have a cell state that runs across all cells

Question 7 — If you want to clear out all temporary variables that tensorflow might have from previous sessions, what code do you run?

tf.keras.backend.clear_session()

Question 8 — What happens if you define a neural network with these two layers?

tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),

tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),

tf.keras.layers.Dense(1),

Your model will fail because you need return_sequences=True after the first LSTM layer

WEEK 4 — REAL WORLD TIME SERIES DATA

QUIZ

Question 1 — 1.Question 1How do you add a 1 dimensional convolution to your model for predicting time series data?

Use a Conv1D layer type

Question 2 — What’s the input shape for a univariate time series to a Conv1D?

[None, 1]

Question 3 — You used a sunspots dataset that was stored in CSV. What’s the name of the Python library used to read CSVs?

CSV

Question 4 — If your CSV file has a header that you don’t want to read into your dataset, what do you execute before iterating through the file using a ‘reader’ object?

next(reader)

Question 5 — When you read a row from a reader and want to cast column 2 to another data type, for example, a float, what’s the correct syntax?

float(row[2])

Question 6 — What was the sunspot seasonality?

11 or 22 years depending on who you ask

Question 7 — After studying this course, what neural network type do you think is best for predicting time series like our sunspots dataset?

A combination of all of the above (DNN, RNN / LSTM, Convolutions)

Question 8 — Why is MAE a good analytic for measuring accuracy of predictions for time series?

It doesn’t heavily punish larger errors like square errors do