Intro to ML: Neural Networks Lecture 2 Part 2

Intro to ML: Neural Networks Lecture 2 Part 2

University

6 Qs

Similar activities

CALCULUS

CALCULUS

University

10 Qs

Straight Line Equation

Straight Line Equation

10th Grade - University

10 Qs

Persiapan UTS Deep Learning

Persiapan UTS Deep Learning

University

10 Qs

KanyE WeSt

KanyE WeSt

KG - Professional Development

10 Qs

Kpop Clothing

Kpop Clothing

KG - Professional Development

10 Qs

Least Squares and RMSE

Least Squares and RMSE

University

8 Qs

Data Mining Quiz

Data Mining Quiz

University

10 Qs

Gradient Descent Method

Gradient Descent Method

University

10 Qs

Intro to ML: Neural Networks Lecture 2 Part 2

Intro to ML: Neural Networks Lecture 2 Part 2

Assessment

Quiz

Created by

Josiah Wang

Mathematics, Computers, Fun

University

15 plays

Hard

6 questions

Show all answers

1.

MULTIPLE CHOICE QUESTION

2 mins • 1 pt

Is the following statement True or False? A neural network’s weights can be randomly initialised as gradient descent will always eventually find the optimal set of parameters, therefore being invariant to the initial set of parameters

True

False

Answer explanation

A neural network's weights are often randomly initialised. What is wrong here is that gradient descent has no theoretical guarantee to find the optimal set of parameters and often does not achieve this. (recall local optima from the lecture slides)

2.

MULTIPLE CHOICE QUESTION

1 min • 1 pt

L1 regularisation favours few non zero weights where as L2 regularisation favours small values around zero.

True

False

Answer explanation

L1 regularization favours a sparse solution by encouraging a few non-zero weights, while L2 regularization favours small values around zero. This means that L1 regularization tends to select only the most important features, while L2 regularization spreads the importance across all features. In this case, the correct choice is 'True' because L1 regularization does favour a few non-zero weights.

3.

MULTIPLE SELECT QUESTION

1 min • 1 pt

Which of the following statements about dropout are correct?

Dropout prevents complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors.

Dropout is more similar to L2 regularisation than L1 regularisation during training.

Dropout is active during training and testing.

Dropout can be viewed as a form of ensemble learning

The amount of dropout, p, can be optimised through standard stochastic gradient descent (SGD) methods

4.

MULTIPLE CHOICE QUESTION

2 mins • 1 pt

A dense multi layer perceptron layer has 300 input values and 200 output values. Assuming no bias, how many parameters does the layer contain?

500

250

6000

60000

3000

Answer explanation

The given question asks about the number of parameters in a dense multi-layer perceptron layer with 300 input values and 200 output values. Since there is no bias, each input value is connected to each output value, resulting in a total of 300 * 200 = 60000 parameters in the layer. This means that there are 60000 weights to be learned during the training process.

5.

MULTIPLE CHOICE QUESTION

1 min • 1 pt

If a neural network is overfitting, which of the following would not help

Introducing dropout

Reducing the number of layers in the model

Increasing the learning rate

Increasing the size of the training data

None of the above

Answer explanation

To address overfitting in a neural network, several techniques can be employed. Introducing dropout, reducing the number of layers in the model, and increasing the size of the training data are effective methods. However, increasing the learning rate may not help in reducing overfitting. By adjusting the learning rate, the model may become more prone to overshooting the optimal solution and fail to generalize well. Therefore, increasing the learning rate is not a suitable approach to combat overfitting.

6.

MULTIPLE SELECT QUESTION

2 mins • 1 pt

Which of the following statements are True?

Minibatch Gradient Descent (GD) updates the network based on an expectation of the gradient of the parameter space at that point.

Minibatch GD updates the network based on the exact gradient of the parameter space at that point.

The computational cost of batch GD and minibatch GD is the same.

The computation cost of batch GD is larger than minibatch GD.