No student devices needed. Know more
10 questions
Which of the following statements justify the Maximum Likelihood approach ?
It returns a model that assigns high probability to observed data
It minimises the KL divergence KL[p_data || p_model]
It minimises the KL divergence KL[p_model || p_data]
It minimises the reconstruction error of the data
Which of the following statements, when combined together, explain why we cannot train VAEs using Maximum likelihood Estimation?
The decoder is parameterised by a neural network so it is highly non-linear
The latent variable is continuous
MLE requires evaluating the marginal distribution on data
There are too many datapoints in the dataset
Which of the following statements are true for the VAE objective?
It is a lower-bound to the maximum likelihood objective
The gap between the VAE objective and the maximum likelihood objective is KL[p(z)||q(z|x)]
The KL term can always be viewed as a regulariser for the VAE encoder
The optimum of the VAE decoder is also the MLE optimum
In the famous “Chinese room” turing test example, a man will be sitting inside a room doing English-to-Chinese translation, and the other volunteers outside the room will be asked to guess, based on the English-to-Chinese translation results, whether the man in the room understands Chinese or not. You are one of the volunteers. You know the man in the room is English so you assume a priori he does not understand Chinese with probability 0.8. Now given the translation result is correct, how would you guess whether he understands Chinese or not?
I’m sure he definitely understand Chinese
He probably doesn’t understand Chinese (with probability 0.8)
Give me more info about the correct translation rates for those who only speak English
Give me more info about the correct translation rates for those who speak both English and Chinese
Which best represents the reparameterisation trick?
y = μ + σϵ where ϵ∼N(0, I)
y ∼N(μ, σ)
y ∼N(E(x), ϵ)
None of the above
Which of the following statements are true for the encoder in a Variational Autoencoder.
It is an approximation function which outputs likely latent representations for a given input.
It is equivalent to the true posterior
It is an approximation of the true posterior
It is still required during the generation process
Heuristically which of the two plots is the best loss for the Generator in a Generative Adversarial Network?
-log(D(G(z)))
log(1 - D(G(z)))
Mode collapse is when...
The Generator learns a parameter setting where it only produces one or a select few points.
The Generator cannot learn as the Discriminator classifies all the Generator’s samples as fake thereby producing a useless learning signal.
The Generator is too deep and suffers from vanishing gradients.
None of the above
This figure details the different output signals generated from an optimal discriminator for an original GAN and a Wasserstein GAN. The blue and green dots represent the 1D real and fake data instances. Match the colour line with the type of GAN.
Red → Wasserstein, Teal → Original
Teal → Wasserstein, Red → Original
This plot confuses me
Which of the following statements are true about Beta-VAEs? (note: beta is the coefficient of the KL term)
When beta = 1 Beta-VAEs are equivalent to VAEs
Increasing beta increases the constraint on the latent bottleneck
Decreasing beta increases the constraint on the latent bottleneck
Increasing beta increases the level of disentanglement
Decreasing beta increases the level of disentanglement
Explore all questions with a free account