2018 November 06

Location: GRLA 107

Attendees clarified a hodge-podge of confusing topics and jargon outlined below. These topics and jargon are regularly encounted in ML literature - they are rarely explained in most papers and can inhibit getting started in ML. Next week, Bane’s project is the primary topic of focus. A VAE summary and PyTorch vs. Keras discussion are backup topics. For the remainder of Fall 2018 we are meeting in Coors Tech 282.

Attending:
Andy Antoine Thomas Iga Jihyun Jonah Xiaodan Hayden

TODO:

  • Post NN video (Thomas)
  • Compile slide for next week (Bane)
  • Begin compiling VAE summary (Iga, Andy, Thomas)
  • Begin compiling PyTorch vs. Keras materials (Jihyun)

Problems

  1. Input data spans several orders of magnitude, large amplitudes dominate.
    Consider weighting the residuals (like a weighted norm!).
  2. GANs are hard to train?
    Whomp whomp.

Jargon: Variational method specific

  1. variational distribution: A probability distribution derived using variational methods. Such as Monte Carlo Markov Chain (MCMC).

  2. amortized inference: Removing memoryless/independence property of traditional statistics.

  3. mean field inference: ???

  4. ELBO: A method related to variational methods. Evidence Lower BOund.

Jargon: General

  1. Convolutional vs. Deconvolutional layer meaning:
    Convolutional layers apply a filter via convolution. Deconvolutional layers are really just the transpose of a convolutional layer. The term deconvolution isn’t used correctly in many ML papers.

  2. Back propagation:
    Application of chain rule to derive gradient for weight updates in a network.

  3. What is the difference between Stochastic Gradient Descent and Steepest Gradient Descent?
    Same thing(ish). Stochastic Gradient Descent is a like a submethod of Steepest Gradient Descent. Stochastic involves using a subset of data, each subset called a batch, to update gradient. Every batch has its own gradient. Updating the gradient this way can help you deal with large amounts of data in a manageable way - and may help get away from local minima.

  4. Learning rate:
    Step-length along direction of gradient you travel while minimizing the loss-function.

  5. Epochs:
    An epoch is complete after all data has been considered in set of gradient updates via Stochastic Gradient Descent. Epochs conclude after all batches have had an opportunity to influence the weights in a network.

  6. Batch normalization layers:
    Normalize the subset of data that is in a specific batch. This actually adds weights that need to be learned during training. It relies on a batch that is large enough to be normally distributed. Could regulate behavior of training.

  7. Dropout layers:
    Upon output, a dropout layer ignore a random subset of weights/activated neurons. Weird form of regularization. Difficult to tune. Antoine discourages using this.

Topics of discussion for next week

  • Watch neural network video, post and compile questions.
  • Bane - semester project
  • Touch on VAEs

Upcoming/backup topics of discussion

  • Variational Auto-encoders summary and application (Iga, Andy, and Thomas)
  • Arnab - existing project
  • PyTorch vs. Keras

Comments