By Seongmin Park in nlp — Jun 30, 2021

Interesting Papers - May 2021

Deep dive into chickens.

Chicken Chicken Chicken: Chicken Chicken

Ken Thompson's Turing award acceptance speech.
Presents a way to plant a Trojan in the complier.
You build a faulty compiler A, and with it compile another compiler B. Distribute B.
An interesting and fairly short read. Highly recommended!

Short answer: No.
Encoder memorizes tokens toward the start and the end. This is measured by distribution of reconstruction loss along sequence length.
In other words, seq2seq VAEs tend to learn local features.
Most VAE LMs in literature report low KL values in their latent structure. We don't yet know if the reported values are acceptable or not.
Replacing LSTM with bag of words helps VAE LMs learn global features and decrease first word memorization.
Eliminating posterior collapse in VAEs is not sufficient. LMs must learn global features!

Another rarely seen Transformer VAE.
Follows the same latent injection scheme as OPTIMUS.
SOTA in two controlled story generation datasets: WritingPrompts and WikiPlots.
Compares perplexity, ROUGE-1, ROGUE-2, and ROGUE-L.

Yet again, we want to prevent posterior collapse in text VAEs.
KL thresholding combined with encoder pretraining (training with AE objective and resetting the decoder) was the most effective in language modeling.
However, encoder pretraining also exhibits posterior collapse if without KL annealing.
Better ELBO does not necessarily mean better latent representation quality, as measured by mutual information or active units.

Normally, we pass the last hidden state of LSTM encoders to the decoder in text VAEs.
This results in a dull latent space.
If we mean-pool or max-pool from all encoder hidden states, we can mitigate posterior collapse.
Somewhat reminiscent of Ray Mooney's famous quote.