Interesting Papers - May 2021
June 30, 2021
- Hits close to home.
- We Koreans love our fried chicken.
- Wish we had Chick-fil-A though.
- Ken Thompson’s Turing award acceptance speech.
- Presents a way to plant a Trojan in the complier.
- You build a faulty compiler A, and with it compile another compiler B. Distribute B.
- An interesting and fairly short read. Highly recommended!
- Short answer: No.
- Encoder memorizes tokens toward the start and the end. This is measured by distribution of reconstruction loss along sequence length.
- In other words, seq2seq VAEs tend to learn local features.
- Most VAE LMs in literature report low KL values in their latent structure. We don’t yet know if the reported values are acceptable or not.
- Replacing LSTM with bag of words helps VAE LMs learn global features and decrease first word memorization.
- Eliminating posterior collapse in VAEs is not sufficient. LMs must learn global features!
NLP-OSS Workshop, ACL 2020
- A survey on Korean NLP datasets for a variety of tasks (32 corpora).
- Document will be updated on
arxiv as more datasets become available.
- Another rarely seen Transformer VAE.
- Follows the same latent injection scheme as OPTIMUS.
- SOTA in two controlled story generation datasets:
- Compares perplexity, ROUGE-1, ROGUE-2, and ROGUE-L.
- Yet again, we want to prevent posterior collapse in text VAEs.
- KL thresholding combined with encoder pretraining (training with AE objective and resetting the decoder) was the most effective in language modeling.
- However, encoder pretraining also exhibits posterior collapse if without KL annealing.
- Better ELBO does not necessarily mean better latent representation quality, as measured by mutual information or active units.
- Normally, we pass the last hidden state of LSTM encoders to the decoder in text VAEs.
- This results in a dull latent space.
- If we mean-pool or max-pool from all encoder hidden states, we can mitigate posterior collapse.
- Somewhat reminiscent of Ray Mooney’s famous quote.