Today we're joined by Stephen Merity, startup founder and independent researcher, with a focus on NLP and Deep Learning.
Late last month, Stephen released his latest paper, Single Headed Attention RNN: Stop Thinking With Your Head, which we break down extensively in this conversation. Stephen details his primary motivations behind writing the paper; the fact that NLP research has been recently dominated by the use of transformer models, and the fact that these models are not the most accessible/trainable for broad use. We discuss the architecture of transformers models, and how he came to the decision of using SHA-RNNs for his research, how he built and trained the model, his approach to benchmarking, and finally his goals for the research in the broader research community.