The Annotated Transformer

Resource | v1 | created by janarez 4 years ago |

Type Interactive

Created 2018-04-03

Available at nlp.seas.harvard.edu/2018/04/03/attention.html

Identifier unavailable

Description

The Transformer from “Attention is All You Need” has been on a lot of people’s minds over the last year. Besides producing major improvements in translation quality, it provides a new architecture for many other NLP tasks. The paper itself is very clearly written, but the conventional wisdom has been that it is quite difficult to implement correctly. In this post I present an “annotated” version of the paper in the form of a line-by-line implementation. I have reordered and deleted some sections from the original paper and added comments throughout. This document itself is a working notebook, and should be a completely usable implementation. In total there are 400 lines of library code which can process 27,000 tokens per second on 4 GPUs.

Relations

discusses Transformer (machine learning model)

A transformer is a deep learning model that adopts the mechanism of attention, differentially weighin...

Edit details Edit relations Attach new author Attach new topic Attach new resource

8.0 /10

useless alright awesome

from 1 review

Write comment Rate resource Tip: Rating is anonymous unless you also write a comment.

Resource level 6.0 /10: beginner intermediate advanced
Resource clarity 4.0 /10: hardly clear sometimes unclear perfectly clear
Reviewer's background 5.0 /10: none basics intermediate advanced expert

Comments 0

Currently, there aren't any comments.

The Annotated Transformer

Relations

from 1 review

Comments 0

Site

Explore

Register

Legal