h d_model W_fc keys: detect patterns σ(W_fc h) d_ff = 4 × d_model W_proj values: produce output m d_model