tsinghua & Bytes jointly proposed DA-Transformer It gets rid of the problem that traditional parallel models rely on knowledge distillation , It has greatly surpassed all previous parallel generation models in translation tasks , The highest increase 4.57 BLEU. At the same time, for the first time 、 Even beyond autoregression Transformer Performance of , At the highest level 0.6 BLEU At the same time , Can reduce 7 Times the decoding delay .






![[dynamic programming] - Knapsack model](/img/0d/c467e70457495f130ec217660cbea7.png)


![[unity entry program] make my first little game](/img/e7/5dcb113c7fabd73ed632fb29619369.png)

![[paper notes] next vit: next generation vision transformer for efficient deployment in real industry](/img/ea/56881999a90f9c65f5f8768f9574bd.png)