Surpassing transformer, Tsinghua, byte significantly refresh parallel text generation SOTA performance

Surpassing transformer, Tsinghua, byte significantly refresh parallel text generation SOTA performance ｜ ICML 2022

2022-07-25 07:58:00 【Zhiyuan community】

tsinghua & Bytes jointly proposed DA-Transformer It gets rid of the problem that traditional parallel models rely on knowledge distillation , It has greatly surpassed all previous parallel generation models in translation tasks , The highest increase 4.57 BLEU. At the same time, for the first time 、 Even beyond autoregression Transformer Performance of , At the highest level 0.6 BLEU At the same time , Can reduce 7 Times the decoding delay .