ARR-addon-results&codebase-nr0034je9

Hu8kKo34

Model	SST-2 (Acc)	MNLI (Acc)	QNLI (Acc)	CoLA (Matthews)	Avg Score
BERT-Base	91.2	84.6	90.1	58.2	81.0
RoBERTa-Base	92.3	87.4	91.8	63.1	83.7
GPT-3 (175B)	94.1	88.9	93.0	66.4	85.6
Our Method	94.8	89.7	93.5	68.9	86.7

Configuration	Attention Mechanism	Pretraining Corpus	MNLI (Acc)
Full Model	Multi-head Self-Attn	Custom + Public	89.7
– w/o Custom Corpus	Multi-head Self-Attn	Public Only	87.1
– w/o Attention Refinement Block	Basic Self-Attn	Custom + Public	86.5
– w/o Positional Embeddings	Multi-head Self-Attn	Custom + Public	85.2
– Random Initialization	—	—	72.4

CSPaper: peer review sidekick