Data & Models
* See our GitHub for an updated list.
ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic [link]
MegaCov: A billion-scale dataset for COVID-19 in 100+ languages [link]
DiaLex: A benchmark for evaluating dialectal Arabic word embeddings: [link]
NADI 2020 and 2021 shared task data (Arabic dialects): [link]
Arabic micro-dialects & models: [link]
Arabic manipulated and fake news data & models: [link]
English machine-generated data: [link]
Yoruba machine translation data: [link]
Arabic emotion detection data: [link]
Arabic dialect id benchmarks: [link]
Arabic Twitter word embeddings models (based on word2vec): [link]