swim-ir
PublicSWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 languages, generated using PaLM 2 and summarize-then-ask prompting.
cross-lingualdatasetsdeep-learninginformation-retrievalmachine-learningmultilingualnatural-language-processingneural-information-retrievalnlptraining-data
Creat:2023-11-06T19:26:39
Update:2025-03-18T13:56:25
https://arxiv.org/abs/2311.05800
49
Stars
0
Stars Increase