Making a MIRACL: Multiligual Information Retrieval Across a Continuum of Languages Paper: https://arxiv.org/abs/2210.09984

MIRACL (Multilingual Information Retrieval Across a Continuum of Languages) is a multilingual dataset we have built for the WSDM 2023 Cup challenge that focuses on ad hoc retrieval across 18 different languages, which collectively encompass over three billion native speakers around the world.1

Footnotes

  1. Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages