We believe that large, publicly available voice datasets will foster innovation and healthy commercial competition in machine-learning based speech technology.
Source: tatoeba (8310556)
Ranked by relevance and common usage.
OpenGloss and ConceptNet supply richer edges like generalizations, collocations, and derivations.
3 total sentences available.
We believe that large, publicly available voice datasets will foster innovation and healthy commercial competition in machine-learning based speech technology.
Source: tatoeba (8310556)
Look to this page as a reference hub for other open source voice datasets and, as Common Voice continues to grow, a home for our release updates.
Source: tatoeba (8310561)
Health researchers increasingly seek to use climate and weather datasets in their studies, but often face challenges in accessing, wrangling, understanding, and applying climate data stored in complex and unfamiliar spatial data formats.
Source: tatoeba (12005940)
Data sourced from Wiktionary, WordNet, CMU, and other open linguistic databases. Updated March 2026.