C++ considers ::*, .* and ->* each to be a single token and a single operator. Some pre-Release 2.0 implementations mistokenize expressions involving pointer-to-pointer-to-member.
Source: wiktionary
Ranked by relevance and common usage.
OpenGloss and ConceptNet supply richer edges like generalizations, collocations, and derivations.
4 total sentences available.
C++ considers ::*, .* and ->* each to be a single token and a single operator. Some pre-Release 2.0 implementations mistokenize expressions involving pointer-to-pointer-to-member.
Source: wiktionary
These were mostly proper names, such as Ronny Johnsen, or foreign language items such as ambre solaire (French) and fairie queene (Middle English), as well as a few misspelt or mistokenized items.
Source: wiktionary
The sentences were tokenized into words using the regex tokenizer which avoided the problems of mistokenizing while using the default NLTK tokenizer.
Source: wiktionary
Similarly, all other two-character atomic representations in SMILES are being mistokenized.
Source: wiktionary
Data sourced from Wiktionary, WordNet, CMU, and other open linguistic databases. Updated March 2026.