Category: Linguistics » Word Lists (24 datasets)

Not finding the datasets you're looking for? Not all of our datasets are categorized yet. Try checking out tags instead.

Moby Project Word Lists | Added by Infochimps

6,213 acronyms (acronyms.txt) common acronyms & abbreviations

Linguistics » Word Lists

Moby Project Word Lists | Added by Infochimps

Over 354,000 single words, excluding proper names, acronyms, or compound words and phrases. This list does not exclude archaic words or significant variant spellings.

Linguistics » Word Lists

Moby Project Word Lists | Added by Infochimps

Over 354,000 single words, excluding proper names, acronyms, or compound words and phrases. This list does not exclude archaic words or significant variant spellings.

Linguistics » Word Lists

Moby Project Word Lists | Added by Infochimps

74,550 common dictionary words — A list of words in common with two or more published dictionaries. This gives the developer of a custom spelling checker a good beginning pool of relatively common words.

Linguistics » Word Lists

Moby Project Word Lists | Added by Infochimps

1,185 King James Version frequent substrings (KJVfreq.txt) The most frequently occurring 1,185 substrings in the King James Version Bible ranked and counted by order of frequency.

Linguistics » Word Lists

Moby Project Word Lists | Added by Infochimps

This file consists of the 1,000 most frequently used English words from a wide variety of common texts listed in decreasing order of frequency

Linguistics » Word Lists

Moby Project Word Lists | Added by Infochimps

21,986 names (names.txt) This database contains the most common names used in the United States and Great Britain. Spelling checkers may want to supplement their basic word list with this one.
Linguistics » Word Lists

Moby Project Word Lists | Added by Infochimps

4,946 female names (names-f.txt) Frequent given names of females in English speaking countries. Spelling checkers may want to supplement their basic word list with this one.

Linguistics » Word Lists

Moby Project Word Lists | Added by Infochimps

3,800 male names Frequent given names of male in English speaking countries. Spelling checkers may want to supplement their basic word list with this one.

Linguistics » Word Lists

Moby Project Word Lists | Added by Infochimps

Over 256,700 hyphenated or other entries containing more than one word as well as all capitalized words and acronyms. Phrases were considered ‘common’ if they or variations of them occur in standard dictionaries or thesauruses.

Linguistics » Word Lists

Moby Project Word Lists | Added by Infochimps

366 often misspelled words (oftenmis.txt) many of the most commonly misspelled words in English speaking countries

Linguistics » Word Lists

Wordnet *****

Free

A large lexical database of English | The Comprehensive Knowledge Archive Network (CKAN) Collection | Added by Infochimps

“WordNet® is a large lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical …

Linguistics » Word Lists

Moby Project Word Lists | Added by Infochimps

113,809 official crosswords A list of words permitted in crossword games such as Scrabble™. Compatible with the first edition of the Official Scrabble Players Dictionary™. Since this list has all forms: -ing, -ed, -s, and so on of words, it makes a good addition when building a custom spell …

Linguistics » Word Lists

Moby Project Word Lists | Added by Infochimps

Over 354,000 single words, excluding proper names, acronyms, or compound words and phrases. This list does not exclude archaic words or significant variant spellings.

Linguistics » Word Lists

Moby Project Word Lists | Added by Infochimps

Over 354,000 single words, excluding proper names, acronyms, or compound words and phrases. This list does not exclude archaic words or significant variant spellings.

Linguistics » Word Lists

Moby Project Word Lists | Added by Infochimps

74,550 common dictionary words — A list of words in common with two or more published dictionaries. This gives the developer of a custom spelling checker a good beginning pool of relatively common words.

Linguistics » Word Lists

Moby Project Word Lists | Added by Infochimps

113,809 official crosswords A list of words permitted in crossword games such as Scrabble™. Compatible with the first edition of the Official Scrabble Players Dictionary™. Since this list has all forms: -ing, -ed, -s, and so on of words, it makes a good addition when building a custom spell …

Linguistics » Word Lists

Moby Project Word Lists | Added by Infochimps

113,809 official crosswords A list of words permitted in crossword games such as Scrabble™. Compatible with the first edition of the Official Scrabble Players Dictionary™. Since this list has all forms: -ing, -ed, -s, and so on of words, it makes a good addition when building a custom spell …

Linguistics » Word Lists

Moby Project Word Lists | Added by Infochimps

This file consists of the 1,000 most frequently used English words from a wide variety of common texts listed in decreasing order of frequency

Linguistics » Word Lists

Added by doncarlo

This is all the text from every Dinosaur Comic ever made in convenient XML format. It was released by the author, Ryan North, as a tool to help solve an anagram presented in the comic for March 1, 2010. The text was also sort …

Computers » Internet | Linguistics » Word Lists | Linguistics » Text Corpora | Linguistics » Transcript Corpora

Added by mrflip

List of summonable objects from the Nintendo DS game Scribblenauts, from AARDVARK, ABOMINABLE SNOWMAN and ABSCONDER to ZOMBIE, ZUNICERATOPS and ZYGOTE.

via the Scribblenauts Wikipedia entry:

Scribbl …

Computers | Linguistics » Word Lists

Added by dhruv

An extremely comprehensive list of English stopwords, culled from multiple sources.

Includes

  • prepositions: “of”, “to”, “in”, “for”, “with”, &c.
  • articles: “the”, “a”, “an”, &c.
  • pronouns: “she”, “me”, “it”, “our”, “them”, “these”, “who”, &c
  • contractions: “it’d”, “she’s”, &c.
  • comm …
Linguistics » Word Lists

Added by dhruv

A list of bad words.

Linguistics » Word Lists