Listing 34 datasets tagged with "text"

Linguistic Data Consortium (LDC) - Collection of Linguistic Corpora and Datasets *****

Pete Skomoroch's Bookmarks | Added by Infochimps 11 months ago

The Linguistic Data Consortium is an open consortium of universities, companies and government research laboratories. It creates, collects and distributes speech and text databases, lexicons, and other resources for research and development purposes. The University of Pennsylvania is the LDC’s hos …

Linguistics

Enron Email Dataset *****

0.5M email messages among managers at Enron Corporation | The Comprehensive Knowledge Archive Network (CKAN) Collection | Added by Infochimps 10 months ago

From the CALO Project at Carnegie-Mellon University a massive dataset of emails recovered from discovery documents in the Enron trials

  1. About

From distribution page:

> This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that …

Computers » Social Networks

Text Messages sent on 9/11/2001 (wikileaks.org) *****

500k+ pager (sms) messages sent on September 11, 2001, published on Wikileaks.org | Added by mrflip 4 months ago

9/11 tragedy pager intercepts.

The following are more than half a million national US pager intercepts released by wikileaks.org. This covers the September 11 tragedy from 3am on the same day (Tuesday) until 3am the following day, a 24 hour period surrounding the attacks in New York and Washing …

Social Sciences » Sociology

Article Search API - NYTimes.com *****

Pete Skomoroch's Bookmarks | Added by Infochimps 11 months ago


TREC-9 Filtering Track Collections - MEDLINE Extract with Relevance Measures *****

Large text corpus, useful for qualifying Text Retrieval algorithms | Added by Infochimps 8 months ago

This README file describes all the data files associated with the
OHSUMED document collection as it was used for the TREC-9
Filtering Track. Please see “The TREC-9 Filtering Track Final
Report” by Stephen Robertson and David A. Hull in the TREC-9
proceedings for a description of the tasks per …

Medicine

Tweets during State of the Union address **

Audience reaction | Added by acordova00 about 1 month ago

A capture of all tweets from Twitter’s sample feed during the 2010 state of the union address. Tweets are in JSON format. The feed is described here: http://apiwiki.twitter.com/Streaming-API-Documentation#statuses/sample.

Computers » Internet | Computers » Social Networks | Politics and Law » Political science

PMC FTP Service **

Pete Skomoroch's Bookmarks | Added by Infochimps 11 months ago


Enron Email Dataset **

Pete Skomoroch's Bookmarks | Added by Infochimps 11 months ago


Enron Dataset **

Pete Skomoroch's Bookmarks | Added by Infochimps 11 months ago


Data for Data Mining **

Pete Skomoroch's Bookmarks | Added by Infochimps 11 months ago


phishingcorpus [JoseWiki] **

Pete Skomoroch's Bookmarks | Added by Infochimps 11 months ago


Summize Twitter Search API **

Pete Skomoroch's Bookmarks | Added by Infochimps 11 months ago

Computers » Internet

Courts.gov **

Pete Skomoroch's Bookmarks | Added by Infochimps 11 months ago


Email Datasets **

Pete Skomoroch's Bookmarks | Added by Infochimps 11 months ago


DMOZ100k06 - Michael G. Noll **

Pete Skomoroch's Bookmarks | Added by Infochimps 11 months ago


Text Analytics Solutions from ClearForest **

Pete Skomoroch's Bookmarks | Added by Infochimps 11 months ago