Listing 6 datasets tagged with "huge"

Retrosheet: Game Logs (play-by-play) for Major League Baseball Games *****

A record of major league games played from 1871-2008 | Added by Infochimps 7 months ago

The game logs contain a record of major league games played from 1871-2008. At a minimum, it provides a listing of the date and score of each game. Where our research is more complete, we include information such as team statistics, winning and losing pitchers, linesc …

Sports » Baseball

Freebase Data Dump *****

Added by Infochimps 12 months ago

A data dump of all the current facts and assertions in the Freebase system.
Freebase is an open database of the worlds information, covering millions of topics in hundreds of categories. Drawing from large open data sets like Wikipedia, …

Encyclopedic » Encyclopedias

Freebase.com Wikipedia Extraction (WEX) *****

Added by Infochimps 12 months ago

The Freebase Wikipedia Extraction (WEX) is a processed dump of the English language Wikipedia. The wiki markup for each article is transformed into machine-readable XML, and common relational features such as templates, infoboxes, categories, article sections, and r …

Encyclopedic » Encyclopedias

DBPedia Main *****

Added by Infochimps 12 months ago

DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. The DBpedia knowledge base currently describes more than 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 mu …

Encyclopedic » Encyclopedias

Global Daily Weather Data from the National Climate Data Center (NCDC) *****

Federal Climate Complex GSOD (Global Surface Summary of Day) version 7 | Added by Infochimps 7 months ago

The GSOD (Global Daily) Data
The GSOD dataset is from National Climate Data Center, and downloadable at ftp://ftp.ncdc.noaa.gov/pub/data/gsod/
You can fetch your own copy with
wget -r -l3 —no-c …

Science » Meteorology

Twitter Census :: Conversation Metrics - One year of URLs, Hashtags, Smileys usage by hour *****

Occurrence counts of tweet tokens: hashtags, URLs, & smileys by hour or month | Twitter Census | Added by MonkeywrenchConsultancy 3 months ago

This data comes from a scrape of the Twitter social network conducted by the Monkeywrench Consultancy. The full scrape consists of 35 million users, 500 million tweets, and 1 billion relationships between users.
This dataset is a corpus of tokens collected from tw …

Computers » Social Networks

Austin Daily Weather (extracted from National Climate Data Center (NCDC) Data) ***

Federal Climate Complex GSOD (Global Surface Summary of Day) version 7 | Added by Infochimps 7 months ago

About
This is an extract from the “Global Daily Weather Data from the National Climate Data Center (NCDC)” dataset for just austin.
h4. Graphs
!h …

Science » Meteorology

FreeBase **

The Comprehensive Knowledge Archive Network (CKAN) Collection | Added by Infochimps 9 months ago

  1. Description
    “Freebase is an open database of the world’s information. It is built by the community and for the community—free for anyone to query, contribute to, built applications on top of, or integrate into their websites.”
  2. Openness: OPEN
  • L …

  • The Open Library **

    The Comprehensive Knowledge Archive Network (CKAN) Collection | Added by Infochimps 9 months ago

    1. About
      > One web page for every book ever published. It’s a lofty, but achievable, goal.
      > To build it, we need hundreds of millions of book records, a brand new database infrastructure for handling huge amounts of dynamic information, a wiki interface, multi- …