Explore Collections

The best data from the web, all in one place

Check out our new MySpace data collection.


The Infochimps repository contains thousands of datasets. Many are unique but some are part of a larger collection. Some of the collections we're especially proud of are listed below.

Click one to explore datasets for that collection.

Collection Statistics

Collections
12
Datasets in collections
6970

The Comprehensive Knowledge Archive Network (CKAN) Collection

From their website:

CKAN is the Comprehensive Knowledge Archive Network, a registry of open knowledge packages and projects (and a few closed ones)…Those familiar with freshmeat, CPAN or PyPI can think of CKAN as providing an analogous service for open knowledge…CKAN is developed and maintained by the Open Knowledge Foundation. Both the CKAN code and data are open: free for anyone to use and reuse. To find out more check out the the CKAN project at "knowledgef …

376 [Datasets in this collection]

Wikipedia Infoboxes

From Wikipedia

An infobox on Wikipedia is a consistently formatted table which is present in articles with a common subject to provide summary information consistently between articles or improve navigation to closely related articles in that subject. (An infobox is a generalization of a taxobox (from taxonomy) which summarizes information for an organism or group of organisms.)

Wikipedia Infoboxes are the small tables that appear on the rig …

3382 [Datasets in this collection]

Statistical Abstract of the United States

From the US Census bureau

The Statistical Abstract of the United States, published since 1878, is the authoritative and comprehensive summary of statistics on the social, political, and economic organization of the United States.

Use the Abstract as a convenient volume for statistical reference, and as a guide to sources of more information both in print and on the Web.

Sources of data include the Census Bureau, Bureau of Labor Statistics, Bure …

1357 [Datasets in this collection]

Pete Skomoroch's Bookmarks

Pete Skomoroch is President and Lead Consultant at Data Wrangling in Arlington, VA, a firm which specializes in mining large datasets to solve problems in search, finance, and recommendation systems.

He maintains an ever-expanding (near 400 as of last count!) list of datasets which have now been incorporated into the Infochimps repository.

397 [Datasets in this collection]

Moby Project Word Lists

The Moby Project has assembled some of the world’s largest collections of word lists. Sixteen datasets containing common male and female first names, special words for crossword puzzles, and commonly misspelled words, and many other collections are stored in the Infochimps repository.

23 [Datasets in this collection]

AggData

AggData sells aggregated lists of data, culled from the websites of major companies like Starbucks, Ace Hardware, &c. Their lists are geolocated and have more information on each branch of each company.

5 [Datasets in this collection]

Twitter Census

A collection of various datasets about the online phenomenon Twitter.

4 [Datasets in this collection]

Data.gov

The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government.

734 [Datasets in this collection]

IP Address to US Census Data

A collection of datasets that link IP address geolocation data from MaxMind to the United States Census 2000 data.

78 [Datasets in this collection]

Geolocation

A collection of datasets concerning the names, locations, and other information about places in the world.

411 [Datasets in this collection]

Datamob

Datamob aims to show, in a very simple way, how public data sources are being used.

Their listings emphasize the connection between data posted by governments and public institutions and the interfaces people are building to explore that data.

195 [Datasets in this collection]

MySpace Real-Time Stream

This is a collection of data from MySpace’s real-time stream API. Bulk dumps, derived datasets, and utility datasets are available here. Developers and academics should find this data useful.

8 [Datasets in this collection]