Share Data

Catalog the World's data

The first step in connecting the world's data is to make it discoverable — it's no good if you can't find it.The most basic level of the infochimps commons is a catalog of datasets on the web.

Here's how:

  1. Use the New Dataset form or (even easier) the Infochimp! bookmarklet to enter the url for a dataset, along with its title, tags and a brief description.
  2. That's it! Now you've made it easy for anyone in the world to find it.

Try our bookmarklet!

Drag this link: Infochimp! to your bookmarks folder, and click it when you spot a ripe data banana.

You'll have to be a registered user to use it, so please take a second and sign on up while you're here.

Upload Data

Infochimps will host and distribute any dataset under an open license, at no cost to anyone.

Here's how:

  1. Sign up: it only takes a second. (or login if you've already joined!)
  2. Briefly describe the dataset using the new dataset form.
  3. Upload the data &mdash once it's been saved, click the 'upload data' button on the dataset's page to attach your dataset. It can be a single file, or a package (.zip, .tar.gz, .tar.bz2). We'll bundle it up for distribution to everyone.
  4. Do you have huge files or a collection of many datasets? No problem! Just get in touchwe will make it easy for you to transfer and bulk load the data.

Infochimps Data Tools

Infochimps is an active contributor to the open-source community. We have created several tools to help data mechanics do their jobs more elegantly and efficiently and to share the results of their labors with the rest of the Infochimps community.

Infinite Monkeywrench

A Ruby library for downloading, parsing, summarizing, and transforming between common data formats like CSV, XML, YAML, HTML, &c.

Wukong

A Ruby library which simplifies common map/reduce patterns and provides an elegant interface to Hadoop streaming.

MachetEC2

A build of Ubuntu Linux customized for data processing, analysis, and visualization which can be used as an AMI from which to create instances on Amazon's Elastic Compute Cloud (EC2).

Wuclan

A Ruby library which coordinates industrial strength web scraping tools and makes it easy to mine data from websites with APIs, especially social networking sites.