The first step in connecting the world's data is to make it discoverable — it's no good if you can't find it.The most basic level of the infochimps commons is a catalog of datasets on the web.
Drag this link: Infochimp! to your bookmarks folder, and click it when you spot a ripe data banana.
You'll have to be a registered user to use it, so please take a second and sign on up while you're here.
Infochimps will host and distribute any dataset under an open license, at no cost to anyone.
Infochimps is an active contributor to the open-source community. We have created several tools to help data mechanics do their jobs more elegantly and efficiently and to share the results of their labors with the rest of the Infochimps community.
A Ruby library for downloading, parsing, summarizing, and transforming between common data formats like CSV, XML, YAML, HTML, &c.
A Ruby library which simplifies common map/reduce patterns and provides an elegant interface to Hadoop streaming.
A build of Ubuntu Linux customized for data processing, analysis, and visualization which can be used as an AMI from which to create instances on Amazon's Elastic Compute Cloud (EC2).
A Ruby library which coordinates industrial strength web scraping tools and makes it easy to mine data from websites with APIs, especially social networking sites.