Infochimps Data Tools

Infochimps is an active contributor to the open-source community. We have created several tools to help data mechanics do their jobs more elegantly and efficiently and to share the results of their labors with the rest of the Infochimps community.

Infinite Monkeywrench

A Ruby library for downloading, parsing, summarizing, and transforming between common data formats like CSV, XML, YAML, HTML, &c.

Wukong

A Ruby library which simplifies common map/reduce patterns and provides an elegant interface to Hadoop streaming.

MachetEC2

A build of Ubuntu Linux customized for data processing, analysis, and visualization which can be used as an AMI from which to create instances on Amazon's Elastic Compute Cloud (EC2).

Wuclan

A Ruby library which coordinates industrial strength web scraping tools and makes it easy to mine data from websites with APIs, especially social networking sites.