infochimps.org is a community to assemble and interconnect a giant free almanac, with tables on everything you can put in a table—things like a century of hourly weather, every major league baseball game, decades of stock prices, or every US patent filing. Built by data nerds and used by data nerds to house the information you need to power the projects the world needs.
Exploring rich data is fun, but finding it, formatting it, tagging it with metadata is drudge work barely fit for a trained chimp. And if you want to share a large raw dataset online, you face two troubling prospects: a) that no one will find it, or b) that everyone will find it.
A central, community-driven repository solves these problems, and also presents amazing possibilities. Interconnect the datasets along concepts they share: instead of 100,000 datasets, there’s just one. Study the physics of baseball by comparing the hourly weather during every single baseball game to game outcomes. Uncover political campaign irregularities by comparing neighborhood per-capita income, historical voter trends, and public campaign finance records. Plan real-estate decisions based on what news-and-other-media keywords rank highly in each area. If you’ve read Freakonomics, you know the power of this approach—let’s start building tools that make this way of thinking available to everychimp.
Yes, but it’s often trapped behind large bureaucratic and monetary barriers. We’re talking 100- to 10,000-times markup (PriceOfFreeData) over the raw bandwidth charge for freely redistributable data gathered at taxpayer expense. Not to mention the hassles with formatting, and converting, and finding, and sharing, and …
Yes—freebase.com is, and so are swivel.com, numbrary.com, CKAN.net, dbpedia.com, and a bunch of others, and all in their own way much, much better than this site. There’s a community of us hanging out at theinfo.org, and we’re all working together, because this job is way too big to be solved by any one group or any finite number of monkeys.
The virtues of infochimps.org lie in its suckiness:The other important feature of infochimps.org is its essential poverty. We’re a community effort beholden to no one, and everything we produce is and will remain free. Only the cooperation of the community (this means you, chimpy) can ensure its success—if you have resources or talent to provide, please Contact us.
Sharing a large raw dataset presents two troubling prospects: a) no-one will find it, or b) everyone will find it.
Infochimps.org lets you makes interesting data available to all. You get the credit, you get the bandwidth off your server, and the world gets a little bit smarter. Instead of hundreds of thousands of datasets scattered all over the web, there should be Just One Dataset, with open formats, interlocking fields, and a finite number of infochimps helping to organize and distribute it.
Painfully; here’s how (HOWTO Upload)
You can’t edit a dataset online, yet—but if you usefully convert or reformat a dataset, or add information to the dataset’s Infochimps Simple Schema file (the one ending in .icss.yaml), then for now just re-upload it.
If it’s broadly interesting, we want to host it. Unless it’s interesting and 20GB large, in which case we want to point to it.
Now “Broadly Interesting” has a certain restricted meaning considering what we’re talking about, but if you browse the existing collection you’ll get a sense of it.
Here’s a useful rule of thumb: would a motivated geeky person from a different field or region of interest find this useful?
A table giving the best known values for all the physical constants and fundamental particles is highly desirable; a petabyte of raw sensor values from the LHCb beam at the CERN supercollider is outside our scope. Weekly water consumption for each major metropolitan area is interesting, but a three-year table showing how much your water bill was each month is not. (Of course, if it were something awesomely obsessive like ’’everything you did, saw and spent money on for a year’’ then it’s interesting again.)
The broad goal is to build a repository of data that helps you discover, share and download raw data sets that are:
Understand that if it’s interesting we’ll take it how it stands. If other people agree it’s interesting, they’ll enrich it or make it more useful and share it back. All that’s really needed to host a dataset is it’s title; a brief description; and most importantly a pointers to who gathered the data and the source that distributed it.
Check the What works / What doesn’t list; there’s a lot of stuff that needs a-fixin’, as the site is still very new.
If some part of the site is obviously and horribly broken, please post a report or Contact us directly.
In order to fuel rapid growth and encourage the farflung tribe of infochimps, we’re outsourcing across the whole simian family. Reportedly many humans are contributing to the site as well.