Tag Archives: screenscraping

Google Refine

A while ago, Google bought the company that made Freebase, a tool for making sense of messy data. Earlier this week, they released a 2.0 version of that software, now renamed Google Refine. Watch the videos to see what that does.

This looks pretty darned impressive. For great chunks of my career, I’ve been doing work like that the hard way. In the 1980s, I started my career by doing data reduction in Fortran, but quickly graduated to sed and awk, and in the 2000s I used perl and ruby. Of course, when I say “the hard way,” that is in hindsight. Each of those was an improvement over what I used before, and this looks like it could be a similar type of improvement.

(I still do some of that kind of work even now. It’s been a couple of years, but I probably spent at least a week, spread across too many evenings and weekends, massaging the church directory from a text format Word document into tabular spreadsheet data.)