{"id":663,"date":"2010-11-13T05:00:34","date_gmt":"2010-11-13T13:00:34","guid":{"rendered":"https:\/\/accretiondisc.com\/blog\/?p=663"},"modified":"2010-11-13T06:27:53","modified_gmt":"2010-11-13T14:27:53","slug":"google-refine","status":"publish","type":"post","link":"https:\/\/accretiondisc.com\/blog\/2010\/11\/13\/google-refine\/","title":{"rendered":"Google Refine"},"content":{"rendered":"<p>A while ago, Google bought the company that made <a href=\"http:\/\/www.pcmag.com\/article2\/0,2817,2372550,00.asp\">Freebase<\/a>, a tool for making sense of messy data. Earlier this week, they released a 2.0 version of that software, now renamed <a href=\"http:\/\/code.google.com\/p\/google-refine\/\">Google Refine<\/a>. Watch the <a href=\"http:\/\/www.youtube.com\/watch?v=yNccGtn3Wb0\">videos<\/a> to see what that does.<\/p>\n<p>This looks pretty darned impressive. For great chunks of my career, I&#8217;ve been doing work like that the hard way. In the 1980s, I started my career by doing data reduction in <a href=\"http:\/\/en.wikipedia.org\/wiki\/Fortran\">Fortran<\/a>, but quickly graduated to <a href=\"http:\/\/www.grymoire.com\/Unix\/Sed.html\">sed<\/a> and <a href=\"http:\/\/www.vectorsite.net\/tsawk.html\">awk<\/a>, and in the 2000s I used <a href=\"http:\/\/www.perl.org\/\">perl<\/a> and <a href=\"http:\/\/www.ruby-lang.org\/en\/\">ruby<\/a>. Of course, when I say &#8220;the hard way,&#8221; that is in hindsight. Each of those was an improvement over what I used before, and this looks like it could be a similar type of improvement.<\/p>\n<p>(I still do some of that kind of work even now. It&#8217;s been a couple of years, but I probably spent at least a week, spread across too many evenings and weekends, massaging the church directory from a text format <a href=\"http:\/\/en.wikipedia.org\/wiki\/DOC_(computing)\">Word document<\/a> into tabular <a href=\"http:\/\/en.wikipedia.org\/wiki\/Comma-separated_values\">spreadsheet data<\/a>.)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A while ago, Google bought the company that made Freebase, a tool for making sense of messy data. Earlier this week, they released a 2.0 version of that software, now renamed Google Refine. Watch the videos to see what that does. This looks pretty darned impressive. For great chunks of my career, I&#8217;ve been doing [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[23,52],"tags":[201,199,203,197,204,198,200,202,205],"class_list":["post-663","post","type-post","status-publish","format-standard","hentry","category-life","category-technology","tag-analysis","tag-awk","tag-datamining","tag-google","tag-perl","tag-refine","tag-ruby-2","tag-screenscraping","tag-sed"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/paRqpr-aH","_links":{"self":[{"href":"https:\/\/accretiondisc.com\/blog\/wp-json\/wp\/v2\/posts\/663","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/accretiondisc.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/accretiondisc.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/accretiondisc.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/accretiondisc.com\/blog\/wp-json\/wp\/v2\/comments?post=663"}],"version-history":[{"count":0,"href":"https:\/\/accretiondisc.com\/blog\/wp-json\/wp\/v2\/posts\/663\/revisions"}],"wp:attachment":[{"href":"https:\/\/accretiondisc.com\/blog\/wp-json\/wp\/v2\/media?parent=663"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/accretiondisc.com\/blog\/wp-json\/wp\/v2\/categories?post=663"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/accretiondisc.com\/blog\/wp-json\/wp\/v2\/tags?post=663"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}