Category Archives: Technology

Diff Tools, Redux

Some time ago, I mentioned here how I like to use colordiff. Well, on a Mac I actually prefer Apple’s FileMerge GUI diff tool, which is is part of the Xcode command line tools, and accessible from the command line as opendiff.)

I recently discovered diffy, which is similar to colordiff, but offers a -split option that gives you the side-by-side effect of opendiff.

Even more interesting to me, however, is diffr, which is a “diff postprocessor” that shows the differences within a specific line. You run diff -u ... | diffr to see what you want.

Thousands of words worth of pictures:

plain old diff
colordiff (aliased to diff because duh)
diffy -s split
diff -u … | diffr
FileMerge, invoked from the command line

Update: I appear to have been wrong. Evidently, opendiff is part of Xcode proper, not the command line tools. Which means I won’t be using it in general, because it’s freaking huge.

File renaming tools

Long ago, I wrote a utility (brename) that renames a set of files based on a supplied pattern. (Imagine you had an arbitrary set of JPEGs and you wanted to pretend they all came from a digital camera with names like IMG_0001, IMG_0002, etc. – that’s my favorite use case for brename. It’s really more of a re-numbering than a renaming tool.)

I also have a tool I call pmv (an alias for Larry Wall’s perl rename tool (version 3.0.1.2 from August of 1990, which was something like this “fork” of version 4.2 I found here)). I use pmv when what I want to do is more complicated than brename will permit. (Interestingly, the version of perl rename I use will force filename case changes on Macs, which like to pretend that ABC.txt and abc.txt aren’t different names, while the newer version won’t.)

But I recently stumbled across also mmv. It’s like the perl rename tool but with error checking beforehand. The downside is that you can’t (easily?) limit the application of your pattern to some set of files. It’s like coming up with a rename expression s/before/after/ and applying it to *. (Not only that, but from reading the man page leads me to think it’s over-eager to apply that pattern not just to * but to **.)

And what about renameutils? I have something like its qmv. The idea is you print a list of filenames and bring it into the editor for a human to fix there. (Way back in the 80’s I used an awk command to do this; it was something like this:

$ ls -1 *.c | awk '{printf("mv %20s %s\n",$1,$1);}' > list ; $EDITOR list

The only problem with qmv is that the “plan” you create isn’t saved. Typically, the files I want to rename (especially when there’s more than a few, when qmv should shine) are backed up somewhere else, and I’d prefer to apply the same plan to the backup folder, rather than copying the (same) files there and then deleting the originals.

pipx

I’m not a python person. But some interesting CLI utilities are written in python, including eyeD3 and visidata.

Enter pipx – it creates little (huge?) pip sandboxes for your different utilities, so you don’t have to worry about one breaking if you need to upgrade the other.

Installation isn’t hard if you’re comfortable using pip, but I’m not a python person, so I used port install pipx.

Duff – duplicate file finder

I’ve got this folder called vast/todo/t.temp that’s got a 100 GB of stuff from old computers in it. Typically, I just copy stuff there and tell myself I’ll get back to it. There are 61,287 files, none less than a year old, and (as of now) only 5 of those 60-thousand files are less than 2 years old.

How will I ever “get back to” making sense of all that junk? Enter duff – the CLI duplicate file finder. Just say:

$ duff *.txt

and it tells you something like this:

2 files in cluster 1 (19925 bytes, digest 8b5cc01edd340e91957b54f10c22d6d3283b7962)
ccc.txt
zzz.txt

Then you decide whether you want to nuke ‘ccc.txt’ or ‘zzz.txt’. Bob’s wife’s your aunt.

Installation is just port install duff

Extracting Text from Word DOCX files using Pandoc

Back in the day, I would use antiword to extract the text from a Word .DOC file. But it only understands DOC. Over the years, more and more Word files have been using the “open” (ha ha) DOCX format, which antiword doesn’t read. So I found Pandoc, which does much, much more.

$ pandoc -i some.docx -t plain > some.txt

There’s also a convert-to-Markdown option:

$ pandoc -i some.docx -t markdown > some.md

I find, however, that the Markdown produced by docx2md is more to my liking. It’s less cluttered, as it doesn’t aim at fidelity to the Word document’s formatting to the same degree as pandoc, but only the basics.

To install pandoc you can use the Macports version, but lately, I’ve found it easier simply to install the official binary Mac OSX PKG.

Macports Cheatsheet

I used to use homebrew, but before that I used MacPorts. (And long, long, before that, fink.) The past year or two I’ve come back to MacPorts. But I forget what the commands are. (Honestly, I get them confused with apt, but that’s a separate problem.)

The usual thing to do is to search and then install:

  • port search whatever
  • port info whatever
  • port variants whatever
  • port install whatever +somevariant

The other thing is to update the stuff you’ve already installed

  • port -d selfupdate
  • port list outdated
  • port upgrade outdated

And sometimes get rid of the old stuff

  • port list inactive
  • port uninstall inactive

Update: I left out two important commands:

  • port list installed # what got installed
  • port list leaves # what you installed

iTunes Misbehavior (Part 934)

I still use iTunes sync my iPhone periodically to my computer (mainly so I can retrieve photos off the phone). Every time I do, I get to watch this:

It copies several hundred unchanged songs back onto my phone. This is apparently a bug. A known bug, known for years, that Apple just won’t fix because it’s Apple being Apple. Screw them.

(This bug is perhaps unrelated to a separate problem — honestly, a bug — wherein someone decided it would be a good idea to change the file’s “modification” time every time a song was played. This is an idea so stupid I simply cannot imagine how anyone thought it was clever.)

Useful gems, 2020 edition

Since the gem ecosystem keeps changing, and since I don’t write new programs very often, here’s a list of my favorite gems for developing command-line interface tools.

Option parsing gem: slop. (Since micro-optparse looks moribund; see here.) But (looking at programs I’ve written) I also seem to like trollop, a/k/a optimist. But I also like the fine-grained control of OptionParser.

Debugging output (not the same as logging): pastel

Invoking system functions gem: tty-command. (See also tty-config and tty-file.) But sadly, tty-command (or how I’ve used it) gets me warnings that bellyache about the 2.7 named argument splat problem.

Wrapper for ImageMagick: About a decade ago, I couldn’t get RMagick (rubymagick?) to compile and I’ve never gotten around to checking back. For awhile I used %x<convert ...> or whatever, but now, if I’m working with images, I’ve sometimes found mini-magick helpful.

Proper Capitalization of Text Strings That Are Titles: titleize.

Parsing Biblical references (e.g., Romans 8:39 and Genesis 12:1-4): pericope.

Plus Kramdown and HAML and SASS (which is no longer written in Ruby).