I’ve been hoarding data for more than 20 years. For backups, I used to burn a CD periodically, but I long since ran over those limits. Today, my backups are hard drives. One reason is that I’ve moved between computers several times during that period, and when I do, I find stuff I don’t know what to do with. So I copy all that data into a new folder, typically called something like temp/backup/that-system-name/tmp/old/save/keep/t.files/save.d.
After 20 years, that starts to add up. So I’ve been looking at programs to help me find and get rid of duplicates. (I’ve been using rsync -n, and occasionally diff -qr, to compare folders. But the problem is deciding what folders, at what places in the directory structure, to compare.)
So far, I’ve focused on dupd. It does what I was thinking needed to be done: crawl the entire hierarchy and save the result as a database.