Tag: antiword

  • Extracting Text from Word DOCX files using Pandoc

    Back in the day, I would use antiword to extract the text from a Word .DOC file. But it only understands DOC. Over the years, more and more Word files have been using the “open” (ha ha) DOCX format, which antiword doesn’t read. So I found Pandoc, which does much, much more. $ pandoc -i…