Tag: word

  • Extracting Text from Word DOCX files using Pandoc

    Back in the day, I would use antiword to extract the text from a Word .DOC file. But it only understands DOC. Over the years, more and more Word files have been using the “open” (ha ha) DOCX format, which antiword doesn’t read. So I found Pandoc, which does much, much more. $ pandoc -i […]

  • Pandoc for Word Document conversion

    I just discovered pandoc. Well, I first bookmarked it in 2008, and again in 2016, so I guess I rediscovered it. But what I mean is that I finally discovered what to use it for: converting Word files to Markdown. It’s dead easy: $ pandoc -f docx -t markdown sample.docx > sample.md I’ve been using […]