You are here: Home » i feel stupid » LaTeX. Word. Not.

LaTeX. Word. Not.

by David M. Doolin, PhD on December 20, 2006

One of the most difficult tasks to perform on a computer: extracting a reasonable facsimile of a LaTeX document into an MS Word file. I am two hours into this. So far:

  • Latex2html won’t install. configure can’t find DBM.
  • Used cpan to update DBM
  • Updated cpan.
  • Exited cpan to install zlib by hand.
  • re-updated cpan.
  • Latex2html still can’t find DBM, using wrong version of perl. At this point, I am looking at a undetermined amount of time to investigate a problem I have no interest in at all, i.e., dealing with perl.
  • Google for convertors, lot’s of PDF to Word convertors. All of them pretty shitty. “Free” demos allow conversion of 3-5 pages. Best one of the bunch I tried required me to MANUALLY ENTER THE PAGE RANGE! (Or select the text using the mouse… either way). Makes me want to scream. The best one of these extracts the text from the images, embeds the text in the Word file as overlays to the underlying image. These people must be idiots. Idiot savants. They program and program with no conception of the underlying meaning of what they are doing.
  • Tried to TeX4t. Home page is 404. Google returns a long list of identical posts.
  • HyperLatex: Sounds good… whoops, uses it’s own subset of TeX. The install documentation is several pages long. Forget it.
  • Tried TtH. Not too bad. Won’t convert images. Tried a ps2png script. Really, really crappy results. TtH is distributed as executable, which means figuring out how to install it on my system such that it can find all it’s little pieces. Need to look for source, see if it will install in /usr/local.
  • convert doesn’t work on my system. Thanks cygwin, thanks for breaking one of the more important tools in my kit. I don’t need it very often, but when I do, it’s really critical.
  • Direct export using Adobe Acrobat to either html or ms word produces unbelievably crappy output. Crappier than any of the crappy 3rd party tools. This has to be deliberate on Adobe’s part. Crappier than simply cutting and pasting a single page at a time.

Current solution

  1. Use TtH to convert text to HTML.
  2. Import HTML into MS Word.
  3. Copy figures from pdf document to word by hand.
  • Check this out: can’t pull embedded eps images out of pdf file. This would be a cut-n-paste with a jpeg.
  • Not only does convert not work, but fig2dev is crashing on a ghostscript problem. I have used gs tool for *years* on both Unix and Windows without any problems. Since fig2dev won’t run, I can’t export from xfig either. Seriously, I have been using this toolchain since 1994, why is it failing now? Runs from the command line just fine.
  • On to the next bad idea. Let’s install the Windows versions of Ghostscript and Ghostview directly. Never mind this is going bite me in the rear later with path problems (which gs, exactly?), it’s Christmas and I just want to finish this stuff. But wait! The gs web is 404! Hrm… ok, they moved it to sourceforge, and neglected to update the web page since 2002. Nice one!
  • Ok, gs and “ghostgum” are installed. Load up the .eps file… export to png. Nice, really nice: the png export doesn’t respect the bounding box of the eps file. It converts the eps file into a full page png image. Another dead end.
Share and Enjoy:
  • Digg
  • StumbleUpon
  • Sphinn
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • TwitThis

{ 2 comments… read them below or add one }

Jean-Philippe Daigle February 22, 2008 at 4:16 pm

Nice rant, I empathize fully. The real question is: what does getting the content into MSWord accomplish?

Reply

doolin February 22, 2008 at 5:05 pm

Fulfills contract so I can invoice!

That client is really superb, and allows me a fair bit of latitude, so I don’t (usually) mind. It’s not like I have to maintain ALL my documents in Word.

My current solution is using 37Signals.com writeboards, export to HTML, import that to Word. Very convenient. I leave all the latex down in the source or other internal documents.

You might like this:
http://cims.nyu.edu/~dbindel/code/dsbweb.c

Reply

Leave a Comment

CommentLuv Enabled