« The Promise of Ajax and DOM: Simple Event-Driven Widget-Based Web Applications | Main | Race condition at my bank »

December 15, 2005

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83451754b69e200d83558ff5169e2

Listed below are links to weblogs that reference Copying Text from Encrypted PDF Files:

Comments

happykaka

convert pdf to other editable formats such as word form.

Om Perkash

Thanks for saving my time :)

pipitas

I've tried this "Text printer driver" trick in the past with quite a few different PDF files which didn't allow you to copy'n'paste text from their pages. The "didn't allow you" part is not always+necessarily caused because the author had *forbidden* it, but because the file contains an embedded font which uses a custom "encoding vector".

Note, 'encoding' in the context of PostScript or PDF fonts has a different meaning from encrypting. Encoding vectors for fonts basically are lists saying "glyph for 'a' is on position 1, glyph for 'adieresis' is on position 2,...".

How encoding vectors work is decribed in the public PostScript and PDF specifications. Adobe defined a few standard encoding vectors, and also how to create and use "custom" encoding vectors. Custom encoding vectors are in common use in many PDF files, and they have nothing to do with "encryption". They are a necessary evil, because due to computing's 8bit legacy, for non-Unicode fonts you by default only have room for 256 glyphs (character shapes).

You can check the details about your PDF's fonts by looking at the document properties dialog of Acrobat on the "fonts" tab.

Or use the "pdffonts.exe" commandline utility from the XPDF suite of utilities...

However, the "Text printer driver" trick does not work in these cases.

And it is pretty annoying that the Acrobat Reader "Save as text..." menu item doesn't work either if fonts use a custom encoding vector. Acrobat seems to have no problem with rendering the job to screen or for the printer, but it is utterly failing when trying to extract text...

Matt

I tried this, and I was able to print from the PDF, but the file I got had a lot of random symbols and split-up words, and was basically unusable. I tried another driver, the "Microsoft Office Document Image Writer" driver, and it worked. I got a file that looked like a screenshot, but one that I could copy text from into Word.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment