PDF encryption is sort of silly. If you really wanted to grab the text, you can screen capture each page and then OCR it. So clearly the true intention of the encryption is to deter the 99% of the users who wouldn't go to such lengths to try to copy text.
Recently, I received a PDF of a text document that I needed as plain text (it was a database DDL commands of a system I was analyzing). For some reason the author had encryption turned on so that I couldn't copy and paste the text, and I was a bit impatient and didn't want to wait for the author to send me the original text file. However, this PDF still allowed printing.
Since I had Acrobat, I tried to print the file into a PDF to remove the encryption. Nope. Acrobat is smart enough to keep you from doing that. But then I installed the Windows "Generic Text" printer driver and set it to print to a file. By printing my "encrypted" PDF to the Generic Text printer, the text of the whole document was nicely saved for me. After removing the page breaks and the margin I had my original database text document.
The punchline is: If you don't want users to easily copy text from your encrypted PDF files, you not only need to turn off the text copy capability, but also the print capability. Why? Because it can easily be foiled using the Windows Generic Text printer driver.
convert pdf to other editable formats such as word form.
Posted by: happykaka | April 21, 2009 at 04:26
Thanks for saving my time :)
Posted by: Om Perkash | June 19, 2010 at 23:27
I've tried this "Text printer driver" trick in the past with quite a few different PDF files which didn't allow you to copy'n'paste text from their pages. The "didn't allow you" part is not always+necessarily caused because the author had *forbidden* it, but because the file contains an embedded font which uses a custom "encoding vector".
Note, 'encoding' in the context of PostScript or PDF fonts has a different meaning from encrypting. Encoding vectors for fonts basically are lists saying "glyph for 'a' is on position 1, glyph for 'adieresis' is on position 2,...".
How encoding vectors work is decribed in the public PostScript and PDF specifications. Adobe defined a few standard encoding vectors, and also how to create and use "custom" encoding vectors. Custom encoding vectors are in common use in many PDF files, and they have nothing to do with "encryption". They are a necessary evil, because due to computing's 8bit legacy, for non-Unicode fonts you by default only have room for 256 glyphs (character shapes).
You can check the details about your PDF's fonts by looking at the document properties dialog of Acrobat on the "fonts" tab.
Or use the "pdffonts.exe" commandline utility from the XPDF suite of utilities...
However, the "Text printer driver" trick does not work in these cases.
And it is pretty annoying that the Acrobat Reader "Save as text..." menu item doesn't work either if fonts use a custom encoding vector. Acrobat seems to have no problem with rendering the job to screen or for the printer, but it is utterly failing when trying to extract text...
Posted by: pipitas | September 13, 2010 at 08:06
I tried this, and I was able to print from the PDF, but the file I got had a lot of random symbols and split-up words, and was basically unusable. I tried another driver, the "Microsoft Office Document Image Writer" driver, and it worked. I got a file that looked like a screenshot, but one that I could copy text from into Word.
Posted by: Matt | February 03, 2011 at 17:17