#PDFPENPRO FLATTEN PDF#
I have checked Apple’s old documentation, now archived, even the sole book about Quartz 2D and PDF in macOS, which is now out of print but available electronically in Apple’s Book Store. This strips out orphaned content very effectively: if an object doesn’t get drawn into the fresh image of the PDF, then it doesn’t survive into the data returned by PDFDocument.dataRepresentation(). When macOS opens the file and turns it into a PDFDocument, it effectively re-images the whole PDF, and what you access in PDFDocument.dataRepresentation() is the PDF of that new document image. The explanation is that PDFDocument.dataRepresentation() doesn’t show the raw data in the original PDF file at all. I thought that my file system was broken, and was opening two quite different files. I couldn’t understand why opening those same PDF files in BBEdit showed that the orphaned objects were still present, but in Podofyllin they were not only gone, but the whole file appeared different. To check that this worked properly, I opened some of my test files, which included some containing orphaned objects, only for those objects to vanish from the PDF source. Checking the minimalist documentation provided by Apple, I thought that using the latter might reduce memory usage, so was calling that and converting the data into ASCII format to display in the view. One is to take the data exactly as read from the file, the other using an instance method of the PDFDocument class, dataRepresentation(). There are two ways of accessing the raw data in a PDF document. So odd that I thought my file system had broken.
#PDFPENPRO FLATTEN CODE#
I have been adding features to Podofyllin to help the user check for orphaned content, and a couple of days ago had modified its source code to display the PDF file in ASCII format in the app’s Source window, when I discovered something very odd. I showed how even Adobe Acrobat (Pro) DC’s structure browser is based on what is listed by the document Catalog, and can readily miss orphaned data left in a previously edited file. The greatest difficulty facing even the expert user is that of checking whether a PDF document has been sanitized, or still has sensitive information remaining in it.
Two techniques are available to sanitize PDF documents: as with most document formats, using the Save As command usually forces the app to write all the data out afresh, and more specifically for PDF, some apps (including both PDF Expert and PDFpenPro) offer commands to write ‘flattened’ versions of the document. I’ve been drawing attention to the danger of hidden and orphaned content being left in PDF documents, something which frequently catches people out when they release those documents, only for someone to discover embarrassing secrets left within the files.