PDF analysis with peepdf

PDF analysis

In this post, I will show a basic use of peepdf to analyze a malicious PDF file.

peepdf is a great (python) tool available at "http://eternal-todo.com/tools/peepdf-pdf-analysis-tool". It is well documented and extendable.
An extension example is ParanoiDF available at "http://securityblackhawk.blogspot.fr/2014/08/paranoidf-analyse-for-pdf-files.html".

peepdf allows for PDF parsing, stream enconding/decoding, javascript analysis (with support for pyv8 and libemu for shellcode emulation) and may other features.
It can be interacted with using scripts or an intuitive interactive shell (using the "-i" option).

The analyzed sample is identified by its MD5 hash : "369614d7c422201f2d1605f4befd452d"

First, let's open the file with peepdf:

There is a lot of object but peepdf highlights those of interest ("suspicious elements" section). Each of these suspicious element has a role in the attack scenario.

ParanoiDF provides a nice feature to get a text rendering of the content of the PDF file. This file seems to have been used in some sort of fishing action and most likely present some form to the victim.

OpenAction object

An OpenAction object is "a value specifying a destination to be displayed or an action to be performed when the document is opened" (source http://partners.adobe.com/public/developper/en/pdf/PDFReference.pdf).

So when the document is opened, the object referenced in this OpenAction object will be called. peepdf allows to see which object will be called. Just use the command "object" with the OpenAction object id as a parameter.

The id of the object is 53, identified as containing Javascript code.

Javascript object

peepdf can provide information about the object using the "info" command. Among interesting information are the offset of the object within the file, its size and its MD5.

When viewing the content of the object, the javascript code is quite short and only contains one function call to "exportDataObject" with a first argument (cname) set to W2 and a second argument set to "0".

This function is used to extract the specified data object to an external file."cname" is the name of the data object to extract.
"nLaunch" controls whether the file is launched, or opened, after it is saved. A value of "0" means that the file to extract will not be launched after having been saved.

So we know that an embedded file will be extracted from the PDF and saved on disk. But there is no direct link with an object in the document.
However, looking back at the OpenAction object, we can see a reference to a "/Names" object which id is 49.

Names object

 This object contains 1 entry with a /EmbeddedFiles element (id 50). Such an element contains a name tree that maps name strings to embedded file streams.

Looking a object 50, we can find a reference to "W2" and a mapping to object 51.

As illustrated above, object 51 is a file specification for a file object embedded in object 52.
From the initial information provided by peepdf, object 52 is indeed an embedded file. This is just how it is called from the document opening. But what is this file?

Embedded file

peepdf shows that the file is supposed to be another pdf file (Subtype: /application/pdf) and that the stream is encoded using the Flate encoding filter. Note that when an object is displayed using the "object" command, peepdf applies the necessary filters to display the uncompressed version of the object.

The stream is too large to be displayed in the console, but peepdf allows for easy export using the ">" command.

Let's see what the embedded file is actually. Instead of being a PDF file it seems to be a PE file.

The W2.pdf file could be carved to extract the PE file but there is another way to to it with peepdf using raw stream manipulation and decoding.

The file is recognized as malicious by clamav.

Again, peepdf has some cool feature to help in this identification task and provides the vtcheck command that checks the PDF file against VirusTotal database use the public API of this site.

The vtcheck command can be used to check the analyzed PDF file itself (default behavior), part of it (either object, rawstream, range of bytes, ...) or a file.

vtcheck PDF file
vtcheck PE file

How is the PE file executed?

The PE got saved to the disk using the javascript exportDataObject function and it is saved under the name "W2.pdf" (remember object 51 above). But the exportDataObject was invoked with the nLaunch parameter set to 0, meaning that the file is not opened after having been saved.
So how is it ever executed?

There is a another object of interest in this PDF file which is a "Launch" object (object 54).

This object's type is /Action/Launch. Looking at the PDF reference document (available in the Adobe web site), a Launch action is used to "launch an application, usually to open a file".
Displaying the content of the object makes the execution method quite obvious: a Windows shell is launch with a script that search for the W2.pdf file in common places ("Desktop" or "My Documents") and, if found, executes it.

Again, how is this action launched?

The AA object

The Launch action is associated to a media box defined in object 3, which "kid" object 2 defines a "AA" dictionary. A "AA" dictionary defines the actions to taken in response to events affecting the document. Object 2 is one of the "suspicious elements" reported by peepdf after opening the file.
So the message "To view [...]" will certainly appear in a media box.

If I understood the PDF specification correctly the "/O" in the /AA entry means that the trigger for the Launch action is simply the document opening.


The attack relies a bit on social engineering because the user expected to:
1/ accept to save an embedded file when the PDF file opens and
2/ press "open" to view the so called "encrypted content" (from the saved embedded PDF file).

I tested the file under Windows using Sandboxie and Adobe Reader 9 to illustrate the attack from the user's point of view.

Open the PDF file

Ask to save W2.pdf
Ask for file execution
Note that as the XP I used was a French version, there is no "Desktop" directory so the execution failed.


This was a short introduction to peepdf and its features. The analyzed PDF file does not use any exploit and the attack strategy relies only on social engineering.

Late addition: this looks like a direct use of Metasploit's module "Adobe PDF Embedded Exe Social Engineering". This source code excerpt explains why it did not work on my French XP OS:

# check for the pdf in these dirs, in this order.. dirs = [ "Desktop", "My Documents", "Documents", "Escritorio", "Mis Documentos" ].


  1. How can one execute interactive console commands of peepdf from command line?


Post a Comment

Popular posts from this blog

Debugging dotNet malware with dnSpy

Anti-analysis tricks