PDF Hacks

PDF Hacks

How to download the big files through Google App Engine UrlFetch API Call

I offer the UrlFetch function in my PDF Password Remover Online application, but I do not want to let it only manipulate no more 1M PDF, after some study, I got the solution, let UrlFetch API download no more 1M data each time, but repeat many times until all data downloaded, of course, there still a limit, 30 second request limit.
For details, please visit

How to Use Google App Engine UrlFetch API to download the files over 1M

December 25, 2009 Posted by | Tutorials | , , , | Leave a comment

how to Optimize and Reduce PDF File Size with the Help of Adobe Acrobat

I noticed How to use Adobe Acrobat to Optimize and Reduce PDF File Size lists two PDF version tutorials,
PDF version tutorial of Adobe Acrobat 6 solution to optimize and redue file size,
http://www.adobe.com/designcenter/acrobat/articles/acr6optimize/acr6optimize.pdf
PDF version tutorial of Adobe Acrobat 7 solution to optimize and redue file size,
http://www.adobe.com/designcenter/acrobat/articles/acr7optimize/acr7optimize.pdf
and I just wonder why they do not release the tutorials for Adobe Acrobat 8 and Adobe Acrobat 9.

October 30, 2009 Posted by | Hacks, Tutorials | , , , | Leave a comment

using pdfsizeopt to Optimize & Reduce PDF File Size

pdfsizeopt is open source project hosting on Google Code, the main feature is PDF file size optimizer. it bases on the following tools,

  • pdfsizeopt.py
  • Python
  • Ghostscript
  • Java
  • sam2p
  • jbig2
  • png22pnm
  • pngtopnm
  • Multivalent.jar
  • PNGOUT

pdfsizeopt is a collection of best practices and scripts for Unix to optimize the size of PDF files, with focus on PDFs created from TeX and LaTeX documents. pdfsizeopt is developed on a Linux system, and it depends on existing tools such as Python 2.4, Ghostscript 8.50, jbig2enc (optional), sam2p, pngtopnm, pngout (optional), and the Multivalent PDF compressor (optional) written in Java.

for details, please visit pdfsizeopt-a Free and Open Source PDF Manipulation Tool to Reduce PDF File Size

references,

pdfsizeopt home page
Convert JBIG2 to PDF with free and open source software agl’s jbig2enc
Windows version JBIG2 Encoder-Jbig2.exe

 

October 30, 2009 Posted by | Hacks, Linux, Open Source, Software, Tutorials, Windows | Leave a comment

Some PDF Tools developed in Python

When Search WordPress.com, I noticed a article  PDF Tools, it introduces some small PDF tools and all developed in Python.

pdf-parser.py

This tool will parse a PDF document to identify the fundamental elements used in the analyzed file. It will not render a PDF document. The code of the parser is quick-and-dirty, I’m not recommending this as text book case for PDF parsers, but it gets the job done.

You can see the parser in action in this screencast.

The stats option display statistics of the objects found in the PDF document. Use this to identify PDF documents with unusual/unexpected objects, or to classify PDF documents. For example, I generated statistics for 2 malicious PDF files, and although they were very different in content and size, the statistics were identical, proving that they used the same attack vector and shared the same origin.

The search option searches for a string in indirect objects (not inside the stream of indirect objects). The search is not case-sensitive, and is susceptible to the obfuscation techniques I documented (as I’ve yet to encounter these obfuscation techniques in the wild, I decided no to resort to canonicalization).

filter option applies the filter(s) to the stream. For the moment, only FlateDecode is supported (e.g. zlib decompression).

The raw option makes pdf-parser output raw data (e.g. not the printable Python representation).

objects outputs the data of the indirect object which ID was specified. This ID is not version dependent. If more than one object have the same ID (disregarding the version), all these objects will be outputted.

reference allows you to select all objects referencing the specified indirect object. This ID is not version dependent.

type alows you to select all objects of a given type. The type is a Name and as such is case-sensitive and must start with a slash-character (/).

Download:

pdf-parser_V0_3_1.zip (https)

MD5: 07CDA54844CD6567473CBF2B0DFC601C

SHA256: 7614AEC453502EEF43F9EA04A82092C4ACDD32AB86D1C4D744B7B590C74152EC

make-pdf tools
make-pdf-javascript.py allows one to create a simple PDF document with embedded JavaScript that will execute upon opening of the PDF document. It’s essentially glue-code for the mPDF.py module which contains a class with methods to create headers, indirect objects, stream objects, trailers and XREFs.

20081109-134003

If you execute it without options, it will generate a PDF document with JavaScript to display a message box (calling app.alert).

To provide your own JavaScript, use option –javascript for a script on the command line, or –javascriptfile for a script contained in a file.

Download:

make-pdf_V0_1_1.zip (https)

MD5: 9AF2E343B78553021C989E8E22355531

SHA256: C604679ABEB0469C1463159E02E74F12487B2755A6096B416A8F4F638DEB8AA9

pdfid.py
This tool is not a PDF parser, but it will scan a file to look for certain PDF keywords, allowing you to identify PDF documents that contain (for example) JavaScript or execute an action when opened. PDFiD will also handle name obfuscation.

The idea is to use this tool first to triage PDF documents, and then analyze the suspicious ones with my pdf-parser.

An important design criterium for this program is simplicity. Parsing a PDF document completely requires a very complex program, and hence it is bound to contain many (security) bugs. To avoid the risk of getting exploited, I decided to keep this program very simple (it is even simpler than pdf-parser.py).

20090330-214223

PDFiD will scan a PDF document for a given list of strings and count the occurrences (total and obfuscated) of each word:

  • obj
  • endobj
  • stream
  • endstream
  • xref
  • trailer
  • startxref
  • /Page
  • /Encrypt
  • /ObjStm
  • /JS
  • /JavaScript
  • /AA
  • /OpenAction
  • /JBIG2Decode

Almost every PDF documents will contain the first 7 words (obj through startxref), and to a lesser extent stream and endstream. I’ve found a couple of PDF documents without xref or trailer, but these are rare (BTW, this is not an indication of a malicious PDF document).

/Page gives an indication of the number of pages in the PDF document. Most malicious PDF document have only one page.

/Encrypt indicates that the PDF document has DRM or needs a password to be read.

/ObjStm counts the number of object streams. An object stream is a stream object that can contain other objects, and can therefor be used to obfuscate objects (by using different filters).

/JS and /JavaScript indicate that the PDF document contains JavaScript. Almost all malicious PDF documents that I’ve found in the wild contain JavaScript (to exploit a JavaScript vulnerability and/or to execute a heap spray). Of course, you can also find JavaScript in PDF documents without malicious intend.

/AA and /OpenAction indicate an automatic action to be performed when the page/document is viewed. All malicious PDF documents with JavaScript I’ve seen in the wild had an automatic action to launch the JavaScript without user interaction.

The combination of automatic action  and JavaScript makes a PDF document very suspicious.

/JBIG2Decode indicates if the PDF document uses JBIG2 compression. This is not necessarily and indication of a malicious PDF document, but requires further investigation.

A number that appears between parentheses after the counter represents the number of obfuscated occurrences. For example, /JBIG2Decode 1(1) tells you that the PDF document contains the name /JBIG2Decode and that it was obfuscated (using hexcodes, e.g. /JBIG#32Decode).

BTW, all the counters can be skewed if the PDF document is saved with incremental updates.

Because PDFiD is just a string scanner (supporting name obfuscation), it will also generate false positives. For example, a simple text file starting with %PDF-1.1 and containing words from the list will also be identified as a PDF document.

Download:

pdfid_v0_0_9.zip (https)

MD5: 1C731D6204C09AAFF219876A8FB5E834

SHA256: 24A9B16E67A84E85488A16879CB611128B2E5921044E48EFB60D784BD785CBD0

October 19, 2009 Posted by | Linux, Open Source, Software, Tutorials, Windows | , , , | Leave a comment

How to Freely Convert PDF online

Pdf Portable Document Format – A proprietary format for the transfer of designs across multiple computer platforms. Pdf is a universal electronic file format, modeled after the Postscript language and is device- and resolution – independent. Documents in the pdf format can be viewed, navigated, and printed from any computer regardless of the fonts or software used to create the original.

As now, almost all of us use PDF file, sometimes it’s easier to have a PDF converter online, so we don’t need to install any program. Just connect to the net and search for PDF utility for free. Here’s some of them.

  1. PDFTextOnline – Extract text from PDF and makes these text copy-able.
  2. ShowPDF – A PDF-to-HTML converter.
  3. FreePDFConvert – Convert MS Office, Images, Web Pages, Vector Graphic Formats files to PDF orConvert PDF to Word (doc) or Excel (xls) document, extract Images from PDF.
    PDFConverter
  4. Document Converter eXPress – Convert files to PDF or Image without the need of installingspecial software.
    neevia_technology
  5. Web2PDF Online – A free HTML to PDF Conversion service for your websites that allows your visitors to quickly save useful information in your blogs and websites to PDF files.
    PDF_Online
  6. Lettos – DOC to ODT & PDF, ODT to PDF a DOC, PDF to TXT
  7. PDFIt – A Firefox extension that allows you to convert any page into a PDF through a online service provided by Touchpdf.com.
  8. htm2PDF – A service to convert your webpages to PDF
  9. Zamzar – A free online file conversion that able to convert PDF to many document formats.
    zamzar
  10. PDFOnline – Convert documents to PDF, PDF service for iPhone, web to PDF and PDF to word.
  11. KoolWire – Just send your documents to pdf@koolwire.com, then you will receive a converted file in PDF format.
    koolwire
  12. HTML2PDF.BIZ – Web Service & API that converts your Website into PDF.
  13. ExpressPDF – ExpressPDF is an online service that lets you convert your Microsoft Office documents to PDF.
  14. Online PDF Converter – You can convert you PDF file to text or image (JPEG, PNG, GIF, TIFF) absolutely free.
  15. PDFText – Converts your PDF (Acrobat) file to plain text.
  16. RSS2PDF – Free Online RSS, Atom or OPML to PDF Generator.
  17. PDF-o-matic – A simple PHP script that uses HTMLDOC to convert the web page of your choice.
  18. LOOP – A free web-based service that allows you to convert and combine files to PDF.
  19. FeedJournal – Convert RSS and Atom feeds into PDF newspaper.
  20. BookletCreator – A free online tool that allows to create a booklet from a PDF document.

–from 20 PDF Online Tool Converter for FREE

October 19, 2009 Posted by | Hacks, Software, Tutorials | , , | 2 Comments

Some Adobe Acrobat Tutorials and Videos

the Acrobat User Community is the perfect way to learn more about the latest features, meet other users, and share ideas with other members and Acrobat experts. Our goal is to provide the type of educational resources and user-to-user support that appeal to Acrobat users of all levels and professional backgrounds.

Learn how to work within your PDF documents to implement simple changes—without having to edit the original source file—with our ‘how to’ tutorials and videos.

Tutorials

Videos

October 19, 2009 Posted by | Tutorials | , , , , | Leave a comment

A Easy and Free Way to download Books From Google

Google Book Downloader is small utility(developed in .NET) which allows you to save book as PDF from google to your local filesystem and with many features,

  • Download any book from Google Books marked as ‘Full view’
  • Partially download any book from Google Books marked as ‘Limited preview’
  • Access to any book available only for US citizens (instructions)
  • Searching for hidden pages (not indexed by Google Books)

The Google Book Downloader application allows users to enter a book’s ISBN number or Google link to pull up the desired book and begin a download, fishing off with exporting the file to a PDF.

References,

October 19, 2009 Posted by | Books, Hacks, Microsoft, Open Source, Software, Tutorials, Windows | , , , , , | 1 Comment

Easy Way to Extract RPM with P7zip Under Linux

Red Hat Package Manager, abbreviated RPM, RPM is some sort of cpio archive.

P7ZIP is a port of 7za.exe for POSIX systems like Unix (Linux, Solaris, OpenBSD, FreeBSD, Cygwin, AIX, …), MacOS X and BeOS,it supports many formats:

  • Packing / unpacking: 7z, ZIP, GZIP, BZIP2 and TAR
  • Unpacking only: RAR, CAB, ISO, ARJ, LZH, CHM, MSI, WIM, Z, CPIO, RPM, DEB and NSIS

so if you want to extract a RPM file, such as myrpm.rpm,you can do in this way

7z myrpm.rpm
7z myrpm.cpio

btw, under windows, you can use 7-zip to do the same job.

reference,

How to Unzip or Extract RPM under Linux

PZIP

7-ZIP

August 19, 2009 Posted by | Hacks, Linux, Open Source, Software, Tutorials | , , , , , , | Leave a comment

Using Ruby Java Bindings on Dreamhost and Fill PDF Form with iText

I am familiar with iText , but not familiar with Ruby, I know Dreamhost supports Ruby on Rails(ROR), but never have a chance to run a real application, though I have a Dreamhost space.

Getting rjb, also known as “Ruby Java Bindings’ to work in a Dreamhost account can be somewhat problematic. Fortunately, it is also fairly straightforward. You just have to install all the dependencies in the user directory.

In my case, I was setting up a Rails Application that used the iText Java library to fill in PDF documents for user download. Of course, the server environment was not using Sun Java and the Java headers were not present, so gem install rjb failed. Joy.

The first course of action was to do a local install of Java.

Download jdk-6u7-linux-x64.bin and jre-6u7-linux-x64.bin from the Sun Java site. Then create an ~/opt directory and extract the JRE and JDK (you will need to chmod u+x both files then call them from the command line). Move the resulting folders to ~/opt . I renamed the folders to jdk and jre for simplicity.

Now you ensure that user gems are enabled in cPanel. Then add the following 3 lines to ~/.bash_profile

export JAVA_HOME=/home/username/opt/jdk
export GEM_PATH=/home/username/ruby/gems
export GEM_HOME=/home/username/ruby/gems

Run source ~/.bash_profile to load the paths.

Now you can run gem install rjb without any problems. You will likely have to re-install Rails and other gems because we will be telling our Rails app to load gems from the user directory. Just use the regular gem install syntax.

Add the following 2 lines to your config/environment.rb at the top

ENV[‘GEM_PATH’]=’/home/username/ruby/gems’
ENV[‘JAVA_HOME’]=’/home/username/opt/jdk’

That’s pretty much it.

I will concede that these probably aren’t the best instructions, but this is the real meat of the solution. If you keep getting “no such file to load” errors, you will need to extract the gems to the vendor/plugins directory. cd RAILS_ROOT/vendor/plugins and gem unpack gem_name for each problematic gem. I believe this is an issue with Passenger.

Please correct me if I am wrong about any of this, it was a very long day!

source: http://blog.patrick-morgan.net/2008/10/using-ruby-java-bindings-on-dreamhost.html

August 18, 2009 Posted by | Hacks, Linux, Tutorials | , , , , , , , , , | 1 Comment

Open and edit PDF files in OpenOffice

btw,RubyPDF has already introduce this features on August 5, 2008, for details, please visit OpenOffice 3.0 Enhances PDF Export and Releases PDF Import Extension

I get this a lot “How can I open and edit PDF files without having to purchase a costly application like QarkXPress”? Before OpenOffice extensions came about my answer to that would be “Not easily”. But now, thanks to a very useful tool created by Sun, editing a PDF file is as simple as adding an extension to OpenOffice and then opening that PDF for editing.

The Sun PDF Import Extension offers numerous features and, like all OpenOffice extensions, is simple to install. Let’s take a look at what it offers, how it is installed, and how it is used.

Supports

  • The Sun extension includes the following features:
  • Edit font attributs
  • Retain font appearance
  • Converts images and vector graphics
  • Import of password-protected PDF files
  • Import shapes with default styles
  • Colors and bitmap support
  • Levels remain true

Does not support

  • Native PDF forms
  • Proper paragraphs
  • LaTeX PDF
  • Complex vector graphics
  • Table conversion
  • EPS graphics
  • RTL fonts

What should be apparent from the above lists is that the Sun PDF import extension supports primarily the more simple PDF documents. Once a document gets complex, this extension may or may not work.

For details, please visit,Open and edit PDF files in OpenOffice.

August 16, 2009 Posted by | Hacks, Tutorials | , , | Leave a comment