PDF Hacks

PDF Hacks

RubyPDF Release pdf2htmlEX Windows Version

pdf2htmlEX  is an open source tool that can easily convert PDF to HTML without losing text or format, the source code has released for a long time, but still no windows port,  now, rubypdf.com gives us a chance to use this tool under windows, win32 static version, only one exe and some necessary resource files.

for details, please visit,

pdf2htmlEX Windows Verion

pdf2htmlEX v0.9 Windows Verion Release


btw, rubypf.com also releases a windows version mktemp,  a little tool that safe temporary file creation from shell scripts





August 20, 2013 Posted by | PDF News, Software | , , , , , , | Leave a comment

Pdfgrep–freely search PDF with a grep like software

pdfgrep – search pdf files for a regular expression, it works similar to grep.

pdfgrep is an open source project developed by Hans-Peter Deifel, before RubyPDF blog released Pdfgrep Windows version, we can only find Linux and Mac version.

pdfgrep works much like grep, with one distinction: It operates on pages and not on lines.  


Ignore case distinctions in both the PATTERN and the input files.
Print the file name for each match. This is the default setting when there is more than one file to search.
Suppress the prefixing of file name on output. This is the default setting when there is only one file to search.
Prefix each match with the number of the page where it was found.
Suppress normal output. Instead print the number of matches for each input file. Note that unlike grep, multiple matches on the same page will be counted individually.
-C–context NUM
Print at most NUM characters of context around each match. The exact number will vary, because pdfgrep tries to respect word boundaries. If NUM is “line“, the whole line will be printed. If this option is not set, pdfgrep tries to print lines that are not longer than the terminal width.
–color WHEN
Surround file names, page numbers and matched text with escape sequences to display them in color on the terminal. (The default setting is auto).

WHEN can be:

Always use colors, even when stdout is not a terminal.
Do not use colors.
Use colors only when stdout is a terminal.
Recursively search all files (restricted by –include and –exclude) under each directory.
Skip files whose base name matches GLOB. See glob(7) for wildcards you can use. You can use this option multiple times to exclude more patterns. It takes precedence over –include. Note, that in- and excludes apply only to files found via –recursive and not to the argument list.
Only search files whose base name matches GLOB. See –exclude for details. The default is *.pdf.
Remove accents and ligatures from both the search pattern and the PDF documents. This is useful if you want to search for a word containing ‘ae’, but the PDF uses the single character ‘æ’ instead. See unac(3) and unaccent(1) for details.

[This option is experimental and only available if pdfgrep is compiled with unac support.]

Suppress all normal output to stdout. Errors will be printed and the exit codes will be returned (see below).
Print a short summary of the options.
Show version information






September 4, 2012 Posted by | Uncategorized | , , , | Leave a comment

diffpdf-Free Cross Platform Software to compare PDF

DiffPDF can compare two PDF files. It offers two comparison modes: Text and Appearance.

By default the comparison is of the text on each pair of pages, but comparing the appearance of pages is also supported (for example, if a diagram is changed or if a paragraph is reformatted). It is also possible to compare particular pages or page ranges. For example, if there are two versions of a PDF file, one with pages 1-12 and the other with pages 1-13 because of an extra page having been added as page 4, they can be compared by specifying two page ranges, 1-12 for the first and 1-3, 5-13 for the second. This will make DiffPDF compare pages in the pairs (1, 1), (2, 2), (3, 3), (4, 5), (5, 6), and so on, to (12, 13).



Free software to Compare the appearance difference of two PDF

diffpdf windows 32 version download address

November 19, 2010 Posted by | Open Source, Software, Windows | , , | Leave a comment

Google Docs support OCR for PDF and Images

This feature only works for the following languages: English, French, Italian, German and Spanish. “For the technically curious: we’re using Optical Character Recognition (OCR) that our friends from Google Books helped us set up. OCR works best with high-resolution images, and not all formatting may be preserved.”, Google Docs Blog says.

for details, please visit Google Docs add OCR support to PDF and Images.

July 16, 2010 Posted by | PDF News | , | 1 Comment

Freely Rotate PDF Page Online-Google App Engine Application

Rotate PDF Page Online(PdfRotate)

RubyPDF release the 3rd Google App Engine Application, PDFRotate Online, wit it, you can freely rotate PDF page online, the rotate angles support 90, 180 and 270 degrees.

Rotate PDF Page Online(PdfRotate)

If you want offline version, please check pdfrotate.

January 12, 2010 Posted by | Uncategorized | , , , , , , , , | Leave a comment

Free Divide PDF Page Online-Another Google App Engine Application

Today, RubyPDF released another Google App Engine Application, Freely Divide PDF Page Online, also bases on iText.

the main feature is Split a PDF page to two half size Pdf Page, for example, Split a A3 Page to two A4 pages.

btw, RubyPDF also released desktop version before.

January 6, 2010 Posted by | Uncategorized | , , , , , , , | Leave a comment

How to download the big files through Google App Engine UrlFetch API Call

I offer the UrlFetch function in my PDF Password Remover Online application, but I do not want to let it only manipulate no more 1M PDF, after some study, I got the solution, let UrlFetch API download no more 1M data each time, but repeat many times until all data downloaded, of course, there still a limit, 30 second request limit.
For details, please visit

How to Use Google App Engine UrlFetch API to download the files over 1M

December 25, 2009 Posted by | Tutorials | , , , | Leave a comment

First PDF Password Remover application hosted on Google App Engine

RubyPDF Software released the First PDF Password Remover application hosted on Google App Engine, bases on iText(version 2.1.7, but with many modification). with it, you can easily remove the user password or owner password online, and it is free.

  • remove restrictions on any secured PDF document (you should have the right to do it, for example, if you forgot the password). Any Acrobat version up to 9 is supported, even with 128-bit AES or 128-bit RC4 encryption. PDF restrictions removal is an instant process. Unlocked file can be opened in any PDF viewer without any restrictions so you may edit/copy/print it.
  • remove the PDF open password. Decryption of the file with password for opening is guaranteed for PDF files Any Acrobat version up to 9 is supported, even with 128-bit AES or 128-bit RC4 encryption,but you must know the password first.

For details, please visit RubyPDF PDF Password Remover Online.

December 23, 2009 Posted by | PDF News, Software | , , , , , | Leave a comment

how to Optimize and Reduce PDF File Size with the Help of Adobe Acrobat

I noticed How to use Adobe Acrobat to Optimize and Reduce PDF File Size lists two PDF version tutorials,
PDF version tutorial of Adobe Acrobat 6 solution to optimize and redue file size,
PDF version tutorial of Adobe Acrobat 7 solution to optimize and redue file size,
and I just wonder why they do not release the tutorials for Adobe Acrobat 8 and Adobe Acrobat 9.

October 30, 2009 Posted by | Hacks, Tutorials | , , , | Leave a comment

using pdfsizeopt to Optimize & Reduce PDF File Size

pdfsizeopt is open source project hosting on Google Code, the main feature is PDF file size optimizer. it bases on the following tools,

  • pdfsizeopt.py
  • Python
  • Ghostscript
  • Java
  • sam2p
  • jbig2
  • png22pnm
  • pngtopnm
  • Multivalent.jar

pdfsizeopt is a collection of best practices and scripts for Unix to optimize the size of PDF files, with focus on PDFs created from TeX and LaTeX documents. pdfsizeopt is developed on a Linux system, and it depends on existing tools such as Python 2.4, Ghostscript 8.50, jbig2enc (optional), sam2p, pngtopnm, pngout (optional), and the Multivalent PDF compressor (optional) written in Java.

for details, please visit pdfsizeopt-a Free and Open Source PDF Manipulation Tool to Reduce PDF File Size


pdfsizeopt home page
Convert JBIG2 to PDF with free and open source software agl’s jbig2enc
Windows version JBIG2 Encoder-Jbig2.exe


October 30, 2009 Posted by | Hacks, Linux, Open Source, Software, Tutorials, Windows | Leave a comment