pdf2htmlEX is an open source tool that can easily convert PDF to HTML without losing text or format, the source code has released for a long time, but still no windows port, now, rubypdf.com gives us a chance to use this tool under windows, win32 static version, only one exe and some necessary resource files.
for details, please visit,
pdfgrep – search pdf files for a regular expression, it works similar to grep.
- -i, –ignore-case
- Ignore case distinctions in both the PATTERN and the input files.
- -H, –with-filename
- Print the file name for each match. This is the default setting when there is more than one file to search.
- -h, –no-filename
- Suppress the prefixing of file name on output. This is the default setting when there is only one file to search.
- -n, –page-number
- Prefix each match with the number of the page where it was found.
- -c, –count
- Suppress normal output. Instead print the number of matches for each input file. Note that unlike grep, multiple matches on the same page will be counted individually.
- -C, –context NUM
- Print at most NUM characters of context around each match. The exact number will vary, because pdfgrep tries to respect word boundaries. If NUM is “line“, the whole line will be printed. If this option is not set, pdfgrep tries to print lines that are not longer than the terminal width.
- –color WHEN
- Surround file names, page numbers and matched text with escape sequences to display them in color on the terminal. (The default setting is auto).
WHEN can be:
- Always use colors, even when stdout is not a terminal.
- Do not use colors.
- Use colors only when stdout is a terminal.
- -R, -r, –recursive
- Recursively search all files (restricted by –include and –exclude) under each directory.
- Skip files whose base name matches GLOB. See glob(7) for wildcards you can use. You can use this option multiple times to exclude more patterns. It takes precedence over –include. Note, that in- and excludes apply only to files found via –recursive and not to the argument list.
- Only search files whose base name matches GLOB. See –exclude for details. The default is *.pdf.
- Remove accents and ligatures from both the search pattern and the PDF documents. This is useful if you want to search for a word containing ‘ae’, but the PDF uses the single character ‘æ’ instead. See unac(3) and unaccent(1) for details.
[This option is experimental and only available if pdfgrep is compiled with unac support.]
- -q, –quiet
- Suppress all normal output to stdout. Errors will be printed and the exit codes will be returned (see below).
- Print a short summary of the options.
- -V, –version
- Show version information
DiffPDF can compare two PDF files. It offers two comparison modes: Text and Appearance.
By default the comparison is of the text on each pair of pages, but comparing the appearance of pages is also supported (for example, if a diagram is changed or if a paragraph is reformatted). It is also possible to compare particular pages or page ranges. For example, if there are two versions of a PDF file, one with pages 1-12 and the other with pages 1-13 because of an extra page having been added as page 4, they can be compared by specifying two page ranges, 1-12 for the first and 1-3, 5-13 for the second. This will make DiffPDF compare pages in the pairs (1, 1), (2, 2), (3, 3), (4, 5), (5, 6), and so on, to (12, 13).
This feature only works for the following languages: English, French, Italian, German and Spanish. “For the technically curious: we’re using Optical Character Recognition (OCR) that our friends from Google Books helped us set up. OCR works best with high-resolution images, and not all formatting may be preserved.”, Google Docs Blog says.
for details, please visit Google Docs add OCR support to PDF and Images.
RubyPDF release the 3rd Google App Engine Application, PDFRotate Online, wit it, you can freely rotate PDF page online, the rotate angles support 90, 180 and 270 degrees.
If you want offline version, please check pdfrotate.
the main feature is Split a PDF page to two half size Pdf Page, for example, Split a A3 Page to two A4 pages.
I offer the UrlFetch function in my PDF Password Remover Online application, but I do not want to let it only manipulate no more 1M PDF, after some study, I got the solution, let UrlFetch API download no more 1M data each time, but repeat many times until all data downloaded, of course, there still a limit, 30 second request limit.
For details, please visit
RubyPDF Software released the First PDF Password Remover application hosted on Google App Engine, bases on iText(version 2.1.7, but with many modification). with it, you can easily remove the user password or owner password online, and it is free.
- remove restrictions on any secured PDF document (you should have the right to do it, for example, if you forgot the password). Any Acrobat version up to 9 is supported, even with 128-bit AES or 128-bit RC4 encryption. PDF restrictions removal is an instant process. Unlocked file can be opened in any PDF viewer without any restrictions so you may edit/copy/print it.
- remove the PDF open password. Decryption of the file with password for opening is guaranteed for PDF files Any Acrobat version up to 9 is supported, even with 128-bit AES or 128-bit RC4 encryption,but you must know the password first.
For details, please visit RubyPDF PDF Password Remover Online.
I noticed How to use Adobe Acrobat to Optimize and Reduce PDF File Size lists two PDF version tutorials,
PDF version tutorial of Adobe Acrobat 6 solution to optimize and redue file size,
PDF version tutorial of Adobe Acrobat 7 solution to optimize and redue file size,
and I just wonder why they do not release the tutorials for Adobe Acrobat 8 and Adobe Acrobat 9.
pdfsizeopt is open source project hosting on Google Code, the main feature is PDF file size optimizer. it bases on the following tools,
pdfsizeopt is a collection of best practices and scripts for Unix to optimize the size of PDF files, with focus on PDFs created from TeX and LaTeX documents. pdfsizeopt is developed on a Linux system, and it depends on existing tools such as Python 2.4, Ghostscript 8.50, jbig2enc (optional), sam2p, pngtopnm, pngout (optional), and the Multivalent PDF compressor (optional) written in Java.
for details, please visit pdfsizeopt-a Free and Open Source PDF Manipulation Tool to Reduce PDF File Size