Monday, February 17, 2014

wkhtmltopdf converts HTML to PDF

Most - if not all - new electronic devices don't come with printed manuals. For the Sony laptop SVF15A1 I bought, I had to go to the Sony web site to access the user guide. The user guide is composed of many individual HTML pages. This posted a problem for me because I wanted to convert some HTML pages to PDF documents for easier off-line access.

Modern web browsers, such as Chrome and Firefox, have the built-in print to PDF feature.

For Chrome,

  1. Navigate to the HTML page, right click, and select Print.
  2. Select Save as PDF to be the Destination.
  3. Click Save, and select the output directory to save in.

Note that the output PDF file is automatically named User Guide _ Using the Touch Pad.pdf. I did not have to manually make up the file name because the default name was extracted from the HTML page content. This saves so much time for users that it becomes a major advantage for using the browser print function to do the conversion.

If you prefer a command-line solution, wkhtmltopdf is a simple yet powerful tool to convert HTML to PDF.

$ wkhtmltopdf -s Letter http://docs.esupport.sony.com/pc/SVF14A1_15A1_series/EN/contents/TP0000053442.html UserGuideTouchPad.pdf
Loading page (1/2)
Printing pages (2/2)
Done               

wkhtmltopdf can convert HTML files stored on the local hard drive or over the Internet. The above example specifies a URL from the Sony support web site.

By default, the pages in the output PDF file are of the size A4. North American users may specify the page size Letter using the -s parameter.

The last parameter to wkhtmltopdf is the mandatory output file name. Despite its rich set of parameters, it is missing the feature to automatically name the output file. To harness the full power of wkhtmltopdf, please refer to the man page.

If you want to merge individual PDF files, please see my posts on pdftk, and ImageMagick.

Saturday, February 8, 2014

Find out when a package was last installed or updated

If you administer a Linux computer, you may occasionally ask when a software package was last installed or updated on your system.

For a Red-Hat-based operating system - Centos, Fedora, RHEL, etc - getting the answer is a simple task of querying the RPM database. The RPM database stores, among other things, the last install date of rpm packages.

To query information about the curl package:

$ rpm -qi curl
Name        : curl
Version     : 7.29.0
Release     : 7.fc19
Architecture: i686
Install Date: Sun 11 Aug 2013 03:55:52 PM PDT
Group       : Applications/Internet
Size        : 529869
License     : MIT
Signature   : RSA/SHA256, Sun 23 Jun 2013 09:26:17 AM PDT, Key ID 07477e65fb4b18e6
Source RPM  : curl-7.29.0-7.fc19.src.rpm
Build Date  : Sat 22 Jun 2013 02:46:47 PM PDT
Build Host  : buildvm-11.phx2.fedoraproject.org
Relocations : (not relocatable)
Packager    : Fedora Project
Vendor      : Fedora Project
URL         : http://curl.haxx.se/
Summary     : A utility for getting files from remote servers (FTP, HTTP, and others)
Description :
...

Note the Install Date for curl is Sun 11 Aug 2013 03:55:52 PM PDT. If curl was installed and subsequently updated, the stored Install Date is the update date, not the first install date.

If you run a Debian-based OS - Debian, Ubuntu, Mint, etc - you have to work harder to get the answer. The Debian package manager (dpkg) does not actually store the install date of packages in its database. However, you can still find it out using either of the following procedures.

  • Search the dpkg log files

    The following example reveals the last update time for the curl package (2014-02-04 11:40:55).

    $ grep 'status installed curl:' /var/log/dpkg.log* 
    /var/log/dpkg.log:2014-02-04 11:40:55 status installed curl:amd64 7.26.0-1+wheezy8
    /var/log/dpkg.log.1:2014-01-08 10:06:50 status installed curl:amd64 7.26.0-1+wheezy7
    

    There is a catch with this approach. You cannot search merely the current dpkg log. Depending on when the package was installed or last updated, the dpkg log may already be rotated out. Hence, the asterisk in dpkg.log*. It matches dpkg.log, dpkg.log.1, dpkg.log.2, etc. However, if the package was installed a long time ago, the log file you are looking for may be already auto-deleted from the system.

  • Look for the last modified timestamp of the package's file list.

    When a package is installed or updated, its corresponding file list in /var/lib/dpkg/info/ is overwritten with the latest information. For instance, /var/lib/dpkg/info/curl.list contains a list of file names that are installed by the curl package.

    The last modified timestamp of curl.list gives you a fairly accurate time of when curl was last installed/updated.

    $ ls -l /var/lib/dpkg/info/curl.list
    -rw-r--r-- 1 root root    584 Feb  4 11:40 /var/lib/dpkg/info/curl.list
    

    Again, if curl was updated after the initial install, the timestamp reflects the last update time.

Monday, February 3, 2014

Merging pdf files

This post is about combining pdf files. It complements 2 earlier posts on splitting pdf files - part 1, and 2.

I am using a tool named SimpleScan to scan in multiples pages of a document. Each page is scanned into a separate pdf file. I must find a way to stitch the pdf files together into one single pdf.

You can use either the pdftk command, or the gs command to merge pdf files. You may recall that they are the same commands you would use to split pdf files. In the examples below, the input files are T4a.pdf, T4b.pdf, and T4c.pdf; the merged output file is combined.pdf.

  • pdftk
    $ pdftk T4a.pdf T4b.pdf T4c.pdf output combined.pdf
    
  • gs (GhostScript)
    $ gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER -sOutputFile=combined.pdf T4a.pdf T4b.pdf T4c.pdf 
    GPL Ghostscript 9.05 (2012-02-08)
    Copyright (C) 2010 Artifex Software, Inc.  All rights reserved.
    This software comes with NO WARRANTY: see the file PUBLIC for details.
    Processing pages 1 through 4.
    Page 1
    Loading NimbusSanL-Regu font from /usr/share/fonts/type1/gsfonts/n019003l.pfb... 4247256 2644865 2257128 952885 3 done.
    Loading NimbusSanL-Bold font from /usr/share/fonts/type1/gsfonts/n019004l.pfb... 4288248 2754288 2277312 983470 3 done.
    Loading NimbusMonL-Regu font from /usr/share/fonts/type1/gsfonts/n022003l.pfb... 4331192 2899511 2438784 1137910 3 done.
    Page 2
    Loading NimbusSanL-BoldItal font from /usr/share/fonts/type1/gsfonts/n019024l.pfb... 4370920 2866410 2479152 934054 3 done.
    Loading NimbusSanL-ReguItal font from /usr/share/fonts/type1/gsfonts/n019023l.pfb... 4410584 2982313 2499336 1004810 3 done.
    Page 3
    Page 4
    Processing pages 1 through 4.
    Page 1
    Page 2
    Page 3
    Page 4
    Processing pages 1 through 4.
    Page 1
    Page 2
    Page 3
    Page 4
    

P.S. You can also merge pdf files using ImageMagick. Refer to my post.