Skip to main content

Scanning & OCRring to PDF: Simple Scan, gimagereader and gscan2pdf v NAPS2 for Windows

The project is to build a Linux Mint machine to have the identical functionality and ergonomics as the existing Windows 10 machine.

This stage relates to scanning paper documents to PDF and digitising the scanned text via optical character recognition.

Environment & required functionality

The scan-and-OCR function needs to run on the following machines:
  • The Linux Mint Xfce 18.3 laptop "Gandalf";
  • A Linux Mint Xfce 18.3 virtual machine "Gimli";
  • The Windows 10 laptop "Legolas".
In any modern office - whether at home or at work - some transactional documents and documents from public authorities still arrive by snail-mail.

This requires the ability to scan all documents, optionally with the digitisation of scanned text (typically via optical character recognition).

The hardware is an old HP OfficeJet Pro 276dw, connected to the LAN instead of directly to a workstation.

Alternatives

There are two strategies:
  1. To use the software provided by the hardware manufacturer, in this case HP.
  2. To use third party software.
The third party software:
  • for Windows 10, the benchmark is NAPS2 for Windows.
  • for Linux:

Software selection

Summarising the benchmark's functionality

NAPS2 is superior to HP's own software (for use with the HP OfficeJet Pro 276dw).  Aside from being clumsy, inefficient, wasteful bloatware, HP's software also managed to cram numerous design flaws into its user interface, undermining ergonomic workflow.

The salient functionality of NAPS2 includes:

  • a batch-orientated document scanning process (workflow), capable of meeting the volume needs of a home office or business office (for small businesses);
  • a one-stop stop approach the whole process from scan to image post-processing to optical character recognition to save-to-disk (with some improvements possible);
  • the maximum range of scan options available from the scanner, i.e. flatbed, document feeder, duplex, monochrome, greyscale, colour;
  • in months of use, NAPS2 has yet to crash;

The main issues with NAPS2 include:

  • some functionality is accessible only by mouse, yet there is no objective need for these functions to be mouse-only.  The main example is the select-images-to-save-as-a-PDF-file process.  This impedes ergonomics and the speed of document processing, making it unnecessary manual;
  • the dialogue box for the saved list of scanning profiles is too small to accommodate the icons and descriptions of scanning profiles and, to make this worse, the user cannot re-size the box.  So the box impedes visibility of its own content!

NAPS2 is said to work on Linux via the Wine translation layer, accompanied by the Mono translation layer (in Windows, NAPS2 run on the .NET framework).  At this time in the project project, Wine and Mono are outside of scope.

Linux equivalents

To meet the above benchmark, as at Apr2019, there were only three initial options left for Linux.  Gimli installed both gscan2pdf and gimagereader for a comparable test.

Installation experience

Simple Scan was pre-installed on Linux Mint.  gscan2pdf and gimagereader were installed via Software Manager, without incident.

User experience

Simple Scan

Simple Scan works as designed and can do a credible job for the home-office if the required functionality is only to scan.

Simple Scan deviates from the benchmark in four major ways:

  • the reverse-side of each scanned page is upside-down.  This requires two manual task to correct for each page (Simple Scan can rotate only by 90° at a time).
  • there is no presumption of batch-orientated document scanning workflow.  Each document is a separate scan to save to a single file.  The use cannot batch-scan a pile of documents, then use Simple Scan to save selected pages (i.e. a document) into a single PDF file.  Instead, the user needs to jump up-and-down, to-and-from the scanner, for each and every multi-page document.  This makes the job so much longer, although, to be fair, it does burn calories and might help the user eliminate his/her personal obesity crisis.  But it's no good for an office environment (of whatever size of business).
  • Simple Scan doesn't always get the page size correct.  When set to determine the page size automatically, sheets of A4 are scanned in pages that are far bigger.  So this requires manual intervention to set the settings correctly.
  • Simple Scan's default dots-per-inch setting for text documents is 150dpi, a tad too low for reliable OCRring (which ideally needs 200dpi).  But Simple Scan offers only 150dpi or 300dpi, no finer balance is available.
Although Simple Scan presents itself as simple, it makes some pretty crass choices about simplicity, resulting in the user needing to do more manual tasks to achieve the required result.

Regarding the upside-down results of every reverse-side, this is found in gscan2pdf too, so is likely to be an issue with the underlying device driver than Simple Scan.  Unfortunately, no-body appears to have asked about this issue on the internet, so it suggests that the desired functionality is so little demanded that no-body has bothered to fix it.  At least the benchmark gets it right straight off the bat.

Gimagereader

Gimagereader is primarily an OCR app for existing documents.  In this test, its primary use case is to complement Simple Scan.  It is a front-end for the tesseract engine.

Gimagereader is accessible only by mouse.  It is absolutely horrendous!

When processing an OCR, it's use of tesseract appears to be far slower than either gscan2pdf or NAPS2 (by minutes).

When gimagereader has processed the OCR, the user can save the output to a text file, but apparently not to embed it into the PDF.  This defeats the whole purpose of OCR: a presumption of the functionality - that the benchmark gets right - is that the selectable text is available directly from within the PDF, that the PDF browser permits select and copy.

Gimagereader is programmed to use tesseract as its OCR engine.  Tesseract does a fair job of reading even 150dpi scanned text, although the number of errors is too great to be reliable in any office environment.  Gimagereader presents the output as a preview pane with a spell-checking facility.  So when tesseract misreads "mortgage" as "mortgge", gimagereader underlines it as a potential error.  This is a useful manual task to have available, if only to show how unreliable the OCR engine is with text scanned at too low a resolution.  By contrast, the benchmark just gets it right, straight off the bat.

Gimagereader can scan documents, too, but only as greyscale or colour.  For office documents, only monochrome is objectively necessary.

Gscan2pdf

Gscan2pdf is by far the best of the Linux applications, but the results fall short of the benchmark.

In principle, gscan2pdf meets all of the functionality of the benchmark.  In practice, the results are weaker than the benchmark.  In summary:
  • like Simple Scan, it returns the back page upside-down, although, unlike Simple Scan, the user can correct this upon issuing the order to scan.
  • although gscan2pdf permits the user the choice of ordering a range of post-scanning processes, gscan2pdf appears to call underlying processes for de-skewing and OCR with errors, such that gscan2pdf sometimes crashes instead of processing the document, or interrupts the processing with some error message that the user needs to clear (i.e. "babysitting").  The solution is to have gscan2pdf only scan the document first, then each post-process as a separate, manually-triggered stage.  The benchmark does processing automatically, and gets it right straight off the bat.
  • gscan2pdf permits use of a choice of OCR engines - tesseract, cuneiform and gocr - yet in spite of gscan2pdf scanning at the requested 200dpi monochrome (lineart, as gscan2pdf calls it), none of the three OCR engines return accurate data.  Again, the benchmark is more reliable than gscan2pdf's partner OCR engines.  Of course, that's not gscan2pdf's fault, but the app is only as good as the results it can produce.
  • insufficient keyboard accessibility.  Some dialogue boxes allow only partial navigation and selection with the keyboard, although the menu structure at the top of the main pane is comprehensively keyboard-accessible.  In addition, gscan2pdf doesn't invoke any of the visual clues available within Linux Mint to "highlight" the keyboard selected control.  So, for those dialogue boxes which are keyboard-accessible, the user cannot "see" where they are and has to guess.
Nevertheless, gscan2pdf meets the same requirements as the benchmark regarding the batch-orientated scan process.  All gscan2pdf needed to do to meet the benchmark was to implement its offering more carefully, and then it would be a straight swap for the benchmark.

Conclusion

Gscan2pdf is by far the most functional of the three Linux apps and would, in principle, meet all of the requirements of the benchmark (NAPS2 for Windows).

However, gscan2pdf implements its functionality in a partially inaccessible way, is prone to crashing or holding the processing to hostage with error messages, and uses OCR engines whose performance is below that of the benchmark.

In spite of such poor performance relative to benchmark, gscan2pdf just about achieves the required functionality to permit the project to proceed.  However, the cost of migration appears to be less reliability, less slick workflows (inefficient use of user's time) and less reliable OCR than the benchmark.


Comments

Popular posts from this blog

Status report: wholesale migration from Windows to Linux is not functionally possible

As at mid-May2019 , it was clear that the path to migration from Windows to Linux was obstructed by a lack of apps that are fit-for-purpose being available in the Linux environment. Since May2019, there has been no change to the apps/functionalities then listed in the section, "Path to migration is obstructed by apps which are incompatible or otherwise unusable."  Developments in the interim have merely confirmed that the apps available for the Linux environment are not fit-for-purpose, and are unlikely to be fit-for-purpose for the foreseeable future . So, it's time for a change of tack.  The time is right to deploy Occam's Razor. In short, the Linux Mint offers a perfect solution to the jaded Windows user.  The only problem with Linux Mint is not of Linux Mint's making.  The problem is a lack of apps that are fit-for-purpose in the Linux environment.  By fit-for-purpose, I mean apps that meet the hygiene requirements of office-based, corporate lackeys wh...

An attempt at full-disk encryption: Vera Crypt

The project is to build a Linux Mint machine to have the identical functionality and ergonomics as the existing Windows 10 machine. This stage relates to testing full-disk encryption using VeraCrypt . Environment & required functionality Full-disk encryption needs to run on the following machines: The Linux Mint Xfce 18.3 laptop " Gandalf "; The Windows 10 laptop " Legolas ". The objective requirement is to protect user data from the physical theft of the physical machine, to provide an additional line of defence against data loss. This is probably more important for Windows than for Linux Mint.   Even so, in both cases, the operating system is likely to log activity which can reveal personal data and user (meta)data. Full-disk encryption does not mitigate against Microsoft’s sinister telemetry functionality, for which the main solutions seem to be: Either to use tools whose developers are constantly on the prowl, hunting for t...