Skip to main content

Scanning & OCRring to PDF: Simple Scan, gimagereader and gscan2pdf v NAPS2 for Windows

The project is to build a Linux Mint machine to have the identical functionality and ergonomics as the existing Windows 10 machine.

This stage relates to scanning paper documents to PDF and digitising the scanned text via optical character recognition.

Environment & required functionality

The scan-and-OCR function needs to run on the following machines:
  • The Linux Mint Xfce 18.3 laptop "Gandalf";
  • A Linux Mint Xfce 18.3 virtual machine "Gimli";
  • The Windows 10 laptop "Legolas".
In any modern office - whether at home or at work - some transactional documents and documents from public authorities still arrive by snail-mail.

This requires the ability to scan all documents, optionally with the digitisation of scanned text (typically via optical character recognition).

The hardware is an old HP OfficeJet Pro 276dw, connected to the LAN instead of directly to a workstation.

Alternatives

There are two strategies:
  1. To use the software provided by the hardware manufacturer, in this case HP.
  2. To use third party software.
The third party software:
  • for Windows 10, the benchmark is NAPS2 for Windows.
  • for Linux:

Software selection

Summarising the benchmark's functionality

NAPS2 is superior to HP's own software (for use with the HP OfficeJet Pro 276dw).  Aside from being clumsy, inefficient, wasteful bloatware, HP's software also managed to cram numerous design flaws into its user interface, undermining ergonomic workflow.

The salient functionality of NAPS2 includes:

  • a batch-orientated document scanning process (workflow), capable of meeting the volume needs of a home office or business office (for small businesses);
  • a one-stop stop approach the whole process from scan to image post-processing to optical character recognition to save-to-disk (with some improvements possible);
  • the maximum range of scan options available from the scanner, i.e. flatbed, document feeder, duplex, monochrome, greyscale, colour;
  • in months of use, NAPS2 has yet to crash;

The main issues with NAPS2 include:

  • some functionality is accessible only by mouse, yet there is no objective need for these functions to be mouse-only.  The main example is the select-images-to-save-as-a-PDF-file process.  This impedes ergonomics and the speed of document processing, making it unnecessary manual;
  • the dialogue box for the saved list of scanning profiles is too small to accommodate the icons and descriptions of scanning profiles and, to make this worse, the user cannot re-size the box.  So the box impedes visibility of its own content!

NAPS2 is said to work on Linux via the Wine translation layer, accompanied by the Mono translation layer (in Windows, NAPS2 run on the .NET framework).  At this time in the project project, Wine and Mono are outside of scope.

Linux equivalents

To meet the above benchmark, as at Apr2019, there were only three initial options left for Linux.  Gimli installed both gscan2pdf and gimagereader for a comparable test.

Installation experience

Simple Scan was pre-installed on Linux Mint.  gscan2pdf and gimagereader were installed via Software Manager, without incident.

User experience

Simple Scan

Simple Scan works as designed and can do a credible job for the home-office if the required functionality is only to scan.

Simple Scan deviates from the benchmark in four major ways:

  • the reverse-side of each scanned page is upside-down.  This requires two manual task to correct for each page (Simple Scan can rotate only by 90° at a time).
  • there is no presumption of batch-orientated document scanning workflow.  Each document is a separate scan to save to a single file.  The use cannot batch-scan a pile of documents, then use Simple Scan to save selected pages (i.e. a document) into a single PDF file.  Instead, the user needs to jump up-and-down, to-and-from the scanner, for each and every multi-page document.  This makes the job so much longer, although, to be fair, it does burn calories and might help the user eliminate his/her personal obesity crisis.  But it's no good for an office environment (of whatever size of business).
  • Simple Scan doesn't always get the page size correct.  When set to determine the page size automatically, sheets of A4 are scanned in pages that are far bigger.  So this requires manual intervention to set the settings correctly.
  • Simple Scan's default dots-per-inch setting for text documents is 150dpi, a tad too low for reliable OCRring (which ideally needs 200dpi).  But Simple Scan offers only 150dpi or 300dpi, no finer balance is available.
Although Simple Scan presents itself as simple, it makes some pretty crass choices about simplicity, resulting in the user needing to do more manual tasks to achieve the required result.

Regarding the upside-down results of every reverse-side, this is found in gscan2pdf too, so is likely to be an issue with the underlying device driver than Simple Scan.  Unfortunately, no-body appears to have asked about this issue on the internet, so it suggests that the desired functionality is so little demanded that no-body has bothered to fix it.  At least the benchmark gets it right straight off the bat.

Gimagereader

Gimagereader is primarily an OCR app for existing documents.  In this test, its primary use case is to complement Simple Scan.  It is a front-end for the tesseract engine.

Gimagereader is accessible only by mouse.  It is absolutely horrendous!

When processing an OCR, it's use of tesseract appears to be far slower than either gscan2pdf or NAPS2 (by minutes).

When gimagereader has processed the OCR, the user can save the output to a text file, but apparently not to embed it into the PDF.  This defeats the whole purpose of OCR: a presumption of the functionality - that the benchmark gets right - is that the selectable text is available directly from within the PDF, that the PDF browser permits select and copy.

Gimagereader is programmed to use tesseract as its OCR engine.  Tesseract does a fair job of reading even 150dpi scanned text, although the number of errors is too great to be reliable in any office environment.  Gimagereader presents the output as a preview pane with a spell-checking facility.  So when tesseract misreads "mortgage" as "mortgge", gimagereader underlines it as a potential error.  This is a useful manual task to have available, if only to show how unreliable the OCR engine is with text scanned at too low a resolution.  By contrast, the benchmark just gets it right, straight off the bat.

Gimagereader can scan documents, too, but only as greyscale or colour.  For office documents, only monochrome is objectively necessary.

Gscan2pdf

Gscan2pdf is by far the best of the Linux applications, but the results fall short of the benchmark.

In principle, gscan2pdf meets all of the functionality of the benchmark.  In practice, the results are weaker than the benchmark.  In summary:
  • like Simple Scan, it returns the back page upside-down, although, unlike Simple Scan, the user can correct this upon issuing the order to scan.
  • although gscan2pdf permits the user the choice of ordering a range of post-scanning processes, gscan2pdf appears to call underlying processes for de-skewing and OCR with errors, such that gscan2pdf sometimes crashes instead of processing the document, or interrupts the processing with some error message that the user needs to clear (i.e. "babysitting").  The solution is to have gscan2pdf only scan the document first, then each post-process as a separate, manually-triggered stage.  The benchmark does processing automatically, and gets it right straight off the bat.
  • gscan2pdf permits use of a choice of OCR engines - tesseract, cuneiform and gocr - yet in spite of gscan2pdf scanning at the requested 200dpi monochrome (lineart, as gscan2pdf calls it), none of the three OCR engines return accurate data.  Again, the benchmark is more reliable than gscan2pdf's partner OCR engines.  Of course, that's not gscan2pdf's fault, but the app is only as good as the results it can produce.
  • insufficient keyboard accessibility.  Some dialogue boxes allow only partial navigation and selection with the keyboard, although the menu structure at the top of the main pane is comprehensively keyboard-accessible.  In addition, gscan2pdf doesn't invoke any of the visual clues available within Linux Mint to "highlight" the keyboard selected control.  So, for those dialogue boxes which are keyboard-accessible, the user cannot "see" where they are and has to guess.
Nevertheless, gscan2pdf meets the same requirements as the benchmark regarding the batch-orientated scan process.  All gscan2pdf needed to do to meet the benchmark was to implement its offering more carefully, and then it would be a straight swap for the benchmark.

Conclusion

Gscan2pdf is by far the most functional of the three Linux apps and would, in principle, meet all of the requirements of the benchmark (NAPS2 for Windows).

However, gscan2pdf implements its functionality in a partially inaccessible way, is prone to crashing or holding the processing to hostage with error messages, and uses OCR engines whose performance is below that of the benchmark.

In spite of such poor performance relative to benchmark, gscan2pdf just about achieves the required functionality to permit the project to proceed.  However, the cost of migration appears to be less reliability, less slick workflows (inefficient use of user's time) and less reliable OCR than the benchmark.


Comments

Popular posts from this blog

OnlyOffice: keyboard inaccessible, so not useable, therefore not tested

I installed OnlyOffice https://www.onlyoffice.com/. I had intended to test it with my now-standard test suite of two linked workbooks.

Unfortunately, in spite of a promising look, I quickly discovered that - with one exception - everything was navigable only by mouse.

That makes it a child's toy.  Unfit for purpose!  No point in testing it further.

I uninstalled it within 10 minutes of installing it.


Adjusting screen brightness

The machine on which Linux Mint is installed an old Acer Aspire 5732Z ("Gandalf")

It has buttons to adjust the brightness of the screen's backlight.  When the user uses these buttons, Linux Mint correctly presented a fading-popup box (a slider bar) to denote relative brightness.  But Linux Mint did not actually adjust the brightness of the screen.

It seems to be a known issue in the Linux Mint forums and solved in multiple  stages by the Easy Tips Project.

I followed the instructions on Easy Tips section 5.2 in Gandalf's admin account, then re-booted, then logged in using the user account, and the brightness adjustment function worked correctly.

Easy Tips asks the user to discover the relevant property of the machine, then creates a file that contains a script of parameters that other programs in Linux Mint understand.

This method worked for Gandalf, because Gandalf has an integrated Intel chipset.

Useful commands at the Terminal ALT+T (or the Mint) menu gets to the …

Keepass and KeepassX

The project is to build a Linux Mint machine to have the identical functionality and ergonomics as the existing Windows 10 machine.

This stage relates to password manager, Keepass.
Environment & required functionality A number of encrypted password vaults synchronise between three machines:

The Linux Mint Xfce laptop "Gandalf";The Windows 10 laptop "Legolas";Another Windows 10 machine, name withheld to protect the guilty.
The synchronisation agent is Google Drive in Windows 10, and grive2 in Linux Mint.
Alternatives My original decision to use Keepass was in 2016 and was based on:

Keepass is open-source;Keepass is locally stored, not stored in the cloud;Keepass does not automatically plug into the browser (a plugin permits this if ever necessary);higher security standards at the office, worth deploying at home;portability of the password vault via Google Drive, encrypted such that Google would not be able to slurp data from an otherwise-unencrypted vault.overall …