The project is to build a Linux Mint machine to have the identical functionality and ergonomics as the existing Windows 10 machine.
This stage relates to scanning paper documents to PDF and digitising the scanned text via optical character recognition.
This requires the ability to scan all documents, optionally with the digitisation of scanned text (typically via optical character recognition).
The hardware is an old HP OfficeJet Pro 276dw, connected to the LAN instead of directly to a workstation.
The salient functionality of NAPS2 includes:
The main issues with NAPS2 include:
NAPS2 is said to work on Linux via the Wine translation layer, accompanied by the Mono translation layer (in Windows, NAPS2 run on the .NET framework). At this time in the project project, Wine and Mono are outside of scope.
Simple Scan deviates from the benchmark in four major ways:
Regarding the upside-down results of every reverse-side, this is found in gscan2pdf too, so is likely to be an issue with the underlying device driver than Simple Scan. Unfortunately, no-body appears to have asked about this issue on the internet, so it suggests that the desired functionality is so little demanded that no-body has bothered to fix it. At least the benchmark gets it right straight off the bat.
Gimagereader is accessible only by mouse. It is absolutely horrendous!
When processing an OCR, it's use of tesseract appears to be far slower than either gscan2pdf or NAPS2 (by minutes).
When gimagereader has processed the OCR, the user can save the output to a text file, but apparently not to embed it into the PDF. This defeats the whole purpose of OCR: a presumption of the functionality - that the benchmark gets right - is that the selectable text is available directly from within the PDF, that the PDF browser permits select and copy.
Gimagereader is programmed to use tesseract as its OCR engine. Tesseract does a fair job of reading even 150dpi scanned text, although the number of errors is too great to be reliable in any office environment. Gimagereader presents the output as a preview pane with a spell-checking facility. So when tesseract misreads "mortgage" as "mortgge", gimagereader underlines it as a potential error. This is a useful manual task to have available, if only to show how unreliable the OCR engine is with text scanned at too low a resolution. By contrast, the benchmark just gets it right, straight off the bat.
Gimagereader can scan documents, too, but only as greyscale or colour. For office documents, only monochrome is objectively necessary.
In principle, gscan2pdf meets all of the functionality of the benchmark. In practice, the results are weaker than the benchmark. In summary:
However, gscan2pdf implements its functionality in a partially inaccessible way, is prone to crashing or holding the processing to hostage with error messages, and uses OCR engines whose performance is below that of the benchmark.
In spite of such poor performance relative to benchmark, gscan2pdf just about achieves the required functionality to permit the project to proceed. However, the cost of migration appears to be less reliability, less slick workflows (inefficient use of user's time) and less reliable OCR than the benchmark.
This stage relates to scanning paper documents to PDF and digitising the scanned text via optical character recognition.
Environment & required functionality
The scan-and-OCR function needs to run on the following machines:- The Linux Mint Xfce 18.3 laptop "Gandalf";
- A Linux Mint Xfce 18.3 virtual machine "Gimli";
- The Windows 10 laptop "Legolas".
This requires the ability to scan all documents, optionally with the digitisation of scanned text (typically via optical character recognition).
The hardware is an old HP OfficeJet Pro 276dw, connected to the LAN instead of directly to a workstation.
Alternatives
There are two strategies:- To use the software provided by the hardware manufacturer, in this case HP.
- To use third party software.
- for Windows 10, the benchmark is NAPS2 for Windows.
- for Linux:
- Simple Scan is bundled within Linux Mint 18.3 Xfce;
- gscan2pdf
- gimagereader
Software selection
Summarising the benchmark's functionality
NAPS2 is superior to HP's own software (for use with the HP OfficeJet Pro 276dw). Aside from being clumsy, inefficient, wasteful bloatware, HP's software also managed to cram numerous design flaws into its user interface, undermining ergonomic workflow.The salient functionality of NAPS2 includes:
- a batch-orientated document scanning process (workflow), capable of meeting the volume needs of a home office or business office (for small businesses);
- a one-stop stop approach the whole process from scan to image post-processing to optical character recognition to save-to-disk (with some improvements possible);
- the maximum range of scan options available from the scanner, i.e. flatbed, document feeder, duplex, monochrome, greyscale, colour;
- in months of use, NAPS2 has yet to crash;
The main issues with NAPS2 include:
- some functionality is accessible only by mouse, yet there is no objective need for these functions to be mouse-only. The main example is the select-images-to-save-as-a-PDF-file process. This impedes ergonomics and the speed of document processing, making it unnecessary manual;
- the dialogue box for the saved list of scanning profiles is too small to accommodate the icons and descriptions of scanning profiles and, to make this worse, the user cannot re-size the box. So the box impedes visibility of its own content!
NAPS2 is said to work on Linux via the Wine translation layer, accompanied by the Mono translation layer (in Windows, NAPS2 run on the .NET framework). At this time in the project project, Wine and Mono are outside of scope.
Linux equivalents
To meet the above benchmark, as at Apr2019, there were only three initial options left for Linux. Gimli installed both gscan2pdf and gimagereader for a comparable test.Installation experience
Simple Scan was pre-installed on Linux Mint. gscan2pdf and gimagereader were installed via Software Manager, without incident.User experience
Simple Scan
Simple Scan works as designed and can do a credible job for the home-office if the required functionality is only to scan.Simple Scan deviates from the benchmark in four major ways:
- the reverse-side of each scanned page is upside-down. This requires two manual task to correct for each page (Simple Scan can rotate only by 90° at a time).
- there is no presumption of batch-orientated document scanning workflow. Each document is a separate scan to save to a single file. The use cannot batch-scan a pile of documents, then use Simple Scan to save selected pages (i.e. a document) into a single PDF file. Instead, the user needs to jump up-and-down, to-and-from the scanner, for each and every multi-page document. This makes the job so much longer, although, to be fair, it does burn calories and might help the user eliminate his/her personal obesity crisis. But it's no good for an office environment (of whatever size of business).
- Simple Scan doesn't always get the page size correct. When set to determine the page size automatically, sheets of A4 are scanned in pages that are far bigger. So this requires manual intervention to set the settings correctly.
- Simple Scan's default dots-per-inch setting for text documents is 150dpi, a tad too low for reliable OCRring (which ideally needs 200dpi). But Simple Scan offers only 150dpi or 300dpi, no finer balance is available.
Regarding the upside-down results of every reverse-side, this is found in gscan2pdf too, so is likely to be an issue with the underlying device driver than Simple Scan. Unfortunately, no-body appears to have asked about this issue on the internet, so it suggests that the desired functionality is so little demanded that no-body has bothered to fix it. At least the benchmark gets it right straight off the bat.
Gimagereader
Gimagereader is primarily an OCR app for existing documents. In this test, its primary use case is to complement Simple Scan. It is a front-end for the tesseract engine.Gimagereader is accessible only by mouse. It is absolutely horrendous!
When processing an OCR, it's use of tesseract appears to be far slower than either gscan2pdf or NAPS2 (by minutes).
When gimagereader has processed the OCR, the user can save the output to a text file, but apparently not to embed it into the PDF. This defeats the whole purpose of OCR: a presumption of the functionality - that the benchmark gets right - is that the selectable text is available directly from within the PDF, that the PDF browser permits select and copy.
Gimagereader is programmed to use tesseract as its OCR engine. Tesseract does a fair job of reading even 150dpi scanned text, although the number of errors is too great to be reliable in any office environment. Gimagereader presents the output as a preview pane with a spell-checking facility. So when tesseract misreads "mortgage" as "mortgge", gimagereader underlines it as a potential error. This is a useful manual task to have available, if only to show how unreliable the OCR engine is with text scanned at too low a resolution. By contrast, the benchmark just gets it right, straight off the bat.
Gimagereader can scan documents, too, but only as greyscale or colour. For office documents, only monochrome is objectively necessary.
Gscan2pdf
Gscan2pdf is by far the best of the Linux applications, but the results fall short of the benchmark.In principle, gscan2pdf meets all of the functionality of the benchmark. In practice, the results are weaker than the benchmark. In summary:
- like Simple Scan, it returns the back page upside-down, although, unlike Simple Scan, the user can correct this upon issuing the order to scan.
- although gscan2pdf permits the user the choice of ordering a range of post-scanning processes, gscan2pdf appears to call underlying processes for de-skewing and OCR with errors, such that gscan2pdf sometimes crashes instead of processing the document, or interrupts the processing with some error message that the user needs to clear (i.e. "babysitting"). The solution is to have gscan2pdf only scan the document first, then each post-process as a separate, manually-triggered stage. The benchmark does processing automatically, and gets it right straight off the bat.
- gscan2pdf permits use of a choice of OCR engines - tesseract, cuneiform and gocr - yet in spite of gscan2pdf scanning at the requested 200dpi monochrome (lineart, as gscan2pdf calls it), none of the three OCR engines return accurate data. Again, the benchmark is more reliable than gscan2pdf's partner OCR engines. Of course, that's not gscan2pdf's fault, but the app is only as good as the results it can produce.
- insufficient keyboard accessibility. Some dialogue boxes allow only partial navigation and selection with the keyboard, although the menu structure at the top of the main pane is comprehensively keyboard-accessible. In addition, gscan2pdf doesn't invoke any of the visual clues available within Linux Mint to "highlight" the keyboard selected control. So, for those dialogue boxes which are keyboard-accessible, the user cannot "see" where they are and has to guess.
Nevertheless, gscan2pdf meets the same requirements as the benchmark regarding the batch-orientated scan process. All gscan2pdf needed to do to meet the benchmark was to implement its offering more carefully, and then it would be a straight swap for the benchmark.
Conclusion
Gscan2pdf is by far the most functional of the three Linux apps and would, in principle, meet all of the requirements of the benchmark (NAPS2 for Windows).However, gscan2pdf implements its functionality in a partially inaccessible way, is prone to crashing or holding the processing to hostage with error messages, and uses OCR engines whose performance is below that of the benchmark.
In spite of such poor performance relative to benchmark, gscan2pdf just about achieves the required functionality to permit the project to proceed. However, the cost of migration appears to be less reliability, less slick workflows (inefficient use of user's time) and less reliable OCR than the benchmark.
Comments
Post a Comment