VER1.0 win - rev. c
by Dan Evans
This program, by Caere Corp., is designed to convert large paper documents
to HTML pages while maintaining the logical structure of that document,
including a means of hyperlink navigation within the document to be
used in web page development. This is done by incorporating OmniPage
Pros OCR (Optical Character Recognition) engine with Caeres
recently developed Logical Structure Recognition (LCR) technology. Remember,
OCR programs convert scanned paper- based information into text that
can be edited in word processors or other text-based programs. OmniPage
Web takes this a step further, introducing LCR to automatically recognize
the outline, hierarchy and structure of scanned documents, and than
generate full sets of HTML pages. The program automatically generates
a table of contents with hyperlinks to the appropriate section of the
document as well as a navigation panel ( top and/or bottom of page)
using text links, your own graphics or icons supplied by OmniPage Web.
The levels of HTML that are supported by this version range from plain
text to Dynamic HTML as well as Cascading Style Sheets (CSS).
I installed this program on a 500MHz Pentium III computer with 96 MB
SDRAM memory, 13 GB hard drive and Windows 98 - second edition. My scanner
is an antique HP ScanJet 4P connected via a Jaz SCSI PCI Card. Typical
instillation consumed about 13 MB of hard drive space and about 15 minutes
of setup time. OmniPage minimum requirements are Pentium PC, 45 MB HD
space, Win 95/98/NT 4.0, SVGA or VGA 256 colors, CD-Rom and 16 MB RAM.
The program interface before outlining provides you with four toolbars
(Standard, Zone, Table, and Auto Web) and three view panes (Thumbnail
on the left for each page scanned, Original Image and Text).
The first four steps to converting a printed document to an HTML document
is similar to other OCR packages that Ive used. These steps are:
1. Scanning the document or loading a series of image files.
The Scanning tab can be set to accommodate an Automatic Document Feeder
(ADF), double sided pages as well as color, grayscale, and black and
white scans; useful features when scanning large documents.
Steps two and three are combined:
2. Automatic zoning which identifies page elements such as text,
graphics and tables, and establishing a reading order for them and;
3. Automatic OCR to convert printed text into editable text.
At this point you are given the opportunity to manually re-zone each
page using the Zone & Table Toolbars to correct text, graphics and
table zones as well as the order you may want them to appear. After
adjusting page zones, you will want to save the document as a OmniPage
Web (.wmt) file. This saves you having to scan and re-zone if you need
to re OCR or re-outline your document. You can just reload the .wmt
file and continue from there.
4. Now you re-OCR and proofread your document to correct OCR
OmniPage Web does things a little different here. Using Logical Structure
Recognition (LCR), the program completes steps five and six.
5. Once recognized, LCR will automatically create an outline
of your document indicating various levels of the document structure.
These include headlines, headings (up to 6 levels), body text, graphics,
tables, headers, footers, captions, URLs, e-mail addresses and
cross references. This outline becomes the basis for the table of contents
on the Web site. A toolbar at the top of the outline view pane allows
you to make manual changes to the outline structure.
6. The final step saves the outline to HTML, complete with a
Table of Contents (which we didnt have in the beginning) linking
to elements within the Web site, live links to Web addresses and e-mail
addresses, as well as cross references to other sections of the site.
The program offers a large variety of options to control HTML output.
General settings effect the whole document to include generating Plain
HTML for universal browser support, document Title, page breaks and
the ability to include the original page image on the web site.
The Components section controls the look and formatting of the Web pages.
These incline the navigational panel, image map or banner, signature
or copyright page, the order of the components on the page, headers/footers,
table of contents, how graphics are presented and horizontal rulings.
The Component Styles section allow text, border, background preference
to be modified. A broad range of style options become available with
Cascading Style Sheets enabled. The program has 20 predefined themes
with unique styles that can be modified or, you can develop you own
theme and save for future use.
Priced at $499, this package is a particularly useful tool for web designers
needing to convert very large documents ( handbooks, manuals, etc) for
display on web sites. Caere Corporation can be found at www.caere.com
Back to the top