A common high end tape format. Current capacities range up to 25 Gigabytes.

Adobe's electronic document format. Documents can be created from within a word processor, from postscript, or from scanned pages. The documents are highly portable, yet maintain the look of the original. Acrobat is especially useful in this area because Adobe makes the reader available for free. Version 3.0 also makes it integrate well with web browsers.

A technique that is used to smooth curves and diagonal lines by adding pixels of intermediate shades or colors around the line.

Aperture Card
A card which holds microfilm intended to protect the film and facilitate loading by a scanner or viewer.

Boolean Searching
To search a text search database using logical operators such as AND, OR, and NOT to focus the search. Another operator available in some search databases is NEAR.

CSS (Cascading Style Sheet)
Style Sheet standard developed by W3C for HTML. This allows the separation of actual data from instructions for presentation.

The re-encoding of data to make it smaller. Most image file formats use compression because image files tend to be large and consume large amounts of disk space and transmission time over networks.

DAT (Digital Audio Tape)
A magnetic tape originally designed for use in audio applications, but now popular for storing data. Capacities range up to 12 Gigabytes.

DLT (Digital Linear Tape)
A fairly new high end tape format. Capacities range up to 35 Gigabytes.

Synonymous with scanning, it is the conversion from printed paper, film, or some other media, to an electronic form where the page is represented as either black and white dots, or color or grayscale pixels.

A technique that is used to add more colors or shades of gray to an existing image, the goal being to improve the appearance of the image. Can be thought of as the inverse to quantization.

An electronic document format primarily useful for scanned documents. Key features are support for different resolutions and compression types for photo areas of an image versus text. Uses a variant of JBIG2 compression for binary image data and wavelets for continuous areas, such as photos. More information here.

To transmit a file from one computer to another. Usually implies retrieving a file from a remote computer to a local one, or from a large computer to a smaller one. FTP is a commonly used command for this.

DTD (Document Type Definition)
A collection of XML markup definitions that specifies the rules, structure and language that is to be used in documents conforming to the DTD. Written as a plain text file with the file extension of .dtd. DTD's may be custom made but many standard DTD's are also available including the JAIDTD suite.

DVD (Digital Video Disk)
An optical storage medium that can store up to 4.7 Gigabytes (single layer), 8.5 GB (double layer), 9.4 GB (double sided, single layer), or 17 GB (double sided, double layer). Transfer rates and seek times are similar to those of CD-ROM for currently available drives. The DVD spec included higher level specs for audio and video capabilities.

Fielded Searching
Searching in a text search database which has the records organized into fields. Common fields are title, author, keywords, abstract, and date. Fields give the searcher a means to focus the search.

Electronic Document
A document that has been scanned, or was originally created on a computer. Documents become more useful when stored electronically because they can be widely distributed instantly, and allow searching. HTML and PDF are well known electronic document formats.

G4 Compression
A compression technique used in Fax Group 4. It produces very good results for black and white, and is frequently used as an option in TIFF files for black and white images. It is also used in Adobe Acrobat (PDF) files.

An image file format that is commonly used on the web. It uses LZW compression, which makes it good for color and grayscale images, but it does not compress as well as G4 for black and white. LZW is "lossless" which means it will not compress as well as JPEG, but will retain all of the image's quality.

An image type that uses black, white, and a ranges of shades of gray. The number of shades of gray depends on the number of bits per pixel. The larger the number of shades of gray, the better the image will look, and the larger the file will be.

ICR (Intelligent Character Recognition)
The processes of recognizing handwritten characters. Similar to OCR, but more difficult since OCR is from printed text.

Image File Format
When a page is scanned, the page can be stored in a number of file types. The type should be chosen based on the desired use of the image, and the software that will be used. Different file formats commonly use different methods of compression as well, and some types of images compress better using some formats rather than others.

A file system format standard developed for CD-ROMs using the CD-XA encoding standard. It is supported by Microsoft operating systems, UNIX, and Macintosh.

A "lossless" image compression format for binary (black and white) images. Compresses better than G4 by up to 25 percent. Also supports progressive encoding. Licensing issues have slowed its adoption for use.

A "lossy" image compression format for binary (black and white) images. A JBIG2 compressor identifies common objects (usually characters) in the image and creates a dictionary with references to those objects. Lossiness is induced by allowing similar objects to be represented by a single dictionary entry. This format is supported in PDF 1.4 and greater.

An image file format that is best suited for photographs. It supports "lossiness", which means that it will throw away some detail in order to achieve better compression. It does not work well for text.

An image file format best suited for photographs, using wavelets for compression. It is "lossy", which means that it will throw away some detail in order to achieve better compression. It does not work well for text.

An image file format best suited for photographs, using wavelets for compression. It is "lossy", which means that it will throw away some detail in order to achieve better compression. It does not work well for text.

Journal Archiving and Interchange DTD (JAIDTD)
A public domain DTD developed by the National Library of Medicine (NLM) and the National Center for Biotechnology Information (NCBI) in order to establish a standard format for journal content.

Adobe's Portable Document Format. The term Adobe uses to describe Acrobat files. (See Acrobat)

To be functional across differing types of computers and operating systems. This can be used to describe programs or electronic documents.

Progressive Encoding
A method by which multiple resolutions of the same image is stored in the same image file. Imaging systems can efficiently serve lower-than-maximum resolutions with images encoded this way. Total file size is increased, but smaller amounts of data can be transmitted to clients.

In OCR, the results are never perfect. The same is true for conversion to PDF. Proofing (in this context) is a service by which the resulting OCR text or PDF file is repaired for errors induced by the OCR process.

To reduce the number of colors or shades of gray in an image, with the goal being to reduce file size while maintaining image quality. Also used to display images with more colors than are available on the display device.

The number of dots per inch (dpi) that were stored during scanning. The greater the number, the greater the amount of detail that is visible. It is recommended that you use between 72 and 100 dpi for images that will be displayed on the screen, and 300 dpi for images that will print on common inexpensive printers. Higher resolution images take up more space as well.

Similar to DTD this is another way of specifying the rules, structure and language of an XML document. Unlike DTDs, Schemas are written in XML. For this reason and the fact that Schemas address many of the limitations of DTDs, Schemas are intended to replace DTDs.

Searching (aka Full Text Searching)
One big advantage of electronic documentation is the ability to use the computer to search for words in a document, or search for a document containing the word (or words) you are looking for.

Standard Generalized Markup Language. A language for marking up text (i.e. tagging) in an electronic document. Common tagged elements include document title, author, and abstract. SGML is the underlying language behind HTML and XML.

During printing or scanning, the contents of a page are almost never exactly vertical, which referred to as being skewed. De-skewing is a process where the computer detects and corrects the skew in an image file.

Style Sheet
Unlike HTML, XML has no tags to define presentation such as color, font, or placement. Instructions on how to display an XML document are found in a separate file called the Style Sheet. One style sheet can be used for many documents and can be customized for each user. The style sheet is itself an XML document. XSL is used to produce the Style Sheet.

When converting a pixel from grayscale to black and white, the threshold is the gray value above which will be considered white, and below or equal to will be considered black.

TIFF (Tag Image File Format)
An industry standard image file format. It is unique in that it incorporates multiple compression techniques, allowing the user to specify the best format for a type of image, and that one file can contain multiple images.

A character set that can support a wide range of international characters. Unicode requires 16-bits to encode a character, unlike ASCII, which requires only 8 but supports only a small subset of characters beyond latin. Many applications still only support ASCII.

Vector Graphic
A graphic file whereby an image is represented by continuous functions, as opposed to an image file which is represented by dots (pixels). Vector Graphics require less space and can generate much higher quality output, but are almost always from an electronically produce original (like a cad or drawing program).

World Wide Web Consortium. An international organization founded in 1994 by Tim Berners-Lee whose purpose is to develop and promote open internet standards.

eXtensible Markup Language. Used to apply structure to electronic documents. It is more narrowly focused than SGML (the underlying language), thereby making it easier to define standard documents types.

Extensible Stylesheet Language. A language for creating a style sheet for XML documents. XSL consists of two parts - XSL Transformations (XSLT) and an XML vocabulary for specifying formatting (XSL-FO).

Extensible Stylesheet Language Formatting Objects. Part of XSL, it is a set of formatting rules specifying how content should be presented.

Extensible Stylesheet Language Transformations. Part of XSL, it is a language for transforming XML documents into a other formats, most commonly HTML. The XSLT is itself an XML document and XSLT processing software is required to interpret the XSLT style sheet. Popular XSLT processor's are Michael Kay's Saxon, Apache's Xalan and Microsoft's MSXML.

To make an image appear larger (zoom in) or smaller (zoom out) by re-displaying the image at different resolutions. Higher resolutions will make the image appear larger easier to read.

