html2tex Version 2.0
This page describes version 2.0 of html2tex,
a program that converts a collection of related HTML files into a
single LaTeX file. (Current version is version
2.6.) Such a LaTeX file can be processed into a PostScript file.
To generate a single LaTeX file from a number of HTML files, the user
needs to give a skeleton LaTeX file and indicate where translated
versions of the HTML files should be included. The user also has to
specify at for each HTML file at which level (chapter, section,
subsection, ..) it should be included. Links between the different
HTML files are mapped to references in the LaTeX.
The program does extensive checking of links between the different
files. Because of this reason it can also be used as a link checking
program, by giving it a single HTML file, and specify that it should
scan all referenced pages in the local directory (and its
Links to excluded HTML files (and other URL's) can either be
reported as footnotes, or as a sorted bibliograph in the LaTeX file.
Error messages are reported on the standard output file, and in a
cross-reference file that is generated along side.
The HTML to LaTeX conversion program is implemented by the C
program html2tex.c, which needs to be compiled first. (The
program is developed with the popular gcc compiler, which
is freely available under the GNU public license.)
The program takes a single file as input. This should be a
skeleton LaTeX file without any extension (or, if the program is only
used for link checking, a HTML file with the extension .html)
It will generate an LaTeX file with the same name as the input file,
but with the extension .tex.
The input file
The input file should contain valid LaTeX commands. In the file
all lines starting with %html will be interpreted as
special lines by the conversion program. These are used to indicate
which HTML files should be included, and to set the various options.
The following special commands are recognized by the html2tex:
- %html fn.html level
Causes the the file fn.html to be included as
LaTeX at the given input line. The level should be
an integer to specify the indentation depth of the headers. A value
of 1 will map the H1 tag to the \section (or to
\chapter for the book document style).
- %html -r URL
Specifies the URL of the directory of the input file. This is
needed to detect if any given URL's in the HTML files map to local
HTML files. This command should be given before any HTML file is
included as LaTeX.
- %html -b
Causes LaTeX bibitems to be generated at the place of the input
file for all excluded HTML files (and other URL's) as LaTeX
bibitems. If this command is not given anywhere in the input file
(and also not the -b command line option), all external
URL's are given as footnotes.
- %html -m rel-URL comp-URL
To map relative URL's to complete URL's. Normally not needed, but
can be used to remap UNIX symbolic links.
Besides the LaTeX file that is generated, the program will also
generate a cross-reference file with the .ref extension,
that contains alot of usefull information.
If the program is given an input file with the extension .html,
it does not generate a LaTeX output file, but only analyse the file,
and the files it references (if the -s option is given).
The program recognizes the following command line options:
- -i : print info.
- -w : print warning (and info).
- -p : pendantic: does not report ommissions of HTML
open and close tags.
- -s : scan not include HTML files. The program will
scan all HTML files that can be reached from the included files, and
that are found in the directory (and its sub-directories) of the
- -r URL : the URL of the directory in which
the program is runned. This is needed to find out if any full URL
points to a local HTML file.
- -b : make bibliograph. If this option is not given,
references to external URL will appear in footnotes. The input file
should contain a line with %html -b.
- Output in .ref produces incorrect error messages for
- The use of <H6> as a means of getting small
bold font can produce strange results.
- No support for IMG, PRE, BLINK,
and more tags.
- . . . more . . .
November 11, 1995: Beta of Version 2.0
- Included support for more tags. Improved checking of HTML
July 6, 1995: Version 1.0
- Support of references between <dt> and <dd>.
If footnotes are used, a single reference between <dt>
and <dd> is processed correctly. If a bibliography
is used, there is no limit to the number of references used.
May 2, 1995:
- solved bug: program took first argument as output file name.
- references in <ADDRESS> are omitted during
March 3, 1995:
- solved bug in -s option. It now does a complete
- some extra parsing added. Still alot is missing. No compliance
with any standard.
- the program can now also except a single HTML file as input. It
does not generate any LaTeX output.
June 8, 1995:
- Use numbers in the generated \label and \ref
for the files now.
- Bug fixed in the URL's in the footnotes. These would print
# if not needed and the other way around.
Last update: November 22, 1995
HTML tools page |
Versions 2.1 and up |