RTF 2 HTML Converter Readme
Author
The RTF 2 HTML Converter, version 0.1 Alpha.
Copyright© Sergey A. Galin, 2001-2002.
Homepage: http://sergey-galin.chat.ru
End-User License
This is a free software, you can redistribute and use it in compiled
form at no charge. Modification of program executable and read-me files
is not allowed. This program provided "AS IS" without warranty
of any kind, whether express or implied. Use it at your own risk.
When RTF2HTML is distributed as a part of another application, the application
should contain a note in its About dialog, help printout (or provide the information
in another sensible way), notifying users that RTF2HTML was used as a part of the
application, and containg Author's name, copyright and home page URL,
as provided in this document above.
Note: if you want to buy program sources, please contact me.
Supported Platforms
The program is written in standard C++ and can be easily ported to almost
any modern operating system. Current sources may be compiled on most U**X systems
including Linux, under MS-DOS (in Borland C++) and
Windows (MS Visual C++ or MinGW, console application or DLL).
What Is It?
RTF 2 HTML Converter is an application to convert RTF documents to HTML :)
Its key features are:
- Very efficient and compact standard-compliant
Strict HTML 4 + CSS2 code (can be easily modified to output
XML/XHTML).
- Result HTML code few times more compact then output of MS Word's HTML
filter on the same RTF file.
- Qualitative conversion, output HTML looks very close to original
RTF (see below for list of unimplemented
features). The program even supports some features, unavaliable in most
other filters, like invisible font, capitalization, long spaces, hard page
breaks (for printing).
- Information tags, unsupported by HTML, are converted into HTML comments
(<!-- ... -->).
- High speed and reliability (e.g. buffer overflow protection).
Note: current version is Alpha, and was not throughfully tested,
so there are possible problems when parsing non-MS Word files.
- Portable to virtually any OS.
- Independence (no third-party code).
- Very low memory usage (compact code and effective parser).
- Well-written and compact source code means that application can be
easily supported improved.
- Can be used as a part of another application (e.g. when built as Windows DLL).
Forms Of Distribution & Installing
- Linux command-line executable (in ELF format).
- Unpack archive, in subdirectory 'linux' run:
$ chmod 755 rtf2html
$ cp rtf2html /usr/bin
- Your system must have the following shared libraries installed:
libc.so.6
ld-linux.so.2
- Windows DLL + headers and sample programs in Visual Basic and C++ (MinGW/Win32).
- You should know what to do with DLL, otherwise you just don't need it :)
Copy r2h.dll to your Windows system directory, e.g.
C:\Windows\System (Windows 9x) or C:\WINNT\System32
(Windows NT/2000/XP etc) and see sample programs.
- After installing DLL, you can use GUI program in Visual Basic (located in
win32dll\vb-demo directory of distribution package) to convert
files.
- Windows command-line application.
- Copy rtf2html.exe to any directory, specified in search
path, e.g. C:\Windows.
- DOS command-line executable.
- Copy rtf2html.exe to any directory, specified in search
path, e.g. C:\DOS.
You may wonder why do I compile it in so many versions. The answer is:
1) for self-training purposes; 2) just for fun; and 3) you never know what
will you need some day :)
Version 0.X Command-Line Arguments
Usage:
rtf2html [<input RTF file> <output HTML file> [<image output directory>]]
It's pretty self-explanatory. With no arguments, program prints build,
copyright and usage information. Without third argument, it outputs
images to
<output HTML file's directory>/<output HTML file name>.files/
(similarly to how Internet Explorer and Mozilla do when saving
HTML document with images).
Note: DOS version always requires image output
directory parameter, since filename.htm.files will never fit
in DOS filename format (8.3).
Partially Supported RTF Features
List support is limited. Lists are not converted to
real HTML lists (OL, UL, DL). But, in most cases, they lookl exactly as
they should, since most RTF editors (e.g. MS Word) add plain formatting
tags for each list element into RTF.
Some list markers (bullets) are not converted.
Bullet symbols which can be represented as ASCII characters are handled well.
Markers based on Wingdings font (often used in MS Word) work fine in
Internet Explorer but may work or not work in other browsers.
Unimplemented RTF Features (To Do)
- Better support for OpenOffice Writer's RTF and Linux word processors,
including buggy.
- Table closing bug with OpenOffice.
- Prevent code from stupid RTF writers to generate something like f3 c4 f3 c4 f3 c4.
- Converting BMP images to PNG. (Can be done for GCC versions via GD.)
- Converting vector images (WMF) to raster. Requires portable WMF rendering library.
- Always add HTML
<TITLE>
; and META
tags into header.
- Paragraph borders.
- Table cell size control.
- Table borders.
- Page Headers and Footers (not sure if needed at all).
- Hyper-References (Microsoft-specific tags and/or automatic URL detection).
- HREF/SRC URL-encoding (usually not needed).
- Header (<H1>, <H2>...) tags (header formatting works OK, but
HTML header tags are not used).
History
- Version 0.2:
- OpenOffice's tag for background color added.
- Text indent tag added.
- Fixed crashes with bad color indexing in OpenOffice RTF.
- UNICODE RTF support. UNICODE characters converted to HTML UNICODE representation,
not recoded. If there is also an ANSI representation of the symbol, ANSI used and
UNICODE ignored. There is also a new flag (r2hPreferUnicode) telling that converter
should use UNICODE even if ANSI version present.
- Version 0.1 Alpha.
- This was the first version released.
Unimplementable RTF Features
RTF tags listed below cannot be converted to HTML/CSS because according
features are inapplicable to continous media, cannot be handled upon conversion
or just not supported by HTML 4. Contents of these tags not output even into
comments.
vern000 edmins000 paperw paperh paperw000 paperh000 cols facingp gutter000 deftab000 |
*\nextfile *\template makeback defformat revision margmiror titlepg outl shad expnd000 |
ulw uld pgnx pgny pgndec pgnucrm pgnlcrm pgnucltr pgnlcltr pgnstart |
Section columns are also not supported by HTML (use tables instead).