|   | OptiVec
Version 3
 for C/C++ and for Pascal/Delphi | 
| OptiCode Dr. Martin Sander Software Development Steinachstr. 9A D-69198 Schriesheim Germany http://www.optivec.com e-mail: support@optivec.com or sales@optivec.com | Part I. A: Handbook | 
This HANDBOOK describes the basic principles of the OptiVec libraries and gives an overview over VectorLib, the first part of OptiVec. The new object-oriented interface, VecObj, is described in chapter 3. The other parts have their own descriptions in separate files, see MATRIX.HTM and CMATH.HTM.
Chapter 1.2 of this Handbook contains the licence terms for the Shareware version, Chapter 1.3 for the Registered version.
 
OptiCode™ and OptiVec™ are trademarks of Dr. Martin Sander Software Dev. Other brand and product names mentioned in this handbook for identification purposes are trademarks or registered trademarks of their respective holders.
 
| German-speaking users: Um die Kosten für das Herunterladen der Shareware-Version über das Internet für alle so gering wie möglich zu halten, enthält diese nur die englische Dokumentation. Sie finden die deutsche Beschreibung separat unter http://www.optivec.com/download/OVDOCD.ZIP. | 
| 1.1 Why Vectorized Programming Pays Off on the PC | |
| 1.2 License Terms for the Shareware Version | |
| 1.3 Registered Versions | 
| 1.3.1 Registered Versions: Ordering | |
| 1.3.2 License Terms for the Registered Versions | 
| 1.4 Getting Started | 
| 4.7 Analysis | |
| 4.8 Signal Processing: Fourier Transforms and Related Topics | |
| 4.9 Statistical Functions and Building Blocks | |
| 4.10 Data Fitting | |
| 4.11 Input and Output | |
| 4.12 Graphics | 
| 5.1 General Remarks | |
| 5.2 Integer Errors | |
| 5.3 Floating-Point Errors | 
| 5.3.1 C/C++ specific | |
| 5.3.2 Pascal/Delphi specific | |
| 5.3.3 Error Types (Both C/C++ and Pascal/Delphi) | |
| 5.3.4 Differences between Borland C++ 4.0 and earlier BC++ versions | 
| 5.4 The Treatment of Denormal Numbers | |
| 5.5 Advanced Error Handling: Writing Messages into a File | |
| 5.6 OptiVec Error Messages | 
| 6.1 General Problems | |
| 6.2 Problems with Windows 3.x? | |
| 6.3 Problems with 32-bit Windows? | |
| 6.4 Problems with Borland's 16-bit Linker? | 
OptiVec offers a powerful set of routines for numerically demanding applications, making the philosophy of vectorized programming available for C/C++ and Pascal/Delphi languages. It serves to overcome the limitations of loop management of conventional compilers – which proved to be one of the largest obstacles in the programmer's way towards efficient coding for scientific and data analysis applications.
In comparison to the old vector language APL, OptiVec has the advantage of being incorporated into the modern and versatile languages C/C++ and Pascal/Delphi. Recent versions of C++ and Fortran do already offer some sort of vector processing, by virtue of iterator classes using templates (C++) and field functions (Fortran90). Both of these, however, are basically a convenient means of letting the compiler write the loop for you and then compile it to the usual inefficient code. The same is true for most implementations of the popular BLAS (Basic Linear Algebra Subroutine) libraries. In comparison to these approaches, OptiVec is superior mainly with respect to execution speed  on the average by a factor of 2-3, in some cases even up to 8. The performance is no longer limited by the quality of your compiler, but rather by the real speed of the processor!
There is a certain overlap in the range of functions offered by OptiVec and by BLAS, LINPACK, and other libraries and source-code collections. However, the latter must be compiled, and, consequently, their performance is determined mainly by the quality of the compiler chosen. To the best of our knowledge, OptiVec, was the first product on the market offering a comprehensive vectorized-functions library realized in a true Assembler implementation.
The wide range of routines and functions covered by OptiVec, the high numerical efficiency and increased ease of programming make this package a powerful programming tool for scientific and data analysis applications, competing with (and often beating) many high-priced integrated systems, but imbedded into your favourite programming language.
This documentation describes the OptiVec implementations for
Vectorization has always been the magic formula for supercomputers with their multi-processor parallel architectures. On these architectures, one tries to spread the computational effort equally over the available processors, thus maximizing execution speed. The so-called "divide and conquer" algorithms break down more complicated numerical tasks into small loops over array elements. Sophisticated compilers then find out the most efficient way how to distribute the array elements among the processors. Many supercomputer compilers also come with a large set of pre-defined proprietary vector and matrix functions for many basic tasks. These vectorized functions offer the best way to achieve maximum throughput.
Obviously, the massive parallel processing of, say, a Cray is not possible on most PCs with their one and only CPU. (Even high-end workstations feature still modest 2 or 4-CPU configurations.) Consequently, at first sight, it might seem useless to apply the principle of vectorized programming to the PC. Actually, however, there are many vector-specific optimizations possible, even for computers with only one CPU. Most of these optimizations are not available to present compilers. Rather, one has to go down to the machine-code level. Hand-optimized, Assembler-written vector functions outperform compiled loops by a factor of two to three, on the average. This means that vectorization, properly done, is indeed worth the effort, also for PC programs.
Here are the most important optimization strategies, employed in OptiVec to boost the performance:
Prefetch of chunks of vector elements
Beginning with the Pentium III processor, Intel introduced the very useful feature of explicit memory prefetch. With these commands, it is possible to "tell" the processor to fetch data from memory sufficiently in advance, so that no time is waisted waiting for them when they are actually needed.
Cache control
The Pentium III processor offers the possibility to mark data as "temporal" (will be used again) or "non-temporal" (used only once), while they are fetched or stored. In OptiVec functions, it is assumed that input vectors (and matrices) will not be used again, whereas the output vectors are likely to become the input for some ensuing procedure. Consequently, the cache is bypassed while loading input data, but the output data are written into the cache. Of course, this approach breaks down if the vectors or matrices become too large to fit into the cache. For these cases, a large-vector version of the OptiVec libraries is available which bypasses the cache also while writing the output vectors. For simple arithmetic functions, up to 20% in speed are gained as compared to the small-and-medium-size version. On the other hand, as this large-vector version effectively switches the cache off, a drastic performance penalty (up to a factor of three or four!) will result, if it is used for smaller systems. For the same reason, you should carefully check if your problem could perhaps be split up into smaller vectors, before resorting to the large-vector version. This would allow to achieve the much higher performance resulting from efficient data caching.
Use of SIMD commands
You might wonder why this strategy is not listed first. The SSE or "Streaming Single-Instruction-Multiple-Data Extensions" of Pentium III and Pentium 4 provide explicit support for vectorized programming with floating-point data in float / single or double precision (the latter only for Pentium 4). At first sight, therefore, they should revolutionize vector programming. Given today's processor and data bus speeds, however, many of the simple arithmetic operations have become data transfer limited, and the use of SIMD commands does not make the large difference (with respect to well-written FPU code) which it could make otherwise. In most cases, the advantage of treating four floats in a single command melts down to a 20-30% increase in speed (which is not that bad, anyway!). For more complicated operations, on the other hand, SIMD commands often cannot be employed, either because conditional branches have to be taken for each vector element individually, or because the "extra" accuracy and range, available by traditional FPU commands (with their internal extended accuracy), allows to simplify algorithms so much that the FPU code is still faster. As a consequence, we use SIMD commands only where a real speed gain is possible without affecting the accuracy. 
Preload of floating-point constants
Floating-point constants, employed in the evaluation of mathematical functions, are loaded onto the floating-point number stack outside of the actual loop and stay as long as they are needed. This saves a large amount of loading/unloading operations which are necessary if a mathematical function is called for each element of a vector separately.
Full FPU stack usage
Where necessary, all eight coprocessor registers are employed. (For present compilers, it is already an excellent achievement to master the bookkeeping for only four coprocessor registers.)
Superscalar scheduling
By careful "pairing" of commands whose results do not depend upon each other, the two integer pipes and the two fadd/fmul units of the Pentium/PentiumXX are used as efficiently as possible.
In most instances, computers equipped with 386/387 or 486DX CPUs just will not care about these optimizations which they cannot profit from. In those cases, however, where the performance on these older CPUs suffers significantly from the Pentium-optimized scheduling, it is applied only in the "4" version of OptiVec (back-compatible to 486DX), but not in the "3" version (back-compatible to 386/387).
Loop-unrolling
Where optimum pairing of commands cannot be achieved for single elements, vectors are often processed in chunks of two, four, or even more elements. This allows to fully exploit the parallel-processing capabilities of the Pentium and its successors. Moreover, the relative amount of time spent for loop management is significantly reduced. In connection with data-prefetching, described above, the depth of the unrolled loops is most often adapted to the cache line size of 32 bytes.
Simplified addressing
The addressing of vector elements is still a major source of inefficiency with present compilers. Switching forth and back between input and output vectors, a large number of redundant addressing operations is performed. The strict (and easy!) definitions of all OptiVec functions allow to reduce these operations to a minimum.
Replacement of floating-point by integer commands
For any operations with floating-point numbers that can also be performed using integer commands (like copying, swapping, or comparing to preset values), the faster method is consistently employed.
Strict precision control
C compilers convert a float into a double  Borland Pascal/Delphi even into extended  before passing it to a mathematical function. This approach was useful at times when disk memory was too great a problem to include separate functions for each data type in the .LIB files, but it is simply inefficient on modern PCs. Consequently, no such implicit conversions are present in OptiVec routines. Here, a function of a float is calculated to float (i.e. single) precision, wasting no time for the calculation of more digits than necessary  which would be discarded anyway. Additionally, you can call V_setFPAccuracy( 1 ); to actively switch the FPU to single precision, if that is enough for a given application. Thereby, execution can be significantly sped up from Pentium CPUs on. For details and precautions, see V_setFPAccuracy.
 All-inline coding
All external function calls are eliminated from the inner loops of the vector processing. This saves the execution time necessary for the "call / ret" pairs and for loading the parameters onto the stack.
Cache-line matching of local variables
The Level-1 cache of the Pentium and its presently available successors is organized in lines of 32 bytes each. Many OptiVec functions need double-precision or extended-precision real local variables on the stack (mainly for integer/floating-point conversions or for range checking). Present compilers align the stack on 4-byte boundaries, which means there is a 1-in-4 chance that the 8 bytes of a double or the 10 bytes of an extended, stored on the stack, will cross a 32-byte boundary. This, in turn, would lead to a cache line-break penalty, deteriorating the performance. Consequently, those OptiVec functions where this is an issue, use special procedures to align their local variables on 8-byte (for doubles) or 16-byte boundaries (for extendeds).
Unprotected and reduced-range functions
OptiVec offers alternative forms of some mathematical functions, where you have the choice between the fully protected variant with error handling and another, unprotected variant without. In the case of the integer power functions, for example, the absence of error checking allows the unprotected versions to be vectorized much more efficiently. Similarly, the sine and cosine functions can be coded more efficiently for arguments that the user can guarantee to lie in the range -2p and +2p. In these special cases, the execution time may be reduced by up to 40%, depending on the hardware environment. This increased speed has always to be balanced against the increased risk, though: If any input element outside the valid range is encountered, the unprotected and reduced-range functions will crash without warning.
Multithread support
All the above being said about single-CPU PCs, there are high-end workstations and servers on the market, equipped with 2 or 4 PentiumXX chips. While multi-tasking and multi-threading is possible also on single-CPU PCs, multi-processor configurations allow the operating system to distribute threads among the available processors, doubling or quadrupling the overall performance. For that, any functions running in parallel must be prevented from interfering with each other through read/write operations on global variables. With very few exceptions (namely the plotting functions, which have to use global variables to store the current window and coordinate system settings), all other OptiVec functions may run in parallel. OptiVec functions do not initiate threads themselves, though, as the overhead involved in multi-threading would significantly affect the performance on single-CPU machines. If you have a multi-CPU computer, you have to explicitly launch the threads you wish to run in parallel. For example, one thread might take the lower half of the vector(s) you wish to process, while a second thread takes the upper half  until a point is reached, where both must be combined.
Be extremely careful with multi-threading, if you are using the P6 version of OptiVec: The earlier releases of 32-bit Windows do not save the XMM registers (employed in the SIMD commands) during task switches. 
This is the Shareware version of OptiVec ("SOFTWARE").
It may be used under the following licence terms:
Purchasing the full (registered) version gives you the right to use it on as many computers at a time as the number of units you bought.
The right to distribute applications employing functions of OptiVec is included in the commercial-version licence. No run-time licence are needed for your customers! Corporate site and world-wide licences are available upon request.
The full versions (both the commercial and the educational editions) of OptiVec
SWREG:
When ordering online through SWREG, please use the product-specific links below:
 OptiVec for C/C++
 OptiVec for Pascal/Delphi
Please choose the exact version and delivery options in the simple pulldown menu on the respective page.
ShareIt:
When ordering online through ShareIt, please use the product-specific links below:
 OptiVec for Borland C/C++ (English)
OptiVec für Borland C/C++ (Deutsch)
CMATH for Borland C/C++
 OptiVec for Microsoft Visual C++
CMATH for Microsoft Visual C++
 OptiVec for Borland Delphi
CMATH for Borland Delphi
 OptiVec for Borland (Turbo) Pascal
CMATH for Borland (Turbo) Pascal
You may also order by e-mail to register@shareit.com.
US customers can also contact ShareIt! by telephone 1-724-850-8186 or FAX 1-724-850-8189 (only for orders, please).
Note the program No.: 
| commercial | educational | |
| OptiVec for Borland C/C++ (English) | 101557 | 102654 | 
| OptiVec für Borland C/C++ (Deutsch) | 101556 | 149813 | 
| CMATH for Borland C/C++ | 101353 | 102655 | 
| OptiVec for Microsoft Visual C++ | 103421 | 149811 | 
| CMATH for Microsoft Visual C++ | 103422 | 103441 | 
| OptiVec for Borland (Turbo) Pascal | 103423 | |
| CMATH for Borland (Turbo) Pascal | 103424 | |
| OptiVec for Borland Delphi | 103443 | 103859 | 
| CMATH for Borland Delphi | 103844 | 103860 | 
If you have a European VAT ID, or if you order from outside the European Union, you are exempt from German VAT, and it will be deduced from your bill, but you may have to pay your local VAT and/or import duties according to local laws.
Please send a print-out of this order form to
OptiCode  Dr. Martin Sander Software Dev.
Steinachstr. 9A
D-69198 Schriesheim
Germany
FAX +49 - 6203 - 601 733
For any other questions related to ordering OptiVec, please contact us at: sales@optivec.com
This is a single copy license for OptiVec ("SOFTWARE"), granted by OptiCode  Dr. Martin Sander Software Development ("OptiCode").
The SOFTWARE in this package is licensed to you as the user. It is not sold. The term "user" means a programmer who links binary code of this SOFTWARE into his own applications. Those people using, in turn, his applications without the need of installing this SOFTWARE themselves, do not need any runtime license for the SOFTWARE. The right to distribute applications containing code of this SOFTWARE is included in the license fee for the commercial version.
Once you have paid the required license fee, you may use the SOFTWARE for as long as you like, provided you do not violate the copyright and if you observe the following rules:
| Platform | Memory Model | Required Processor | ||||
| 
 | ||||||
| DOS | TINY | 
 | ||||
| DOS | SMALL | 
 | ||||
| DOS | MEDIUM | 
 | ||||
| DOS | COMPACT | 
 | ||||
| DOS | LARGE | 
 | ||||
| DOS | HUGE | 
 | ||||
| Windows | SMALL | 
 | ||||
| Windows | MEDIUM | 
 | ||||
| Windows | COMPACT | 
 | ||||
| Windows | LARGE | 
 | ||||
| 32-bit Windows static run-time library dynamic run-time library | (FLAT) GUI or Console | 
 | ||||
| 32-bit Windows static run-time library dynamic run-time library | (FLAT) Large-Vector-Version | 
 | 
| Target | Configuration | 486/Pentium library | Pentium III | Pentium III, Large-vector version | 
| single-thread | debug | OVVCSD.LIB | OVVCSD6.LIB | OVVCLSD6.LIB | 
| single-thread | release | OVVCSR.LIB | OVVCSR6.LIB | OVVCLSR6.LIB | 
| multi-thread | debug | OVVCMTD.LIB | OVVCMTD6.LIB | OVVCLTD6.LIB | 
| multi-thread | release | OVVCMTR.LIB | OVVCMTR6.LIB | OVVCLTR6.LIB | 
| multi-thread DLL | debug | OVVCMDD.LIB | OVVCMDD6.LIB | OVVCLDD6.LIB | 
| multi-thread DLL | release | OVVCMDR.LIB | OVVCMDR6.LIB | OVVCLDR6.LIB | 
Starting already with the 8086/8087 processor pair, the Intel processors are able to process integer numbers of up to 64 bits (8 bytes). We call the 64-bit type "quad" (for "quadword integer"). It is not fully supported by Borland C++. Therefore, floating-point numbers (preferably long doubles with their 64-bit mantissa) have to be used as intermediates. The necessary interface functions are setquad, quadtod and _quadtold.
The type quad is always signed. There is not anything like an "unsigned quad".  
The data type extended, which is familiar to Pascal/Delphi programmers, is defined as a synonym for "long double" in OptiVec for C/C++. As Visual C++ does not support 80-bit reals, we define "extended" as "double" in the OptiVec versions for that compiler.
The reason for the choice of the name "extended" is that all OptiVec routines shall have identical names in C/C++ and Pascal/Delphi languages. Since the function prefixes are derived from the data types of the processed vectors (see below), this necessitates the definition of alias names for some data types denoted differently in the various languages. While the letter "L" (which could possibly stand for "long double") is already overcrowded by the data types long int and unsigned long, the letter "E" is unique to the data type extended and therefore used in the prefixes for vectors and functions of long double precision. This way, the letters defining the real- number data types are in alphabetical proximity: "D" for double, "E" for extended, and "F" for float. Maybe the future will bring high-precision 128-bit and 256-bit real numbers which could find their place in this series as "G" for "great" and "H" for "hyper".
For historical reasons, the various integer data types have a somewhat confusing nomenclature in Turbo Pascal. The WinProcs unit of Delphi offers already a more systematic nomenclature. In order to make the derived function prefixes compatible with the C/C++ versions of OptiVec, we define those synonyms present in Delphi (and a few more) also for 16-bit Pascal, as described in the following table:
| type | Pascal name | synonym | derived prefix | 
| 8 bit signed | ShortInt | ByteInt | VBI_ | 
| 8 bit unsigned | Byte | UByte | VUB_ | 
| 16 bit signed | SmallInt | VSI_ | |
| 16 bit unsigned | Word | USmall | VUS_ | 
| 32 bit signed | LongInt | VLI_ | |
| 32 bit unsigned | ULong | VUL_ | |
| 64 bit signed | Comp | QuadInt | VQI_ | 
| 16/32 bit signed | Integer | VI_ | |
| 16/32 bit unsigned | Cardinal | UInt | VU_ | 
Turbo Pascal only:
The unsigned 32-bit integer type ULong is treated as such only  inside OptiVec routines. Otherwise, all ULong variables will be treated  as LongInt. This means that the most significant bit will then be interpreted  as the sign bit, which may lead to errors, unless proper care is taken to  avoid mistakes. Delphi 4+ always treats ULong as such.
QuadInts are always signed. As yet, there is nothing like a "UQuad".
Delphi:
As Delphi supports 64-bit integers, QuadInt is not defined as Comp, but is used as a synonym for the data type Int64.
To have a Boolean data type available which is of the same size as Integer, we define the type IntBool. It is equivalent to WordBool in Pascal, but LongBool in Delphi. You will see the IntBool type as the return value of many mathematical VectorLib functions.
Most compilers and available libraries implement complex functions very inefficiently and inaccurately. (Just writing down the textbook formula for a complex function, like it is usually done, works fine only for a very limited range of arguments!)
Our aims are
VectorLib itself contains the necessary initialization functions of complex numbers and all vectorized forms of complex math functions. If you are using only these, you need not explicitly include CMATH. In this case, the following complex data types are defined in <VecLib.h> for C/C++:
 typedef struct { float Re, Im; } fComplex;
typedef struct { double Re, Im; } dComplex;
typedef struct { extended Re, Im; } eComplex;
typedef struct { float Mag, Arg; } fPolar;
typedef struct { double Mag, Arg; } dPolar;
typedef struct { extended Mag, Arg; } ePolar;
(the data type extended is used as a synonym for long double, see above.)
The corresponding definitions for Pascal/Delphi are contained in the unit VecLib:
type fComplex = record Re, Im: Float; end;
type dComplex = record Re, Im: Double; end;
type eComplex = record Re, Im: Extended; end;
type fPolar = record Mag, Arg: Float; end;
type dPolar = record Mag, Arg: Double; end;
type ePolar = record Mag, Arg: Extended; end;
If, for example, a complex number z is declared as "fComplex z;", the real and imaginary parts of z are available as z.Re and z.Im, resp. Complex numbers are initialized either by setting the constituent parts separately to the desired value, e.g.,
	z.Re = 3.0; z.Im = 5.7;
p.Mag = 4.0; p.Arg = 0.7;
(of course, the assignment operator is := in Pascal/Delphi).
Alternatively, the same initialization can be accomplished by the
functions fcplx or fpolr:
C/C++:
z = fcplx( 3.0, 5.7 );
p = fpolr( 4.0, 0.7 );
Pascal/Delphi:
fcplx( z, 3.0, 5.7 );
fpolr( p, 3.0, 5.7 );
For double-precision complex numbers, use dcplx and dpolr, for extended-precision complex numbers, use ecplx and epolr.
Pointers to arrays or vectors of complex numbers are declared using the data types cfVector, cdVector, and ceVector (for cartesian complex) and pfVector, pdVector, and peVector (for polar complex) described below.
The basis of all VectorLib routines is formed by the various vector data types given below and declared in <VecLib.h> or the unit VecLib.  In contrast to the fixed-size static arrays, the VectorLib types use dynamic memory allocation and allow for varying sizes. Because of this increased flexibility, we recommend that you predominantly use the latter. Here they are:
 
| C/C++ 
 | Pascal/Delphi 
 | 
| Note: in connection with Windows programs, often the letter "l" or "L" is used to denote "long int" variables. In order to prevent confusion, however, the data type "long int" is signalled by "li" or "LI", and the data type "unsigned long" is signalled by "ul" or "UL". Conflicts with prefixes for "long double" vectors are avoided by deriving these from the alias name "extended" and using "e", "ce", "E", and "CE", as described above and in the following. | 
Pascal/Delphi specific:
The elements of OptiVec vectors cannot be accessed with the [] operator here. Instead, the the type-specific functions VF_element (returns the value of the
desired vector element, but cannot be used to overwrite the element) and VF_Pelement (returns the pointer to a vector element) have to be used. The latter function allows to set vector values, e.g.
VF_Pelement( X, 3 )^ := 5.7;
As in C/C++, you may mix the OptiVec vector types with the static arrays of classic Pascal style. Static arrays have to be passed to OptiVec functions with the "address of" operator. Here, the above example reads:
a: array[0..99] of Single; (* classic static array *)
b: fVector;(* VectorLib vector *)
b := VF_vector(100); 
VF_equ1( @a, 100 ); (* set first 100 elements of a = 1.0 *)
VF_equC( b, 100, 3.7 ); (* set first 100 elements of b = 3.7 *)
Since Version 4, Delphi has also offered dynamically-allocated arrays, which may also be used as arguments for OptiVec functions. The following table compares the pointer-based vectors of VectorLib with the array types of Pascal/Delphi:
 
| OptiVec vectors | Pascal/Delphi static/dynamic arrays | |
| alignment of first element | on 32-byte boundary for optimum cache-line matching | 2 or 4-byte boundary (may cause line-break penalty for double, QuadInt) | 
| alignment of following elements | packed (i.e., no dummy bytes between elements, even for 10- and 20-bit types | arrays must be declared as "packed" for Delphi to be compatible with OptiVec | 
| index range checking | none | automatic with built-in size information | 
| dynamic allocation | function VF_vector, VF_vector0 | procedure SetLength (Delphi only) | 
| initialization with 0 | optional by calling VF_vector0 | always (Delphi only) | 
| de-allocation | function V_free, V_freeAll | procedure Finalize (Delphi only) | 
| reading single elements | function VF_element: a := VF_element(X,5); Delphi only: typecast into array also possible: a := fArray(X)[5]; | index in brackets: a := X[5]; | 
| setting single elements | function VF_Pelement: VF_Pelement(X,5)^ := a; Delphi only: typecast into array also possible: fArray(X)[5] := a; | index in brackets: X[5] := a; | 
| passing to OptiVec function | directly: VF_equ1( X, sz ); | address-of operator: VF_equ1( @X, sz ); | 
| passing sub-vector to OptiVec function | function VF_Pelement: VF_equC( VF_Pelement(X,10), sz-10, 3.7); | address-of operator: VF_equC( @X[10], sz-10, 3.7 ); | 
Any of the algebraic and mathematical functions included in this library exists in one variant for each floating-point format. The data type of all floating-point vector elements, parameters, and of the return value is always the same within one function. The data type is signalled by the second letter of the prefix: VF_ denotes the variant of a function that uses exclusively the data type float (Pascal: Single), VD_ stands for the data type double, and VE_ for the data type extended, i.e., long double. (The first letter, "V", stands for "Vector function", of course.) VF_ functions thus work on arrays declared as fVector, use parameters of the type float, and, if there is any floating-point return value, this will also be of the type float. Except for a very few cases, there are no mixed-type functions (that would, e.g., work on vectors of type fVector, use parameters of type double and return a value of type long double).
One partial exception from this rule comes from the fact that floating-point return values of OptiVec functions are returned in extended precision on the number stack. Therefore, you may assign the return value of a function to a variable of another data type. For example, the product of all elements of a vector may easily overflow, and it is a good idea to define eProd as an extended (i.e., as a long double), before writing the line
 eProd = VF_prod( X, size ); 
Borland C++ only:
To use this possibility, you must switch the option "Fast floating point" on (in the IDE in the menu "Options/Compiler/Advanced Code Generation", or the command-line compiler option "-ff"),
For the description of the functions in the Alphabetical Reference, generally only the VF_ version is described and its syntax explicitly given. The versions for the data types double and long double are exactly analogous to the VF_ variant. You have only to replace the prefix VF_ by VD_ (or VE_) and to use "dVector" and "double" (or "eVector" and "extended", resp.) wherever you find "fVector" and "float" in the VF_ version.
Return values of the complex data types are not possible in Pascal/Delphi. Therefore, the syntax of those functions returning a complex number is different in C/C++ and Pascal/Delphi.
In contrast to the carelessness with which complex mathematical functions are often treated (see above), the complex functions of OptiVec are designed in such a way as to achieve full accuracy over the complete range of input/output values possible with the respective data type.
In order to perform non-vectorized complex operations with the same level of speed and reliability as the vectorized ones, use CMATH. See CMATH.HTM for details.
Don't be afraid of so many data types. It is one of the advantages of modern computer languages to have them, and it is one of the disadvantages, at the same time, that a programming style is supported which mixes all the data types until it is no longer clear "who is who". In all normal cases, the VI_, VLI_, and VU_ functions should be sufficient; but keep in mind that there are more available in case you need them.
If present, the vectorized integer functions are always described together with their floating-point analogues. To obtain, for example, the VI_ version, vectors of type iVector have to be substituted for those of type fVector which are demanded by the VF_ version. In the same way, the other versions are obtained by changing "float" and "fVector" into the desired data type.
MS Visual C++ and Borland C++ Builder (but not previous Borland C++ versions): Programmers should put the directive
"using namespace OptiVec;"
either in the body of any function that usestVecObj, or in the global declaration part of the program.  Placing the directive in the function body is safer, avoiding potential namespace conflicts in other functions.
The vector objects are defined as classes vector<T>, encapsulating the vector address (pointer) and size.
For easier use, these classes got alias names fVecObj, dVecObj, and so on, with the data-type signalled by the first one or two letters of the class name, in the same way as the vector types described above.
All functions defined in VectorLib for a specific vector data-type are contained as member functions in the respective tVecObj class.
The constructors are available in four forms:
vector(); // no memory allocated, size set to 0
vector( ui size ); // vector of size elements allocated
vector( ui size, T fill ); // as before, but initialized with value "fill"
vector( vector<T> init ); // creates a copy of the vector "init"
For all vector classes, the arithmetic operators
+    -    *    /    +=    -=    *=    /=
are defined, with the exception of the polar-complex vector classes, where only multiplications and divisions, but no additions or subtractions are supported. These operators are the only cases in which you can directly assign the result of a calculation to a vector object, like
fVecObj Z = X + Y; or
fVecObj Z = X * 3.5;
Note, however, that the C++ class syntax rules do not allow a very efficient implementation of these operators. The arithmetic member functions are much faster. If speed is an issue, use
fVecObj Z.addV( X, Y ); or
fVecObj Z.mulC( X, 3.5 );
 instead of the operator syntax. 
The operator * refers to element-wise multiplication, not to the scalar product of two vectors.
All other arithmetic and math functions can only be called as member functions of the respective output vector as, for example, Y.exp(X). Although it would certainly be more logical to have these functions defined in such a way that you could write "Y = exp(X)" instead, the member-function syntax was chosen for efficiency considerations: The only way to implement the second variant is to store the result of the exponential function of X first in a temporary vector, which is then copied into Y, thus considerably increasing the work-load and memory demands.
While most VecObj functions are member functions of the output vector, there is a number of functions which do not have an output vector. In these cases, the functions are member functions of an input vector.
Example: s = X.mean();.
If you ever need to process a VecObj vector in a "classic" plain-C VectorLib function (for example, to process only some part of it), you may use the member functions 
getSize() to retrieve its size, 
getVector() for the pointer (of data type tVector, where "t" stands for the usual type prefix), and
Pelement( n ) for a pointer to the to the n'th element.
The syntax of all VecObj functions is described in FUNCREF.HTM together with the basic VectorLib functions for which tVecObj serves as a wrapper.
The following functions manage dynamically allocated vectors:
| VF_vector | memory allocation for one vector | 
| VF_vector0 | memory allocation and initialization of all elements with 0 | 
| V_free | free one vector | 
| V_nfree | free n vectors (only for C, not for Pascal) | 
| V_freeAll | free all existing vectors | 
| C/C++: X = VF_vector( 3*size); Z = (Y = X+size) + size; | Pascal/Delphi: X := VF_vector( 3*size ); Y := VF_Pelement( X, size ); Z := VF_Pelement( Y, size ); | 
The following functions are used to initialize or re-initialize vectors that have already been created:
| VF_equ0 | set all elements of a vector equal to 0 | 
| VF_equ1 | set all elements equal to 1 | 
| VF_equm1 | set all elements equal to -1 | 
| VF_equC | set all elements equal to a constant C | 
| VF_equV | make one vector a copy of another | 
| VFx_equV | "expanded" version of the equality operation: Yi = a * Xi + b | 
| VF_ramp | "ramp": Xi = a * i + b. | 
| VF_random | high-quality random numbers | 
| VF_noise | white noise | 
| VF_comb | "comb": equals a constant C at equidistant points, elsewhere 0 | 
| VF_Hanning | Hanning window | 
| VF_Parzen | Parzen window | 
| VF_Welch | Welch window | 
| VF_ReImtoC | merge two vectors, Re and Im, into one cartesian complex vector | 
| VF_RetoC | overwrite the real part of a cartesian complex vector | 
| VF_ImtoC | overwrite the imaginary part of a cartesian complex vector | 
| VF_PolartoC | construct a cartesian complex vector from polar coordinates, entered as separate vectors Mag and Arg | 
| VF_MagArgtoP | merge two vectors, Mag and Arg into one polar complex vector | 
| VF_MagArgtoPrincipal | merge two vectors, Mag and Arg into one polar complex vector, reducing the Arg range to the principal value, -p < Arg <= +p | 
| VF_MagtoP | overwrite the Mag part of a polar complex vector | 
| VF_ArgtoP | overwrite the Arg part of a polar complex vector | 
| VF_ReImtoP | construct a polar complex vector from cartesian coordinates, entered as separate vectors Re and Im | 
| VF_rev | reverse the element ordering | 
| VF_reflect | set the upper half of a vector equal to the reversed lower half | 
| VF_rotate | rotate the ordering of the elements | 
| VF_insert | insert one element into a vector | 
| VF_delete | delete one element from a vector | 
| VF_sort | fast sorting of the elements (ascending or descending order) | 
| VF_sortind | sorting of an index array associated with a vector | 
| VF_subvector | extract a subvector from a (normally larger) vector, using a constant sampling interval. | 
| VF_indpick | fills a vector with elements "picked" from another vector according to their indices. | 
| VF_indput | distribute the elements of one vector to the sites of another vector specified by their indices. | 
| 
 | 
 | 
| VF_searchC | search for the element of a vector that is closest to a pre-set value C (closest, closest larger-or-equal, or closest smaller-or-equal value, depending on a parameter "mode") | 
| VF_searchV | the same, but for a whole array of pre-set values | 
| VF_polyinterpol | polynomial interpolation | 
| VF_ratinterpol | rational interpolation | 
| VF_splineinterpol | cubic spline interpolation | 
| V_FtoD | float to double | 
| V_CDtoCF | complex<double> to complex<float> (with overflow protection) | 
| V_PFtoPE | polar<float> to polar<extended> | 
| VF_PtoC | polar<float> to complex<float> | 
| V_ItoLI | int to long int | 
| V_ULtoUS | unsigned long to unsigned short | 
| V_ItoU | signed int to unsigned int. Interconversions between signed and unsigned types can only be performed on the same level of accuracy. Functions like "V_UStoLI" do not exist. | 
| V_ItoF | int to float | 
| VF_roundtoI | round to the closest integer | 
| VF_choptoI | round by neglecting ("chopping off") the fractional part | 
| VF_trunctoI | the same as VF_choptoI | 
| VF_ceiltoI | round to the next greater-or-equal integer | 
| VF_floortoI | round to the next smaller-or-equal integer | 
| VF_ReImtoC | form a cartesian complex vector out of its real and imaginary parts | 
| VF_RetoC | overwrite the real part | 
| VF_ImtoC | overwrite the imaginary part | 
| VF_CtoReIm | extract the real and imaginary parts | 
| VF_CtoRe | extract the real part | 
| VF_CtoIm | extract the imaginary part | 
| VF_PolartoC | form a cartesian complex vector out of polar coordinates, entered as separate vectors Mag and Arg | 
| VF_CtoPolar | transform cartesian complex into polar coordinates, returned in the separate vectors Mag and Arg | 
| VF_CtoAbs | absolute value (magnitude of the pointer in the complex plane) | 
| VF_CtoArg | argument (angle of the pointer in the complex plane) | 
| VF_CtoNorm | norm (here defined as the square of the absolute value) | 
| VCF_normtoC | norm, stored as a cartesian complex vector (with all imaginary parts equal to 0) | 
| VF_MagArgtoP | merge two vectors, Mag and Arg into one polar complex vector | 
| VF_MagArgtoPrincipal | merge two vectors, Mag and Arg into one polar complex vector, reducing the Arg range to the principal value, -p < Arg <= +p | 
| VF_MagtoP | overwrite the Mag part of a polar complex vector | 
| VF_ArgtoP | overwrite the Arg part of a polar complex vector | 
| VF_PtoMagArg | extract the Mag and Arg parts | 
| VF_PtoMag | extract the Mag part | 
| VF_PtoArg | extract the Arg part | 
| VF_PtoNorm | norm (here defined as the square of the magnitude) | 
| VF_ReImtoP | construct a polar complex vector from cartesian coordinates, entered as separate vectors Re and Im | 
| VF_PtoReIm | transform a polar complex vector into two real vectors, representing the corresponding cartesian coordinates Re and Im | 
| VF_PtoRe | calculate the real part of the polar complex input numbers | 
| VF_PtoIm | calculate the imaginary part of the polar complex input numbers | 
| VPF_principal | calculate the principal value. You might recall that each complex number has an infinite number of representations in polar coordinates, with the angles differing by an integer multiple of 2 p. The representation with -p < Arg <= +p is called the principal value. | 
In addition to this error handling "by element", the return values of the VectorLib math functions show if all elements have been processed successfully. In C/C++, the return value is of the data-type int, in Pascal/Delphi, it is IntBool. (We do not yet use the newly introduced data type bool for this return value in C/C++, in order to make VectorLib compatible also with older versions of C compilers.) If a math function worked error-free, the return value is FALSE (0), otherwise it is TRUE (any non-zero number).
| VF_round | round to the closest integer | 
| VF_chop | round by neglecting ("chopping off") the fractional part | 
| VF_trunc | the same as VF_chop | 
| VF_ceil | round to the next greater-or-equal integer | 
| VF_floor | round to the next smaller-or-equal integer | 
| VF_roundtoI | round to the closest integer | 
| VF_choptoI | round by neglecting ("chopping off") the fractional part | 
| VF_trunctoI | the same as VF_choptoI | 
| VF_ceiltoI | round to the next greater-or-equal integer | 
| VF_floortoI | round to the next smaller-or-equal integer | 
| VF_choptoSI | neglect the fractional part and store as short int / SmallInt | 
| VF_ceiltoLI | round up and store as long int / LongInt | 
| VF_floortoQI | round downwards and store as quadruple integer, quad / QuadInt | 
| VF_roundtoU | round and store as unsigned / UInt | 
| VF_ceiltoUS | round up and store as unsigned short / USmall | 
| VD_choptoUL | neglect the fractional part and store as unsigned long / ULong | 
| VF_cmp0 | compare to 0 | 
| VD_cmpC | compare to a constant C | 
| VE_cmpV | compare corresponding vector elements | 
| VF_cmp_eq0 | check if equal to 0 | 
| VD_cmp_gtC | check if greter than a constant C | 
| VE_cmp_leV | check if less than or equal to corresponding vector element | 
| VF_cmp_neCind | store indices of elements not equal to a constant C | 
| VD_cmp_lt0ind | store indices of elements less than 0 | 
| VE_cmp_geVind | store indices of elements greater than or equal to corresponding vector elements | 
| VF_cmp_inclrange0C | check if 0 <= x <= C  (C positive) or 0 >= x >= C (C negative) | 
| VF_cmp_exclrange0C | check if 0 < x < C  (C positive) or 0 > x > C (C negative) | 
| VF_cmp_inclrangeCC | check if CLo <= x <= CHi | 
| VF_cmp_exclrangeCC | check if CLo < x < CHi | 
| VF_cmp_inclrange0Cind | store indices of elements 0 <= x <= C  (C positive) or 0 >= x > C (C negative) | 
| VF_cmp_exclrange0Cind | store indices of elements 0 < x < C  (C positive) or 0 > x > C (C negative) | 
| VF_cmp_inclrangeCCind | store indices of elements CLo <= x <= CHi | 
| VF_cmp_exclrangeCCind | store indices of elements CLo < x < CHi | 
| VF_iselementC | returns TRUE, if C is an element of a vector | 
| VF_iselementV | checks for each element of a vector if it is contained in a table | 
| VI_shl | shift the bits to the left | 
| VI_shr | shift the bits to the right | 
| VI_or | apply a bit mask in an OR operation | 
| VI_xor | apply a bit mask in an XOR operation | 
| VI_not | invert all bits | 
| VF_neg | Yi = - Xi | 
| VF_abs | Yi = | Xi | | 
| VCF_conj | Yi.Re = Xi.Re; Yi.Im = -(Xi.Re) | 
| VF_inv | Yi = 1.0 / Xi | 
| 
 | 
 | 
| 
 | 
 | 
All functions in the right column of the above two sections also exist in an expanded form (with the prefix VFx_...) in which the function is not evaluated for Xi itself, but for the expression 
(a * Xi + b), e.g. 
| VFx_addV | Zi = (a * Xi + b) + Yi | 
| VFx_divrV | Zi = Yi / (a * Xi + b) | 
| VFs_addV | Zi = C * (Xi + Yi) | 
| VFs_subV | Zi = C * (Xi - Yi) | 
| VFs_mulV | Zi = C * (Xi * Yi) | 
| VFs_divV | Zi = C * (Xi / Yi) | 
| VF_maxC | set Yi equal to Xi or C, whichever is greater | 
| VF_minC | choose the smaller of Xi and C | 
| VF_maxV | set Zi equal to Xi or Yi, whichever is greater | 
| VF_minV | set Zi equal to Xi or Yi, whichever is smaller | 
| VF_limit | limit the range of values | 
| VF_flush0 | set all values to zero which are below a preset threshold | 
| VF_intfrac | split into integer and fractional parts | 
| VF_mantexp | split into mantissa and exponent | 
| VF_accV | fVector Y += fVector X | 
| VD_accVF | dVector Y += fVector X | 
| VF_accVI | fVector Y += iVector X | 
| VQI_accVLI | qiVector Y += liVector X | 
| VF_acc2V | fVector Y += fVector X1 + fVector X2 | 
| VD_acc2VF | dVector Y += fVector X1 + fVector X2 | 
| VF_scalprod | scalar product of two vectors | 
| VF_xprod | cross-product (or vector product) of two vectors | 
| VF_Euclid | Euclidean norm | 
If, on the other hand, two real input vectors X and Y, or one complex input vector XY, define the coordinates of several points in a planar coordinate system, there is a function to rotate these coordinates:
| VF_rotateCoordinates | counter-clockwise rotation of the input coordinates specified by the vectors X and Y; the result is returned in the vectors Xrot and Yrot. | 
| VCF_rotateCoordinates | counter-clockwise rotation of the input coordinates specified by the cartesian complex vector XY; the result is returned in the vector XYrot. | 
| normal version | unprotected version | operation | 
| VF_square | VFu_square | square | 
| VF_cubic | VFu_cubic | cubic | 
| VF_quartic | VFu_quartic | quartic (fourth power) | 
| VF_ipow | VFu_ipow | arbitrary integer powers | 
| VF_pow | n.a. | fractional powers | 
| VF_powexp | n.a. | fractional powers, multiplied by exponential function: xrexp(x) | 
| VF_poly | VFu_poly | polynomial | 
| VF_pow10 | fractional powers of 10 | 
| VF_ipow10 | integer powers of 10 (stored as floating-point numbers) | 
| VF_pow2 | fractional powers of 2 | 
| VF_ipow2 | integer powers of 2 (stored as floating-point numbers) | 
| VF_exp | exponential function | 
| VF_exp10 | exponential function to the basis 10 (identical to VF_pow10) | 
| VF_exp2 | exponential function to the basis 2 (identical to VF_pow2) | 
| VF_expArbBase | exponential function of an arbitrary base | 
| VF_sqrt | square-root (which corresponds to a power of 0.5) | 
The complex-number equivalents are available as well, both for cartesian and polar coordinates. Additionally, two special cases are covered:
 
| VCF_powReExpo | real, fractional powers of complex numbers | 
| VCF_exptoP | takes a cartesian input vector, returning its exponential function in polar coordinates. | 
| VF_exp | exponential function | |
| VF_expc | complementary exponential function Yi = 1 - exp[Xi] | |
| VF_expmx2 | exponential function of the negative square of the argument, Yi = exp( -Xi² ). This is a bell-shaped function. | |
| VF_Gauss | Gaussian distribution function | |
| VF_erf | Error function (Integral over the Gaussian distribution) | |
| VF_erfc | complementary error function, 1 - erf( Xi ) | |
| VF_powexp | n.a. | fractional powers, multiplied by exponential function, Xirexp(Xi) | 
| VF_sinh | hyperbolic sine | 
| VF_cosh | hyperbolic cosine | 
| VF_tanh | hyperbolic tangent | 
| VF_coth | hyperbolic cotangent | 
| VF_sech | hyperbolic secant | 
| VF_cosech | hyperbolic cosecant | 
| VF_sech2 | square of the hyperbolic secant | 
| VF_log10 | decadic logarithm (to the basis 10) | 
| VF_log | natural logarithm (to the basis e) | 
| VF_ln | synonym for VF_log | 
| VF_log2 | binary logarithm (to the basis 2) | 
| VPF_log10toC | decadic logarithm (to the basis 10) | 
| VPF_logtoC | natural logarithm (to the basis e) | 
| VPF_lntoC | synonym for VPF_logtoC | 
| VPF_log2toC | binary logarithm (to the basis 2) | 
| VF_OD | OD = log10( X0/X ) for fVector as input and as output | 
| VF_ODwDark | OD = log10( (X0-X0Dark) / (X-XDark) ) for fVector as input and as output | 
| VUS_ODtoF | OD, calculated in float precision for usVector input | 
| VUL_ODtoD | OD, calculated in double precision for ulVector input | 
| VQI_ODtoEwDark | OD with dark-current correction, calculated in extended precision for qiVector input | 
| VF_sin | sine | 
| VFr_sin | extra-fast "reduced-range" sine function for -2p <= Xi <= +2p | 
| VF_cos | cosine | 
| VFr_cos | cosine for -2p <= Xi <= +2p | 
| VF_sincos | sine and cosine at once | 
| VFr_sincos | sine and cosine for -2p <= Xi <= +2p | 
| VF_tan | tangent | 
| VF_cot | cotangent | 
| VF_sec | secant | 
| VF_cosec | cosecant | 
| VF_sin2 | sine² | 
| VFr_sin2 | sine² for -2p <= Xi <= +2p | 
| VF_cos2 | cosine² | 
| VFr_cos2 | cosine² for -2p <= Xi <= +2p | 
| VF_sincos2 | sine² and cosine² at once | 
| VFr_sincos2 | sine² and cosine² for -2p <= Xi <= +2p | 
| VF_tan2 | tangent² | 
| VF_cot2 | cotangent² | 
| VF_sec2 | secant² | 
| VF_cosec2 | cosecant² | 
| VF_sinrpi | sine of p/q * p | 
| VF_cosrpi | cosine of p/q * p | 
| VF_sincosrpi | sine and cosine of p/q * p at once | 
| VF_tanrpi | tangent of p/q * p | 
| VF_cotrpi | cotangent of p/q * p | 
| VF_secrpi | secant of p/q * p | 
| VF_cosecrpi | cosecant of p/q * p | 
| VF_sinrpi2 | sine of p / 2n * p | 
| VF_tanrpi3 | tangent of p / (3*n) * p | 
| VF_sinc | sinc function, Yi = sin( Xi ) / Xi | 
| VF_Kepler | Kepler function, calculating the time-dependent angular position of a planet or comet | 
| VF_asin | arc sin | 
| VF_acos | arc cos | 
| VF_atan | arc tan | 
| VF_atan2 | arc tan of ratios, Zi = atan( Yi / Xi ) | 
| VF_derivV | derivative of a Y-array with respect to an X-array | 
| VF_derivC | the same for constant intervals between the X-values | 
| VF_integralV | value of the integral of a Y-array over an X-array | 
| VF_runintegralV | point-by-point ("running") integral | 
| VF_integralC | integral over an equally spaced X-axis | 
| VF_runintegralC | point-by-point integral over an equally spaced X-axis | 
| VF_ismonoton | test if an array is monotonously rising or falling | 
| VF_iselementC | test, if a given value occurs within a vector | 
| VF_searchC | search an ordered table for the entry whose value comes closest to a preset value C | 
| VF_localmaxima | detect local maxima (points whose right and left neighbours are smaller) | 
| VF_localminima | detect local minima (points whose right and left neighbours are larger) | 
| VF_max | detect global maximum | 
| VF_min | detect global minimum | 
| VF_maxind | global maximum and its index | 
| VF_minind | global minimum and its index | 
| VF_absmax | global maximum absolute value | 
| VF_absmin | global minimum absolute value | 
| VF_absmaxind | global maximum absolute value and its index | 
| VF_absminind | global minimum absolute value and its index | 
| VF_maxexp | global maximum exponent | 
| VF_minexp | global minimum exponent | 
| VF_runmax | "running" maximum | 
| VF_runmin | "running" minimum | 
The complex equivalents of the last group of functions are:
 
| VCF_maxReIm | maximum real and imaginary parts separately | 
| VCF_minReIm | minimum real and imaginary parts separately | 
| VCF_absmaxReIm | maximum absolute real and imaginary values separately | 
| VCF_absminReIm | minimum absolute real and imaginary values separately | 
| VCF_absmax | largest magnitude (absolute value; this is a real number) | 
| VCF_absmin | smallest magnitude | 
| VCF_cabsmax | complex number of largest magnitude | 
| VCF_cabsmin | complex number of smallest magnitude | 
| VCF_sabsmax | complex number for which the sum |Re| + |Im| is largest | 
| VCF_sabsmin | smallest complex number in terms of the sum |Re| + |Im| | 
| VCF_absmaxind | largest magnitude (absolute value) and its index | 
| VCF_absminind | smallest magnitude and its index | 
| VF_FFTtoC | forward Fast Fourier Transform (FFT) of a real vector; the result is a cartesian complex vector | 
| VF_FFT | forward and backward FFT of a real vector; the result of the forward FFT is packed into a real vector of the same size as the input vector | 
| VCF_FFT | forward and backward FFT of a complex vector | 
| VF_convolve | convolution with a given response function | 
| VF_deconvolve | deconvolution, assuming a given response function | 
| VF_filter | spectral filtering | 
| VF_spectrum | spectral analysis | 
| VF_autocorr | autocorrelation function of a data array | 
| VF_xcorr | cross-correlation function of two arrays | 
| VF_setRspEdit | set editing threshold for the filter in convolutions and deconvolutions (decides over the treatment of "lost" frequencies) | 
| VF_getRspEdit | retrieve the current editing threshold | 
The FFT algorithm chosen for this PC implementation is a radix-2 Cooley-Tukey routine. Only for this radix-2 algorithm, the restricted number of eight coprocessor registers still allows to hold all intermediate results of the inner transform loop in coprocessor registers. Although featuring savings in the number of multiplications, radix-4 and radix-8 routines are rendered less efficient than the routine chosen by the need of storing intermediate results in memory.
There are three different versions of all FFT-based functions. Depending on the memory model, either of them is automatically chosen. You may, however, explicitly specify the one you wish to employ. 
As all FFT-based matrix functions internally rely on VF_FFT, all of them exist in three versions as well. Here, the prefixes are MFp_, MFl_ and MFs_ in the real-number case, or MCFp_, MCFl_ and MCFs_ in the complex case. Similarly to the one-dimensional case, the functions with the "normal" prefix (MF_, MCF_) will automatically be redirected to the MFP_, MFl_ or MFs_ variant, as determined by the memory model.
Although it does not use Fourier transform methods, VF_smooth should be remembered here as a crude form of frequency filtering which removes high-frequency noise.
| VF_sum | sum of all elements | 
| VI_fsum | sum of all elements of an integer vector, accumulated as a floating point number in double or extended precision | 
| VF_prod | product of all elements | 
| VF_ssq | sum-of-squares of all elements | 
| VF_sumabs | sum of absolute values of all elements | 
| VF_rms | root-of-the-mean-square of all elements | 
| VF_runsum | running sum | 
| VF_runprod | running product | 
| VF_sumdevC | sum over the deviations from a preset constant, sum( |Xi-C| ) | 
| VF_sumdevV | sum over the deviations from another vector, sum( |Xi-Yi| ) | 
| VF_avdevC | average deviation from a preset constant, 1/N * sum( |Xi-C| ) | 
| VF_avdevV | average deviation from another vector, 1 / N * sum( |Xi-Yi| ) | 
| VF_ssqdevC | sum-of-squares of the deviations from a preset constant, sum( (Xi - C)² ) | 
| VF_ssqdevV | sum-of-squares of the deviations from another vector, sum( (Xi - Yi)² ) | 
| VF_chi2 | chi-square merit function | 
| VF_chiabs | "robust" merit function, similar to VF_chi2, but based on absolute instead of squared deviations | 
| VF_mean | equally-weighted mean (or average) of all elements | 
| VF_meanwW | "mean with weights" of all elements | 
| VF_meanabs | equally-weighted mean (or average) of the absolute values of all elements | 
| VF_selected_mean | averages only those vector elements which fall into a specified range, thus allowing to exclude outlier points from the calculation of the mean | 
| VF_varianceC | variance of a distribution with respect to a preset constant value | 
| VF_varianceCwW | the same with non-equal weighting | 
| VF_varianceV | variance of one distribution with respect to another | 
| VF_varianceVwW | the same with non-equal weighting | 
| VF_meanvar | mean and variance of a distribution simultaneously | 
| VF_meanvarwW | the same with non-equal weighting | 
| VF_median | median of a distribution | 
| VF_corrcoeff | linear correlation coefficient of two distributions | 
| VF_distribution | bins data into a discrete one-dimensional distribution function | 
A detailed description of the various data-fitting concepts is given in chapter 13 of MATRIX.HTM. Therefore, at this place, the available X-Y fitting functions are only summarized in the following table:
 
| VF_linregress | equally-weighted linear regression on X-Y data | 
| VF_linregresswW | the same with non-equal weighting | 
| VF_polyfit | fitting of one X-Y data set to a polynomial | 
| VF_polyfitwW | the same for non-equal data-point weighting | 
| VF_linfit | fitting of one X-Y data set to an arbitrary function linear in its parameters | 
| VF_linfitwW | the same for non-equal data-point weighting | 
| VF_setLinfitNeglect | set threshold to neglect (i.e. set equal to zero) a fitting parameter A[i], if its significance is smaller than the threshold | 
| VF_getLinfitNeglect | retrieve current significance threshold | 
| VF_nonlinfit | fitting of one X-Y data set to an arbitrary, possibly non-linear function | 
| VF_nonlinfitwW | the same for non-equal data-point weighting | 
| VF_multiLinfit | fitting of multiple X-Y data sets to one common linear function | 
| VF_multiLinfitwW | the same for non-equal data-point weighting | 
| VF_multiNonlinfit | fitting of multiple X-Y data sets to one common nonlinear function | 
| VF_multiNonlinfitwW | the same for non-equal data-point weighting | 
| VF_cprint | print the elements of a vector to the screen (or "console"  hence the "c" in the name) into the current text window, automatically detecting its height and width. After printing one page, the user is prompted to continue. (Only for DOS) | 
| VF_print | is similar to VF_cprint in that the output is directed to the screen, but there is no automatic detection of the screen data; a default linewidth of 80 characters is assumed, and no division into pages is made. (Only for DOS and EasyWin) | 
| VF_fprint | print a vector to a stream. | 
| VF_write | write data in ASCII format in a stream | 
| VF_read | read a vector from an ASCII file | 
| VF_nwrite | write n vectors of the same data type as the columns of a table into a stream | 
| VF_nread | read the columns of a table into n vectors of the same type | 
| VF_store | store data in binary format | 
| VF_recall | retrieve data in binary format | 
The following functions allow to modify the standard settings of VF_write, VF_nwrite and VI_read:
| VF_setWriteFormat | define a certain number format | 
| VF_setWriteSeparate | define a separation string between successive elements, written by VF_write | 
| VF_setNWriteSeparate | define a separation string between the columns written by VF_nwrite | 
| V_setRadix | define a radix different from the standard of 10 for the whole-number variants of the V.._read functions | 
| V_initPlot | initialize VectorLib graphics functions (both Windows and DOS). For Windows, no shut-down is needed at the end, since the Windows graphics functions always remain accessible. V_initPlot automatically reserves a part of the screen for plotting operations. This part comprises about 2/3 of the screen on the right side. Above, one line is left for a heading. Below, a few lines are left empty. To change this default plotting region, call V_setPlotRegion after V_initPlot. | 
| V_initGraph | simultaneously initialize Borland's graphics interface and VectorLib plotting functions (DOS only). Do not call initgraph after V_initGraph. If, on the other hand, you have already called initgraph, do not use V_initGraph, but V_initPlot instead. At the end of the graphics session, the BGI function closegraph has to be used to leave the graphics mode and to release graphics buffer memory. | 
| V_initPrint | initialize VectorLib graphics functions and direct them to a printer (Windows only). By default, one whole page is reserved for plotting. In order to change this, call V_setPlotRegion after V_initPrint. | 
| V_setPlotRegion | set a plotting region different from the default | 
VectorLib distinguishes between two sorts of plotting functions, AutoPlot and DataPlot. All AutoPlot functions (e.g., VF_xyAutoPlot) execute the following steps:
| VF_xyAutoPlot | display an automatically-scaled plot of an X-Y vector pair | 
| VF_yAutoPlot | plot a single Y-vector, using the index as X-axis | 
| VF_xy2AutoPlot | plot two X-Y pairs at once, scaling the axes in such a way that both vectors fit into the same coordinate system | 
| VF_y2AutoPlot | the same for two Y-vectors, plotted against their indices | 
| VF_xyDataPlot | plot one additional set of X-Y data | 
| VF_yDataPlot | plot one additional Y vector over its index | 
Cartesian complex arrays are printed into the complex plane (the imaginary parts versus the real parts), using
 
| VCF_autoPlot | plot one cartesian complex vector | 
| VCF_2AutoPlot | plot two cartesian complex vectors simultaneously | 
| VCF_dataPlot | plot one additional cartesian complex vector | 
At present, there are no plotting functions for polar complex vectors included.
It is possible to draw more than one coordinate systems into a given window on the screen. The position of each coordinate system must be specified by the above-mentioned function V_setPlotRegion. "Hopping" between the different coordinate systems and adding new DataPlots after defining new viewports (e.g., for text output) is made possible by the following functions:
 
| V_continuePlot | go back to the viewport of the last plot and restore its scalings | 
| V_getCoordSystem | get a copy of the scalings and position of the current coordinate system | 
| V_setCoordSystem | restore the scalings and position of a coordinate system; these must have been stored previously, using V_getCoordSystem | 
DOS only:
When using multiple coordinate systems on the same screen, the default font used for axis labeling might be too large, so that neighbouring labels overlap each other. In these cases, use the BGI function settextstyle to switch to another font before calling an AutoPlot function.
In the "expanded" versions of all functions with extended accuracy (those with the prefixes VEx_ and VCEx_; for example VEx_exp), there is generally no overflow protection for the calculation of A*Xi+B, but only for the core of the function itself and for the final multiplication by C.
A series of identical errors occurring within one and the same OptiVec function leads to one error message only. Subsequent identical messages are suppressed.
There is a fundamental difference between floating-point and integer numbers with respect to OVERFLOW and DOMAIN errors: for floating-point numbers, these are always serious errors, whereas for integer numbers, by virtue of the implicit modulo-2n arithmetics, this is not necessarily the case. In the following two paragraphs, details are given on the error handling of integer and floating-point numbers, respectively.
| ierrNote | print an error message | 
| ierrAbort | print an error message and exit the program | 
| ierrIgnore | ignore the problem. With this last option, the error handling can be switched off intermediately. | 
Although you may use a call to
V_setIntErrorHandling( ierrIgnore );
to switch the error handling off, it is always better simply to use the "normal" VI_ version rather than the VIo_ version with the error-handling short-cut, as the normal version is always much faster.
C/C++ only:
To choose the overflow-detecting version not only for single function calls, but everywhere, the easiest way is to define symbolic constant V_trapIntError in the program header before(!) <VecLib.h> is included:
Example:
#define V_trapIntError 1
#include <VIstd.h>
#include <VImath.h>
.....
main() /* or WinMain(), or OwlMain() */
{
  iVector I1, I2;
  I1 = VI_vector( 1000 ); I2 = VI_vector( 1000 );
  V_setIntErrorHandling( ierrNote );
  VI_ramp( I1, 1000, 0, 50 ); /* an overflow will occur here! */
  V_setIntErrorHandling( ierrIgnore );
  VI_mulC( I2, I1, 1000, 5 );
    /* here, even a whole series of overflows will occur; they are all ignored. */
  ....
}
If the function in which an error occurs has one real-valued argument, only the parameter e->x is defined in calling _matherr and e->y is left undefined. Only if there are two arguments (like in VF_atan2 or in VF_cotrpi), both e->x and e->y are needed to hold these arguments. For complex arguments, the real part (or the Mag part for polar coordinates) is stored in e->x and the imaginary part (or the Arg part) in e->y.
In the following description of all floating-point error types, we denote by "HUGE_VAL" the largest number possible in the respective data type. Similarly, "TINY_VAL" is the smallest denormal number representable in the respective data type; this is not the same as "MIN_VAL", which is the smallest full-accuracy number of the respective data type.
In general, they may be treated just as ordinary numbers. In some instances, however, like taking the inverse, overflow errors may occur. In these cases, the somewhat academic distinction between SING and OVERFLOW errors is dropped and a SING error signalled (as if it was a division by exactly 0).
On the other hand, for functions like the logarithms, very small input numbers may give perfectly reasonable results, although the exact number 0.0 is an illegal argument, leading to a SING error. Here, the possible loss of precision is neglected and denormals are considered valid arguments. (This treatment is quite different from that chosen for the math functions of most compilers, where denormal arguments lead to SING errors also in these cases, which seems less appropriate to us.)
You might wish to circumvent this. To this end, OptiVec provides the function V_setErrorEventFile. This function needs as arguments the desired
name of your event file and a switch named ScreenAndFile which decides if the error message is printed only into the file, or additionally to the screen as well.
Note that this redirection of error messages is valid only for errors occurring in OptiVec routines. If you wish to do so, however, there is a way in C/C++ to extend the redirection also to the "non-OptiVec" functions: you may modify _matherr and _matherrl such that the statement 
 return 0;
(which signals an unresolved error) is replaced by the sequence
V_noteError( e->name, e->type ); return 1;
Thereby the task of printing the error message for unresolved errors is passed to the OptiVec function V_noteError. Keep in mind that it is the return value of _matherr which decides if an error message is printed by the default error handler of your compiler. Thus, after the call to V_noteError, the printing of the default error messages is by-passed by returning "1". (Also, do not forget that OptiVec uses your _matherr routine to determine which errors you accept and which not!)
For example, your _matherr function (matherr  without the leading underbar  for Borland C++ 3.0 and 3.1) might look like the following one:
#include <math.h>
int _matherr( struct exception *e) /* "_exception" for MSVC */
{
  if( (e->type == UNDERFLOW) || (e->type == TLOSS) ) ; /* ignore */
  else /* all other errors deserve at least notice */
  {
    V_noteError( e->name, e->type );
    if (e->type == DOMAIN) exit(1); /* really fatal */
  }
  return 1;
}
(Of course, if you decide to change _matherr, do not forget to change _matherrl in the same way, if you are using Borland C++!).
Both C/C++ and Pascal/Delphi: The default printing of error messages on the screen alone is restored by V_closeErrorEventFile.
A way to keep track also of those errors which do not lead to messages is opened by the return values of mathematical VectorLib functions. Any of the "silent" TLOSS along with the more serious DOMAIN, SING and OVERFLOW errors will lead to a TRUE (non-zero) return value. You may wish to check for a clean result after a group of functions, like in the following example:
unsigned ErrFlag;
...
/* part Trig1 */
ErrFlag=0; /* reset the flag */
ErrFlag |= VF_sin( Y1, X1, sz );
ErrFlag |= VF_cos( Y2, X1, sz );
ErrFlag |= VF_atan2( Z1, Y1, Y2, sz );
if( ErrFlag ) V_printErrorMsg( "Errors occurred in part Trig1 ! ");
...
As indicated in the example, it is better to use the |= operator (Pascal/Delphi: "ErrFlag := ErrFlag or") instead of += (since, in rare cases, all return values might add up to 65536, which is stored as 0 due to an overflow of the integer variable). Even if you chose addition of the individual return values, the number of occurred errors would not be obtainable from the result; in case of an error, any non-specified non-zero number is returned.
| 
 | 
| 
 | 
| 
 | 
| 
 | 
| 
 | 
| 
 | 
| 
 | 
| 
 | 
| 
 | 
| 
 | 
| 
 | 
| 
 | 
| 
 | 
| 
 | 
| 
 | 
| 
 | 
| 
 | 
| 
 | 
If you are still working with Borland (Turbo) C++ 3.x, you cannot simultaneously include VC??.LIB and MC??.LIB, as the linker is not able to handle these large libraries. As very few people are still using BC 3, we have discontinued full support for that compiler and do no longer offer the special BC 3 version OptiVec, previously fixing that problem by splitting the libraries up into many smaller ones. You still can extract the needed modules from the MC??.LIB libraries yourself, however, and include them as .OBJ nodes into your project.
C/C++ only: Declare the use of OptiVec functions with #include statements. If you are using MFC (Microsoft Foundation Classes) or Borland's OWL (ObjectWindows Library), the MFC or OWL include-files have to be #included before (!) the OptiVec include-files.
Pascal/Delphi only: Declare the use of OptiVec functions with the uses clause.
 
| Include-file (add suffix .H) or unit | Contents | 
| VecLib | Basic definitions of the data types along with the functions common to all data types (prefix V_) except for the graphics initialization functions. | 
| VFstd, VDstd, VEstd | Floating-point "standard operations:" generation and initialization of vectors, index-oriented manipulations, data-type interconversions, statistics, analysis, geometrical vector arithmetics, Fourier-Transform related functions, I/O operations. | 
| VCFstd, VCDstd, VCEstd, VPFstd, VPDstd, VPEstd | Standard operations for cartesian and polar complex vectors | 
| VIstd, VBIstd, VSIstd, VLIstd, VQIstd | Standard operations for signed integer vectors | 
| VUstd, VUBstd, VUSstd, VULstd, VUIstd | Standard operations for unsigned integer vectors (VUIstd only for C/C++) | 
| VFmath, VDmath, VEmath | Algebraic, arithmetical and mathematical functions for floating-point vectors | 
| VCFmath, VCDmath, VCEmath, VPFmath, VPDmath, VPEmath | Arithmetical and mathematical functions for complex vectors | 
| VImath, VBImath, VSImath, VLImath, VQImath | Arithmetical and mathematical functions for signed integer vectors | 
| VUmath, VUBmath, VUSmath, VULmath, VUImath | Arithmetical and mathematical functions for unsigned integer vectors (VUImath only for C/C++) | 
| Vgraph | Graphics functions for all data types | 
| VFNLFIT, VDNLFIT, VENLFIT | Non-linear fitting functions (Pascal/Delphi only; in C/C++, they are in M?std) | 
| VFMNLFIT, VDMNLFIT, VEMNLFIT | Non-linear fitting functions for multiple data sets (Pascal/Delphi only; in C/C++, they are in M?std) | 
| MFstd, MDstd, MEstd | Matrix operations for real-valued matrices | 
| MCFstd, MCDstd, MCEstd | Matrix operations for cartesian complex matrices | 
| Mgraph | Matrix graphics functions for all data types | 
| MFNLFIT, MDNLFIT, MENLFIT | Non-linear fitting functions for Z = f(X, Y) data (Pascal/Delphi only; in C/C++, they are in M?std) | 
| MFMNLFIT, MDMNLFIT, MEMNLFIT | Non-linear fitting functions for multiple Z = f(X, Y) data sets (Pascal/Delphi only; in C/C++, they are in M?std) | 
| NEWCPLX | complex class library CMATH; C++ only | 
| CMATH | complex library CMATH for Pascal/Delphi and plain C | 
| CFMATH, CDMATH, CEMATH | C/C++ only: type-specific parts of CMATH. | 
| XMATH | A few non-vectorized math functions needed internally by other OptiVec functions; they are publically accessible (see chapter 9). C/C++: declares also the sine, cosec, and tangent tables for VF_sinrpi2 etc. | 
| FSINTAB2, DSINTAB2, ESINTAB3, FSINTAB3, DSINTAB3, ESINTAB3 | sine tables (Pascal/Delphi only; for C/C++, they are in XMATH) | 
| FCSCTAB2, DCSCTAB2, ECSCTAB3, FCSCTAB3, DCSCTAB3, ECSCTAB3 | cosecant tables (Pascal/Delphi only; for C/C++, they are in XMATH) | 
| FTANTAB2, DTANTAB2, ETANTAB3, FTANTAB3, DTANTAB3, ETANTAB3 | tangent tables (Pascal/Delphi only; for C/C++, they are in XMATH) | 
| VecObj | basic definitions for VecObj, the object-oriented interface for C++ | 
| fVecObj, dVecObj, eVecObj | VecObj member functions for real-valued vector objects (C++ only) | 
| cfVecObj, cdVecObj, ceVecObj pfVecObj, pdVecObj, peVecObj | VecObj member functions for complex vector objects (C++ only) | 
| iVecObj, biVecObj, siVecObj, liVecObj, qiVecObj | VecObj member functions for signed-integer vector objects (C++ only) | 
| uVecObj, ubVecObj, usVecObj, ulVecObj, uiVecObj | VecObj member functions for unsigned-integer vector objects (C++ only) | 
| OptiVec | includes the whole OptiVec package (C++ only) | 
| VecAll | includes all VectorLib and CMATH functions (C or C++ only) | 
| MatAll | includes all MatrixLib functions (C or C++ only) |