TArray

tarray - Typed Array extension for Tcl

Project is hosted at SourceForge [L1 ]. Latest download is V1.0.0.

Web site and documentation is at https://tarray.magicsplat.com

Overview

The tarray extension implements a new Tcl collection data type - typed array - and associated commands column and table. A typed array stores elements of a specific data type in native format. The primary motivation for this extension is efficient memory utilization and speed of certain operations in applications dealing with very large numbers of elements. This is achieved through native storage formats and parallelization of operations on multi-core CPU's. See the benchmarks below.

The tarray extension was inspired in part by Speed Tables and to a lesser extent by TclRal.

The philosophy behind tarray is to provide efficient facilities on top of which more sophisticated data structures, possibly customized for specific applications, can be easily scripted and experimented with. Therefore, unlike Speed Tables, tarray does not require creation and recompilation of a new extension for each table definition. Moreover, tarray provides value-based semantics so that columns and tables can be used as basic building blocks. Additional facilities that Speed Tables provides, like remote access, are expected to be implemented at the script level.

An accompanying package tarray_ui is also under development containing related Tk widgets.

Also see Xtal, a language built on top of Tarray for convenient programming with typed arrays.

Benchmarks

Out of date

As expected, benefits of native format and parallelization can be significant (more than two orders of magnitude for searches) as shown in the benchmarks below. Tests run on 64-bit Windows 8 notebook with i5 CPU @ 1.7Ghz.

Sorting

The lsort column is the baseline showing the performance of lsort for each data type (using the -real, -integer etc. options). The numbers in parenthesis show performance relative to the lsort baseline. The next two columns show tarray performance without parallelizing and parallelized to 2 threads.

Version:  0.6
Run date: Thu Aug 14 10:20:06 IST 2014
     Size       Type      lsort     singlethread      multithread  

    10000    strings       4159       3776 (1.10)       2367 (1.76)
    10000        any       4104       3787 (1.08)       2675 (1.53)
    10000    doubles       3087       1377 (2.24)        895 (3.45)
    10000       ints       2888       1100 (2.62)        733 (3.94)

   100000    strings      85702      55741 (1.54)      40372 (2.12)
   100000        any      84130      74371 (1.13)      63067 (1.33)
   100000    doubles      47588      17283 (2.75)      10354 (4.60)
   100000       ints      44223      13729 (3.22)       7770 (5.69)

  1000000    strings    1407869     816009 (1.73)     635882 (2.21)
  1000000        any    1379399    1204986 (1.14)    1015875 (1.36)
  1000000    doubles     781096     214341 (3.64)     132427 (5.90)
  1000000       ints     743424     148959 (4.99)      93574 (7.94)

Searching

Similar to above, for lsearch baseline using -all -exact/-integer/-real

     Size       Type    lsearch        singlethread      multithread
    10000    strings        200         48 (4.17)         82 (2.44)
    10000        any        238        270 (0.88)        253 (0.94)
    10000    doubles      10414         30 (347.13)      126 (82.65)
    10000       ints       1460         12 (121.67)       53 (27.55)

   100000    strings       6088        631 (9.65)        521 (11.69)
   100000        any       5756       8453 (0.68)       4926 (1.17)
   100000    doubles      92572        253 (365.90)      197 (469.91)
   100000       ints      22501        123 (182.93)      132 (170.46)

  1000000    strings      71039       5345 (13.29)       4315 (16.46)
  1000000        any      74739     107890 (0.69)       70007 (1.07)
  1000000    doubles     951024       2115 (449.66)      1164 (817.03)
  1000000       ints     280801        986 (284.79)       558 (503.23)

Real world example

The following example compares storing of geographical data for cities from http://geonames.org as a list, as a sqlite in-memory database, and as a tarray table. The database has just over 142000 records. The corresponding cities table definition is shown below.

tarray::table create {
    geonameid int 
    name string 
    country string 
    latitude double 
    longitude double 
    population wide 
    elevation int
}

The table is queried for all cities with more than a million people. Simplistic query and shows tarray in the best light speedwise as comparisons are all numeric. Better benchmarks tbd.

Memory usage and timing is shown below. It shows even from the memory usage point of view, lists are not suitable for more than a few hundred thousand records. Note tarray uses twice as much memory as sqlite but is two orders of magnitude faster on the search. This is not to say sqlite and tarray are comparable! sqlite is a database, tarray is not! Nevertheless, for use as an in-memory data structure, tarray can replace some uses of sqlite. APN is somewhat surprised with the difference in speed. Any hints on optimizing the sqlite query would be appreciated. It would have been nice to also compare Speed Tables but that does not build on Windows and I do not have a Linux benchmarking host.

list results:
Virtual Bytes: 112 MB
Working Set: 111 MB
Page file: 112 MB

sqlite results: {db eval {select name from geo where population > 1000000}}
Virtual Bytes: 17 MB
Working Set: 9 MB
Page file: 9 MB
23142.2 microseconds per iteration

tarray results: {table get -columns name $tab [column search -all -gt [table column $tab population] 1000000]}
Virtual Bytes: 40 MB
Working Set: 18 MB
Page file: 19 MB
187.7 microseconds per iteration

Same benchmark, this time using a cities database with 2.15 million records. Lists not included due to memory issues with such a large number of records.

sqlite results:
Virtual Bytes: 159 MB
Working Set: 154 MB
Page file: 157 MB
354914.5 microseconds per iteration

tarray results:
Virtual Bytes: 309 MB
Working Set: 277 MB
Page file: 293 MB
1353.3 microseconds per iteration

Discussion

SEH -- It would be useful to be able to do dumps of raw binary data once a TArray had been constucted. Then one could, for example, use it with a reflected channel to duplicate the function of memchan. Or to do device I/O. (Is this already possible?)

APN Direct I/O files/databases to and from typed arrays is on the to-do list but low down for a couple of reasons. First, there are still a bunch of basic operations and optimizations that have to be implemented to make the package more useful as a building block. Second, it is not clear what the output / input format should be. Even just binary dumps raise questions of endianness etc.


AK - 2013-04-18 22:18:04

Tcllib already has a memchan emulation based on reflected channels. Documentation @ https://core.tcl-lang.org/tcllib/doc/trunk/embedded/md/tcllib/files/modules/virtchannel_base/tcllib_memchan.md

SEH -- Since that package is pure Tcl, I was hoping a TArray-based solution would be faster.

AK -- What makes this then different from the original memchan ?

SEH -- Nothing at all, except that the author states above that TArray is intended to be a modular base for a range of specific solutions. If the function of memchan could be duplicated, that would be one less purpose-specific package to maintain, replaced by a flexible multi-purpose tool. I think it would be healthier for the Tcl ecosystem to have fewer of the former and more of the latter, for an equivalent range of applications.


EB - 2014-08-14 09:57:16

I use metakit in an ethernet sniffer application, to store known frames:

    mk::file open hist
    mk::view layout hist.data   {time:L decoder:I header:S data:B}

If I have time, I'll a try to switch to tarray to give it a try. But to have a real benefit, I'd like to have binary search, to search by time. It's also missing in metakit, so I have done it in Tcl, which is faster than searching with a metakit call.

APN There is no "byte array" tarray type so if you are storing frames, you have to use type any for the column which corresponds to a Tcl_Obj*. So you will not really see any savings in memory. Regarding binary search, you might want to see if you really need it as integer searches are pretty fast. I have thought of optimizing the column search for the sorted column case but have not gotten to it yet.