Main Page   Class Hierarchy   Compound List   File List   Compound Members   File Members   Related Pages   Examples  

Portable Lists and Hashtables of Arbitrary Objects

User Guide to the type_allocator.h and hashtable.h. This document explains the basic usage of template classes to maintain arbitrary lists and hashtables of arbitrary objects.

Introduction

The Standard Template Library (STL) is part of the draft standard for C++. The STL has a large number of extremely useful classes for storing and manipulating lists of objects. However, not all platforms yet support the draft standard due to a number of portability concerns.

The classes type_allocator and hashtable attempt to fill the gap until the code that is written against the STL is portable. When the STL is finally available and in use with our applications, a simple adaptor class between the appropriate STL class and the template classes detailed in this document can be written.

The goal of the two classes is to capture the advantages of the STL library vector and hash table classes without one, the development of a portable STL headers and associated library, and two, that can be compiled across all platforms that we support. These two classes meet these goals. For instance the old HP compilers could not even compile the STL available from the HP ftp site. While the new HP compilers are far better than the compilers available under HP-UX 9 and 10, there still are some portability problems. Both the type_allocator and hash table classes do compile on all platforms we currently support.

Type Allocator

The type_allocator class acts both as a STL vector and as a memory allocator rolled into one. The STL vector class allows users to put objects in a list. The STL vector class terms the append operation to be in constant time. The type_allocator append operation is also constant time. Inserting objects into the middle of the list can be expensive because all objects above the insertion point are copied. A A template link list class could be written to solve this, but as of yet there has been no compelling application where there wasn't a workaround using qsort.

List of type_allocator objects are first put on the stack. One of the specifications of a type_allocator list is the number of objects to be put on the stack. Lists that are smaller than this size, will be completely on the stack hence resulting in a performance gain for short lists. Once the initial stack size is exceeded, the list object automatically reallocates the list to hold the larger number of objects.

Reallocation of the list does entail a cost. All objects must be properly copied from the old list to the new list. This means that the destructor of objects must be called and hence memory is not simply reallocated with the standard C library routine realloc.

Objects placed into type_allocator lists must have the following methods:

Plain old data objects such as int, double, char, etc and pointers can be placed into type_allocator lists without worrying about any of these methods. The compiler knows how to carry out all operations except destroying these objects and the file type_allocator.h has defined NOP methods for the destruction for all POD data objects and pointers.

If users put pointers to objects into a type_allocator list, the user is responsible for the memory. If users put objects into lists, the destructor of the type_allocator list itself will free all memory associated with the list including memory owned by each object added to the list. Obviously, the user is responsible for writing a valid destructor that frees all memory owned by the object.

Declaring a Type Allocator List

Say users wish to put objects of class A into a type_allocator list. Users need simply to declare a list

type_allocator<A, 10> list;

The object list will be an arbitrary list of objects of type A. The stack size will be 10 in this case. The number can be any number that will not exceed the system limits. When the user has added more than 10 items to the list, the list will automatically be resized to accomodate the objects.

Typically, one uses a typedef statement to make further declarations easy. For instance the coding pattern,

typedef type_allocator<A, 10> list_of_A;
typedef list_of_A::const_iterator const_iterator_loA;
typedef list_of_A::const_iterator const_iterator_loA;
list_of_A list;

is commonly seen in routines that use type_allocator lists.

Inserting an Object to a Type Allocator List

Say we wish to add a new object a to the list. We can append the object to the list by

list.insert(a);

The append operation is a constant time operation. If you wish to insert before a particular position in the list, one first must iterate to the position of the insert. Then the user can invoke

list.insert(p, a);

where p is the position in the list for the insert. This operation will be linear in the number of elements in the list.

Iterating through Type Allocator Lists

One of the popular coding patterns in C is the ability to iterate over statically or dynamically allocated memory using a pointer. For instance,

const char *string = "this is a string";
const char *p = string;
while (*p) {
char next = *p++;
... do something with the next character ...
}

can be used to scan the string. The variable p is the iterator for the string. We iterate over the string by being able to advance the pointer p by the statement p++.

The use of an iterator can be used for arrays of objects since the length of the object can be determined at compile time. Therefore iterators are used in type_allocator lists in order to loop over the contents of the list.

The method begin() is used for the start of the list. The method end() is used to point one past the end of the list. The above design pattern is the very same except we use begin() and end() to determine the start and end of the iteration. Going back to our list of objects of type class A

typedef type_allocator<A, 10> list_of_A;
typedef list_of_A::const_iterator const_iterator_loA;
typedef list_of_A::iterator iterator_loA;
list_of_A list;
... fill the list ...
const_iterator_loA p = list.begin();
while (p < list.end()) {
A next = *p++;
... do something with the next object ...
}

The const_iterator does not allow the list to be modified. One can use iterator type for iterators that can modify the list

iterator_loA p = list.begin();
while (p < list.end()) {
... modify object (*p), modifies the list entry ...
p++;
}

Indexing Type Allocator Lists as an Array

List of type_allocator objects can also be randomly accessed using an index. The method size() can be used the get the number of elements in the list. As long as the index that one uses to address the list is valid, the entry list[n] is a reference to the nth entry in the list.

One can also fill a list like an array. However, the user must insure that the index is a member of the array. The insert method adds one object to the list. This method updates the counter of elements by one. It may be that the user wishes to add a new element at an index beyond the end of the array.

The method list.reserveArray(n) is used to create all indices up to and including n-1. Typically, if a user wishes to fill a list using the random access method, one should use the following coding pattern. Basically, there is a check if the index requested is beyond the end of the array.

if (index >= list.size()) {
list.reserveArray(index+1);
}
list[index] = ... object to add to index ...

If one tries to access a type_allocator list past the end of storage, an assert is triggered.

Finding an Object in a Type Allocator List

The user now wishes to see if an object matching the equality criteria is in the list. If this is a list of POD or pointers, then equality is strict matching of the number or pointer. If it is a list of objects, then the equality operator is used to determine equality. Let us call the object of class A to be match b.

Users can judge if the object is in the list by

iterator_loA p = list.find(b);

where p is iterator to the object in the list matching b. If no matching object is in the list, then p will point to one past the end of list. Users can check if this is the case by comparing p with end() as in

if (p < end()) ...

If this test succeeds, then an object matching b was found in the list and the user can proceed from there.

One does not need to explicitly construct objects of the type in the list for this to work. As long as the class has a constructor that can convert the object specified in the find method to the object of the class stored in the list, the compiler automatically will create the proper object for the comparison.

For example, the class StorageBuffer can be constructed from constant strings. Hence one can specify a constant string in find to search a list of StorageBuffer objects.

typedef type_allocator<StorageBuffer, 10> string_list;
typedef string_list::const_iterator const_string_iterator;
string_list list;
... fill list with strings ...
const_string_iterator p = list.find("this string");

This works because a StorageBuffer object can be constructed from "this string".

Removing an Object from a Type Allocator List

The user can remove an object or a range of objects from a list. Users can remove an object of class A b from the list by first finding the object in the list by invoking find followed by the erase method:

iterator_loA p = list.find(b);
erase(p);

The find method uses the object class's equality operator to judge if a match has been made. If no match is made, then the find operator returns the end of list which is actually the pointer to one past the end of memory for the list. The erase method then is responsible for compressing memory.

If the user wishes to release a segment of memory, the user can invoke

erase(s,e);

where s is the iterator to the start of and e is the iterator to one past the end of memory to erase.

The method list.deleteElements(); uses the second form of erase to release all memory associated with the list. This method also releases all dynamic memory that was allocated by the list to hold the objects in addition to releasing all memory owned by each object itself. The destructor for type_allocator lists simply invokes this method to destroy itself.

Memory Allocation Options

The biggest performance problem in using type_allocator lists is the number of times memory needs to be dynamically allocated. Dynamically allocating memory is itself a cost. The standard C memory allocator malloc is notoriously slow especially for allocations of small amounts of memory. The second cost in performance is the need to copy each object seperately into the new dynamically allocated memory and to invoke the destructor of each object allocated in the old memory that is about to be released.

Type allocator lists allow users to mitigate the performance costs several ways. First, users can specify a reasonable size for the list on the stack in the declaration of the list. If lists are below that size, then no dynamic allocations will be required at all thereby completely mitigating the costs of having to dynamically reallocate the list.

Second, if the length of the list can approximately be determined at run time, users can reserve(n) the space required for the list. This is useful for very long lists where the stack allocation size would not be reasonable for all cases. Reserving memory improves performance of type_allocator lists by reducing the number of reallocations and copies of objects the list will require.

Finally, one can choose a memory reallocation style that best matches your application. Type allocator lists support three different memory allocation models. When new memory is requested, the list can allocate a new chunk of memory by

The exponential allocator, will at least double the size of the memory currently allocated in case more memory to hold the list is required. This has the advantage of reducing the number of times memory needs to be copied, at the cost of potentially dynamically allocating too much memory. This is the default memory allocation style.

Somewhat better in terms of memory performance, is the block style allocation of memory. Memory in multiples of the default stack allocation size will be allocated when new memory is required. This is not as wasteful in memory as the exponential allocator, at the cost of more reallocations if the allocated space is exhausted quickly by more inserts.

The best model for memory performance is allocating just what the list needs. This is a good model in the case of small lists where the number of list members will always be close to the number of stack allocation members. However, this will give the worst memory performance in the case where many more inserts are required above and beyond the declared stack size because every insert past the stack size implies a repeated reallocation and copying of memory.

Finally, the method reserveArray(n) method differs from the reserve method in that the number of elements in the list is changed to n-1 if the number of memory slots in the request exceeds the current number of objects in the list. This is to allow random access of the list via array indexing. All new memory in the list, either by reserve or by reserveArray is initialized using the default constructor of the object.

Stacks of Objects

Stacks are type_allocator lists that implement two additional methods:

Users declare a stack by

stack<A, 10> slist;

Stacks of objects are in every way equivalent to lists of objects with the addition of the methods push, and pop.

Sorting a List

Users can also sort type_allocator lists by invoking the method list.sortBy(comparison_routine) where comparison_routine compares two objects for the ansi C standard routine qsort. The method sortBy invokes qsort in order to sort the list.

The only drawback in using qsort is that it restricts the objects that can be sorted by the method. The object must not own any memory because neither the constructor or destructor will be invoked when qsort sorts the list. Hence the list can be POD, pointers, or objects that are composed of POD and pointers to memory that the object does not manage.

If this proves to be a problem, a proper quick sort method can be substituted for qsort that obeys the object nature of the list.

Hash Tables

Arbitrary objects can also be placed into hash tables. The template implementation of hash tables uses the basic type_allocator template lists to manage storage by maintaining lists of objects that have the same hash value. More precisely, a hash table is a subclass of a list of void * objects where the non-zero entries in the array are pointers to link lists of objects with the same hash value.

The hash value of the object to be inserted into the hash table can either be computed by the object itself (the object must implement the method size_t hash() const;), or the hash value may be computed externally. The table computes a starting index in the standard way by taking the modulus of the hash value with the amount of storage of the list. The storage length is a prime number. It then searchs for the link list that matches the same hash value or the first empty slot using the calculated index as a starting point.

The occupancy of the hash table is maintained by the hash table object. When the occupancy of the table exceeds a user specified threshold, the table is resized to the next prime number. The default threshold is set to 85%. Hash tables have a good search performance up until a threshold of 90%, but there is some performance gain as the threshold is lowered at the expense of more memory usage.

This hash table implementation uses link lists of the same hash value in order to improve the performance as the table is resized. The pointer to the link list is moved when the table is resized as opposed to copying and destroying all hashed objects in the table.

Objects placed in hash tables must have the very same methods as objects placed in type_allocator lists. Additionally, objects should declare a hash method though this will not be required on most platforms if all objects inserted into the hash table use an externally computed hash value.

Declaring a Hash Table

Returning to our original example, users wish to put objects of class A into a hashtable. The declaration of the hashtable is

hashtable<A> table;

Users will, from time to time, will need to be able to query the link list itself. The link list is declared

hash_link<A> link_list;

Again typically one uses a typedef statement to make further declarations easy:

typedef hashtable<A> hash_of_A;
typedef hash_of_A::hash_object_link linklist_of_A;
typedef hash_of_A::const_hash_iterator const_hash_of_A_iterator;
typedef hash_of_A::hash_iterator hash_of_A_iterator;
hash_of_A table;

Inserting an Object to a Hash Table

Users can now insert objects into a table by

table.insert(a);

inserts object a into the hash table. The hash value for the object, in this case, will be a.hash().

Users can calculate a hashvalue externally and use that to insert the object into a hash table:

table.insert(a, hv);

where hv is the externally calculated hash value for object a. Both insert methods return the iterator to the inserted object.

Finding an Object in a Hash Table

The user now wishes to see if an object matching the equality criteria is in the hash table. Let us call the object of class A to be match b:

hash_of_A_iterator p = list.find(b);

The return of find in a hash table is slightly different than that from a type_allocator list. Recall that the end of the list is returned in the case of a type_allocator list. Here 0 is returned if no match is found. This is because the end of the table has no meaning because a hash table will have a number of empty slots and hashed objects are in a link list.

If the user wishes to manipulate the link list of objects matching a particular hash value directly, one can get the start of the link list by

hash_of_A::iterator p = table.find_hash_link(hv);

where hv is the hash value to match. If no link list matches the specified hash value, a 0 will be returned. If a match is found, users can iterate over the list using the same techniques as type_allocator lists. What is returned here is a pointer to al link list so in our example of a hash table of objects of class A

linklist_of_A *list = (linklist_of_A *) (void *)(*p);
hash_of_A_iterator s = list->begin();
while (s < list->end()) {
... do something with each entry in the link list ...
s++;
}

Note the explicit cast of *p to void *. This is required because the wrap_ptr object is used in the list of hash links. The wrap_ptr object is required because the default 'constructor' for void * does not zero memory.

Removing an Object from a Type Allocator List

The user can remove an object from a hash table. If one erases a range of objects in a hash table, one is erasing a number of link lists as opposed to individual objects. Users should therefore not use the erase method provided in a type_allocator list but instead use an erase interface specifically provided for hashtable instead. The method

table.erase(b);

takes the specified object, attempts to match it in a link list and erases the entry from that link list alone. When the link list is empty, the storage for the link list is deleted and the link list is removed from the table. This method relies on the object's method hash to determine a hash value. Users can specify an external hash value by

table.erase(b, hv);

where hv is the externally computed hash value.

Memory Allocation

The same memory considerations for type allocator lists still somewhat apply to hash tables. However, object memory is not copied in hash tables due to the fact there can be frequent reallocations of memory for a hash table. Instead the objects are stored on link lists and the pointer to the link list is moved when the hash table is resized.

This means that initially sizing the hash table properly is not as important with hash tables as it is with type_allocator lists. Moreover, hash tables must be an array of prime length. Hence, users no longer have the ability to choose a default stack size for instance. Users still can size the hash table array at run time by invoking

table.reservePrimeSpace(n);

or

table.reservePrimeArray(n);

where n is the minimum length of the table. The difference between the two methods is that the reservePrimeSpace method reserves enough space so that the occupancy of the specified n elements will be below the threshold for reallocating the hash table. The hash table will compute the next prime number after the requested space if the requested storage is greater than the current length of the table. While users should not use the method reserveArray with hashtables, users still can use method reserve though there is less reason to do so.

The threshold of occupancy before the hash table is resized can also be set by the user. The threshold can be set as an optional argument to the constructor or can be set dynamically by the method table.setThreshold(v). The thresholds are not allowed to be set below 30% or greater than 95%. Hash tables have good performance up to 90% occupancy. After that point, the number of collisions with hashed entries in a search becomes so large that a search of the table starts to behave more like the number of entries in the table as opposed to constant time. Similarly, once users go below 50% in occupancy, they are just wasting memory and paying a large cost for inserting new elements into the table due to the frequent reallocations.

The memory allocation options are somewhat less relevant for the hash table. Given that the prime allocation will overallocate memory on resize, the memory option for hash tables is set to the exact size requested.

However, if the user anticipates many reallocations because the list is of unknown length, setting block or exponential allocation can be of use. The next prime length for the hash table will start at least at twice the current length in the case of the exponential allocation option and at least at the current size plus the initial hashtable blocksize (101) in the case of the block reallocator.

Use of either of the above two options will reduce the number of times the hashtable needs to be resized at the cost of overallocating memory. It is suggested that users initially size the hashtable correctly if they know the approximate number of entries and use the default allocation scheme for hashtables (the 'just size' allocation scheme).

Conclusion

This document has presented two template classes that can be used to automatically maintain lists of arbirary objects. Type allocators maintain vectors of objects and the hash table allows hashing of objects.

Objects inserted into either template class list must have

Additionally, objects to be hashed should have the method hash unless the hash value is computed externally.


Generated at Wed Mar 2 11:19:08 2005 for Result Library by doxygen1.2.0 written by Dimitri van Heesch, © 1997-2000