Next: Address Hashing, Previous: Basic Hash Table Operations, Up: Hash Tables
Normally, hash tables automatically resize themselves according to need. Because of this, the programmer need not be concerned with management of the table's size. However, some limited control over the table's size is provided, which will be discussed below. This discussion involves two concepts, usable size and physical size, which we will now define.
The usable size of a hash table is the number of associations that the table can hold at a given time. If the number of associations in the table exceeds the usable size, the table will automatically grow, increasing the usable size to a new value that is sufficient to hold the associations.
The physical size is an abstract measure of a hash table that specifies how much space is allocated to hold the associations of the table. The physical size is always greater than or equal to the usable size. The physical size is not interesting in itself; it is interesting only for its effect on the performance of the hash table. While the average performance of a hash-table lookup is bounded by a constant, the worst-case performance is not. For a table containing a given number of associations, increasing the physical size of the table decreases the probability that worse-than-average performance will occur.
The physical size of a hash table is statistically related to the number of associations. However, it is possible to place bounds on the physical size, and from this to estimate the amount of space used by the table:
(define (hash-table-space-bounds count rehash-size rehash-threshold) (let ((tf (/ 1 rehash-threshold))) (values (if (exact-integer? rehash-size) (- (* count (+ 4 tf)) (* tf (+ rehash-size rehash-size))) (* count (+ 4 (/ tf (* rehash-size rehash-size))))) (* count (+ 4 tf)))))
What this formula shows is that, for a “normal” rehash size (that is, not an exact integer), the amount of space used by the hash table is proportional to the number of associations in the table. The constant of proportionality varies statistically, with the low bound being
(+ 4 (/ (/ 1 rehash-threshold) (* rehash-size rehash-size)))
and the high bound being
(+ 4 (/ 1 rehash-threshold))
which, for the default values of these parameters, are 4.25
and
5
, respectively. Reducing the rehash size will tighten these
bounds, but increases the amount of time spent resizing, so you can see
that the rehash size gives some control over the time-space tradeoff of
the table.
The programmer can control the size of a hash table by means of three parameters:
If the programmer knows that the table will initially contain a specific
number of items, initial-size can be given when the table is
created. If initial-size is an exact non-negative integer, it
specifies the initial usable size of the hash table; the table will not
change size until the number of items in the table exceeds
initial-size, after which automatic resizing is enabled and
initial-size no longer has any effect. Otherwise, if
initial-size is not given or is #f
, the table is
initialized to an unspecified size and automatic resizing is immediately
enabled.
The rehash size specifies how much to increase the usable size of
the hash table when it becomes full. It is either an exact positive
integer, or a real number greater than one. If it is an integer, the
new size is the sum of the old size and the rehash size. Otherwise, it
is a real number, and the new size is the product of the old size and
the rehash size. Increasing the rehash size decreases the average cost
of an insertion, but increases the average amount of space used by the
table. The rehash size of a table may be altered dynamically by the
application in order to optimize the resizing of the table; for example,
if the table will grow quickly for a known period and afterwards will
not change size, performance might be improved by using a large rehash
size during the growth phase and a small one during the static phase.
The default rehash size of a newly constructed hash table is 2.0
.
Warning: The use of an exact positive integer for a rehash size is almost always undesirable; this option is provided solely for compatibility with the Common Lisp hash-table mechanism. The reason for this has to do with the time penalty for resizing the hash table. The time needed to resize a hash table is proportional to the number of associations in the table. This resizing cost is amortized across the insertions required to fill the table to the point where it needs to grow again. If the table grows by an amount proportional to the number of associations, then the cost of resizing and the increase in size are both proportional to the number of associations, so the amortized cost of an insertion operation is still bounded by a constant. However, if the table grows by a constant amount, this is not true: the amortized cost of an insertion is not bounded by a constant. Thus, using a constant rehash size means that the average cost of an insertion increases proportionally to the number of associations in the hash table.
The rehash threshold is a real number, between zero exclusive and
one inclusive, that specifies the ratio between a hash table's usable
size and its physical size. Decreasing the rehash threshold decreases
the probability of worse-than-average insertion, deletion, and lookup
times, but increases the physical size of the table for a given usable
size. The default rehash threshold of a newly constructed hash table is
1
.
Returns the usable size of hash-table as an exact positive integer. This is the number of associations that hash-table can hold before it will grow.
X must be either an exact positive integer, or a real number that is greater than one. Sets the rehash size of hash-table to x and returns an unspecified result. This operation adjusts the “shrink threshold” of the table; the table might shrink if the number of associations is less than the new threshold.
X must be a real number between zero exclusive and one inclusive. Sets the rehash threshold of hash-table to x and returns an unspecified result. This operation does not change the usable size of the table, but it usually changes the physical size of the table, which causes the table to be rehashed.