summaryrefslogtreecommitdiff
path: root/doc/maps.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/maps.rst')
-rw-r--r--doc/maps.rst525
1 files changed, 525 insertions, 0 deletions
diff --git a/doc/maps.rst b/doc/maps.rst
new file mode 100644
index 0000000..68eca60
--- /dev/null
+++ b/doc/maps.rst
@@ -0,0 +1,525 @@
+====
+Maps
+====
+
+A map is a collection of key-value pairs. (Some languages, such as Perl, also
+call them “associative array” or just “hash”, however, the underlying storage
+mechanism may not be an array or a hash, however.) Each key is unique and has
+an associated value. Keys can be any data desired; HXmap allows to specify your
+own key and data handling functions so they can be strings, raw pointers, or
+complex structures.
+
+To access any map-related functions, ``#include <libHX/map.h>``.
+
+
+Structural definition
+=====================
+
+The HXmap structure is a near-opaque type. Unlike the predecessor map
+implementation struct HXbtree from libHX 2.x, the 3.x API exposes much less
+fields.
+
+.. code-block:: c
+
+ struct HXmap {
+ unsigned int items, flags;
+ };
+
+``items``
+ The number of items in the tree. This field tracks the number of items
+ in the map and is used to report the number of elements to the user,
+ and is updated whenever an element is inserted or removed from the map.
+ The field must not be changed by the user.
+
+``flags``
+ The current behavior flags for the map. While implementation-private
+ bits are exposed, only ``HXMAP_NOREPLACE`` is currently allowed to be
+ (un)set by the user while a map exists.
+
+For retrieving elements from a tree, some functions work with ``struct
+HXmap_node``, which is defined as follows:
+
+.. code-block:: c
+
+ struct HXmap_node {
+ union {
+ void *key;
+ const char *const skey;
+ };
+ union {
+ void *data;
+ char *sdata;
+ };
+ };
+
+``key``
+ The so-called primary key, which uniquely identifies an element (a
+ key-value pair) in the map. The memory portions that make up the key
+ must not be modified. (If the key changes, so does its hash value
+ and/or position index, and without taking that into account, writing to
+ the key directly is going to end up with an inconsistent state. To
+ change the key, you will need to delete the element and reinsert it
+ with its new key.)
+
+``skey``
+ A convenience type field for when the map's keys are C strings. It is
+ useful for use with e.g. ``printf`` or other varargs function, which
+ would otherwise require explicit and noisy casting of the ``void *key``
+ member to ``const char *`` first.
+
+``data``
+ The data associated with the key.
+
+``sdata``
+ Convenience type field.
+
+
+Map initialization
+==================
+
+During initialization, you specify the underlying storage type by selecting a
+given constructor function. All further operations are done through the unified
+HXmap API which uses a form of virtual calls internally.
+
+Currently, there are two distinct map types in libHX. There are a handful of
+selectable symbols, though. Abstract types are:
+
+``HXMAPT_DEFAULT``
+ No further preferences or guarantees; selects any data structure that the
+ libHX maintainer deemed fast.
+
+``HXMAPT_ORDERED``
+ The map shall use a data structure that provides ordered traversal.
+
+Specific types include:
+
+``HXMAPT_HASH``
+ Hash-based map – Amortized O(1) insertion, lookup and deletion;
+ unordered.
+
+``HXMAPT_RBTREE``
+ Red-black binary tree – O(log(n)) insertion, lookup and deletion;
+ ordered.
+
+These can then be used with the initialization functions:
+
+.. code-block:: c
+
+ struct HXmap *HXmap_init(unsigned int type, unsigned int flags);
+ struct HXmap *HXmap_init5(unsigned int type, unsigned int flags, const struct HXmap_ops *ops, size_t key_size, size_t data_size);
+
+Both the *init* and *init5* variant creates a new map; the latter function
+allows to specify the operations in detail as well as key and data size, which
+may become necessary when using data sets which have their own way of being
+managed. The flags parameter can contain any of the following:
+
+``HXMAP_NONE``
+ This is just a mnemonic for the value 0, indicating no flags.
+
+``HXMAP_NOREPLACE``
+ If a key already exists and another add operation is attempted, the
+ key's associated value will be replaced by the new value. If this flag
+ is absent, ``-EEXIST`` is returned. This flag is allowed to be
+ subsequently changed by the user if so desired, using bit logic such as
+ ``map->flags &= ~HXMAP_NOREPLACE;``.
+
+``HXMAP_SKEY``
+ Notifies the constructor that keys will be C-style strings. The flag
+ presets the ``k_compare`` operation to use ``strcmp``. In the flag's
+ absence, direct value comparison will be used if the key size is
+ specified as zero (e.g. with the ``HXhashmap_init4`` function call), or
+ ``memcmp`` if the key size is non-zero.
+
+``HXMAP_CKEY``
+ Instructs the map to make copies of keys when they are added to the
+ map. This is required when the buffer holding the key changes or goes
+ out of scope. The flag presets the ``k_clone`` and ``k_free``
+ operations to ``HX_memdup`` and ``free``, and as such, the ``key_size``
+ parameter must not be zero. If however, ``HXMAP_SKEY`` is also
+ specified, ``HX_strdup`` and ``free`` will be used and ``key_size``
+ must be ``zero``.
+
+``HXMAP_SDATA``
+ Notifies the constructor that data will be C-style strings. This sets
+ up the ``d_clone`` and ``d_free`` operations.
+
+``HXMAP_CDATA``
+ Instructs the map to make copies of the data when new entries are added
+ to the map. This is required when the buffer holding the data either
+ goes out of scope, or you want to keep the original contents instead of
+ just a pointer.
+
+``HXMAP_SCKEY``
+ Mnemonic for the combination of ``HXMAP_SKEY | HXMAP_CKEY``.
+
+``HXMAP_SCDATA``
+ Mnemonic for the combination of ``HXMAP_SDATA | HXMAP_SDATA``.
+
+``HXMAP_SINGULAR``
+ Specifies that the “map” is only used as a set, i.e. it does not store
+ any values, only keys. Henceforth, the value argument to ``HXmap_add``
+ must always be ``NULL``.
+
+
+Flag combinations
+=================
+
+This subsection highlights the way ``HXMAP_SKEY`` interacts with ``HXMAP_CKEY``
+and the key size. The copy semantics are the same for ``HXMAP_SDATA`` and
+``HXMAP_CDATA``.
+
+HXMAP_SKEY unset, HXMAP_CKEY unset
+----------------------------------
+
+The ``key_size`` parameter at the time of map construction is ignored. The
+pointer value of the key parameter for the ``HXmap_add`` call is directly
+stored in the tree, and this is the key that uniquely identifies the map entry
+and which is used for comparisons. This may be used if you intend to directly
+map pointer values.
+
+.. code-block:: c
+
+ static struct something *x = ..., *y = ...;
+ HXmap_add(map, &x[0], "foo");
+ HXmap_add(map, &x[1], "bar");
+
+HXMAP_SKEY set, HXMAP_CKEY unset
+--------------------------------
+
+The ``key_size`` parameter at the time of map construction is ignored. The
+pointer value of the key parameter for the HXmap_add call is directly stored in
+the tree, but it is the C string pointed to by the key parameter that serves as
+the key.
+
+HXMAP_SKEY set, HXMAP_CKEY set
+------------------------------
+
+The ``key_size`` parameter at the time of map construction is ignored. The
+string pointed to by the key parameter will be duplicated, and the resulting
+pointer will be stored in the tree. Again, it is the pointed-to string that is
+the key.
+
+HXMAP_SKEY unset, HXMAP_CKEY set
+--------------------------------
+
+The memory block pointed to by the key parameter will be duplicated. The
+``key_size`` parameter must be non-zero for this to successfully work.
+
+With separate ops
+-----------------
+
+However, when a custom ``struct HXmap_ops`` is provided in the call to
+``HXmap_init5``, any of these semantics can be overridden. Particularly, since
+your own ops can practically ignore ``key_size``, it could be set to any value.
+
+
+Key-data operations
+===================
+
+The ``HXMAP_SKEY/CKEY/SDATA/CDATA`` flags are generally sufficient to set up
+common maps where keys and/or data are C strings or simple binary data where
+``memdup``/``memcmp`` is enough. Where the provided mechanisms are not cutting
+it, an extra ``HXmap_ops`` structure with functions specialized in handling the
+keys and/or data has to be used as an argument to the initialization function
+call.
+
+.. code-block:: c
+
+ struct HXmap_ops {
+ int (*k_compare)(const void *, const void *, size_t);
+ void *(*k_clone)(const void *, size_t);
+ void (*k_free)(void *);
+ void *(*d_clone)(const void *, size_t);
+ void (*d_free)(void *);
+ unsigned long (*k_hash)(const void *, size_t);
+ };
+
+``k_compare``
+ Function to compare two keys. The return value is the same as that of
+ ``memcmp`` or ``strcmp``: negative values indicate that the first key
+ is “less than” the second, zero indicates that both keys are equal, and
+ positive values indicate that the first key is “greater than” the
+ second. The ``size`` argument in third position is provided so that
+ ``memcmp``, which wants a size parameter, can directly be used without
+ having to write an own function. (This also means strcmp can't be
+ directly plugged in due to a function signature mismatch.)
+
+``k_clone``
+ Function that will clone (duplicate) a key. This is used for keys that
+ will be added to the tree, and potentially also for state-keeping
+ during traversal of the map. It is valid that this clone function
+ simply returns the value of the pointer it was actually passed; this is
+ used by default for maps without ``HXMAP_CKEY`` for example.
+
+``k_free``
+ Function to free a key. In most cases it defaults to ``free``(3), but
+ in case you are using complex structs, more cleanup may be needed.
+
+``d_clone``
+ Same idea as ``k_clone``, but for data.
+
+``d_free``
+ Same idea as ``k_free``, but for data.
+
+``k_hash``
+ Specifies an alternate hash function. Only to be used with hash-based
+ maps. Hashmaps default to using the DJB2 string hash function when
+ ``HXMAP_SKEY`` is given, or otherwise the Jenkins' lookup3 hash
+ function.
+
+libHX exports two hash functions that you can select for ``struct HXmap_ops``'s
+``k_hash`` if the default for a given flag combination is not to your liking.
+
+``HXhash_jlookup3``
+ Bob Jenkins's lookup3 hash.
+
+``HXhash_djb2``
+ DJB2 string hash.
+
+
+Map operations
+==============
+
+.. code-block:: c
+
+ int HXmap_add(struct HXmap *, const void *key, const void *value);
+ const struct HXmap_node *HXmap_find(const struct HXmap *, const void *key);
+ void *HXmap_get(const struct HXmap *, const void *key);
+ void *HXmap_del(struct HXmap *, const void *key);
+ void HXmap_free(struct HXmap *);
+ struct HXmap_node *HXmap_keysvalues(const struct HXmap *);
+
+``HXmap_add``
+ Adds a new node to the tree using the given key and data. When an
+ element is in the map, the key may not be modified, as doing so could
+ possibly invalidate the internal location of the element, or its
+ ordering with respect to other elements. If you need to change the key,
+ you will have to delete the element from the tree and re-insert it. On
+ error, -errno will be returned.
+
+ When ``HXMAP_SINGULAR`` is in effect, value must be ``NULL``, else
+ ``-EINVAL`` is returned.
+
+``HXmap_find``
+ Finds the node for the given key. The key can be read from the node
+ using ``node->key`` or ``node->skey`` (convenience alias for key, but
+ with a type of ``const char *``), and the data by using ``node->data``
+ or ``node->sdata``.
+
+``HXmap_get``
+ Get is a find operation directly returning ``node->data`` instead of
+ the node itself. Since ``HXmap_get`` may legitimately return ``NULL``
+ if ``NULL`` was stored in the tree as the data for a given key, only
+ ``errno`` will really tell whether the node was found or not; in the
+ latter case, ``errno`` is set to ``ENOENT``.
+
+``HXmap_del``
+ Removes an element from the map and returns the data value that was
+ associated with it. When an error occurred, or the element was not
+ found, ``NULL`` is returned. Because ``NULL`` can be a valid data
+ value, ``errno`` can be checked for non-zero. ``errno`` will be
+ ``-ENOENT`` if the element was not found, or zero when everything was
+ ok.
+
+``HXmap_free``
+ The function will delete all elements in the map and free memory it
+ holds.
+
+``HXmap_keysvalues``
+ Returns all key-value-pairs in an array of the size as many items were
+ in the map (map->items) at the time it was called. The memory must be
+ freed using ``free``(3) when it is no longer needed. The order elements
+ in the array follows the traverser notes (see below), unless otherwise
+ specified.
+
+
+Map traversal
+=============
+
+.. code-block:: c
+
+ struct HXmap_trav *HXmap_travinit(const struct HXmap *);
+ const struct HXmap_node *HXmap_traverse(struct HXmap_trav *iterator);
+ void HXmap_travfree(struct HXmap_trav *iterator);
+ void HXmap_qfe(const struct HXmap *, bool (*fn)(const struct HXmap_node *, void *arg), void *arg);
+
+``HXmap_travinit``
+ Initializes a traverser (a.k.a. iterator) for the map, and returns a
+ pointer to it. ``NULL`` will be returned in case of an error, such as
+ memory allocation failure. Traversers are returned even if the map has
+ zero elements.
+
+``HXmap_traverse``
+ Returns a pointer to a ``struct HXmap_node`` for the next element /
+ key-value pair from the map, or ``NULL`` if there are no more entries.
+
+``HXmap_travfree``
+ Releases the memory associated with a traverser.
+
+``HXmap_qfe``
+ The “quick foreach”. Iterates over all map elements in the fastest
+ possible manner, but has the restriction that no modifications to the
+ map are allowed. Furthermore, a separate function to handle each
+ visited node, is required. (Hence this is also called “closed
+ traversal”, because one cannot access the stack frame of the original
+ function which called ``HXmap_qfe``.) The user-defined function returns
+ a bool which indicates whether traversal shall continue or not.
+
+Flags for ``HXmap_travinit``:
+
+``HXMAP_NOFLAGS``
+ A mnemonic for no flags, and is defined to 0.
+
+``HXMAP_DTRAV``
+ Enable support for deletion during traversal. As it can make traversal
+ slower, it needs to be explicitly specified for cases where it is
+ needed, to not penalize cases where it is not.
+
+WARNING: Modifying the map while a traverser is active is
+implementation-specific behavior! libHX generally ensures that there will be no
+undefined behavior (e.g. crashes), but there is no guarantee that elements
+will be returned exactly once. There are fundamental cases that one should be
+aware of:
+
+* An element is inserted before where the traverser is currently positioned at.
+ The element may not be returned in subsequent calls to ``HXmap_traverse`` on
+ an already-active traverser.
+
+* Insertion or deletion may cause internal data structure to re-layout.
+
+ * Traversers of ordered data structures may choose to rebuild
+ their state.
+
+ * Traversers of unordered data structures would run risk to
+ return more than once, or not at all.
+
+Descriptions for different map types follow.
+
+:Hashmaps:
+ On ```HXmap_add`, an element may be inserted in a position that is
+ before where the traverser is currently positioned. Such elements will
+ not be returned in the remaining calls to ``HXmap_traverse``. The
+ insertion or deletion of an element may cause the internal data
+ structure to re-layout itself. When this happens, the traverser will
+ stop, so as to not return entries twice.
+
+:Binary trees:
+ Elements may be added before the traverser's position. These elements
+ will not be returned in subsequent traversion calls. If the data
+ structure changes as a result of an addition or deletion, the traverser
+ will rebuild its state and continue traversal transparently. Because
+ elements in a binary tree are ordered, that is, element positions may
+ not change with respect to another when the tree is rebalanced, there
+ is no risk of returning entries more than once. Nor will elements that
+ are sorted after the current traverser's position not be returned
+ (= they will be returned, because they cannot get reordered to before
+ the traverser like in a hash map). The HX rbtree implementation also
+ has proper handling for when the node which is currently visiting is
+ deleted.
+
+
+RB-tree Limitations
+===================
+
+The implementation has a theoretical minimum on the maximum number of nodes,
+2^{24}=16{,}777{,}216. A worst-case tree with this many elements already has a
+height of 48 (RBT_MAXDEP), which is the maximum height currently supported. The
+larger the height is that HXrbtree is supposed to handle, the more memory
+(linear increase) it needs. All functions that build or keep a path reserve
+memory for RBT_MAXDEP nodes; on x86_64, this is 9 bytes per <node, direction>
+pair, amounting to 432 bytes for path tracking alone. It may not sound like a
+lot to many, but given that kernel developers try to limit their stack usage to
+some 4096 bytes is impressive alone.
+
+
+Examples
+========
+
+Case-insensitive ordering
+-------------------------
+
+The correct way:
+
+.. code-block:: c
+
+ static int my_strcasecmp(const void *a, const void *b, size_t z) {
+ return strcasecmp(a, b);
+ }
+ static const struct HXmap_ops icase = {
+ .k_compare = my_strcasecmp,
+ };
+ HXmap_init5(HXMAPT_RBTREE, HXMAP_SKEY, &icase, 0, dsize);
+
+A hackish way (which wholly depends on the C implementation and use of extra
+safeguards is a must):
+
+.. code-block:: c
+
+ static const struct HXmap_ops icase = {
+ .k_compare = (void *)strcasecmp,
+ };
+ BUILD_BUG_ON(sizeof(DEMOTE_TO_PTR(strcasecmp)) > sizeof(void *));
+ BUILD_BUG_ON(sizeof(DEMOTE_TO_PTR(strcasecmp)) > sizeof(icase.k_compare));
+ HXmap_init5(HXMAPT_RBTREE, HXMAP_SKEY, &icase, 0, dsize);
+
+
+Reverse sorting order
+---------------------
+
+Any function that behaves like strcmp can be used. It merely has to return
+negative when ``a<b``, zero on ``a==b``, and positive non-zero when ``a>b``.
+
+.. code-block:: c
+
+ static int strcmp_rev(const void *a, const void *b, size_t z)
+ {
+ /* z is provided for cases when things are raw memory blocks. */
+ return strcmp(b, a);
+ }
+
+ static const struct HXmap_ops rev = {
+ .k_compare = strcmp_rev,
+ };
+ HXmap_init5(HXMAPT_RBTREE, HXMAP_SKEY, &rev, 0, dsize);
+
+
+Keys with non-unique data
+-------------------------
+
+Keys can actually store non-unique data, as long as this extra fields does not
+actually contribute to the logical key — the parts that do uniquely identify
+it. In the following example, the notes member may be part of struct package,
+which is the key as far as HXmap is concerned, but still, only the name and
+versions are used to identify it.
+
+.. code-block:: c
+
+ struct package {
+ char *name;
+ unsigned int major_version;
+ unsigned int minor_version;
+ char notes[64];
+ };
+
+ static int package_cmp(const void *a, const void *b)
+ {
+ const struct package *p = a, *q = b;
+ int ret;
+ ret = strcmp(p->name, q->name);
+ if (ret != 0)
+ return ret;
+ ret = p->major_version - q->major_version;
+ if (ret != 0)
+ return ret;
+ ret = p->minor_version - q->minor_version;
+ if (ret != 0)
+ return ret;
+ return 0;
+ }
+
+ static const struct HXmap_ops package_ops = {
+ .k_compare = package_cmp,
+ };
+
+ HXmap_init5(HXMAPT_RBTREE, flags, &package_ops,
+ sizeof(struct package), dsize);