Basic Concepts

Indexing mechanisms used to speed up access to desired data.

Search Key - attribute to set of attributes used to look up records in a file.
An index file consists of records (called index entries) of the form.

Index Evaluation Metrics

  • Access types supported efficiently
    • Point query: records with a specified value in the attribute.
    • Range query: records with an attribute value falling in a specified range of values.
  • Access time
  • Insertion time
  • Deletion time
  • Space overhead

Ordered Indices

  • Primary index(主索引): in a sequentially ordered file, the index whose search key specifies the sequential order of the file.
    • Also called clustering index(聚集索引)
    • The search key of a primary index is usually but not necessarily the primary key.
  • Secondary index(辅助索引): an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index.


如果 key 不是一个主键,那可能会对应多个记录。

Primary index 是很宝贵的,只能有一个,其他都是辅助索引。

  • Dense index(稠密索引) — Index record appears for every search-key value in the file.
    所有 search-key 都要出现在索引文件里。
  • Sparse Index(稀疏索引): contains index records for only some search-key values.

Good tradeoff: sparse index with an index entry for every block in file, corresponding to least search-key value in the block.

Multilevel Index

If primary index does not fit in memory, access becomes expensive.

B+-Tree Index

  • All paths from root to leaf are of the same length
  • Inner node(not a root or a leaf): between \(\lceil n/2\rceil\) and \(n\) children.
  • Leaf node: between \(\lceil (n–1)/2\rceil\) and \(n–1\) values
  • Special cases:
    • If the root is not a leaf:
      at least 2 children.
    • If the root is a leaf :
      between 0 and (n–1)values.

一般一个节点就是一个块的大小, 4K.
B+ 树的叉是非常大的。

Observations about B+-trees

Since the inter-node connections are done by pointers, “logically” close blocks need not be “physically” close. 如果有很多文件一次性建立 B+ 树,我们可以从叶子节点开始建立。

如果有 K 个索引项,则树高度不会超过 \(\lceil \log_{\lceil n/2 \rceil}(K/2)\rceil + 1\).
高度最小为 \(\log_n(K)\)

Examples of Insert on B+-Tree

注意内点的 split 和叶子的不一样。要把中间的节点 move 上去。

Examples of Delete on B+-Tree



B+- tree : height and size estimation

浅层节点个数少,我们可以把第一层和第二层的节点都放到内存中 pin 住。

B+-Tree File Organization

B+-Tree File Organization:

  • Leaf nodes in a B+-tree file organization store records, instead of pointers
  • Helps keep data records clustered even when there are insertions/deletions/updates


Other Issues in Indexing

  • Record relocation and secondary indices
    If a record moves, all secondary indices that store record pointers have to be updated
    Node splits in B+-tree file organizations become very expensive
    Solution: use primary-index search key instead of record pointer in secondary index
  • Variable length strings as keys
    Variable fanout
  • Prefix compression
    Key values at internal nodes can be prefixes of full key
    Keep enough characters to distinguish entries in the subtrees separated by the key value

Multiple-Key Access

Composite search keys are search keys containing more than one attribute
e.g. (dept_name, salary)

Lexicographic ordering: \((a_1, a_2) < (b_1, b_2)\) if either \(a_1 < b_1\), or \(a_1=b_1\) and \(a_2 < b_2\).

单个 key, 不同 key 之间组合都可以建立 B+ 树。这样会有很多组合,可以在频繁出现的查询属性上建立 B+ 树。

Non-unique Search Keys

我们可以在 B+ 树叶子节点不直接指向磁盘里的数据,而是指向一个块。

Bulk Loading and Bottom-Up Build

Inserting entries one-at-a-time into a B+-tree requires \(\geq 1\) IO per entry


  • Efficient alternative 1: Insert in sorted order
    局部性较好,减少 I/O.
  • Efficient alternative 2: Bottom-up B+-tree construction
    • First sort index entries
    • Then create B+-tree layer-by-layer, starting with leaf level
    • The built B+-tree is written to disk using sequential I/O operations


fanout 可以计算出来。
可以用 level-order 写到磁盘里,便于顺序访问所有索引,此时块是连续的。(便于顺序访问所有数据项)
这里的代价就是建好后,一次 seek 后全部写出去 (9 blocks)

Bulk insert index entries


把刚刚那棵 B+ 树叶子节点(即遍历所有数据)需要 1seek+6blocks. 随后和上面的数据合并后,写回磁盘时需要 1seek+13blocks.

Merge two existing two B+-trees , to create a new B+-tree using the Bottom-UP Build algorithm, as in LSM-tree Index
假设有两棵这样生成的 B+ 树,将他们合并在一起。首先把叶子节点拿出来排序。

Indexing in Main Memory

cache 按 cache line 传输, 只有 64B.

  • Random access in memory
    • Much cheaper than on disk/flash, but still expensive compared to cache read
    • Binary search for a key value within a large B+-tree node results in many cache misses
      二分查找可能带来很多 cache miss.
    • Data structures that make best use of cache preferable – cache conscious

B+- trees with small nodes that fit in cache line are preferable to reduce cache misses
search key 和 pointer 可以分开放。

Key idea:

  • use large node size to optimize disk access,
  • but structure data within a node using a tree with small node size, instead of using an array, to optimize cache access.

Indexing on Flash

Flash 里不是即时修改,而是先擦掉再写。同时擦的次数是有限制的。因此最好的方法是从底构建,然后顺序写入。

  • Random I/O cost much lower on flash
    20 to 100 microseconds for read/write
  • Writes are not in-place, and (eventually) require a more expensive erase Optimum page size therefore much smaller
  • Bulk-loading still useful since it minimizes page erases
  • Write-optimized tree structures (i.e., LSM-tree) have been adapted to minimize page writes for flash-optimized search trees

Log Structured Merge (LSM) Tree

  • Records inserted first into in-memory tree (\(L_0\) tree)
  • When in-memory tree is full, records moved to disk (\(L_1\) tree)
    B+-tree constructed using bottom-up build by merging existing \(L_1\) tree with records from \(L_0\) tree
    内存里的 B+ 树如果满了,就马上写到磁盘里去(可以连续写)
  • When \(L_1\) tree exceeds some threshold, merge into $L_2¥ tree
    And so on for more levels
    Size threshold for \(L_{i+1}\) tree is \(k\) times size threshold for \(L_i\) tree

这样我们把随机写变为了顺序写。但此时查找一个索引,就要遍历所有 B+ 树。

  • Benefits of LSM approach
    • Inserts are done using only sequential I/O operations
    • Leaves are full, avoiding space wastage
    • Reduced number of I/O operations per record inserted as compared to normal B+-tree (up to some size)
  • Drawback of LSM approach
    • Queries have to search multiple trees
    • Entire content of each level copied multiple times

Stepped Merge Index
原先的结构中,Merge 操作太多,我们可以一次性合并。 磁盘上每层有 k 棵树,当 k 个索引处于同一层时,合并它们并得到一棵层数 \(+1\) 的树,写回。
\(L_1\) 有大小限制,达到后会生成另一个 B+ 树,依次为 k 倍。

最后更新: 2023年5月16日 08:02:44
创建日期: 2023年5月16日 08:02:44