Indexing¶

Basic Concepts¶

Indexing mechanisms used to speed up access to desired data.
索引用来加速查找。

Search Key - attribute to set of attributes used to look up records in a file.
An index file consists of records (called index entries) of the form.
索引文件通常是有顺序的

Index Evaluation Metrics¶

Access types supported efficiently
- Point query: records with a specified value in the attribute.
  点查询
- Range query: records with an attribute value falling in a specified range of values.
  范围查询
Access time
Insertion time
Deletion time
Space overhead

Ordered Indices¶

Primary index（主索引）: in a sequentially ordered file, the index whose search key specifies the sequential order of the file.
- Also called clustering index（聚集索引）
- The search key of a primary index is usually but not necessarily the primary key.
Secondary index（辅助索引）: an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index.

Example

主索引和数据内的顺序是一样的。点查和范围查都是比较高效的。

如果 key 不是一个主键，那可能会对应多个记录。

Primary index 是很宝贵的，只能有一个，其他都是辅助索引。

Dense index(稠密索引) — Index record appears for every search-key value in the file.
所有 search-key 都要出现在索引文件里。
Sparse Index（稀疏索引）: contains index records for only some search-key values.

Example

Good tradeoff: sparse index with an index entry for every block in file, corresponding to least search-key value in the block.

Multilevel Index¶

If primary index does not fit in memory, access becomes expensive.
可以对索引文件本身再建立一次索引。

B+-Tree Index¶

All paths from root to leaf are of the same length
Inner node(not a root or a leaf): between $\lceil n/2\rceil$ and $n$ children.
Leaf node: between $\lceil (n–1)/2\rceil$ and $n–1$ values
Special cases:
- If the root is not a leaf:
  at least 2 children.
- If the root is a leaf :
  between 0 and (n–1)values.

一般一个节点就是一个块的大小, 4K.
B+ 树的叉是非常大的。

Observations about B+-trees¶

Since the inter-node connections are done by pointers, “logically” close blocks need not be “physically” close. 如果有很多文件一次性建立 B+ 树，我们可以从叶子节点开始建立。

如果有 K 个索引项，则树高度不会超过 $\lceil \log_{\lceil n/2 \rceil}(K/2)\rceil + 1$.
高度最小为 $\log_n(K)$

Examples of Insert on B+-Tree

注意内点的 split 和叶子的不一样。要把中间的节点 move 上去。

Examples of Delete on B+-Tree

中间点如果不够，从另外一边借一个过来。但是不能直接借，需要把它顶上去。

注意考虑和兄弟合并。

B+- tree : height and size estimation¶

全满的时候节点最少
注意这里的向上/向下取整问题
浅层节点个数少，我们可以把第一层和第二层的节点都放到内存中 pin 住。

B+-Tree File Organization¶

文件组织
B+-Tree File Organization:

Leaf nodes in a B+-tree file organization store records, instead of pointers
叶子节点不再放索引项，放记录本身。
Helps keep data records clustered even when there are insertions/deletions/updates

我们可以改变半满的要求以提高空间利用率。

Other Issues in Indexing¶

Record relocation and secondary indices
If a record moves, all secondary indices that store record pointers have to be updated
Node splits in B+-tree file organizations become very expensive
Solution: use primary-index search key instead of record pointer in secondary index
Variable length strings as keys
Variable fanout
Prefix compression
Key values at internal nodes can be prefixes of full key
Keep enough characters to distinguish entries in the subtrees separated by the key value

Multiple-Key Access¶

Composite search keys are search keys containing more than one attribute
e.g. (dept_name, salary)

Lexicographic ordering: $(a_1, a_2) < (b_1, b_2)$ if either $a_1 < b_1$, or $a_1=b_1$ and $a_2 < b_2$.

单个 key, 不同 key 之间组合都可以建立 B+ 树。这样会有很多组合，可以在频繁出现的查询属性上建立 B+ 树。

Non-unique Search Keys¶

我们可以在 B+ 树叶子节点不直接指向磁盘里的数据，而是指向一个块。
也可以在索引上加上去一个索引，使它对应的记录唯一。
可以通过范围查找

Bulk Loading and Bottom-Up Build¶

Inserting entries one-at-a-time into a B+-tree requires $\geq 1$ IO per entry

如果我们一次性插入很多索引项

Efficient alternative 1: Insert in sorted order
局部性较好，减少 I/O.
Efficient alternative 2: Bottom-up B+-tree construction
- First sort index entries
- Then create B+-tree layer-by-layer, starting with leaf level
- The built B+-tree is written to disk using sequential I/O operations

Example

如果要排序的内容较大，无法放下内存，可以使用外部排序。
fanout 可以计算出来。
可以用 level-order 写到磁盘里，便于顺序访问所有索引，此时块是连续的。（便于顺序访问所有数据项）
这里的代价就是建好后，一次 seek 后全部写出去 (9 blocks)

Bulk insert index entries

Example

把刚刚那棵 B+ 树叶子节点（即遍历所有数据）需要 1seek+6blocks. 随后和上面的数据合并后，写回磁盘时需要 1seek+13blocks.

Merge two existing two B+-trees , to create a new B+-tree using the Bottom-UP Build algorithm, as in LSM-tree Index
假设有两棵这样生成的 B+ 树，将他们合并在一起。首先把叶子节点拿出来排序。

Indexing in Main Memory¶

cache 按 cache line 传输, 只有 64B.

Random access in memory
- Much cheaper than on disk/flash, but still expensive compared to cache read
- Binary search for a key value within a large B+-tree node results in many cache misses
  二分查找可能带来很多 cache miss.
- Data structures that make best use of cache preferable – cache conscious

B+- trees with small nodes that fit in cache line are preferable to reduce cache misses
search key 和 pointer 可以分开放。

Key idea:

use large node size to optimize disk access,
but structure data within a node using a tree with small node size, instead of using an array, to optimize cache access.

Indexing on Flash¶

Flash 里不是即时修改，而是先擦掉再写。同时擦的次数是有限制的。因此最好的方法是从底构建，然后顺序写入。

Random I/O cost much lower on flash
20 to 100 microseconds for read/write
Writes are not in-place, and (eventually) require a more expensive erase Optimum page size therefore much smaller
Bulk-loading still useful since it minimizes page erases
Write-optimized tree structures (i.e., LSM-tree) have been adapted to minimize page writes for flash-optimized search trees
写优化索引结构

Log Structured Merge (LSM) Tree¶

Records inserted first into in-memory tree ($L_0$ tree)
When in-memory tree is full, records moved to disk ($L_1$ tree)
B+-tree constructed using bottom-up build by merging existing $L_1$ tree with records from $L_0$ tree
内存里的 B+ 树如果满了，就马上写到磁盘里去（可以连续写）
When $L_1$ tree exceeds some threshold, merge into $L_2￥ tree
And so on for more levels
Size threshold for $L_{i+1}$ tree is $k$ times size threshold for $L_i$ tree

这样我们把随机写变为了顺序写。但此时查找一个索引，就要遍历所有 B+ 树。

Benefits of LSM approach
- Inserts are done using only sequential I/O operations
- Leaves are full, avoiding space wastage
- Reduced number of I/O operations per record inserted as compared to normal B+-tree (up to some size)
Drawback of LSM approach
- Queries have to search multiple trees
- Entire content of each level copied multiple times

Stepped Merge Index
原先的结构中，Merge 操作太多，我们可以一次性合并。磁盘上每层有 k 棵树，当 k 个索引处于同一层时，合并它们并得到一棵层数 $+1$ 的树，写回。
$L_1$ 有大小限制，达到后会生成另一个 B+ 树，依次为 k 倍。
布隆过滤器

Indexing¶

Basic Concepts¶

Index Evaluation Metrics¶

Ordered Indices¶

Multilevel Index¶

B+-Tree Index¶

Observations about B+-trees¶

B+- tree : height and size estimation¶

B+-Tree File Organization¶

Other Issues in Indexing¶

Multiple-Key Access¶

Non-unique Search Keys¶

Bulk Loading and Bottom-Up Build¶

Indexing in Main Memory¶

Indexing on Flash¶

Log Structured Merge (LSM) Tree¶

评论