当前位置：网站首页>Data system partition design - partition and secondary index

Data system partition design - partition and secondary index

2022-07-25 15:47:00 【JavaEdge】

Current zoning schemes rely on KV Data model .KV Simple model , It's all through K Access records , Naturally, it can be based on K Determine the partition , And route the read-write request to the responsible K The partition .

But if secondary index is involved , It's complicated . Secondary indexes usually do not uniquely identify a record , It's a query that speeds up specific values , Such as querying users JavaEdge All operations , Find words that contain java All blogs, etc .

many KV Storage （ Such as HBase） In order to reduce the implementation complexity, we give up the secondary index , But some （ Such as Riak） They have started to be supported , The secondary index is also Solr and ES Wait for the root of search server .

The main challenge of secondary indexes is that they cannot be mapped neatly to partitions . There are two schemes that support partitioning secondary indexes ：

Document based partitioning （document-based）
Based on keywords （term-based） The partition

3.1 Partition based on the secondary index of the document

Used car sales network （ Pictured -4）. Each list has a unique document ID, This is right for DB partition , Such as zoning 0 Medium ID 0~499, Partition 1 Medium ID 500~999.

User search , It can be filtered by color and manufacturer , So we need to set secondary index in color and manufacturer （ In the document DB These are fields （field）, Relationship DB These are columns （column））. Whenever a red car is added to DB,DB Partitions are automatically added to index entries color:red Documents ID list .

In this indexing method , Each partition is completely independent , Each maintains its own secondary index , And only responsible for the documents in your own partition , And don't care about the data of other partitions . Whenever you need to write DB（ add to , Delete or update documents ）, Just deal with the document containing the object you are writing ID The partition . therefore , Document partition index is also called local index , Not the global index .

But pay attention when reading ： Except for documents ID Special treatment , Otherwise, it is unlikely to put all cars of a specific color or brand in the same zone . chart -4 in , The red car appears in the partition 0、1. therefore , If you search the red car , You need to send the query to all partitions , Then merge all the returned results .

This query partition DB The method of is sometimes called decentralization / Gather （scatter/gather）, Obviously, this kind of secondary index query is expensive . Even parallel query partitions , Dispersed / Aggregation also tends to cause significant amplification of tail read latency . But it is still widely used ：MongoDB,Cassandra,ES Up to the secondary index based on document partition . Most of the DB The supplier suggests that users build a suitable partition scheme by themselves , Try to satisfy the secondary index query by a single partition , But this is not always feasible , Especially when multiple secondary indexes are used in queries （ For example, you need to press the color at the same time 、 The manufacturer inquires with two conditions ）.

3.2 Based on entries (Term) Secondary index partition of

A global index can be built for all data , Instead of each partition maintaining its own secondary index （ Local index ）. To avoid becoming a bottleneck , You cannot store a global index on a node , Otherwise, the purpose of setting partition balance will be destroyed . therefore , The global index must also be partitioned , But it can be used with K Different partition strategies .

Pictured -5, The red car of all data partitions is included in the index color:red, The index itself is partitioned , If you follow a To r The starting color is in the partition 0,s To z Partition 1. Allied , The index of automobile manufacturers is also partitioned （ The boundaries of the two partitions are f、 h）.

This index is called Entry partition （term-partitioned）, Take the keyword to be searched itself as the index . Such as color ：color:red. key word （Term） This name comes from full-text index （ A special secondary index ）,term Refers to all word sets that appear in the document .

Directly through key word Divide the index globally by itself , Or to its hash. Partitioning according to the keywords themselves is useful for range scanning （ For example, for properties of numeric classes ,e.g. Car quotation ）, And for keywords hash Partitions can be more evenly divided .

Global entry partition V.S Document partition index

It makes reading more efficient , That is, there is no need to disperse / The collection performs a query on all partitions . contrary , The client only needs to send a read request to the partition containing the entry
Disadvantages of global indexing , Writing is slow and complex , Because the update of a single document is , May affect multiple secondary indexes , The partitions of the secondary index may be located in different partitions or different nodes ,

Ideally , The index should be kept up to date , That is, every data written should be immediately reflected in the latest index . But partition entries , This requires distributed transactions across partitions , Writing speed will be greatly affected , So there are DB Synchronous updating of secondary indexes is not supported .

In practice , Updates to the global secondary index are asynchronous （ That is, if you read the index immediately after writing , Then the update may not be reflected in the index ）.

原网站

版权声明
本文为[JavaEdge]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/206/202207251521384504.html

当前位置：网站首页>Data system partition design - partition and secondary index

Data system partition design - partition and secondary index

3.1 Partition based on the secondary index of the document

3.2 Based on entries (Term) Secondary index partition of

Global entry partition V.S Document partition index

边栏推荐

猜你喜欢

随机推荐