当前位置:网站首页>Data system partition design - partition and secondary index
Data system partition design - partition and secondary index
2022-07-25 15:47:00 【JavaEdge】
Current zoning schemes rely on KV Data model .KV Simple model , It's all through K Access records , Naturally, it can be based on K Determine the partition , And route the read-write request to the responsible K The partition .
But if secondary index is involved , It's complicated . Secondary indexes usually do not uniquely identify a record , It's a query that speeds up specific values , Such as querying users JavaEdge All operations , Find words that contain java All blogs, etc .
many KV Storage ( Such as HBase) In order to reduce the implementation complexity, we give up the secondary index , But some ( Such as Riak) They have started to be supported , The secondary index is also Solr and ES Wait for the root of search server .
The main challenge of secondary indexes is that they cannot be mapped neatly to partitions . There are two schemes that support partitioning secondary indexes :
- Document based partitioning (document-based)
- Based on keywords (term-based) The partition
3.1 Partition based on the secondary index of the document
Used car sales network ( Pictured -4). Each list has a unique document ID, This is right for DB partition , Such as zoning 0 Medium ID 0~499, Partition 1 Medium ID 500~999.
User search , It can be filtered by color and manufacturer , So we need to set secondary index in color and manufacturer ( In the document DB These are fields (field), Relationship DB These are columns (column)). Whenever a red car is added to DB,DB Partitions are automatically added to index entries color:red Documents ID list .
In this indexing method , Each partition is completely independent , Each maintains its own secondary index , And only responsible for the documents in your own partition , And don't care about the data of other partitions . Whenever you need to write DB( add to , Delete or update documents ), Just deal with the document containing the object you are writing ID The partition . therefore , Document partition index is also called local index , Not the global index .
But pay attention when reading : Except for documents ID Special treatment , Otherwise, it is unlikely to put all cars of a specific color or brand in the same zone . chart -4 in , The red car appears in the partition 0、1. therefore , If you search the red car , You need to send the query to all partitions , Then merge all the returned results .
This query partition DB The method of is sometimes called decentralization / Gather (scatter/gather), Obviously, this kind of secondary index query is expensive . Even parallel query partitions , Dispersed / Aggregation also tends to cause significant amplification of tail read latency . But it is still widely used :MongoDB,Cassandra,ES Up to the secondary index based on document partition . Most of the DB The supplier suggests that users build a suitable partition scheme by themselves , Try to satisfy the secondary index query by a single partition , But this is not always feasible , Especially when multiple secondary indexes are used in queries ( For example, you need to press the color at the same time 、 The manufacturer inquires with two conditions ).
3.2 Based on entries (Term) Secondary index partition of
A global index can be built for all data , Instead of each partition maintaining its own secondary index ( Local index ). To avoid becoming a bottleneck , You cannot store a global index on a node , Otherwise, the purpose of setting partition balance will be destroyed . therefore , The global index must also be partitioned , But it can be used with K Different partition strategies .
Pictured -5, The red car of all data partitions is included in the index color:red, The index itself is partitioned , If you follow a To r The starting color is in the partition 0,s To z Partition 1. Allied , The index of automobile manufacturers is also partitioned ( The boundaries of the two partitions are f、 h).
This index is called Entry partition (term-partitioned), Take the keyword to be searched itself as the index . Such as color :color:red. key word (Term) This name comes from full-text index ( A special secondary index ),term Refers to all word sets that appear in the document .
Directly through key word Divide the index globally by itself , Or to its hash. Partitioning according to the keywords themselves is useful for range scanning ( For example, for properties of numeric classes ,e.g. Car quotation ), And for keywords hash Partitions can be more evenly divided .
Global entry partition V.S Document partition index
- It makes reading more efficient , That is, there is no need to disperse / The collection performs a query on all partitions . contrary , The client only needs to send a read request to the partition containing the entry
- Disadvantages of global indexing , Writing is slow and complex , Because the update of a single document is , May affect multiple secondary indexes , The partitions of the secondary index may be located in different partitions or different nodes ,
Ideally , The index should be kept up to date , That is, every data written should be immediately reflected in the latest index . But partition entries , This requires distributed transactions across partitions , Writing speed will be greatly affected , So there are DB Synchronous updating of secondary indexes is not supported .
In practice , Updates to the global secondary index are asynchronous ( That is, if you read the index immediately after writing , Then the update may not be reflected in the index ).
边栏推荐
- Storage structure of cross linked list
- The difference between VaR, let and Const
- Beyond Compare 4 实现class文件对比【最新】
- Leetcode - 707 design linked list (Design)
- User defined annotation verification API parameter phone number
- Cf750f1 thinking DP
- 2019陕西省省赛J-位运算+贪心
- Alibaba's internal "100 billion level concurrent system architecture design notes" are all inclusive, too comprehensive
- Pytoch learning notes advanced_ CNN (using perception_module) implements MNIST dataset classification - (comments and results)
- Deadlock gossip
猜你喜欢

Understand "average load"

LeetCode - 379 电话目录管理系统(设计)

Phased summary of the research and development of the "library management system -" borrowing and returning "module

Leetcode - 232 realize queue with stack (design double stack to realize queue)

LeetCode - 380 O(1) 时间插入、删除和获取随机元素 (设计 哈希表+数组)

使用cpolar建立一个商业网站(如何购买域名)

Leetcode - 380 o (1) time to insert, delete and get random elements (design hash table + array)

Reasons for data format conversion when matlab reads the displayed image

p4552-差分

HDD杭州站·HarmonyOS技术专家分享HUAWEI DevEco Studio特色功能
随机推荐
Pytoch learning notes -- Summary of common functions 3
CF365-E - Mishka and Divisors,数论+dp
Cf750f1 thinking DP
Pytorch学习笔记--Pytorch常用函数总结1
Activity review | July 6 Anyuan AI X machine heart series lecture No. 2 | MIT professor Max tegmark shares "symbiotic evolution of human and AI"
Gary marcus: learning a language is more difficult than you think
var、let、const之间的区别
ZOJ - 4114 Flipping Game-dp,合理状态表示
MySQL optimization summary II
p4552-差分
BSC智能链合约模式系统开发详情
BSC smart chain contract mode system development details
Get the ask code corresponding to the key pressed by the keyboard
Leetcode - 380 o (1) time to insert, delete and get random elements (design hash table + array)
JVM—类加载器和双亲委派模型
Pytoch learning notes -- Summary of common functions of pytoch 1
2019 Shaanxi Provincial race K-variant Dijstra
Cf685b find the center of gravity of each subtree of a rooted tree
C # fine sorting knowledge points 10 generic (recommended Collection)
Cf365-e - Mishka and divisors, number theory +dp