Introduce
P2PDB From KK Research and development of the group's offline cash register project , Early adoption of Raft agreement +Sqlite Implement a lightweight distributed database , As for Raft In depth study of the protocol , Find out Raft The protocol has great defects for offline scenarios ( The number of failed nodes is greater than 50%, The whole cluster doesn't work ), Then it opened the way to further study the distributed database protocol , A year later, it was found that based on Merkel's directed acyclic graph +CRDT Implemented logical clock
Decentralization of final consistency can be achieved 、 Distributed 、 Point to point database , It also provides full support for offline scenarios .
P2PDB Is a point-to-point database , This means that each peer has its own specific database instance . Databases are automatically replicated between peers , This generates the latest view of the database when any peer updates . in other words , The database is pulled to the client .
This means that each application contains the full database they are using . With client - Server model , This in turn changes data modeling , client - The server model typically uses a large database for all data : stay P2PDB in , It should be stored according to the access rights of the data 、“ Partition ” or “ Fragmentation ”. for example , In applications like Weibo , Tweets will not be saved in global pages written by millions of users at the same time “ Tweets ” In the database , Instead, each user has its own microblog database . To subscribe to other users' tweets , Just focus on the topic (topic)
1、 Offline priority application
Offline first web Browser running applications , The data generated by the client can be stored in p2pdb in , Let's take a real scenario : In the physical industry , Offline physical stores need multiple cashier systems , At present, the common solution is to use the online cash register system , Data exchange through centralized cloud servers , When the network is poor 、 Even when the network is disconnected , It will directly affect the income of offline physical stores , As a result, offline physical stores cannot operate normally , A better solution would be to use offline storage , Report the data after the network is restored , How to ensure the consistency of data when offline , Use raft The protocol can be used to build an offline storage network through the LAN in the case of offline , But it's similar Raft Paxos Strong consistency protocols have natural defects , When more than half of the nodes are unavailable , The cluster will not work , The cashier system of offline physical stores is often turned on and off 、 In the low peak period of business, the excess cash registers will be shut down , Therefore, a more effective solution is needed .
2、 Distributed edge caching
In the age of Globalization , We often visit various countries 、 Website data of provinces , Because the server nodes are distributed in different parts of the world , Lead to slow access , Such as accessing the U.S. server from the mainland , With high latency . Even though CDNS Vendors provide distributed file cloud storage ,cdns High cost , Not every website can afford to pay ,IPFS The problem of optimization arises , however IPFS Medium filecoin It just solves the problem of file storage , There is still no feasible scheme for the database storage which requires fast data exchange .
3、dapp Decentralized application scenarios
dapp The application of both de Sinochem , Also known as distributed applications , It's supposed to turn on web3.0 Time , DApp Decentralized operation through network nodes . It can run on the user's personal device , such as : mobile phone 、 Personal computers and other terminal equipment ,DApp Running on Peer-to-Peer Networks . Independent of the central server , There is no need for a dedicated communication server to deliver messages , There is no need for a central database to record data . The data is stored in the user's personal space , It could be a cell phone , It could be a personal cloud disk ,p2pdb The design naturally conforms to the decentralized standard , It's going to be dapp Great choice for application data storage .
4、 Multi person online collaboration
When the client application is used for multi person online collaboration . It is necessary to solve the problem of concurrency conflicts , be based on merkle-CRDT The tamper proof log of , Automatically solve the problem of final data consistency .
5、 The Internet of things
Physical network application access p2pdb Realize edge storage , Deploy a subscription node in the cloud , It can automatically collect IOT data of all edge nodes , Data peer-to-peer synchronization realized at the database level , It saves a lot of trouble at the application level .
Two 、 Meet the conditions
To solve the above industry problems , The following goals need to be achieved
3、 ... and 、 Summary of technical principles
CRDT Conflict free replication data types
Because direct transmission can be used for online and offline communication , Therefore, there must be a way to maintain the consistency and order of all messages , Especially in conversations with multiple participants . for example , If Alice and Bob In a chat group with several others , And they all lost their Internet connection by taking the subway , Then they can still use BLE Communicate with each other in the same conversation , To create a parallel version of this conversation . When they go back online ,BLE Version and Internet Versions will have to be merged . therefore , It is necessary to use an algorithm to ensure that all peers have exactly the same sort message list after synchronization .
The solution to this problem is to replicate data types without conflicts (CRDT), It's a data structure , Allow consistent ordering of messages on distributed systems ,CRDT Provides optimistic replication and strong final consistency , Make sure that once you synchronize , Each peer will have the same version of the message list .
Each message is linked to its parent , This is the last message sent in the conversation by one of the peers connected at this time . There will be problems when the online and offline versions of the conversation are synchronized : Some messages are linked to the same parent , The linked list becomes a directed acyclic graph .
.

This leads to the creation of several parallel branches that eventually need to be merged . P2PDB By using Lamport Clock To achieve this : Each message will contain a Lamport Clock, The merged list will be sorted according to its value .
Lamport The clock
Lamport A clock is a structure with two fields : An identity public key and a counter , This counter is for the relevant user / Each message published by identity is incremented
.
// golang
type lamportClock struct {
time int
id crypto.PublicKey
} The comparison function is simple , It first checks the difference between the counter values , without , It checks the dictionary gap between the identity public keys , Knowing that a given identity cannot publish two message values with the same counter .
// golangfunc
compareClock(a, b lamportClock) int {
dist := a.time - b.time
if dist == 0 {
dist = comparePubKey(a.id, b.id) // Returns lexicographic distance
}
return dist
}
merkle-clock
Merkle-clock Of M It's a Merkle-DAG, Each node represents an event , Put it another way , An event in a given system , We can do it here DAG Find a node that represents it in , And allows us to sort it by comparing it with other events .DAG Is to merge other nodes through some simple rules DAG To build , The new event is added as the new root node ( The parent node of an existing node ), It should be noted that ,Merkle-clock There may be multiple roots at a given time .
for example , Given Mα and Mβ(α and β It's these DAG A single root in ):
Otherwise, if α ∈ Mβ, We reserve the Mβ As our new clock , because Mα The history of is already a part of it . under these circumstances , We said Mα < Mβ.
Otherwise, if β ∈ Mα, We reserve for the same reason Mα. Here Under different circumstances , We said Mβ< Mα.
otherwise , By keeping two DAG Merge the two clocks as is , therefore There are two root nodes , These nodes are made up of α and β quote . Please note that ,Mα and Mβ May not intersect at all , May not intersect , It depends on whether they share Enjoy some deeper nodes . If we want to record a new event , We can To create a new root γ, It has two children ,α and β.
We can already see , By identifying a MerkleClock Whether included in another MerkleClock in , We are Merkle-Clocks The concept of order is introduced between . In the same way , We have a concept of order between nodes in each clock , Because an earlier event will always be a descendant of a later event . Besides , We also introduced a method of merging according to their comparison Merkle-Clocks Methods . Generated Merkle-Clock Always include from two Merkle-Clock Cause and effect information . This ultimately means that each copy is stored in Merkle-Clock The causality information in will converge to the same Merkle-Clock.Merkle-Clocks The causal sequence provided is used to construct the Merkle-DAG Embedded when , And is often overlooked as something very intuitive . However , It is important to formalize how we define Merkle-Clocks The order between , And prove that causal information is maintained when they are synchronized and merged .
Peer-to-peer networks
Peer to peer network is an idea of network structure . It is the same as the dominant client in the current network / The server (Client/Server) structure ( That is to say WWW The structure adopted ) An essential difference between , There is no central node in the whole network structure ( Or central server ). stay P2P In structure , Every node (peer) Most of them have information consumers at the same time 、 Information provider and information communication . In terms of calculation mode ,P2P Breaking the traditional Client/Server (C/S) Pattern , Every node in the network has the same status . Each node acts as a server , Provide services for other nodes , At the same time, they also enjoy the services provided by other nodes .
To put it simply ,P2P It's about connecting people directly , Let people interact directly through the Internet .P2P Make it easy to communicate on the Internet 、 More direct sharing and interaction , Truly eliminate middlemen . P2P Another important feature is to change the status of the Internet Centered on the Ethernet website 、 Return “ Decentralization ”, And give the power back to the user . Peer to peer network is a successful extension of the concept of distribution , It allocates the traditional server burden to every node in the network , Each node will undertake limited storage and computing tasks , The more nodes join the network , The more resources a node contributes , The higher the quality of service .
Peer to peer networks can be applied to Internet Relatively powerful computers on the edge ( personal computer ), Perform more advanced tasks than client based computing tasks . modern PC With a very fast processor 、 Massive memory and huge hard disk , While performing routine computing tasks ( such as : Browse email and Web) when , The potential of these devices cannot be fully realized . new PC It's easy to act as both a client and a server for many types of applications ( Counterpart ).

P2P The characteristics of network technology are reflected in the following aspects :
The resources and services in the network are distributed on all nodes , Information transmission and service implementation are directly carried out between nodes , There is no need for the intervention of intermediate links and servers , Avoid possible bottlenecks .P2P The basic characteristics of decentralization , Brings its scalability 、 Robustness and other advantages .
stay P2P In the network , With the addition of users , Not only has the demand for services increased , The overall resources and service capabilities of the system are also expanding synchronously , It can always easily meet the needs of users . Theoretically, its scalability is almost infinite . for example : In the traditional way FTP In the file download mode of , When the number of download users increases , The download speed will get slower and slower , However P2P The Internet is the opposite , The more users you join ,P2P The more resources available on the Internet , The faster you download .
P2P Architecture is inherently attack resistant 、 Advantages of high fault tolerance . Because services are distributed among nodes , The destruction of some nodes or networks has little impact on other parts .P2P Generally, the network can automatically adjust the overall topology when some nodes fail , Keep other nodes connected .P2P Networks are usually self-organized , And allow nodes to join and leave freely .
The performance advantage is P2P An important reason for widespread concern . With the development of hardware technology , The computing, storage capacity and network bandwidth of personal computers are growing rapidly according to Moore's theorem . use P2P The architecture can make effective use of a large number of common nodes scattered in the Internet , Distribute computing tasks or storage data to all nodes . Use the idle computing power or storage space , Achieve the purpose of high-performance computing and mass storage . at present ,P2P The application in this field is mostly in academic research , Once the technology is mature , It can be popularized in the industrial field , Can save many enterprises the cost of purchasing large servers .
stay P2P In the network , Because the transmission of information is distributed among the nodes without going through a centralized link , The possibility of users' privacy information being eavesdropped and leaked is greatly reduced . Besides , At the moment Internet The privacy problem mainly adopts relay forwarding technology , Thus, the participants of communication are hidden in many network entities . In some traditional anonymous communication systems , Implementing this mechanism depends on some relay server nodes . And in the P2P in , All participants can provide relay forwarding function , Therefore, the flexibility and reliability of anonymous communication are greatly improved , Can provide users with better privacy protection .
Proof of version VOS
Proof of vension , similar pos、 An algorithm for storing proofs , The node with the most complete storage version can pack blocks , according to IPFS Protocol storage to different server nodes worldwide , To achieve the purpose of permanent data storage
The log cannot be tampered with
Tamper proof means , Change log generated each time , Broadcast to all nodes , Every ten minutes, it will be packaged into a block log and uploaded to the blockchain , At this time, the log content cannot be tampered with , Refer to Ethereum 、 Bitcoin protocol
Technical terminology
1、 Node consensus
Node consensus refers to , Historical data changes cannot be tampered with 、 The nodes will reach a consensus every ten minutes , When the consensus is completed, the change log will be permanently saved , Each node will save a complete change log , At the same time, a data snapshot will be generated , For fast data recovery .
2、 Proof of version
Proof of version refers to the occurrence of consensus , The node with the most complete version certificate is preferred as the node for data consistency verification , The remaining nodes will perform data proofreading according to the node with the most complete data version , Finally, data snapshots and historical log blocks are generated
3、 Light node
Follow Merkel's directed acyclic graph structure , For edge nodes , Just download the latest 10 Minutes ago .
4、 Data storage snapshots and logs
Store change logs ( similar mysql binlog journal , Every ten minutes ), Data snapshot ( Full data , For fast recovery and backup )
5、 The private network
P2PDB Not just for the public network , At the same time, it also allows enterprises to P2PDB Set up a private network , The private network has the same high partition fault tolerance as the public network 、 Usability 、 But security 、 It will be more controllable .
6、 Public private key
The public-private key provides the security of data during transmission , The node encrypts the data according to the issued public key before data transmission , Only the user who knows the private key can solve the secret text .
The reference paper of the white paper
Lamport time clock
https://lamport.azurewebsites.net/pubs/time-clocks.pdf
CRDT tech report:
https://hal.inria.fr/file/index/docid/555588/filename/techreport.pdf
CRDT :
https://hal.inria.fr/inria-00555588/document
Peer-to-peer networks
https://baike.baidu.com/item/%E5%AF%B9%E7%AD%89%E7%BD%91%E7%BB%9C/5482934?fr=aladdin
merker-CRDT
https://research.protocol.ai/blog/2019/a-new-lab-for-resilient-networks-research/PL-TechRep-merkleCRDT-v0.1-Dec30.pdf
原网站版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/175/202206241237173650.html