当前位置:网站首页>Web3 decentralized storage ecological landscape
Web3 decentralized storage ecological landscape
2022-06-26 16:47:00 【Blockchain Technology Researcher】
If we want to go further in decentralizing the Internet , These three pillars will eventually be needed : Consensus 、 Storage and calculation . If humanity succeeds in decentralizing these three areas , We will embark on the next stage of the Internet journey : Web3.

Storage , As the second pillar , Is maturing rapidly , Various storage solutions have been applied to usage scenarios .
The need for decentralized storage
From the perspective of blockchain
From the perspective of blockchain , We need decentralized storage because the blockchain itself is not designed to store large amounts of data . The mechanism for obtaining block consensus relies on a small amount of data ( transaction ), These data are placed in blocks ( Collect transactions ), And quickly share to the network for node verification .
First , Storing data in blocks is very expensive . At the time of writing , stay layer1 Store a complete BAYC #3368 It costs more than 18000 dollar .

secondly , If we want to store a lot of arbitrage data in these blocks , Network congestion will become serious , This can cause... When using the network gas The war led to a rise in prices . This is the consequence of the implicit time value of the block , If a user needs to submit a transaction to the network at a certain time , They will have to pay extra gas Fee to make their deal a priority .
therefore , It is suggested that NFT Metadata and image data 、dApp The front end of the is stored off the chain .
From the perspective of centralized network
If storing data on the chain is so expensive , Why not store data directly under the centralized network chain ?
Centralized networks are subject to scrutiny and variability . This requires users to trust the data provider to maintain data security . No one can guarantee that the operators of the centralized network will really live up to the trust of users : Data may be erased intentionally or accidentally . For example, it may be because the data provider changes the policy 、 Hardware failure or being attacked by a third party .
NFTs
With NFT The floor price of the collection exceeds 10 Thousands of dollars , some NFT Every time kb The value of image data is as high as 7 ten thousand , Commitment alone is not enough to ensure that data is available at all times . A stronger guarantee is needed to ensure that the bottom layer NFT Invariance and persistence of data .

NFT Does not really contain any image data , contrary , They only have pointers to metadata and image data stored under the chain . But it is these metadata and image data that need to be protected , If these data disappear ,NFT Will be just an empty container .

so to speak ,NFT The value of is not primarily driven by the metadata and image data they refer to , It is driven by the movement and the community of ecosystems around the collection . Although this may be true , But if there is no basic data ,NFT Will be meaningless , Meaningless communities cannot be formed at all .
In addition to profile images and art collections ,NFT It can also represent the ownership of real-world assets , Such as real estate or financial instruments . Such data has external real-world value , from So through NFT Represents its value , So save NFT The value of each byte of data will not be lower than that of the chain NFT The value of .
dApps
If NFT It is a commodity existing on the blockchain , that dApp It can be considered as a service that exists on the blockchain and promotes interaction with the blockchain . dApp It is a combination of the front-end user interface under the chain and the smart contract that exists on the network and interacts with the blockchain . Sometimes they also have a simple back end , Some calculations can be moved off the chain to reduce the amount of gas, Thus reducing the costs incurred by end users for certain transactions .

Even though dApp The value of should be based on dApp In the context of ( Such as ,DeFi,GameFi, social contact , Meta universe , Name service, etc ),dApps The value is amazing . The past at the time of writing 30 Days. ,DappRadar In the top 10 Bit dApp Together contributed to more than 1500 A $billion transfer .

Even though dApp The core mechanism of is implemented by smart contracts , End users can ensure user accessibility through the front end . therefore , In a sense , Make sure dApp The accessibility of the front end is to ensure the availability of the underlying services .

Decentralized storage reduces server failures 、DNS hackers 、 And centralized entity deletion dApp Front end access . Even if it stops dApp Development of , You can also continue to access smart contracts through the front end .
Picture of decentralized storage
Blockchains such as bitcoin Ethereum exist mainly to promote value transfer . When it comes to decentralized storage networks , Some networks also use this method : They use native blockchains to record and track stored orders , This represents a transfer of value in exchange for storage services . However , This is just one of many potential approaches —— Broad storage space , Over the years, different solutions have emerged with different trade-offs and use cases .

Despite many differences , But all these projects have one thing in common : None of these networks replicate all data on all nodes , This is the case with bitcoin and Ethereum blockchain . In a decentralized storage network , The immutability and availability of stored data are not achieved by storing and verifying successively linked data on most networks , This is the case with bitcoin and Ethereum . Although as mentioned earlier , Many networks choose to use blockchains to track stored orders .
It is not sustainable for all nodes on a decentralized storage network to store all data , Because the indirect cost of running the network will rapidly increase the storage cost of users , And finally promote the centralization of the network , Turn to a few node operators who can afford the hardware cost .
therefore , Decentralized storage networks need to overcome extraordinary challenges .
The challenge of decentralized storage
Review the previous limitations on data storage on the chain , It is clear that a decentralized storage network must store data in a way that does not affect the network value transfer mechanism , At the same time, ensure that the data remains persistent 、 Immutability and accessibility . essentially , A decentralized storage network must be able to store data 、 Retrieving data and maintaining data , At the same time, ensure that all participants in the network are motivated by their storage and retrieval work , At the same time, it is necessary to maintain the trustworthiness and willfulness of the decentralized system .
These challenges can be summarized as the following questions :
- Data storage format : Store complete files or file fragments ?
- Data replication : How many nodes to store data across ( Complete file or fragment )?
- Store trace : How the network knows where to retrieve files ?
- Proof of stored data : Whether nodes store the data they are required to store ?
- Data availability over time : Whether the data is still stored over time ?
- Store price discovery : How storage costs are determined ?
- Persistent data redundancy : If the node leaves the network , How the network ensures that data is still available ?
- The data transfer : Network bandwidth comes at a cost —— How to ensure that a node retrieves data when asked ?
- Network token economics : In addition to ensuring that data is available on the network , How does the network ensure its long-term existence ?
As part of this study , The various networks that have been explored employ a wide range of mechanisms , And through some trade-offs to achieve decentralization .

An in-depth comparison of the above networks for each challenge , And the detailed configuration file of each network , Can be found in Arweave or Crust Network Read the complete research article .
Data storage format

In these networks , There are two main ways to store data on the network : Store complete files and use erasure codes :Arweave and Crust Network Store complete files , and Filecoin、Sia、Storj and Swarm All use erasure codes . In erasure coding , The data is decomposed into fixed size fragments , Each fragment is expanded and encoded with redundant data . The redundant data stored in each fragment makes it necessary to reconstruct the original file by only a subset of the fragment .
Data replication
stay Filecoin、Sia、Storj and Swarm in , The network determines the number of erasure encoded segments and the range of redundant data to be stored in each segment . However ,Filecoin It also allows the user to determine the replication factor , This factor determines that as part of a storage transaction with a single storage miner , How many separate physical devices should the erasure code segment be copied . If the user wants to use a different storage miner to store files , Then the user must make a separate storage transaction . Crust and Arweave Let the network decide to replicate , And in the Crust It is possible to manually set the replication factor on . stay Arweave On , Storage proof mechanism encourages nodes to store as much data as possible . therefore ,Arweave The upper limit of replication is the total number of storage nodes on the network .

The methods used to store and copy data will affect how the network retrieves data .
Store trace
After the data is stored on the network and distributed in any form among the nodes in the network , The network needs to be able to track stored data . Filecoin、Crust and Sia Both use local blockchains to track and store orders , The storage node also maintains a list of local network locations . Arweave Use a blockchain like structure . Different from blockchains such as bitcoin and Ethereum , stay Arweave On , The node can decide whether to store the data from the block . therefore , If you compare Arweave A chain of multiple nodes on , They will not be exactly the same —— contrary , Some blocks on some nodes are lost , On other nodes, you can find .

Last ,Storj and Swarm Two completely different methods are used . stay Storj in , A second node type, called a satellite node, acts as a coordinator for a set of storage nodes , Storage location for managing and tracking data . stay Swarm in , The address of the data is directly embedded in the data block . When retrieving data , The network knows where to look according to the data itself .
Store data to prove
When proving how data is stored , Each network has its own unique approach . Filecoin Use replication to prove —— A proprietary storage proof mechanism , It first stores the data on the storage node , Then seal the data in a sector . The sealing process allows two duplicate fragments of the same data to prove that they are unique to each other , This ensures that the correct number of copies are stored on the network ( So for 「 Proof of reproduction 」).
Crust Break a piece of data into many small pieces , These small pieces are hashed into Merkle In the tree . By hashing the result of a single data stored on a physical storage device with the expected Merkle Compare tree hash values ,Crust You can verify that the file is stored correctly . This is similar to Sia Methods , The difference is Crust Store the entire file on each node , and Sia Store erasure encoded fragments . Crust You can store the entire file on a single node , And you can still use the node trusted execution environment (TEE) To achieve privacy , This is a sealed hardware component that even the hardware owner cannot access . Crust This storage proof algorithm is called 「 Proof of meaningful work 」, Meaningful means that the new hash value is calculated only when the stored data is changed , Thus, meaningless operations are reduced . Crust and Sia All will Merkle The tree root hash is stored on the blockchain , As a true source for verifying data integrity .
Storj Check whether the data has been stored correctly through data audit . Data auditing is similar to Crust and Sia How to use Merkle Tree to validate data fragments . stay Storj On , Once enough nodes return their audit results , The network can determine which nodes are faulty according to most of the responses , Instead of comparing with the fact source of blockchain . Storj This mechanism in is very intentional , Because developers think , Reducing network wide coordination through blockchain can speed up ( No need to wait for consensus ) And bandwidth usage ( There is no need for the entire network to interact regularly with the blockchain ) Improve performance .
Arweave Use the encryption proof of work challenge to determine if the file has been stored . In this mechanism , To enable the node to mine the next block , They need to prove that they can access the previous block and another random block in the network block history . Because in Arweave The data uploaded in is directly stored in the block , Prove that the storage provider did save the file correctly by proving access to the previous block .
Last , stay Swarm It also uses Merkle Trees , The difference is Merkle The tree is not used to determine the file location , Instead, data blocks are stored directly in Merkle In the tree . stay swarm When storing data on , The root of the tree ( It is also the address where the data is stored ) The documentation has been properly partitioned and stored .
Data availability over time
Again , When determining that data is stored in a specific period of time , Each network has a unique approach . stay Filecoin in , To reduce network bandwidth , The storage miner needs to run the replication proof algorithm continuously within the time period to store data . The result hash of each time period proves that the storage space has been occupied by the correct data in a specific time period , So it is 「 Time and space prove 」.
Crust、Sia and Storj Verify the random data segment regularly , And report the results to their coordination mechanism ——Crust and Sia Blockchain , as well as Storj Satellite nodes of . Arweave Ensure the consistent availability of data through its access proof mechanism , This requires miners not only to prove that they can access the last block , And prove that they can access a random block of history . Storing older and rarer blocks is an incentive , Because it increases the likelihood that the miner will win the workload proof challenge , This challenge is a prerequisite for accessing a particular block .
On the other hand ,Swarm Run the lottery regularly , Reward nodes hold less popular data over time , At the same time, it also runs a proof of ownership algorithm for the data that the node promises to store for a longer time .
Filecoin、Sia and Crust The node needs to deposit collateral to become a storage node , and Swarm Just need it for long-term storage requests . Storj No upfront collateral is required , but Storj Part of the deposit income of the miners will be withheld . Last , All networks make periodic payments to the nodes for the period of time that the nodes can prove to store data .
Store price discovery
To determine the storage price ,Filecoin and Sia Use the storage marketplace , Storage vendors set their asking prices , Storage users set the price they are willing to pay , And other settings . then , The storage market connects users to storage providers that meet their requirements . Storj In a similar way , The main difference is that no single network wide market can connect all nodes on the network . contrary , Each satellite has its own set of storage nodes that interact with it .
Last ,Crust、Arweave and Swarm Let the agreement determine the storage price . Crust and Swarm Some settings can be made according to the user's file storage requirements , and Arweave The files on the are stored permanently .
Persistent data redundancy
as time goes on , Nodes will leave these open public networks , When the node disappears , The data they store will also disappear . therefore , The network must actively maintain a certain degree of redundancy in the system . Sia and Storj By collecting a subset of fragments 、 Rebuild the underlying data and then re encode the file to recreate the missing fragment , Redundancy is achieved by supplementing lost erasure encoded segments . stay Sia in , Users must log in regularly Sia Only the client can replenish the fragments , Because only the client can distinguish which data fragments belong to which data and users . And in the Storj On ,Satellite Always online and regularly run data audits to supplement data fragments .
Arweave Our access proof algorithm ensures that data is always replicated regularly throughout the network , And in the Swarm On , Data is copied to nodes close to each other . stay Filecoin On , If the data disappears over time and the remaining file fragments fall below a certain threshold , Storage orders will be reintroduced into the storage market , Allow another storage miner to take over the storage order .Crust Replenishment mechanism (replenishment mechanism) Currently under development .
Drive data transmission
as time goes on , After the data is safely stored , Users will want to retrieve data . Because bandwidth comes at a cost , Therefore, data must be provided to motivate storage nodes when necessary . Crust and Swarm Use debt and credit mechanisms , Each node tracks how inbound and outbound traffic flows to the nodes they interact with . If a node only accepts inbound traffic , But the outbound flow is not accepted , Then it will be de prioritized for future communication , This may affect their ability to accept new stored orders . Crust Use IFPS Bitswap Mechanism , and Swarm Use the name SWAP Exclusive agreement of . stay Swarm Of SWAP Agreement on , The network allows nodes to pay off their debts with stamps ( Only accept inbound traffic without sufficient outbound traffic ), This can be exchanged for their practical tokens .

This tracking of node generosity is also Arweave How to ensure that data is transmitted on request . stay Arweave in , This mechanism is called wildfire , Nodes will give priority to peer nodes with better ranking , And rationalize the use of bandwidth accordingly . Last , stay Filecoin、Storj and Sia On , Users will eventually pay for bandwidth , Thus, the nodes are encouraged to deliver data when requested .
Token economy
Token economy design ensures the stability of the network , It also ensures that the network will exist for a long time , Because the final data is only as permanent as the network . In the table below , We can find a brief summary of token economics design decisions , And the inflation and deflation mechanism embedded in the corresponding design .

Which is the best network ?
It cannot be said that one network is objectively better than another . When designing decentralized storage networks , There are countless tradeoffs . although Arweave Ideal for storing data permanently , but Arweave Not necessarily suitable for Web2.0 Industry participants migrate to Web3.0 - Not all data needs to be permanently saved . however , A strong data sub domain really needs permanence :NFT and dApp.
Final , Design decisions will be based on the purpose of the network .
Here is a summary of the various storage networks , They compare with each other on a set of scales defined below . The scales used reflect the comparative dimensions of these networks , But it should be noted that , In many cases, there is no good or bad way to overcome the challenges of decentralized storage , It just reflects the design decision .
- Storage parameter flexibility : The user controls the extent to which the file stores parameters
- Storage persistence : To what extent can file storage achieve theoretical persistence through the network ( That is, no intervention is required )
- Redundant persistence : The ability of a network to maintain data redundancy by supplementing or repairing
- Data transmission incentives : The extent to which the network ensures that nodes transmit data generously
- The universality of storage tracing : The degree of consensus between nodes on the location of data storage
- Guaranteed data accessibility : The ability of the network to ensure that a single participant in a stored procedure cannot remove access to files on the network
The higher the score, the stronger the ability of the above items .
Filecoin Token economics supports increasing the storage space of the entire network , Used to store large amounts of data in an immutable manner . Besides , Their storage algorithm is more suitable for data that is unlikely to change greatly over time ( Cold storage ).

Crust Token economics ensures super redundancy and fast retrieval , Make it suitable for high flow dApp And it is suitable for quick retrieval of popular NFT The data of .
Crust Low score in storage persistence , Because there is no persistent redundancy , Its ability to provide permanent storage will be severely affected . For all that , Persistence can still be achieved by manually setting extremely high replication factors .

Sia It's about privacy . The reason why users need to restore health manually , This is because the node does not know which data segments it has stored , And what data these fragments belong to . Only the data owner can reconstruct the original data from the shards in the network .

by comparison ,Arweave It's about persistence . This is also reflected in their endowment design , This makes storage more expensive , But it also makes them NFT Attractive choice for storage .

Storj Their business model seems to affect their billing and payment methods to a large extent : Amazon AWS S3 Users are more familiar with monthly billing . By removing the complex payment and incentive systems common in blockchain based systems ,Storj Labs At the expense of some decentralization , But significantly lower AWS Entry threshold for key target groups of users .

Swarm The joint curve model ensures that as more data is stored on the network , Storage costs remain relatively low , And its proximity to the Ethereum blockchain makes it a more complex Ethereum based blockchain dApp Key storage competitors for .

For the challenges of decentralized storage networks , There is no single best method . According to the purpose of the network and the problems it tries to solve , It must balance the technology of network design with the economics of token .

Last , The purpose of the network and the specific use cases it tries to optimize will determine various design decisions .
边栏推荐
- 【从删库到跑路】MySQL基础 完结篇(入个门先跑路了。。)
- [from database deletion to running] JDBC conclusion (finish the series in one day!! run as soon as you finish learning!)
- Interpretation of new plug-ins | how to enhance authentication capability with forward auth
- Find out the maximum value of each column element of NxN matrix and store it in the one-dimensional array indicated by formal parameter B in order
- Screenshot of the answers to C language exercises
- Gui+sqlserver examination system
- What is the preferential account opening policy of securities companies now? Is it safe to open an account online now?
- What is flush software? Is it safe to open an account online?
- Stm32f103c8t6 realize breathing lamp code
- Which position does Anxin securities rank? Is it safe to open an account?
猜你喜欢

国内首款开源 MySQL HTAP 数据库即将发布,三大看点提前告知

我把它当副业月入3万多,新手月入过万的干货分享!

Apache APIs IX has the risk of rewriting the x-real-ip header (cve-2022-24112)
Scala 基础 (二):变量和数据类型

5g is not flat and 6G is restarted. China leads wireless communication. What is the biggest advantage of 6G?

pybullet机器人仿真环境搭建 5.机器人位姿可视化

Qt 5.9.8 安装教程

数字藏品与NFT到底有何区别

Niuke Xiaobai monthly race 50

构造函数和析构函数
随机推荐
Science | 红树林中发现的巨型细菌挑战传统无核膜观念
[learn FPGA programming from scratch -46]: Vision - development and technological progress of integrated circuits
Vibrating liquid quantity detecting device
y=1/100*100+1/200*200+1/300*300+.....+ 1/m*m
pybullet机器人仿真环境搭建 5.机器人位姿可视化
[chat in 5] eight years after graduation, I have been pursuing my dream
板卡的分级调试经验
Kubernetes essential tools: 2021
Find all primes less than or equal to Lim, store them in AA array, and return the number of primes
Use the array to calculate the average of N numbers, and output the numbers greater than the average
QT 5.9.8 installation tutorial
基于STM32+华为云IOT设计的云平台监控系统
Structure the graduation project of actual combat camp
Scala 基礎 (二):變量和數據類型
Call the random function to generate 20 different integers and put them in the index group of institute a
Redis Guide (8): principle and implementation of Qianfan Jingfa distributed lock
Discover K8E: minimalist kubernetes distribution
Codeforces Round #802 (Div. 2)
基於Kubebuilder開發Operator(入門使用)
Multiply the values of the upper triangular elements of the array by M