当前位置：网站首页>Web3 decentralized storage ecological landscape

Web3 decentralized storage ecological landscape

2022-06-26 16:47:00 【Blockchain Technology Researcher】

If we want to go further in decentralizing the Internet , These three pillars will eventually be needed ： Consensus 、 Storage and calculation . If humanity succeeds in decentralizing these three areas , We will embark on the next stage of the Internet journey ： Web3.

Storage , As the second pillar , Is maturing rapidly , Various storage solutions have been applied to usage scenarios .

The need for decentralized storage

From the perspective of blockchain

From the perspective of blockchain , We need decentralized storage because the blockchain itself is not designed to store large amounts of data . The mechanism for obtaining block consensus relies on a small amount of data （ transaction ）, These data are placed in blocks （ Collect transactions ）, And quickly share to the network for node verification .

First , Storing data in blocks is very expensive . At the time of writing , stay layer1 Store a complete BAYC #3368 It costs more than 18000 dollar .

secondly , If we want to store a lot of arbitrage data in these blocks , Network congestion will become serious , This can cause... When using the network gas The war led to a rise in prices . This is the consequence of the implicit time value of the block , If a user needs to submit a transaction to the network at a certain time , They will have to pay extra gas Fee to make their deal a priority .

therefore , It is suggested that NFT Metadata and image data 、dApp The front end of the is stored off the chain .

From the perspective of centralized network

If storing data on the chain is so expensive , Why not store data directly under the centralized network chain ？

Centralized networks are subject to scrutiny and variability . This requires users to trust the data provider to maintain data security . No one can guarantee that the operators of the centralized network will really live up to the trust of users ： Data may be erased intentionally or accidentally . For example, it may be because the data provider changes the policy 、 Hardware failure or being attacked by a third party .

NFTs

With NFT The floor price of the collection exceeds 10 Thousands of dollars , some NFT Every time kb The value of image data is as high as 7 ten thousand , Commitment alone is not enough to ensure that data is available at all times . A stronger guarantee is needed to ensure that the bottom layer NFT Invariance and persistence of data .

NFT Does not really contain any image data , contrary , They only have pointers to metadata and image data stored under the chain . But it is these metadata and image data that need to be protected , If these data disappear ,NFT Will be just an empty container .

so to speak ,NFT The value of is not primarily driven by the metadata and image data they refer to , It is driven by the movement and the community of ecosystems around the collection . Although this may be true , But if there is no basic data ,NFT Will be meaningless , Meaningless communities cannot be formed at all .

In addition to profile images and art collections ,NFT It can also represent the ownership of real-world assets , Such as real estate or financial instruments . Such data has external real-world value , from So through NFT Represents its value , So save NFT The value of each byte of data will not be lower than that of the chain NFT The value of .

dApps

If NFT It is a commodity existing on the blockchain , that dApp It can be considered as a service that exists on the blockchain and promotes interaction with the blockchain . dApp It is a combination of the front-end user interface under the chain and the smart contract that exists on the network and interacts with the blockchain . Sometimes they also have a simple back end , Some calculations can be moved off the chain to reduce the amount of gas, Thus reducing the costs incurred by end users for certain transactions .

Even though dApp The value of should be based on dApp In the context of （ Such as ,DeFi,GameFi, social contact , Meta universe , Name service, etc ）,dApps The value is amazing . The past at the time of writing 30 Days. ,DappRadar In the top 10 Bit dApp Together contributed to more than 1500 A $billion transfer .

Even though dApp The core mechanism of is implemented by smart contracts , End users can ensure user accessibility through the front end . therefore , In a sense , Make sure dApp The accessibility of the front end is to ensure the availability of the underlying services .

Decentralized storage reduces server failures 、DNS hackers 、 And centralized entity deletion dApp Front end access . Even if it stops dApp Development of , You can also continue to access smart contracts through the front end .

Picture of decentralized storage

Blockchains such as bitcoin Ethereum exist mainly to promote value transfer . When it comes to decentralized storage networks , Some networks also use this method ： They use native blockchains to record and track stored orders , This represents a transfer of value in exchange for storage services . However , This is just one of many potential approaches —— Broad storage space , Over the years, different solutions have emerged with different trade-offs and use cases .

Despite many differences , But all these projects have one thing in common ： None of these networks replicate all data on all nodes , This is the case with bitcoin and Ethereum blockchain . In a decentralized storage network , The immutability and availability of stored data are not achieved by storing and verifying successively linked data on most networks , This is the case with bitcoin and Ethereum . Although as mentioned earlier , Many networks choose to use blockchains to track stored orders .

It is not sustainable for all nodes on a decentralized storage network to store all data , Because the indirect cost of running the network will rapidly increase the storage cost of users , And finally promote the centralization of the network , Turn to a few node operators who can afford the hardware cost .

therefore , Decentralized storage networks need to overcome extraordinary challenges .

The challenge of decentralized storage

Review the previous limitations on data storage on the chain , It is clear that a decentralized storage network must store data in a way that does not affect the network value transfer mechanism , At the same time, ensure that the data remains persistent 、 Immutability and accessibility . essentially , A decentralized storage network must be able to store data 、 Retrieving data and maintaining data , At the same time, ensure that all participants in the network are motivated by their storage and retrieval work , At the same time, it is necessary to maintain the trustworthiness and willfulness of the decentralized system .

These challenges can be summarized as the following questions ：

Data storage format ： Store complete files or file fragments ？
Data replication ： How many nodes to store data across （ Complete file or fragment ）？
Store trace ： How the network knows where to retrieve files ？
Proof of stored data ： Whether nodes store the data they are required to store ？
Data availability over time ： Whether the data is still stored over time ？
Store price discovery ： How storage costs are determined ？
Persistent data redundancy ： If the node leaves the network , How the network ensures that data is still available ？
The data transfer ： Network bandwidth comes at a cost —— How to ensure that a node retrieves data when asked ？
Network token economics ： In addition to ensuring that data is available on the network , How does the network ensure its long-term existence ？

As part of this study , The various networks that have been explored employ a wide range of mechanisms , And through some trade-offs to achieve decentralization .

An in-depth comparison of the above networks for each challenge , And the detailed configuration file of each network , Can be found in Arweave or Crust Network Read the complete research article .

Data storage format

In these networks , There are two main ways to store data on the network ： Store complete files and use erasure codes ：Arweave and Crust Network Store complete files , and Filecoin、Sia、Storj and Swarm All use erasure codes . In erasure coding , The data is decomposed into fixed size fragments , Each fragment is expanded and encoded with redundant data . The redundant data stored in each fragment makes it necessary to reconstruct the original file by only a subset of the fragment .

Data replication

stay Filecoin、Sia、Storj and Swarm in , The network determines the number of erasure encoded segments and the range of redundant data to be stored in each segment . However ,Filecoin It also allows the user to determine the replication factor , This factor determines that as part of a storage transaction with a single storage miner , How many separate physical devices should the erasure code segment be copied . If the user wants to use a different storage miner to store files , Then the user must make a separate storage transaction . Crust and Arweave Let the network decide to replicate , And in the Crust It is possible to manually set the replication factor on . stay Arweave On , Storage proof mechanism encourages nodes to store as much data as possible . therefore ,Arweave The upper limit of replication is the total number of storage nodes on the network .

The methods used to store and copy data will affect how the network retrieves data .

Store trace

After the data is stored on the network and distributed in any form among the nodes in the network , The network needs to be able to track stored data . Filecoin、Crust and Sia Both use local blockchains to track and store orders , The storage node also maintains a list of local network locations . Arweave Use a blockchain like structure . Different from blockchains such as bitcoin and Ethereum , stay Arweave On , The node can decide whether to store the data from the block . therefore , If you compare Arweave A chain of multiple nodes on , They will not be exactly the same —— contrary , Some blocks on some nodes are lost , On other nodes, you can find .

Last ,Storj and Swarm Two completely different methods are used . stay Storj in , A second node type, called a satellite node, acts as a coordinator for a set of storage nodes , Storage location for managing and tracking data . stay Swarm in , The address of the data is directly embedded in the data block . When retrieving data , The network knows where to look according to the data itself .

Store data to prove

When proving how data is stored , Each network has its own unique approach . Filecoin Use replication to prove —— A proprietary storage proof mechanism , It first stores the data on the storage node , Then seal the data in a sector . The sealing process allows two duplicate fragments of the same data to prove that they are unique to each other , This ensures that the correct number of copies are stored on the network （ So for 「 Proof of reproduction 」）.

Crust Break a piece of data into many small pieces , These small pieces are hashed into Merkle In the tree . By hashing the result of a single data stored on a physical storage device with the expected Merkle Compare tree hash values ,Crust You can verify that the file is stored correctly . This is similar to Sia Methods , The difference is Crust Store the entire file on each node , and Sia Store erasure encoded fragments . Crust You can store the entire file on a single node , And you can still use the node trusted execution environment (TEE) To achieve privacy , This is a sealed hardware component that even the hardware owner cannot access . Crust This storage proof algorithm is called 「 Proof of meaningful work 」, Meaningful means that the new hash value is calculated only when the stored data is changed , Thus, meaningless operations are reduced . Crust and Sia All will Merkle The tree root hash is stored on the blockchain , As a true source for verifying data integrity .

Storj Check whether the data has been stored correctly through data audit . Data auditing is similar to Crust and Sia How to use Merkle Tree to validate data fragments . stay Storj On , Once enough nodes return their audit results , The network can determine which nodes are faulty according to most of the responses , Instead of comparing with the fact source of blockchain . Storj This mechanism in is very intentional , Because developers think , Reducing network wide coordination through blockchain can speed up （ No need to wait for consensus ） And bandwidth usage （ There is no need for the entire network to interact regularly with the blockchain ） Improve performance .

Arweave Use the encryption proof of work challenge to determine if the file has been stored . In this mechanism , To enable the node to mine the next block , They need to prove that they can access the previous block and another random block in the network block history . Because in Arweave The data uploaded in is directly stored in the block , Prove that the storage provider did save the file correctly by proving access to the previous block .

Last , stay Swarm It also uses Merkle Trees , The difference is Merkle The tree is not used to determine the file location , Instead, data blocks are stored directly in Merkle In the tree . stay swarm When storing data on , The root of the tree （ It is also the address where the data is stored ） The documentation has been properly partitioned and stored .

Data availability over time

Again , When determining that data is stored in a specific period of time , Each network has a unique approach . stay Filecoin in , To reduce network bandwidth , The storage miner needs to run the replication proof algorithm continuously within the time period to store data . The result hash of each time period proves that the storage space has been occupied by the correct data in a specific time period , So it is 「 Time and space prove 」.

Crust、Sia and Storj Verify the random data segment regularly , And report the results to their coordination mechanism ——Crust and Sia Blockchain , as well as Storj Satellite nodes of . Arweave Ensure the consistent availability of data through its access proof mechanism , This requires miners not only to prove that they can access the last block , And prove that they can access a random block of history . Storing older and rarer blocks is an incentive , Because it increases the likelihood that the miner will win the workload proof challenge , This challenge is a prerequisite for accessing a particular block .

On the other hand ,Swarm Run the lottery regularly , Reward nodes hold less popular data over time , At the same time, it also runs a proof of ownership algorithm for the data that the node promises to store for a longer time .

Filecoin、Sia and Crust The node needs to deposit collateral to become a storage node , and Swarm Just need it for long-term storage requests . Storj No upfront collateral is required , but Storj Part of the deposit income of the miners will be withheld . Last , All networks make periodic payments to the nodes for the period of time that the nodes can prove to store data .

Store price discovery

To determine the storage price ,Filecoin and Sia Use the storage marketplace , Storage vendors set their asking prices , Storage users set the price they are willing to pay , And other settings . then , The storage market connects users to storage providers that meet their requirements . Storj In a similar way , The main difference is that no single network wide market can connect all nodes on the network . contrary , Each satellite has its own set of storage nodes that interact with it .

Last ,Crust、Arweave and Swarm Let the agreement determine the storage price . Crust and Swarm Some settings can be made according to the user's file storage requirements , and Arweave The files on the are stored permanently .

Persistent data redundancy

as time goes on , Nodes will leave these open public networks , When the node disappears , The data they store will also disappear . therefore , The network must actively maintain a certain degree of redundancy in the system . Sia and Storj By collecting a subset of fragments 、 Rebuild the underlying data and then re encode the file to recreate the missing fragment , Redundancy is achieved by supplementing lost erasure encoded segments . stay Sia in , Users must log in regularly Sia Only the client can replenish the fragments , Because only the client can distinguish which data fragments belong to which data and users . And in the Storj On ,Satellite Always online and regularly run data audits to supplement data fragments .

Arweave Our access proof algorithm ensures that data is always replicated regularly throughout the network , And in the Swarm On , Data is copied to nodes close to each other . stay Filecoin On , If the data disappears over time and the remaining file fragments fall below a certain threshold , Storage orders will be reintroduced into the storage market , Allow another storage miner to take over the storage order .Crust Replenishment mechanism （replenishment mechanism） Currently under development .

Drive data transmission

as time goes on , After the data is safely stored , Users will want to retrieve data . Because bandwidth comes at a cost , Therefore, data must be provided to motivate storage nodes when necessary . Crust and Swarm Use debt and credit mechanisms , Each node tracks how inbound and outbound traffic flows to the nodes they interact with . If a node only accepts inbound traffic , But the outbound flow is not accepted , Then it will be de prioritized for future communication , This may affect their ability to accept new stored orders . Crust Use IFPS Bitswap Mechanism , and Swarm Use the name SWAP Exclusive agreement of . stay Swarm Of SWAP Agreement on , The network allows nodes to pay off their debts with stamps （ Only accept inbound traffic without sufficient outbound traffic ）, This can be exchanged for their practical tokens .

This tracking of node generosity is also Arweave How to ensure that data is transmitted on request . stay Arweave in , This mechanism is called wildfire , Nodes will give priority to peer nodes with better ranking , And rationalize the use of bandwidth accordingly . Last , stay Filecoin、Storj and Sia On , Users will eventually pay for bandwidth , Thus, the nodes are encouraged to deliver data when requested .

Token economy

Token economy design ensures the stability of the network , It also ensures that the network will exist for a long time , Because the final data is only as permanent as the network . In the table below , We can find a brief summary of token economics design decisions , And the inflation and deflation mechanism embedded in the corresponding design .

Which is the best network ？

It cannot be said that one network is objectively better than another . When designing decentralized storage networks , There are countless tradeoffs . although Arweave Ideal for storing data permanently , but Arweave Not necessarily suitable for Web2.0 Industry participants migrate to Web3.0 - Not all data needs to be permanently saved . however , A strong data sub domain really needs permanence ：NFT and dApp.

Final , Design decisions will be based on the purpose of the network .

Here is a summary of the various storage networks , They compare with each other on a set of scales defined below . The scales used reflect the comparative dimensions of these networks , But it should be noted that , In many cases, there is no good or bad way to overcome the challenges of decentralized storage , It just reflects the design decision .

Storage parameter flexibility ： The user controls the extent to which the file stores parameters
Storage persistence ： To what extent can file storage achieve theoretical persistence through the network （ That is, no intervention is required ）
Redundant persistence ： The ability of a network to maintain data redundancy by supplementing or repairing
Data transmission incentives ： The extent to which the network ensures that nodes transmit data generously
The universality of storage tracing ： The degree of consensus between nodes on the location of data storage
Guaranteed data accessibility ： The ability of the network to ensure that a single participant in a stored procedure cannot remove access to files on the network

The higher the score, the stronger the ability of the above items .

Filecoin Token economics supports increasing the storage space of the entire network , Used to store large amounts of data in an immutable manner . Besides , Their storage algorithm is more suitable for data that is unlikely to change greatly over time （ Cold storage ）.

Crust Token economics ensures super redundancy and fast retrieval , Make it suitable for high flow dApp And it is suitable for quick retrieval of popular NFT The data of .

Crust Low score in storage persistence , Because there is no persistent redundancy , Its ability to provide permanent storage will be severely affected . For all that , Persistence can still be achieved by manually setting extremely high replication factors .

Sia It's about privacy . The reason why users need to restore health manually , This is because the node does not know which data segments it has stored , And what data these fragments belong to . Only the data owner can reconstruct the original data from the shards in the network .

by comparison ,Arweave It's about persistence . This is also reflected in their endowment design , This makes storage more expensive , But it also makes them NFT Attractive choice for storage .

Storj Their business model seems to affect their billing and payment methods to a large extent ： Amazon AWS S3 Users are more familiar with monthly billing . By removing the complex payment and incentive systems common in blockchain based systems ,Storj Labs At the expense of some decentralization , But significantly lower AWS Entry threshold for key target groups of users .

Swarm The joint curve model ensures that as more data is stored on the network , Storage costs remain relatively low , And its proximity to the Ethereum blockchain makes it a more complex Ethereum based blockchain dApp Key storage competitors for .

For the challenges of decentralized storage networks , There is no single best method . According to the purpose of the network and the problems it tries to solve , It must balance the technology of network design with the economics of token .