当前位置:网站首页>Web3 decentralized storage ecological landscape
Web3 decentralized storage ecological landscape
2022-06-26 16:47:00 【Blockchain Technology Researcher】
If we want to go further in decentralizing the Internet , These three pillars will eventually be needed : Consensus 、 Storage and calculation . If humanity succeeds in decentralizing these three areas , We will embark on the next stage of the Internet journey : Web3.

Storage , As the second pillar , Is maturing rapidly , Various storage solutions have been applied to usage scenarios .
The need for decentralized storage
From the perspective of blockchain
From the perspective of blockchain , We need decentralized storage because the blockchain itself is not designed to store large amounts of data . The mechanism for obtaining block consensus relies on a small amount of data ( transaction ), These data are placed in blocks ( Collect transactions ), And quickly share to the network for node verification .
First , Storing data in blocks is very expensive . At the time of writing , stay layer1 Store a complete BAYC #3368 It costs more than 18000 dollar .

secondly , If we want to store a lot of arbitrage data in these blocks , Network congestion will become serious , This can cause... When using the network gas The war led to a rise in prices . This is the consequence of the implicit time value of the block , If a user needs to submit a transaction to the network at a certain time , They will have to pay extra gas Fee to make their deal a priority .
therefore , It is suggested that NFT Metadata and image data 、dApp The front end of the is stored off the chain .
From the perspective of centralized network
If storing data on the chain is so expensive , Why not store data directly under the centralized network chain ?
Centralized networks are subject to scrutiny and variability . This requires users to trust the data provider to maintain data security . No one can guarantee that the operators of the centralized network will really live up to the trust of users : Data may be erased intentionally or accidentally . For example, it may be because the data provider changes the policy 、 Hardware failure or being attacked by a third party .
NFTs
With NFT The floor price of the collection exceeds 10 Thousands of dollars , some NFT Every time kb The value of image data is as high as 7 ten thousand , Commitment alone is not enough to ensure that data is available at all times . A stronger guarantee is needed to ensure that the bottom layer NFT Invariance and persistence of data .

NFT Does not really contain any image data , contrary , They only have pointers to metadata and image data stored under the chain . But it is these metadata and image data that need to be protected , If these data disappear ,NFT Will be just an empty container .

so to speak ,NFT The value of is not primarily driven by the metadata and image data they refer to , It is driven by the movement and the community of ecosystems around the collection . Although this may be true , But if there is no basic data ,NFT Will be meaningless , Meaningless communities cannot be formed at all .
In addition to profile images and art collections ,NFT It can also represent the ownership of real-world assets , Such as real estate or financial instruments . Such data has external real-world value , from So through NFT Represents its value , So save NFT The value of each byte of data will not be lower than that of the chain NFT The value of .
dApps
If NFT It is a commodity existing on the blockchain , that dApp It can be considered as a service that exists on the blockchain and promotes interaction with the blockchain . dApp It is a combination of the front-end user interface under the chain and the smart contract that exists on the network and interacts with the blockchain . Sometimes they also have a simple back end , Some calculations can be moved off the chain to reduce the amount of gas, Thus reducing the costs incurred by end users for certain transactions .

Even though dApp The value of should be based on dApp In the context of ( Such as ,DeFi,GameFi, social contact , Meta universe , Name service, etc ),dApps The value is amazing . The past at the time of writing 30 Days. ,DappRadar In the top 10 Bit dApp Together contributed to more than 1500 A $billion transfer .

Even though dApp The core mechanism of is implemented by smart contracts , End users can ensure user accessibility through the front end . therefore , In a sense , Make sure dApp The accessibility of the front end is to ensure the availability of the underlying services .

Decentralized storage reduces server failures 、DNS hackers 、 And centralized entity deletion dApp Front end access . Even if it stops dApp Development of , You can also continue to access smart contracts through the front end .
Picture of decentralized storage
Blockchains such as bitcoin Ethereum exist mainly to promote value transfer . When it comes to decentralized storage networks , Some networks also use this method : They use native blockchains to record and track stored orders , This represents a transfer of value in exchange for storage services . However , This is just one of many potential approaches —— Broad storage space , Over the years, different solutions have emerged with different trade-offs and use cases .

Despite many differences , But all these projects have one thing in common : None of these networks replicate all data on all nodes , This is the case with bitcoin and Ethereum blockchain . In a decentralized storage network , The immutability and availability of stored data are not achieved by storing and verifying successively linked data on most networks , This is the case with bitcoin and Ethereum . Although as mentioned earlier , Many networks choose to use blockchains to track stored orders .
It is not sustainable for all nodes on a decentralized storage network to store all data , Because the indirect cost of running the network will rapidly increase the storage cost of users , And finally promote the centralization of the network , Turn to a few node operators who can afford the hardware cost .
therefore , Decentralized storage networks need to overcome extraordinary challenges .
The challenge of decentralized storage
Review the previous limitations on data storage on the chain , It is clear that a decentralized storage network must store data in a way that does not affect the network value transfer mechanism , At the same time, ensure that the data remains persistent 、 Immutability and accessibility . essentially , A decentralized storage network must be able to store data 、 Retrieving data and maintaining data , At the same time, ensure that all participants in the network are motivated by their storage and retrieval work , At the same time, it is necessary to maintain the trustworthiness and willfulness of the decentralized system .
These challenges can be summarized as the following questions :
- Data storage format : Store complete files or file fragments ?
- Data replication : How many nodes to store data across ( Complete file or fragment )?
- Store trace : How the network knows where to retrieve files ?
- Proof of stored data : Whether nodes store the data they are required to store ?
- Data availability over time : Whether the data is still stored over time ?
- Store price discovery : How storage costs are determined ?
- Persistent data redundancy : If the node leaves the network , How the network ensures that data is still available ?
- The data transfer : Network bandwidth comes at a cost —— How to ensure that a node retrieves data when asked ?
- Network token economics : In addition to ensuring that data is available on the network , How does the network ensure its long-term existence ?
As part of this study , The various networks that have been explored employ a wide range of mechanisms , And through some trade-offs to achieve decentralization .

An in-depth comparison of the above networks for each challenge , And the detailed configuration file of each network , Can be found in Arweave or Crust Network Read the complete research article .
Data storage format

In these networks , There are two main ways to store data on the network : Store complete files and use erasure codes :Arweave and Crust Network Store complete files , and Filecoin、Sia、Storj and Swarm All use erasure codes . In erasure coding , The data is decomposed into fixed size fragments , Each fragment is expanded and encoded with redundant data . The redundant data stored in each fragment makes it necessary to reconstruct the original file by only a subset of the fragment .
Data replication
stay Filecoin、Sia、Storj and Swarm in , The network determines the number of erasure encoded segments and the range of redundant data to be stored in each segment . However ,Filecoin It also allows the user to determine the replication factor , This factor determines that as part of a storage transaction with a single storage miner , How many separate physical devices should the erasure code segment be copied . If the user wants to use a different storage miner to store files , Then the user must make a separate storage transaction . Crust and Arweave Let the network decide to replicate , And in the Crust It is possible to manually set the replication factor on . stay Arweave On , Storage proof mechanism encourages nodes to store as much data as possible . therefore ,Arweave The upper limit of replication is the total number of storage nodes on the network .

The methods used to store and copy data will affect how the network retrieves data .
Store trace
After the data is stored on the network and distributed in any form among the nodes in the network , The network needs to be able to track stored data . Filecoin、Crust and Sia Both use local blockchains to track and store orders , The storage node also maintains a list of local network locations . Arweave Use a blockchain like structure . Different from blockchains such as bitcoin and Ethereum , stay Arweave On , The node can decide whether to store the data from the block . therefore , If you compare Arweave A chain of multiple nodes on , They will not be exactly the same —— contrary , Some blocks on some nodes are lost , On other nodes, you can find .

Last ,Storj and Swarm Two completely different methods are used . stay Storj in , A second node type, called a satellite node, acts as a coordinator for a set of storage nodes , Storage location for managing and tracking data . stay Swarm in , The address of the data is directly embedded in the data block . When retrieving data , The network knows where to look according to the data itself .
Store data to prove
When proving how data is stored , Each network has its own unique approach . Filecoin Use replication to prove —— A proprietary storage proof mechanism , It first stores the data on the storage node , Then seal the data in a sector . The sealing process allows two duplicate fragments of the same data to prove that they are unique to each other , This ensures that the correct number of copies are stored on the network ( So for 「 Proof of reproduction 」).
Crust Break a piece of data into many small pieces , These small pieces are hashed into Merkle In the tree . By hashing the result of a single data stored on a physical storage device with the expected Merkle Compare tree hash values ,Crust You can verify that the file is stored correctly . This is similar to Sia Methods , The difference is Crust Store the entire file on each node , and Sia Store erasure encoded fragments . Crust You can store the entire file on a single node , And you can still use the node trusted execution environment (TEE) To achieve privacy , This is a sealed hardware component that even the hardware owner cannot access . Crust This storage proof algorithm is called 「 Proof of meaningful work 」, Meaningful means that the new hash value is calculated only when the stored data is changed , Thus, meaningless operations are reduced . Crust and Sia All will Merkle The tree root hash is stored on the blockchain , As a true source for verifying data integrity .
Storj Check whether the data has been stored correctly through data audit . Data auditing is similar to Crust and Sia How to use Merkle Tree to validate data fragments . stay Storj On , Once enough nodes return their audit results , The network can determine which nodes are faulty according to most of the responses , Instead of comparing with the fact source of blockchain . Storj This mechanism in is very intentional , Because developers think , Reducing network wide coordination through blockchain can speed up ( No need to wait for consensus ) And bandwidth usage ( There is no need for the entire network to interact regularly with the blockchain ) Improve performance .
Arweave Use the encryption proof of work challenge to determine if the file has been stored . In this mechanism , To enable the node to mine the next block , They need to prove that they can access the previous block and another random block in the network block history . Because in Arweave The data uploaded in is directly stored in the block , Prove that the storage provider did save the file correctly by proving access to the previous block .
Last , stay Swarm It also uses Merkle Trees , The difference is Merkle The tree is not used to determine the file location , Instead, data blocks are stored directly in Merkle In the tree . stay swarm When storing data on , The root of the tree ( It is also the address where the data is stored ) The documentation has been properly partitioned and stored .
Data availability over time
Again , When determining that data is stored in a specific period of time , Each network has a unique approach . stay Filecoin in , To reduce network bandwidth , The storage miner needs to run the replication proof algorithm continuously within the time period to store data . The result hash of each time period proves that the storage space has been occupied by the correct data in a specific time period , So it is 「 Time and space prove 」.
Crust、Sia and Storj Verify the random data segment regularly , And report the results to their coordination mechanism ——Crust and Sia Blockchain , as well as Storj Satellite nodes of . Arweave Ensure the consistent availability of data through its access proof mechanism , This requires miners not only to prove that they can access the last block , And prove that they can access a random block of history . Storing older and rarer blocks is an incentive , Because it increases the likelihood that the miner will win the workload proof challenge , This challenge is a prerequisite for accessing a particular block .
On the other hand ,Swarm Run the lottery regularly , Reward nodes hold less popular data over time , At the same time, it also runs a proof of ownership algorithm for the data that the node promises to store for a longer time .
Filecoin、Sia and Crust The node needs to deposit collateral to become a storage node , and Swarm Just need it for long-term storage requests . Storj No upfront collateral is required , but Storj Part of the deposit income of the miners will be withheld . Last , All networks make periodic payments to the nodes for the period of time that the nodes can prove to store data .
Store price discovery
To determine the storage price ,Filecoin and Sia Use the storage marketplace , Storage vendors set their asking prices , Storage users set the price they are willing to pay , And other settings . then , The storage market connects users to storage providers that meet their requirements . Storj In a similar way , The main difference is that no single network wide market can connect all nodes on the network . contrary , Each satellite has its own set of storage nodes that interact with it .
Last ,Crust、Arweave and Swarm Let the agreement determine the storage price . Crust and Swarm Some settings can be made according to the user's file storage requirements , and Arweave The files on the are stored permanently .
Persistent data redundancy
as time goes on , Nodes will leave these open public networks , When the node disappears , The data they store will also disappear . therefore , The network must actively maintain a certain degree of redundancy in the system . Sia and Storj By collecting a subset of fragments 、 Rebuild the underlying data and then re encode the file to recreate the missing fragment , Redundancy is achieved by supplementing lost erasure encoded segments . stay Sia in , Users must log in regularly Sia Only the client can replenish the fragments , Because only the client can distinguish which data fragments belong to which data and users . And in the Storj On ,Satellite Always online and regularly run data audits to supplement data fragments .
Arweave Our access proof algorithm ensures that data is always replicated regularly throughout the network , And in the Swarm On , Data is copied to nodes close to each other . stay Filecoin On , If the data disappears over time and the remaining file fragments fall below a certain threshold , Storage orders will be reintroduced into the storage market , Allow another storage miner to take over the storage order .Crust Replenishment mechanism (replenishment mechanism) Currently under development .
Drive data transmission
as time goes on , After the data is safely stored , Users will want to retrieve data . Because bandwidth comes at a cost , Therefore, data must be provided to motivate storage nodes when necessary . Crust and Swarm Use debt and credit mechanisms , Each node tracks how inbound and outbound traffic flows to the nodes they interact with . If a node only accepts inbound traffic , But the outbound flow is not accepted , Then it will be de prioritized for future communication , This may affect their ability to accept new stored orders . Crust Use IFPS Bitswap Mechanism , and Swarm Use the name SWAP Exclusive agreement of . stay Swarm Of SWAP Agreement on , The network allows nodes to pay off their debts with stamps ( Only accept inbound traffic without sufficient outbound traffic ), This can be exchanged for their practical tokens .

This tracking of node generosity is also Arweave How to ensure that data is transmitted on request . stay Arweave in , This mechanism is called wildfire , Nodes will give priority to peer nodes with better ranking , And rationalize the use of bandwidth accordingly . Last , stay Filecoin、Storj and Sia On , Users will eventually pay for bandwidth , Thus, the nodes are encouraged to deliver data when requested .
Token economy
Token economy design ensures the stability of the network , It also ensures that the network will exist for a long time , Because the final data is only as permanent as the network . In the table below , We can find a brief summary of token economics design decisions , And the inflation and deflation mechanism embedded in the corresponding design .

Which is the best network ?
It cannot be said that one network is objectively better than another . When designing decentralized storage networks , There are countless tradeoffs . although Arweave Ideal for storing data permanently , but Arweave Not necessarily suitable for Web2.0 Industry participants migrate to Web3.0 - Not all data needs to be permanently saved . however , A strong data sub domain really needs permanence :NFT and dApp.
Final , Design decisions will be based on the purpose of the network .
Here is a summary of the various storage networks , They compare with each other on a set of scales defined below . The scales used reflect the comparative dimensions of these networks , But it should be noted that , In many cases, there is no good or bad way to overcome the challenges of decentralized storage , It just reflects the design decision .
- Storage parameter flexibility : The user controls the extent to which the file stores parameters
- Storage persistence : To what extent can file storage achieve theoretical persistence through the network ( That is, no intervention is required )
- Redundant persistence : The ability of a network to maintain data redundancy by supplementing or repairing
- Data transmission incentives : The extent to which the network ensures that nodes transmit data generously
- The universality of storage tracing : The degree of consensus between nodes on the location of data storage
- Guaranteed data accessibility : The ability of the network to ensure that a single participant in a stored procedure cannot remove access to files on the network
The higher the score, the stronger the ability of the above items .
Filecoin Token economics supports increasing the storage space of the entire network , Used to store large amounts of data in an immutable manner . Besides , Their storage algorithm is more suitable for data that is unlikely to change greatly over time ( Cold storage ).

Crust Token economics ensures super redundancy and fast retrieval , Make it suitable for high flow dApp And it is suitable for quick retrieval of popular NFT The data of .
Crust Low score in storage persistence , Because there is no persistent redundancy , Its ability to provide permanent storage will be severely affected . For all that , Persistence can still be achieved by manually setting extremely high replication factors .

Sia It's about privacy . The reason why users need to restore health manually , This is because the node does not know which data segments it has stored , And what data these fragments belong to . Only the data owner can reconstruct the original data from the shards in the network .

by comparison ,Arweave It's about persistence . This is also reflected in their endowment design , This makes storage more expensive , But it also makes them NFT Attractive choice for storage .

Storj Their business model seems to affect their billing and payment methods to a large extent : Amazon AWS S3 Users are more familiar with monthly billing . By removing the complex payment and incentive systems common in blockchain based systems ,Storj Labs At the expense of some decentralization , But significantly lower AWS Entry threshold for key target groups of users .

Swarm The joint curve model ensures that as more data is stored on the network , Storage costs remain relatively low , And its proximity to the Ethereum blockchain makes it a more complex Ethereum based blockchain dApp Key storage competitors for .

For the challenges of decentralized storage networks , There is no single best method . According to the purpose of the network and the problems it tries to solve , It must balance the technology of network design with the economics of token .

Last , The purpose of the network and the specific use cases it tries to optimize will determine various design decisions .
边栏推荐
- How to implement interface current limiting?
- How can I get the stock account opening discount link? Is online account opening safe?
- STM32F103C8T6实现呼吸灯代码
- 建立自己的网站(16)
- Multiply the values of the upper triangular elements of the array by M
- proxy
- Binary array command of redis
- Which position does Anxin securities rank? Is it safe to open an account?
- # 补齐短板-开源IM项目OpenIM关于初始化/登录/好友接口文档介绍
- Scala Basics (II): variables and data types
猜你喜欢

TCP congestion control details | 1 summary

100+数据科学面试问题和答案总结 - 基础知识和数据分析

What does the inner structure of the neural network "alchemy furnace" look like? An interpretation of the thesis by the doctor of Oxford University

去中心化NFT交易协议将击败OpenSea
![[matlab project practice] prediction of remaining service life of lithium ion battery based on convolutional neural network and bidirectional long short time (cnn-lstm) fusion](/img/a6/6d3914360ffe4732db0dbd2aaf1994.png)
[matlab project practice] prediction of remaining service life of lithium ion battery based on convolutional neural network and bidirectional long short time (cnn-lstm) fusion

Cloud platform monitoring system based on stm32+ Huawei cloud IOT design

Teach you to learn dapr - 5 Status management

架构实战营毕业设计

用Attention和微调BERT进行自然语言推断-PyTorch

Teach you to learn dapr - 8 binding
随机推荐
Research on natural transition dubbing processing scheme based on MATLAB
How to separate jar packages and resource files according to packaging?
并发编程整体脉络
QT 5.9.8 installation tutorial
num[i]++
Binary array command of redis
Pybullet robot simulation environment construction 5 Robot pose visualization
Call the random function to generate 20 different integers and put them in the index group of institute a
C language -- legal identifier and integer
Develop operator based on kubebuilder (for getting started)
No manual prior is required! HKU & Tongji & lunarai & Kuangshi proposed self supervised visual representation learning based on semantic grouping, which significantly improved the tasks of target dete
Leetcode 1170. 比较字符串最小字母出现频次(可以,已解决)
[Error] ld returned 1 exit status
Redis overview
【从删库到跑路】MySQL基础 完结篇(入个门先跑路了。。)
1-12vmware adds SSH function
[force deduction question] two point search: 4 Find the median of two positive arrays
# 补齐短板-开源IM项目OpenIM关于初始化/登录/好友接口文档介绍
Learn about common functional interfaces
Cuckoo filter for Chang'an chain transaction