当前位置：网站首页>Dialogue ace phase IV: challenges and opportunities for the future development of distributed databases

Dialogue ace phase IV: challenges and opportunities for the future development of distributed databases

2022-07-24 10:57:00 【Official blog of oceanbase database】

With Cloud Computing 、 The development of big data technology , Traditional information technology and its application have been greatly impacted , As a basic software, database also faces new challenges and opportunities . Similarly, data usage scenarios show a trend of diversification , The data scale has also exploded . The explosive growth of massive heterogeneous data , It puts forward higher requirements for the storage and computing power of the database .

future , The traditional database based on single physical server node will be replaced by distributed database in more scenarios , The application prospect of distributed database will be more and more extensive .

《 dialogue ACE》 The fourth activity will focus on “ Challenges and opportunities for the future development of distributed databases ” In the background , Invite the founder of Jishu Yunzhou &CEO、Oracle ACE Director Zhou Yanwei ,OceanBase Wang Nan, senior director of products and solutions department, jointly explored “ The future development of distributed database ”, So as to promote the better development of distributed database technology .

Here is the dialogue ：

OceanBase Senior director of products and solutions Wang Nan

Q：OceanBase Completely self-developed domestic distributed database , At the beginning of the design , What are the difficulties ？

about OceanBase for , It is not considered from the beginning of the design to solve the problems and difficulties in the longer-term stage . There is a process of decision-making and project establishment , Including what problems can be solved , What kind of value does it produce , How much investment is needed , How long and how much will it take to solve such a problem . In the early stage, it was solved step by step based on scenarios and problems , Indirectly complete the whole cognitive process in stages .

The first stage comes from the scene of Taobao , The solution is the scale and volume of core data , The throughput including data access can cope with business growth in the case of continuous growth . The first step is to solve the problem of distribution , It is a distributed model without relation NoSQL. With the problem of Taobao scene solved , More business scenarios also have expansibility demands . Many scenarios are based on relational databases , Demand for database , Some businesses can be solved through application layer transformation , But if a large number of businesses are going to move towards general scenarios , Relationship model is also a necessary demand .

The second stage is after solving the distributed problem , Gradually complete the relationship model , Support SQL Ability . With OceanBase From the inside of ant and Alipay to the general market , You can see that many business scenarios are based on MySQL Open source ecological , It's also based on Oracle Business ecological , So how can applications migrate to new distributed databases ？ In fact, there is a big problem .

The third stage is to achieve good compatibility , For users , What are the costs and costs in the application layer, including the migration process ,OceanBase To solve the MySQL Compatibility is completed Oracle Full compatibility of , Including grammar 、 Semantics, stored procedures, and so on .

After that, the key problem and challenge will be when the data volume reaches a certain scale , about AP The appeal of ability is still relatively strong .HTAP At this stage , Is a key difficulty and technical challenge . With the development of Cloud Computing , Support for cloud infrastructure and even cross cloud heterogeneous clouds , There will also be many customer demands .

So at the beginning of the design, including the technical difficulties in the development process , In fact, it is found gradually in the process of supporting and serving customers .

Q ： At the bank 、 In financial scenarios such as insurance, many enterprises have chosen distributed database as their core first choice , In which scenarios will there be similar situations in the future ？

Before OceanBase There will be more applications in financial scenarios , In the past two years, in addition to finance , We have applied it in many enterprises in the general market . There are many factors influencing this problem , Today, I will discuss and share from the perspective of products and technology .

In the core scenario of distributed database, which has been widely used at present , They choose distributed databases . In fact, there are several kinds of reasons , We all choose based on several categories of different demands and considerations .

First of all , Users with core system migration needs . These users are often not driven by business demands , Because in this scenario , What kind of architecture does the user choose , Is it centralized or distributed , A strong constraint that is not particularly strong , But more concerned about Oracle Compatibility 、 During switching and substitution , Whether it can smoothly migrate with high cost performance .

second , Users who solve continuity problems . For the supply of databases 、 Buy 、 Including the ability of database service, we are facing more and more such challenges , So continuity may also be one of the important reasons .

Third , Users with real business demands . Because the essence of choice finally comes back to business , Or go back to the market , To choose whether a product is economical 、 Cost-effective 、 correct . From this perspective , Whether it is the rapid growth of customer business , Then it leads to the expansion of the database or data management layer 、 The appeal of expansion , Or because the amount of data is too large , The centralized architecture can't bear the future business demands and data volume . In fact, such users' demands are more rigid , It is also more urgent , It is also a core factor that can touch and drive users to really make up their minds to do technological upgrading .

In this process , The key factor that users care about is the ability of the database itself , Include features 、 performance 、 specifications 、 Whether these capabilities of safety are satisfied , Another big factor is the cost and cost of application migration . There are great differences between database as basic software and Application , For complex business systems , The impact and cost are relatively large .

So for what scenario will you choose distributed and what scenario will you choose a new database , It's actually two concepts . For selecting a database , We should pay more attention to whether the data security and specifications can meet . But whether to choose distributed , There need to be some core demands ： Or the original single financial system cannot meet the demands ; Or I will upgrade and reserve some technology for the future . Finally, which scenarios may choose to distribute databases in the future , Or we should return to the topic of the real market choice itself .

Q ： About the operation and maintenance of distributed database products or large-scale intelligent operation and maintenance , Any good suggestions ？

Actually, a product , The larger its application scale , Its appeal 、 The challenge will be higher . Because when the product is not applied on a large scale , No matter what he does , Even if it is supported by human flesh , It can also support the operation and maintenance well . But once the quantity is increased , This operation and maintenance is a great challenge , Especially database products , It is quite different from ordinary applications , It will affect the influence of various factors during operation . Such as software and hardware failure , That's why DBA Such a professional group comes to specialize in the operation and maintenance of this database . After decades of development, traditional databases , It has formed a relatively mature DBA Groups , In addition , Compared with centralized database, distributed database , In fact, there are different challenges .

The first is the technical architecture , It brings higher complexity in itself , Compared with centralized database , Distributed for high availability fault recovery and various tuning of execution plan , It will have a higher demand for ability . For a new product and architecture , Learn to understand , And then form a universal cognition , And the proficiency of using it , In fact, it takes a long process . This may not mean targeting a database , But because of the distributed architecture and technical characteristics, there will be such a process , And in fact, we are on the issue of operation and maintenance , There are also several levels of practice and exploration .

The first level is product capability . In addition to the core of the database , To really use a product, we need to consider various tools related to product matching , The whole cluster management, operation and maintenance monitoring 、 Full link diagnosis , There is also an automatic optimizer 、 Function tuning 、 Self service operation and maintenance , So using tools to solve this problem is very critical . Of course, there must be a need DBA Cultivate such ability with others , But if you have better tools or more complete tool capabilities to support , There are better technologies to provide this support and query , In fact, it is a great help . So in terms of product capability , We need to build such capacity support and base .

The second level is the cognition of the whole user level , Including the training of operation and maintenance personnel system .OceanBase Now we are training some users through open source , We also suggest that everyone use it , Know what characteristics the database has , What's the problem . More people use , More people will understand it in operation and maintenance 、 Use it . At the same time, participate in some training and certification , image OceanBase current OBCA/OBCP/OBCE, For different people , Different characteristics to create a full certification system , Let more people learn quickly 、 Use and understand this distributed database . This is not just for OceanBase, Instead, I want to operate and maintain the entire distributed database 、 Optimize to cultivate such talent base .

The third level is in the process of marketization and delivery , Slowly realize and understand the operation and maintenance of the whole database . It's not enough to rely on ourselves , image Oracle Such a mature product and company , Its operation and maintenance also depends on a large number of third-party professional service companies and DBA The group worked together . For distributed databases ,OceanBase Belongs to a start-up company , Now the scale is small , Later, once the scale is applied to more industries and user scenarios , It is difficult to rely on the original factory to support the service . So for database operation and maintenance , I think we should put all kinds of professional service companies of the third party 、 ecology 、 Include DBA The ability to make comprehensive use of .

In addition, you will also have some exploration . Like qualification and ability 、 Include AI It is also a framework for our exploration in the future . I think at this stage , The three levels just mentioned are the key points we should focus on and invest in .

Q ： Many domestic emerging database products R & D reference Google Spanner The paper of , What do you think about distributed database technology at home and abroad , What are the gaps ？

Before, we faced some business challenges under the traditional database architecture , Such as the rapid explosion of data scale 、 Analytical appeals , Under such a scenario , It brings us a lot of different innovative thinking in technology , This kind of breakthrough and Enlightenment in thinking is of great significance . With Google For example , It originated from its own business , because Google Globalization itself 、 Cross region , It also has a very large business scale , It is based on its own business scenario , In the process of solving their own problems, they have accumulated some technical solutions and capabilities , Then some output .

Again for instance Spanner Such an architecture , It is a direction to solve such problems , But the effect is hard to say , At least it seems that it is far from perfect , Include Google The marketization or commercialization ability of its own database , It is hard to say that it is relatively successful , Like some core technologies he mentioned ,True-Time API And the ability of globalization , Is it applicable to all scenarios , Not necessarily .

There are great differences in the demands and scenarios of different enterprises , Whether domestic or global, the promotion of distributed databases , There are mainly several big cloud factories behind it . Because of the characteristics of yundachang and Google Their own business characteristics have certain commonalities , That is, large enough scale and large concentration of data , Such a scenario will have some core demands for distributed databases , But it will have a strong color of manufacturers , How to meet diversified scenarios , It has very important practical significance .

So is it based on the data centralization demands of cloud manufacturers or Internet manufacturers , Or say all in cloud To do distributed database ？ It's not , For example, in China , It will have better soil to promote and develop diversified scene support , Instead of everyone gathered on the same road to do this globalization , Or the native solution of the public cloud .

Comprehensive, , Compared with foreign countries, our technology gap is not that big , Because different scenarios need to be solved , We have our advantages , In the early stage, he may also have some advantages . It can only be said that at this stage , From the technical progressiveness and technical strength 、 In terms of ability , We have enough confidence to face any situation , Including the challenges of large-scale domestic challenges and globalization demands , We all have enough self-confidence to solve these problems .

Q ： Under the current cloud native trend , What are the breakthroughs in the development of distributed database technology ？

The current database is in the state of letting a hundred flowers bloom , Our insistence is that we should build a distributed relational database that is transparent to user applications . That sounds simple enough , But in fact, there are several key elements .

First of all , We need to do application transparency . There is no need to perceive and solve problems in the application layer , Nor is it a middleware solution , Instead, solve problems at the database layer . In other words , Is to leave the complexity to the database , Leave simplicity to users .

second , Strict ACID Guarantee . The database must ensure the correctness of the data . In other words , Namely OceanBase Always stand on HTAP, Constantly support and expand our AP Analytical ability , Not that we are AP To support strict OLTP, This is one of the core elements we adhere to in the direction of distribution .

In addition to supporting these two technical challenges , I would like to talk about another topic , Distributed database is in the process of market application , The biggest problem and challenge is to find out what problems the customer has encountered , What problems need to be solved , In addition to the technical competitiveness and advantages of distributed databases , Manufacturers really want to return to the world from heaven , Let everyone use it , Here are also a few key points .

First of all , For large-scale distributed transactions , It can support and ensure the consistency of transactions . Under normal conditions , I believe many people can solve , But in the case of failure or various abnormal software and hardware failures , Go and recover , At the same time, the recovery process can not affect the business , This is a huge challenge for the production system . From the perspective of users , We are very concerned about whether the distributed technology system can solve this problem .

second , Whether it can migrate smoothly . When a large number of applications migrate from the original system to distributed , Whether there can be a general scheme . Another reality is that we have to solve a large number of different industries 、 scene , Migration of massive applications , It is also an element concerned by many large users .

Third , If we expand or generalize this scope , In fact, the application scenarios of different user scenarios or the same user are very complex . It may have different infrastructures （ Private cloud 、 Public cloud 、 A hybrid cloud ）, And a large number of systems are not switched all at once , The cost of this risk is too high , So it must be gradual . If there is such a scene , It will bring a lot of support from this different infrastructure 、 Demands of heterogeneous deployment , This also requires the database layer to provide consistent capability experience and capability support to ensure .

Last , What I want to say is HTAP, Namely TP and AP Whether our abilities can be integrated , In limine , In fact, there is no concept of separation between the two , Later, with the increase of data volume and analysis demands . The original database capacity cannot meet , That's why it's separated . The current architecture also brings many problems , Including users' needs in this application , To build different business systems , To do daily production transactions , And the analysis of this role .

In terms of application complexity and customer input costs , Or return to IT The essence of , It is the resource consumption of computing power and storage , It is uneconomical . So there is a solution that can be solved at the same time TP and AP Fusion , And it has good isolation of load resources , Let them not be affected by each other , This is of great practical and economic significance , This is also OceanBase The next step is to focus on the key breakthroughs of investment and effort .

Founder of polar cloud boat Zhou Yanwei

Q： How to quickly understand what is a truly distributed database ？

If this question is answered concisely , It can be divided into two paragraphs . What is distributed and what is database .

First of all, database is to solve the problem of data storage and calculation . Secondly, distribution is to use a variety of resources to solve the storage and calculation of large-scale data , Allocate it from the resources and calculations of the original single server to multiple servers , This is the literal meaning of distributed database , But is such a product truly distributed ？ I don't think so , It can only be regarded as an academic theory product , It is not considered from the actual demand scenario .

In addition to solving the problem of resource allocation of computing power and storage capacity, distributed database , It also depends on the ability and demands of the database in the actual demand , In addition, distribution can be considered as the allocation of various resources . for instance , Asymmetric distribution . Most of the time , What we are concerned about is that under the condition of equivalence , Then the distributed cluster is used for data storage and calculation allocation , But in some product requirements , For example, for edge to edge calculation, you need to calculate the result directly at the far end , Then relate these results to the central end , For such a scene , Obviously, the allocation of resources is unequal , But we still need to combine the technology of distributed database from remote end to terminal data , Then do some kind of operation and come to a conclusion , Is this a distributed database ？ If the design is good , Of course, it is a kind of distributed , This is the first example . The second example is the distribution of unequal Computing , As we often see HTAP This diverse type of computation , Data will be stored in different storage , Different structures are used in different environments , There are different computing requirements or algorithms , So to build a distributed system , To solve the problem of computing power and allocation , Allocation of different operations and algorithms , If the system is well designed , This should also be included in the scope of distributed databases .

Q： Distributed database is still popular in recent years , But some overseas mainstream database manufacturers （Oracle,IBM） There is no big promotion , There are also some remarks that , Due to the large population of Asia , Large number of , So distributed database is more suitable for China , What do you think of that ？

Past image IBM,Oracle,SQL Server The main core product is the stand-alone version , Distributed databases have not been promoted or developed as their mainstream . On the other hand , About the size of the amount of data , I don't think it mainly comes from people , It comes from machines and equipment .

There is no essential relationship between distributed database and data volume , So why don't mainstream manufacturers do this （ Distributed ）, I have also thought about this . Focusing on distributed databases can be roughly divided into two categories , An idealistic team of academic type , Put forward from a theoretical point of view , Then do distributed database , What kind of results or better products do you hope to produce for the database in the future . The other kind doesn't know much about databases or is called newborn calves are not afraid of tigers , Plan the long-term blueprint first , And know the result , At the same time, it's someone else's money .

Why do you say that? ？ That is, the database first solves the problem of absolutely correct data , Secondly, we should consider the support of various performances and capabilities , But distributed means to use resources across networks , Ensure the consistency of various calculations or data , These are the basic capabilities of the database , Then the mainstream database manufacturers , Based on the needs of business and social responsibility , First of all, we must fully ensure the correctness of data storage , Also consider performance , Input output ratio , The second is to consider whether to do distributed , let me put it another way , If you use a single machine deployment or cluster, you can solve the calculation of data to a certain extent 、 Storage capacity allocation and other issues , Or if the problem can be solved without the distributed computing architecture in the database kernel , It can also ensure better benefits for the security and consistency of data , That may be an important reason why mainstream databases don't do distributed databases .

With the continuous development of hardware, network and distributed database storage capacity , The allocation of resource capacity can also be divided into several types for distribution . In terms of computing power , Can be based on CPU/GPU Share the , But due to the limitation of Moore's theorem , The improvement of computing power on a single machine will gradually decrease , But the distribution of physical hardware across the network will gradually strengthen . The network has appeared this kind of similar RDMA Such high-speed access . I don't know whether there will be cross CPU/GPU Integration of hardware , But now we have seen some from storage . So we used RAID Connect multiple storage disks to form a piece , In this way, the database is directly deployed RAID Later on the stand-alone , This is a form of storage expansion ; It can also expand storage directly from the perspective of hardware into a unified hard storage category across networks and machines based on intelligent network cards .

In other words, distributed storage may be solved from the perspective of hardware , Then whether your database needs to solve this problem or not needs careful consideration , Because the essence of database is to ensure the storage and calculation of data , Especially the correctness of data storage and calculation . It seems that the above said Oracle,IBM Etc. are inert to the implementation of distribution at the software level , There may be such a consideration .

If CPU And memory can also be made based on intelligent network in the future 、 Intelligent Switch 、 Cross host unification CPU And memory , Make it a hardware based distributed . If there are these resources , So just focus on the storage computing problem itself , There is no need to consider distributed . Although it is still immature , But I have seen some scientific research being done , This kind of product may appear in the future . So I think I will do it , But I said the same , That kind of inequality 、 Different qualitative computing requirements , Maybe it is a new challenge for database . This is what we achieved Data Fabric Things to be solved , It is a cross environmental dimension , Generalized distributed systems across data dimensions .

This argument may continue for some time , Software and hardware are mutually iterative and promote development , Wait until the hardware distribution is good enough , Then talk about software distribution , Just different categories . Believe in the near future , It will be a situation of generalized distributed domination .

Q： There are several forms of distributed database （ middleware 、NewSQL、 Distributed architecture + Enterprise class kernel 、 Calculation + Architecture separation storage ）？ Can we analyze the advantages and disadvantages of these forms ？

This is a very basic topic , From my point of view , We can analyze the advantages and disadvantages from several distributed scenarios . Personally, I think there are three levels of distribution , That is, user interaction 、 Calculation of data 、 Data storage . We can see it from three levels of distributed generation ：

First of all , The routing layer . When SQL Write it in , Need to write database , If you are a middleware , That's what the routing layer of middleware needs to solve , A long time ago , We do sub database sub table 、 middleware , Even the data center , There may be such a middleware component , To solve this SQL Distribution of 、 Share the 、 Cache calculation and reorganization of data , This is the case when distributed occurs at the routing layer .

second , Computing layer . If the routing layer is scratching the surface , Is the earliest 、 Lowest end form , Then the distribution in the computing layer really involves the database kernel , It requires technical capabilities 、 And the maturity time of software will be longer , This is a system .

Third , Storage layer . Shared storage just mentioned , Or the earliest cloud native thought , For example, that set is based on AWS The theoretical system of , I think its difficulty lies between the routing layer and the computing layer , But it also considers various security 、 The result of performance balance . Let's analyze the characteristics of these three forms .

The routing layer used to , For example, Renren more than ten years ago , Taobao, etc. , Mainly in the form of routing layer . The earliest open source middleware , It is also open source by Alibaba , This is one of the earliest forms . Up to now , There are also some commercial Middleware in this form , But I think this form is a product of early technological development , To solve some problems at that time , But there is no better way to allocate resources , Then we made a kind of database and table at the later end , Use middleware to coordinate , And then solve the problem of resource allocation . Once our technology breaks through the technical barriers we encountered at that time , This plan will be eliminated soon . Because the middleware needs to solve the fusion calculation of various data , Cache all kinds of data , Be compatible with all kinds of grammars 、 And data consistency , These are its problems , And some problems cannot be solved at all , So we will see , There are fewer and fewer distributed systems based on this architecture , Or slowly, it will be covered by the waves of Technology . There is no future for middleware based distribution .

The other two mainstream should be the main development direction of the distributed field —— They are the distribution of computing layer and storage layer . Perform distributed computing at the computing layer , The underlying layer may be the fragmented storage of data , This form can be said to be an ideal form of distribution , It ideally solves the problem of resource allocation . I think at this level ,OceanBase It is a form of this kind of distribution , This is its advantage , But it also has some disadvantages , Where are the disadvantages ？ In fact, we can examine the calculation of data in the database , Because the core demand of the solution just now is the correctness of calculation , Security of data and efficiency of calculation , The second is how to consider resource allocation .

If the resources are better allocated , To ensure the safety and efficiency of data , Then I'll sacrifice something . If you are doing a thorough distributed transaction , It means that in some extreme performance , It should not be compared with a single machine , It is not higher than single machine . This is a simple calculation model , such as MySQL It will be faster than distributed , This is because it is simple , So it's efficient , Can do better . So from this point of view , Based on the distribution of computer hardware and computing forms , It can reach an ideal state , But the complexity of its implementation will increase , It greatly increases the complexity of the database itself and the difficulty of database maturity , The other is its security for some data 、 Computing performance may be affected .

Connect with the previous topic , Why? Oracle Will not develop such products ？ Why? IBM Of DB2 In this product, single library is the main , I think people who really make databases think about absolute data security and performance improvement , And choose the best between cost performance .

Let's talk about the storage layer distributed form , This one looks ideal now , Although not the ultimate form , But it is a very clever way . In other words, if we want to share the database capacity , It should be divided into two directions: Calculation and storage , Then there is the concept of separation of computing and storage , Because they can really be disassembled . And if you want to do real large-scale data support , We must first realize the separation of computing and storage , Only after the separation of computing and storage , Then we can consider what we should do in the storage layer and the computing layer respectively . Only when such conditions are met , Only then can we think , Distributed storage in the storage layer is the same as the original single machine , I don't need to consider the security of data 、 The problem of immediacy .

Of course, it does not rule out that we should do more idealized , More extreme products . But technology research and development should be combined with actual needs , We'll do it when the actual needs need it , On the contrary, the product is very good , The actual demand is not big , It may not rule out its continued existence as a scientific research product , Waiting for future applications , But from a commercial point of view , Maybe it's not just the pursuit of technical perfection , It's about whether technology can cover demand , To be efficient 、 Security . I think this may also be the idea of many established database manufacturers to design products .

in addition , Combined with the prediction of the future hardware development trend , If the problem of resource allocation is solved at the hardware level , There is every reason not to ask for trouble , But based on the implementation of hardware , Do your database job well .

For several forms of distribution , I think many should not exceed these three levels . And if each level is distinguished in this way , The characteristics of each level should be relatively distinct 、 The advantages and disadvantages are clear at a glance . So we choose the type to deploy 、 Which product to use , It should be defined according to the actual needs and the stage of social development .

Q： What value does the separation of storage and calculation of distributed databases have for business ？

My view of this may be extreme , The so-called distributed database , We must do the separation of deposit and settlement . Separation of deposit and settlement is a means , Its goal is to solve this distributed problem , Only when saving and calculating are separated can they be in “ save ” From the perspective of distributed extensions 、 Capacity expansion , And some high-end distributed 、 Copy and so on , And in “ count ” Consider how to achieve high flexibility from the perspective of , Improve resource utilization , On demand , Parallel computing and other issues .

I don't have to think about what is stored , Because after the separation of deposit and settlement , This part is transparent to users . In addition, there is transparency for the whole system 、 Highly available construction . For a traditional database architecture , Every node is brought together with storage and Computing . It's hard for you to make a business switch . After the separation of storage and calculation , You can almost do one server-less Architecture , This is very easy for switching logically , The business layer does not need to consider the switching of a node . This is important for business development 、 Operation and maintenance 、 Extensions are very valuable , So my view is extreme —— Separation of computing layer and storage layer , Do different distributed , Only then can it be regarded as a truly valuable distributed business .

Q： Learning and practice of distributed database , What should we pay attention to at ordinary times ？

I think there are two points , First of all , Know what it is , And know why ; Another point , True knowledge comes from practice .

To know what it is, we must know what it is , Explain that if we are using a complex system , The more optimized the system is 、 The simpler , The more things hidden inside . Because all personalized things are packaged , If you keep using it on the surface , Then you may not be able to solve it when it goes wrong .

Let me give you an example , When I went to where net , It's crazy to recommend one MySQL The component of is called Galera. Because in a period of time , It solves the problem of multi node synchronous writing in a small and beautiful way , I think it's valuable , But products like this , I have also seen many people complain that this architecture is not easy to use 、 This open source component is not easy to use 、 What's the problem . But I got hundreds of sets online without any problem , And then I wrote it as a book , On the 《MySQL Operations inside 》 in .

Why is this so ？ Besides the product capability itself , It also depends on the user's understanding of it , And the ability to master it . If you just do something on the surface , It's hard to know why to make good use of this product . So we are using distributed databases, which are extremely complex 、 And after packaging , Or to understand the essence .

second , True knowledge comes from practice , The emphasis is to give yourself the chance to make mistakes . In other words, just talking on paper is useless , You should actually do it for a year or two , If given the chance , You can also do it online . But there are several kinds of online environments , You need to choose something that will not affect your online business . Use it to understand the differences of database products , It is a dynamic system 、 It's a continuous process , After operation , It will have all kinds of problems more or less , Only climb with problems , In order to let you understand more deeply , So that you can read the source code , Where to read , Let you have a deeper understanding of the product or system itself .

Q&A

Q： Availability of distributed database to database 、 Is there a particularly good guarantee for integrity , Can it be like Oracle like that , It is widely used in finance 、 Carrier industry ？

Wang Nan

This problem is actually one of our ideas and positioning , Our goal is to solve this problem . Simply speaking ,OceanBase Is to enter the core scenario of the enterprise , Then put all kinds of demands brought by this application , Include features 、 Performance safety 、 Data consistency 、 integrity 、 The guarantee of correctness is built at the database level .

There are many solutions using middleware that depend on Applications 、 constraint , For example, things like two stages across nodes , In the event of a breakdown , For the consistency of things , The residual state including rollback is not fully guaranteed , It needs to be constrained or handled at the application layer , To ensure the correctness of the data ,OceanBase Never throw this problem to users , But at the database level . The key technology of the core scenario of Finance and operators is the reliability of data 、 Security 、 Correctness the appeal here , With this kind of Internet , This scenario , Its tolerance and maturity are different .OceanBase From the installation and deployment of the database 、 Development 、 Operation and maintenance 、 monitor 、 Troubleshooting .

Q：Arkcontrol Is there a plan to access support OceanBase？

Zhou Yanwei

Arkcontrol It is a component of our product system , The meaning of its existence is in two directions .

One is to provide users with a management layer of our entire product system , He is responsible for the Yunzhou data longitude and latitude platform （DTArk） Operation and maintenance of , monitor , Data backup management .

Second, to meet the needs of users , Built on user needs , Then solve the problems of the integration management of multiple databases at the user level . My users may be in addition to their own products , At the database level, there are MySQL、MongoDB、Oracle、Redis, I can provide him with a unified platform for unified management , It belongs to a convenient tool , Its core meaning is management .

As for whether you will accept OceanBase , I think it needs to return to the market , See what products users deploy before making a decision , Suppose many users IDC Many have been deployed OceanBase Products , We also hope that for the convenience of users , To support OceanBase Some systems of . I have been adhering to a point of view , Is to go deep inside to do things , Instead of floating on the surface and periphery . Of course, in the process of support, I hope OceanBase It can open more interfaces for us , Let the management do well .

Q： The performance requirements of distributed databases are very high , Is there any plan to say ordinary PC Or a plan that reduces performance and can run ？

Wang Nan

Tell the truth , We have noticed , And I've worked hard for some time . First, on the public cloud , A large number of such small and medium-sized customers will have such rigid demand , Especially for small customers , In fact, he doesn't need a server with particularly strong performance to solve the current problem , Or there are many such developers 、 Individual users may want to take a small scene , Even test or use it on your own basis .

Now we have several directions to work on this problem ： The first is the consumption of resources , Next OceanBase Will launch products with lower resource consumption . At present, it should be for 8C 64G Such memory demands , This year we will reduce the cost of resources to 4C, In the future 2C even to the extent that 1C. meanwhile , For more customers to use and experience more conveniently , For the specification scenario of a single node , We are also doing some exploration now .

in addition to , On the public cloud, we can consider using multiple leases to solve the cost problem of users , Because if a large number of small users monopolize this way , It will bring costs . For non core businesses , Customers can reduce some demands on resources , To get some return on economic investment , This kind of multi rent is better . We now have the ability to rent multiple kernels , Then do the resource isolation ability of storage computing and memory , In the future, we will also release payment capacity in the form of cloud , Let users experience it quickly , Let's summarize this problem again , It is the demand for this small specification , The appeal of lowering the specification is already being made , And we will see great progress this year , future OceanBase Will continue to do this .

Q：HTAP How to understand the intermediate state ？

Zhou Yanwei

There may be many kinds of understanding , As a product designer , We should consider from a higher goal . What is a lofty goal ？HTAP——TP And AP The form of fusion , This is also the combination of two basic capabilities of database . If you design such a product , So is your state the ultimate state ？ Obviously not , Because the calculation of data except TP、AP outside , And the calculation of edges 、 Calculation of retrieval , Calculation of graph, etc , Due to the existence of this computing demand , So that we have all kinds of databases on the market .HTAP Maybe it's the original TP Type and later dominated by big data AP Get through , yes 1+1 The pattern of , Can we combine three into one 、 Four is one ？ The design of the product determines the realization of the product , If you only consider HTAP Two dimensional intermediate state of , If you want to expand the implemented product in the future , It's very difficult .

So in terms of product design ideas , If we can solve the product form of multidimensional Computing , That may look the other way HTAP, It is an intermediate state , It is a very important step we have taken from one dimension to two dimensions . But this step is not an end , It's just a starting point , We still have a lot to do in the future . Let's look at it from this angle , If we design products , Design a frame that can support two-dimensional 、 Three dimensional and even multi-dimensional is the best , So the person who does technology in turn becomes a product manager , It can be considered synchronously from the perspective of product requirements and product realization . If you only focus on this dimension , Then in the future, when we look at a higher product demand , Maybe a lot of code you write now will be overturned and restarted . We are not willing to do this , We have to consider it more comprehensively , Make a bigger framework , Be able to realize this two-dimensional , At the same time, it can also be compatible with future multidimensional needs , This is the most ideal scene .

our DTArk It is based on such thinking , It has been realized TP/AP/FP Fusion computing , It is estimated that graph calculation will be supported soon , This multi-dimensional and extensible architecture is the same as thinking directly to realize HTAP It's completely different , let me put it another way , current HTAP Products should further realize the integration of more dimensions , May have to overturn and start again , The code has been rewritten .

Q： Is there any recommendation for distributed databases of the same service in the world ？ Cover sub 、 beautiful 、 The kind in Europe ？

Wang Nan

I believe the students asking questions may not be talking about whether there is a database that can adapt to these areas ,“ Global service ” I may be asking whether there is such a database on the public cloud that can provide consistency capabilities in different regions of the world to support the deployment demands of global applications and data , It's really challenging now .

There is also an implied appeal ： The same cloud can provide such capabilities in all regions of the world ？ There are still many customers who don't want to be bound by a cloud . Simply look , Is there a distributed database or cloud service that can provide services around the world , This is actually quite a lot , There may be some differences in ability , Including several major cloud vendors , In fact, on the cloud in all regions of the world , Both include RDS, Shared storage services 、AP Products , If it is a cross cloud appeal, it will not be bound by a cloud , Especially large customers .

If you can only do application infrastructure on one cloud , In fact, there are many risks . Including technical 、 Business security 、 Cost risks . At present, many people have put forward the appeal of cross cloud , But not everyone can solve , Because what cloud manufacturers can solve is how to provide global and cross regional services based on infrastructure . But for cross cloud , It may still depend on independent database products 、 Manufacturers should solve this problem .OceanBase Now we are considering 、 solve , You will soon see that we have some products and services .

Zhou Yanwei

If you answer from the perspective of pure technology , There must be , For example, relying on the distributed network , Regardless of performance tolerance 、 Time cost , There is no problem with distributed global rapid deployment . let me put it another way , The bottleneck of global deployment , I think it lies in the network and authority , Just cross these two points .

In practice , We should really consider cross center 、 Cross cloud 、 Cross globalization , Also consider the combination of performance , It seems to be difficult now , Because the network must have timeliness , The speed of light is hard to break . So back to second place , You can only do shared storage on a distributed basis , Or both distributed , Also do shared storage , It mainly depends on the timing or performance tolerance of the business . We are realizing DTArk In this way Data Fabric Product time , It also proposes an innovative technology —— Data shuttle . It's simple , You have to do this , It may not be necessary to synchronize the full amount of data on a large scale , Maybe only a certain batch of data is needed , Then pass it around , There is no need to make the world synchronized , Waste a lot of time waiting for the network , But in some data or business 、 The data of different dimensions in the calculation scenario is cross domain or cross interval . At this moment, we should not consider instant synchronization , But the ultimate consistency ; Not instant calculation , It is the calculation of the time window according to the time configuration . I think the key point is still the problem of the network , Network delay is the essential problem that affects all requirements .

原网站

版权声明
本文为[Official blog of oceanbase database]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/205/202207241048139349.html

当前位置：网站首页>Dialogue ace phase IV: challenges and opportunities for the future development of distributed databases

Dialogue ace phase IV: challenges and opportunities for the future development of distributed databases

Here is the dialogue ：

OceanBase Senior director of products and solutions Wang Nan

Q：OceanBase Completely self-developed domestic distributed database , At the beginning of the design , What are the difficulties ？

Q ： At the bank 、 In financial scenarios such as insurance, many enterprises have chosen distributed database as their core first choice , In which scenarios will there be similar situations in the future ？

Q ： About the operation and maintenance of distributed database products or large-scale intelligent operation and maintenance , Any good suggestions ？

Q ： Many domestic emerging database products R & D reference Google Spanner The paper of , What do you think about distributed database technology at home and abroad , What are the gaps ？

Q ： Under the current cloud native trend , What are the breakthroughs in the development of distributed database technology ？

Founder of polar cloud boat Zhou Yanwei

Q： How to quickly understand what is a truly distributed database ？

Q： There are several forms of distributed database （ middleware 、NewSQL、 Distributed architecture + Enterprise class kernel 、 Calculation + Architecture separation storage ）？ Can we analyze the advantages and disadvantages of these forms ？

Q： What value does the separation of storage and calculation of distributed databases have for business ？

Q： Learning and practice of distributed database , What should we pay attention to at ordinary times ？

Q&A

Q： Availability of distributed database to database 、 Is there a particularly good guarantee for integrity , Can it be like Oracle like that , It is widely used in finance 、 Carrier industry ？

Wang Nan

Q：Arkcontrol Is there a plan to access support OceanBase？

Zhou Yanwei

Q： The performance requirements of distributed databases are very high , Is there any plan to say ordinary PC Or a plan that reduces performance and can run ？

Wang Nan

Q：HTAP How to understand the intermediate state ？

Zhou Yanwei

Q： Is there any recommendation for distributed databases of the same service in the world ？ Cover sub 、 beautiful 、 The kind in Europe ？

Wang Nan

Zhou Yanwei

边栏推荐

猜你喜欢

随机推荐