当前位置:网站首页>Dialogue ace phase IV: challenges and opportunities for the future development of distributed databases
Dialogue ace phase IV: challenges and opportunities for the future development of distributed databases
2022-07-24 10:57:00 【Official blog of oceanbase database】
With Cloud Computing 、 The development of big data technology , Traditional information technology and its application have been greatly impacted , As a basic software, database also faces new challenges and opportunities . Similarly, data usage scenarios show a trend of diversification , The data scale has also exploded . The explosive growth of massive heterogeneous data , It puts forward higher requirements for the storage and computing power of the database .
future , The traditional database based on single physical server node will be replaced by distributed database in more scenarios , The application prospect of distributed database will be more and more extensive .
《 dialogue ACE》 The fourth activity will focus on “ Challenges and opportunities for the future development of distributed databases ” In the background , Invite the founder of Jishu Yunzhou &CEO、Oracle ACE Director Zhou Yanwei ,OceanBase Wang Nan, senior director of products and solutions department, jointly explored “ The future development of distributed database ”, So as to promote the better development of distributed database technology .
Here is the dialogue :
OceanBase Senior director of products and solutions Wang Nan
Q:OceanBase Completely self-developed domestic distributed database , At the beginning of the design , What are the difficulties ?
about OceanBase for , It is not considered from the beginning of the design to solve the problems and difficulties in the longer-term stage . There is a process of decision-making and project establishment , Including what problems can be solved , What kind of value does it produce , How much investment is needed , How long and how much will it take to solve such a problem . In the early stage, it was solved step by step based on scenarios and problems , Indirectly complete the whole cognitive process in stages .
The first stage comes from the scene of Taobao , The solution is the scale and volume of core data , The throughput including data access can cope with business growth in the case of continuous growth . The first step is to solve the problem of distribution , It is a distributed model without relation NoSQL. With the problem of Taobao scene solved , More business scenarios also have expansibility demands . Many scenarios are based on relational databases , Demand for database , Some businesses can be solved through application layer transformation , But if a large number of businesses are going to move towards general scenarios , Relationship model is also a necessary demand .
The second stage is after solving the distributed problem , Gradually complete the relationship model , Support SQL Ability . With OceanBase From the inside of ant and Alipay to the general market , You can see that many business scenarios are based on MySQL Open source ecological , It's also based on Oracle Business ecological , So how can applications migrate to new distributed databases ? In fact, there is a big problem .
The third stage is to achieve good compatibility , For users , What are the costs and costs in the application layer, including the migration process ,OceanBase To solve the MySQL Compatibility is completed Oracle Full compatibility of , Including grammar 、 Semantics, stored procedures, and so on .
After that, the key problem and challenge will be when the data volume reaches a certain scale , about AP The appeal of ability is still relatively strong .HTAP At this stage , Is a key difficulty and technical challenge . With the development of Cloud Computing , Support for cloud infrastructure and even cross cloud heterogeneous clouds , There will also be many customer demands .
So at the beginning of the design, including the technical difficulties in the development process , In fact, it is found gradually in the process of supporting and serving customers .
Q : At the bank 、 In financial scenarios such as insurance, many enterprises have chosen distributed database as their core first choice , In which scenarios will there be similar situations in the future ?
Before OceanBase There will be more applications in financial scenarios , In the past two years, in addition to finance , We have applied it in many enterprises in the general market . There are many factors influencing this problem , Today, I will discuss and share from the perspective of products and technology .
In the core scenario of distributed database, which has been widely used at present , They choose distributed databases . In fact, there are several kinds of reasons , We all choose based on several categories of different demands and considerations .
First of all , Users with core system migration needs . These users are often not driven by business demands , Because in this scenario , What kind of architecture does the user choose , Is it centralized or distributed , A strong constraint that is not particularly strong , But more concerned about Oracle Compatibility 、 During switching and substitution , Whether it can smoothly migrate with high cost performance .
second , Users who solve continuity problems . For the supply of databases 、 Buy 、 Including the ability of database service, we are facing more and more such challenges , So continuity may also be one of the important reasons .
Third , Users with real business demands . Because the essence of choice finally comes back to business , Or go back to the market , To choose whether a product is economical 、 Cost-effective 、 correct . From this perspective , Whether it is the rapid growth of customer business , Then it leads to the expansion of the database or data management layer 、 The appeal of expansion , Or because the amount of data is too large , The centralized architecture can't bear the future business demands and data volume . In fact, such users' demands are more rigid , It is also more urgent , It is also a core factor that can touch and drive users to really make up their minds to do technological upgrading .
In this process , The key factor that users care about is the ability of the database itself , Include features 、 performance 、 specifications 、 Whether these capabilities of safety are satisfied , Another big factor is the cost and cost of application migration . There are great differences between database as basic software and Application , For complex business systems , The impact and cost are relatively large .
So for what scenario will you choose distributed and what scenario will you choose a new database , It's actually two concepts . For selecting a database , We should pay more attention to whether the data security and specifications can meet . But whether to choose distributed , There need to be some core demands : Or the original single financial system cannot meet the demands ; Or I will upgrade and reserve some technology for the future . Finally, which scenarios may choose to distribute databases in the future , Or we should return to the topic of the real market choice itself .
Q : About the operation and maintenance of distributed database products or large-scale intelligent operation and maintenance , Any good suggestions ?
Actually, a product , The larger its application scale , Its appeal 、 The challenge will be higher . Because when the product is not applied on a large scale , No matter what he does , Even if it is supported by human flesh , It can also support the operation and maintenance well . But once the quantity is increased , This operation and maintenance is a great challenge , Especially database products , It is quite different from ordinary applications , It will affect the influence of various factors during operation . Such as software and hardware failure , That's why DBA Such a professional group comes to specialize in the operation and maintenance of this database . After decades of development, traditional databases , It has formed a relatively mature DBA Groups , In addition , Compared with centralized database, distributed database , In fact, there are different challenges .
The first is the technical architecture , It brings higher complexity in itself , Compared with centralized database , Distributed for high availability fault recovery and various tuning of execution plan , It will have a higher demand for ability . For a new product and architecture , Learn to understand , And then form a universal cognition , And the proficiency of using it , In fact, it takes a long process . This may not mean targeting a database , But because of the distributed architecture and technical characteristics, there will be such a process , And in fact, we are on the issue of operation and maintenance , There are also several levels of practice and exploration .
The first level is product capability . In addition to the core of the database , To really use a product, we need to consider various tools related to product matching , The whole cluster management, operation and maintenance monitoring 、 Full link diagnosis , There is also an automatic optimizer 、 Function tuning 、 Self service operation and maintenance , So using tools to solve this problem is very critical . Of course, there must be a need DBA Cultivate such ability with others , But if you have better tools or more complete tool capabilities to support , There are better technologies to provide this support and query , In fact, it is a great help . So in terms of product capability , We need to build such capacity support and base .
The second level is the cognition of the whole user level , Including the training of operation and maintenance personnel system .OceanBase Now we are training some users through open source , We also suggest that everyone use it , Know what characteristics the database has , What's the problem . More people use , More people will understand it in operation and maintenance 、 Use it . At the same time, participate in some training and certification , image OceanBase current OBCA/OBCP/OBCE, For different people , Different characteristics to create a full certification system , Let more people learn quickly 、 Use and understand this distributed database . This is not just for OceanBase, Instead, I want to operate and maintain the entire distributed database 、 Optimize to cultivate such talent base .
The third level is in the process of marketization and delivery , Slowly realize and understand the operation and maintenance of the whole database . It's not enough to rely on ourselves , image Oracle Such a mature product and company , Its operation and maintenance also depends on a large number of third-party professional service companies and DBA The group worked together . For distributed databases ,OceanBase Belongs to a start-up company , Now the scale is small , Later, once the scale is applied to more industries and user scenarios , It is difficult to rely on the original factory to support the service . So for database operation and maintenance , I think we should put all kinds of professional service companies of the third party 、 ecology 、 Include DBA The ability to make comprehensive use of .
In addition, you will also have some exploration . Like qualification and ability 、 Include AI It is also a framework for our exploration in the future . I think at this stage , The three levels just mentioned are the key points we should focus on and invest in .
Q : Many domestic emerging database products R & D reference Google Spanner The paper of , What do you think about distributed database technology at home and abroad , What are the gaps ?
Before, we faced some business challenges under the traditional database architecture , Such as the rapid explosion of data scale 、 Analytical appeals , Under such a scenario , It brings us a lot of different innovative thinking in technology , This kind of breakthrough and Enlightenment in thinking is of great significance . With Google For example , It originated from its own business , because Google Globalization itself 、 Cross region , It also has a very large business scale , It is based on its own business scenario , In the process of solving their own problems, they have accumulated some technical solutions and capabilities , Then some output .
Again for instance Spanner Such an architecture , It is a direction to solve such problems , But the effect is hard to say , At least it seems that it is far from perfect , Include Google The marketization or commercialization ability of its own database , It is hard to say that it is relatively successful , Like some core technologies he mentioned ,True-Time API And the ability of globalization , Is it applicable to all scenarios , Not necessarily .
There are great differences in the demands and scenarios of different enterprises , Whether domestic or global, the promotion of distributed databases , There are mainly several big cloud factories behind it . Because of the characteristics of yundachang and Google Their own business characteristics have certain commonalities , That is, large enough scale and large concentration of data , Such a scenario will have some core demands for distributed databases , But it will have a strong color of manufacturers , How to meet diversified scenarios , It has very important practical significance .
So is it based on the data centralization demands of cloud manufacturers or Internet manufacturers , Or say all in cloud To do distributed database ? It's not , For example, in China , It will have better soil to promote and develop diversified scene support , Instead of everyone gathered on the same road to do this globalization , Or the native solution of the public cloud .
Comprehensive, , Compared with foreign countries, our technology gap is not that big , Because different scenarios need to be solved , We have our advantages , In the early stage, he may also have some advantages . It can only be said that at this stage , From the technical progressiveness and technical strength 、 In terms of ability , We have enough confidence to face any situation , Including the challenges of large-scale domestic challenges and globalization demands , We all have enough self-confidence to solve these problems .
Q : Under the current cloud native trend , What are the breakthroughs in the development of distributed database technology ?
The current database is in the state of letting a hundred flowers bloom , Our insistence is that we should build a distributed relational database that is transparent to user applications . That sounds simple enough , But in fact, there are several key elements .
First of all , We need to do application transparency . There is no need to perceive and solve problems in the application layer , Nor is it a middleware solution , Instead, solve problems at the database layer . In other words , Is to leave the complexity to the database , Leave simplicity to users .
second , Strict ACID Guarantee . The database must ensure the correctness of the data . In other words , Namely OceanBase Always stand on HTAP, Constantly support and expand our AP Analytical ability , Not that we are AP To support strict OLTP, This is one of the core elements we adhere to in the direction of distribution .
In addition to supporting these two technical challenges , I would like to talk about another topic , Distributed database is in the process of market application , The biggest problem and challenge is to find out what problems the customer has encountered , What problems need to be solved , In addition to the technical competitiveness and advantages of distributed databases , Manufacturers really want to return to the world from heaven , Let everyone use it , Here are also a few key points .
First of all , For large-scale distributed transactions , It can support and ensure the consistency of transactions . Under normal conditions , I believe many people can solve , But in the case of failure or various abnormal software and hardware failures , Go and recover , At the same time, the recovery process can not affect the business , This is a huge challenge for the production system . From the perspective of users , We are very concerned about whether the distributed technology system can solve this problem .
second , Whether it can migrate smoothly . When a large number of applications migrate from the original system to distributed , Whether there can be a general scheme . Another reality is that we have to solve a large number of different industries 、 scene , Migration of massive applications , It is also an element concerned by many large users .
Third , If we expand or generalize this scope , In fact, the application scenarios of different user scenarios or the same user are very complex . It may have different infrastructures ( Private cloud 、 Public cloud 、 A hybrid cloud ), And a large number of systems are not switched all at once , The cost of this risk is too high , So it must be gradual . If there is such a scene , It will bring a lot of support from this different infrastructure 、 Demands of heterogeneous deployment , This also requires the database layer to provide consistent capability experience and capability support to ensure .
Last , What I want to say is HTAP, Namely TP and AP Whether our abilities can be integrated , In limine , In fact, there is no concept of separation between the two , Later, with the increase of data volume and analysis demands . The original database capacity cannot meet , That's why it's separated . The current architecture also brings many problems , Including users' needs in this application , To build different business systems , To do daily production transactions , And the analysis of this role .
In terms of application complexity and customer input costs , Or return to IT The essence of , It is the resource consumption of computing power and storage , It is uneconomical . So there is a solution that can be solved at the same time TP and AP Fusion , And it has good isolation of load resources , Let them not be affected by each other , This is of great practical and economic significance , This is also OceanBase The next step is to focus on the key breakthroughs of investment and effort .
Founder of polar cloud boat Zhou Yanwei
Q: How to quickly understand what is a truly distributed database ?
If this question is answered concisely , It can be divided into two paragraphs . What is distributed and what is database .
First of all, database is to solve the problem of data storage and calculation . Secondly, distribution is to use a variety of resources to solve the storage and calculation of large-scale data , Allocate it from the resources and calculations of the original single server to multiple servers , This is the literal meaning of distributed database , But is such a product truly distributed ? I don't think so , It can only be regarded as an academic theory product , It is not considered from the actual demand scenario .
In addition to solving the problem of resource allocation of computing power and storage capacity, distributed database , It also depends on the ability and demands of the database in the actual demand , In addition, distribution can be considered as the allocation of various resources . for instance , Asymmetric distribution . Most of the time , What we are concerned about is that under the condition of equivalence , Then the distributed cluster is used for data storage and calculation allocation , But in some product requirements , For example, for edge to edge calculation, you need to calculate the result directly at the far end , Then relate these results to the central end , For such a scene , Obviously, the allocation of resources is unequal , But we still need to combine the technology of distributed database from remote end to terminal data , Then do some kind of operation and come to a conclusion , Is this a distributed database ? If the design is good , Of course, it is a kind of distributed , This is the first example . The second example is the distribution of unequal Computing , As we often see HTAP This diverse type of computation , Data will be stored in different storage , Different structures are used in different environments , There are different computing requirements or algorithms , So to build a distributed system , To solve the problem of computing power and allocation , Allocation of different operations and algorithms , If the system is well designed , This should also be included in the scope of distributed databases .
Q: Distributed database is still popular in recent years , But some overseas mainstream database manufacturers (Oracle,IBM) There is no big promotion , There are also some remarks that , Due to the large population of Asia , Large number of , So distributed database is more suitable for China , What do you think of that ?
Past image IBM,Oracle,SQL Server The main core product is the stand-alone version , Distributed databases have not been promoted or developed as their mainstream . On the other hand , About the size of the amount of data , I don't think it mainly comes from people , It comes from machines and equipment .
There is no essential relationship between distributed database and data volume , So why don't mainstream manufacturers do this ( Distributed ), I have also thought about this . Focusing on distributed databases can be roughly divided into two categories , An idealistic team of academic type , Put forward from a theoretical point of view , Then do distributed database , What kind of results or better products do you hope to produce for the database in the future . The other kind doesn't know much about databases or is called newborn calves are not afraid of tigers , Plan the long-term blueprint first , And know the result , At the same time, it's someone else's money .
Why do you say that? ? That is, the database first solves the problem of absolutely correct data , Secondly, we should consider the support of various performances and capabilities , But distributed means to use resources across networks , Ensure the consistency of various calculations or data , These are the basic capabilities of the database , Then the mainstream database manufacturers , Based on the needs of business and social responsibility , First of all, we must fully ensure the correctness of data storage , Also consider performance , Input output ratio , The second is to consider whether to do distributed , let me put it another way , If you use a single machine deployment or cluster, you can solve the calculation of data to a certain extent 、 Storage capacity allocation and other issues , Or if the problem can be solved without the distributed computing architecture in the database kernel , It can also ensure better benefits for the security and consistency of data , That may be an important reason why mainstream databases don't do distributed databases .
With the continuous development of hardware, network and distributed database storage capacity , The allocation of resource capacity can also be divided into several types for distribution . In terms of computing power , Can be based on CPU/GPU Share the , But due to the limitation of Moore's theorem , The improvement of computing power on a single machine will gradually decrease , But the distribution of physical hardware across the network will gradually strengthen . The network has appeared this kind of similar RDMA Such high-speed access . I don't know whether there will be cross CPU/GPU Integration of hardware , But now we have seen some from storage . So we used RAID Connect multiple storage disks to form a piece , In this way, the database is directly deployed RAID Later on the stand-alone , This is a form of storage expansion ; It can also expand storage directly from the perspective of hardware into a unified hard storage category across networks and machines based on intelligent network cards .
In other words, distributed storage may be solved from the perspective of hardware , Then whether your database needs to solve this problem or not needs careful consideration , Because the essence of database is to ensure the storage and calculation of data , Especially the correctness of data storage and calculation . It seems that the above said Oracle,IBM Etc. are inert to the implementation of distribution at the software level , There may be such a consideration .
If CPU And memory can also be made based on intelligent network in the future 、 Intelligent Switch 、 Cross host unification CPU And memory , Make it a hardware based distributed . If there are these resources , So just focus on the storage computing problem itself , There is no need to consider distributed . Although it is still immature , But I have seen some scientific research being done , This kind of product may appear in the future . So I think I will do it , But I said the same , That kind of inequality 、 Different qualitative computing requirements , Maybe it is a new challenge for database . This is what we achieved Data Fabric Things to be solved , It is a cross environmental dimension , Generalized distributed systems across data dimensions .
This argument may continue for some time , Software and hardware are mutually iterative and promote development , Wait until the hardware distribution is good enough , Then talk about software distribution , Just different categories . Believe in the near future , It will be a situation of generalized distributed domination .
Q: There are several forms of distributed database ( middleware 、NewSQL、 Distributed architecture + Enterprise class kernel 、 Calculation + Architecture separation storage )? Can we analyze the advantages and disadvantages of these forms ?
This is a very basic topic , From my point of view , We can analyze the advantages and disadvantages from several distributed scenarios . Personally, I think there are three levels of distribution , That is, user interaction 、 Calculation of data 、 Data storage . We can see it from three levels of distributed generation :
First of all , The routing layer . When SQL Write it in , Need to write database , If you are a middleware , That's what the routing layer of middleware needs to solve , A long time ago , We do sub database sub table 、 middleware , Even the data center , There may be such a middleware component , To solve this SQL Distribution of 、 Share the 、 Cache calculation and reorganization of data , This is the case when distributed occurs at the routing layer .
second , Computing layer . If the routing layer is scratching the surface , Is the earliest 、 Lowest end form , Then the distribution in the computing layer really involves the database kernel , It requires technical capabilities 、 And the maturity time of software will be longer , This is a system .
Third , Storage layer . Shared storage just mentioned , Or the earliest cloud native thought , For example, that set is based on AWS The theoretical system of , I think its difficulty lies between the routing layer and the computing layer , But it also considers various security 、 The result of performance balance . Let's analyze the characteristics of these three forms .
The routing layer used to , For example, Renren more than ten years ago , Taobao, etc. , Mainly in the form of routing layer . The earliest open source middleware , It is also open source by Alibaba , This is one of the earliest forms . Up to now , There are also some commercial Middleware in this form , But I think this form is a product of early technological development , To solve some problems at that time , But there is no better way to allocate resources , Then we made a kind of database and table at the later end , Use middleware to coordinate , And then solve the problem of resource allocation . Once our technology breaks through the technical barriers we encountered at that time , This plan will be eliminated soon . Because the middleware needs to solve the fusion calculation of various data , Cache all kinds of data , Be compatible with all kinds of grammars 、 And data consistency , These are its problems , And some problems cannot be solved at all , So we will see , There are fewer and fewer distributed systems based on this architecture , Or slowly, it will be covered by the waves of Technology . There is no future for middleware based distribution .
The other two mainstream should be the main development direction of the distributed field —— They are the distribution of computing layer and storage layer . Perform distributed computing at the computing layer , The underlying layer may be the fragmented storage of data , This form can be said to be an ideal form of distribution , It ideally solves the problem of resource allocation . I think at this level ,OceanBase It is a form of this kind of distribution , This is its advantage , But it also has some disadvantages , Where are the disadvantages ? In fact, we can examine the calculation of data in the database , Because the core demand of the solution just now is the correctness of calculation , Security of data and efficiency of calculation , The second is how to consider resource allocation .
If the resources are better allocated , To ensure the safety and efficiency of data , Then I'll sacrifice something . If you are doing a thorough distributed transaction , It means that in some extreme performance , It should not be compared with a single machine , It is not higher than single machine . This is a simple calculation model , such as MySQL It will be faster than distributed , This is because it is simple , So it's efficient , Can do better . So from this point of view , Based on the distribution of computer hardware and computing forms , It can reach an ideal state , But the complexity of its implementation will increase , It greatly increases the complexity of the database itself and the difficulty of database maturity , The other is its security for some data 、 Computing performance may be affected .
Connect with the previous topic , Why? Oracle Will not develop such products ? Why? IBM Of DB2 In this product, single library is the main , I think people who really make databases think about absolute data security and performance improvement , And choose the best between cost performance .
Let's talk about the storage layer distributed form , This one looks ideal now , Although not the ultimate form , But it is a very clever way . In other words, if we want to share the database capacity , It should be divided into two directions: Calculation and storage , Then there is the concept of separation of computing and storage , Because they can really be disassembled . And if you want to do real large-scale data support , We must first realize the separation of computing and storage , Only after the separation of computing and storage , Then we can consider what we should do in the storage layer and the computing layer respectively . Only when such conditions are met , Only then can we think , Distributed storage in the storage layer is the same as the original single machine , I don't need to consider the security of data 、 The problem of immediacy .
Of course, it does not rule out that we should do more idealized , More extreme products . But technology research and development should be combined with actual needs , We'll do it when the actual needs need it , On the contrary, the product is very good , The actual demand is not big , It may not rule out its continued existence as a scientific research product , Waiting for future applications , But from a commercial point of view , Maybe it's not just the pursuit of technical perfection , It's about whether technology can cover demand , To be efficient 、 Security . I think this may also be the idea of many established database manufacturers to design products .
in addition , Combined with the prediction of the future hardware development trend , If the problem of resource allocation is solved at the hardware level , There is every reason not to ask for trouble , But based on the implementation of hardware , Do your database job well .
For several forms of distribution , I think many should not exceed these three levels . And if each level is distinguished in this way , The characteristics of each level should be relatively distinct 、 The advantages and disadvantages are clear at a glance . So we choose the type to deploy 、 Which product to use , It should be defined according to the actual needs and the stage of social development .
Q: What value does the separation of storage and calculation of distributed databases have for business ?
My view of this may be extreme , The so-called distributed database , We must do the separation of deposit and settlement . Separation of deposit and settlement is a means , Its goal is to solve this distributed problem , Only when saving and calculating are separated can they be in “ save ” From the perspective of distributed extensions 、 Capacity expansion , And some high-end distributed 、 Copy and so on , And in “ count ” Consider how to achieve high flexibility from the perspective of , Improve resource utilization , On demand , Parallel computing and other issues .
I don't have to think about what is stored , Because after the separation of deposit and settlement , This part is transparent to users . In addition, there is transparency for the whole system 、 Highly available construction . For a traditional database architecture , Every node is brought together with storage and Computing . It's hard for you to make a business switch . After the separation of storage and calculation , You can almost do one server-less Architecture , This is very easy for switching logically , The business layer does not need to consider the switching of a node . This is important for business development 、 Operation and maintenance 、 Extensions are very valuable , So my view is extreme —— Separation of computing layer and storage layer , Do different distributed , Only then can it be regarded as a truly valuable distributed business .
Q: Learning and practice of distributed database , What should we pay attention to at ordinary times ?
I think there are two points , First of all , Know what it is , And know why ; Another point , True knowledge comes from practice .
To know what it is, we must know what it is , Explain that if we are using a complex system , The more optimized the system is 、 The simpler , The more things hidden inside . Because all personalized things are packaged , If you keep using it on the surface , Then you may not be able to solve it when it goes wrong .
Let me give you an example , When I went to where net , It's crazy to recommend one MySQL The component of is called Galera. Because in a period of time , It solves the problem of multi node synchronous writing in a small and beautiful way , I think it's valuable , But products like this , I have also seen many people complain that this architecture is not easy to use 、 This open source component is not easy to use 、 What's the problem . But I got hundreds of sets online without any problem , And then I wrote it as a book , On the 《MySQL Operations inside 》 in .
Why is this so ? Besides the product capability itself , It also depends on the user's understanding of it , And the ability to master it . If you just do something on the surface , It's hard to know why to make good use of this product . So we are using distributed databases, which are extremely complex 、 And after packaging , Or to understand the essence .
second , True knowledge comes from practice , The emphasis is to give yourself the chance to make mistakes . In other words, just talking on paper is useless , You should actually do it for a year or two , If given the chance , You can also do it online . But there are several kinds of online environments , You need to choose something that will not affect your online business . Use it to understand the differences of database products , It is a dynamic system 、 It's a continuous process , After operation , It will have all kinds of problems more or less , Only climb with problems , In order to let you understand more deeply , So that you can read the source code , Where to read , Let you have a deeper understanding of the product or system itself .
Q&A
Q: Availability of distributed database to database 、 Is there a particularly good guarantee for integrity , Can it be like Oracle like that , It is widely used in finance 、 Carrier industry ?
Wang Nan
This problem is actually one of our ideas and positioning , Our goal is to solve this problem . Simply speaking ,OceanBase Is to enter the core scenario of the enterprise , Then put all kinds of demands brought by this application , Include features 、 Performance safety 、 Data consistency 、 integrity 、 The guarantee of correctness is built at the database level .
There are many solutions using middleware that depend on Applications 、 constraint , For example, things like two stages across nodes , In the event of a breakdown , For the consistency of things , The residual state including rollback is not fully guaranteed , It needs to be constrained or handled at the application layer , To ensure the correctness of the data ,OceanBase Never throw this problem to users , But at the database level . The key technology of the core scenario of Finance and operators is the reliability of data 、 Security 、 Correctness the appeal here , With this kind of Internet , This scenario , Its tolerance and maturity are different .OceanBase From the installation and deployment of the database 、 Development 、 Operation and maintenance 、 monitor 、 Troubleshooting .
Q:Arkcontrol Is there a plan to access support OceanBase?
Zhou Yanwei
Arkcontrol It is a component of our product system , The meaning of its existence is in two directions .
One is to provide users with a management layer of our entire product system , He is responsible for the Yunzhou data longitude and latitude platform (DTArk) Operation and maintenance of , monitor , Data backup management .
Second, to meet the needs of users , Built on user needs , Then solve the problems of the integration management of multiple databases at the user level . My users may be in addition to their own products , At the database level, there are MySQL、MongoDB、Oracle、Redis, I can provide him with a unified platform for unified management , It belongs to a convenient tool , Its core meaning is management .
As for whether you will accept OceanBase , I think it needs to return to the market , See what products users deploy before making a decision , Suppose many users IDC Many have been deployed OceanBase Products , We also hope that for the convenience of users , To support OceanBase Some systems of . I have been adhering to a point of view , Is to go deep inside to do things , Instead of floating on the surface and periphery . Of course, in the process of support, I hope OceanBase It can open more interfaces for us , Let the management do well .
Q: The performance requirements of distributed databases are very high , Is there any plan to say ordinary PC Or a plan that reduces performance and can run ?
Wang Nan
Tell the truth , We have noticed , And I've worked hard for some time . First, on the public cloud , A large number of such small and medium-sized customers will have such rigid demand , Especially for small customers , In fact, he doesn't need a server with particularly strong performance to solve the current problem , Or there are many such developers 、 Individual users may want to take a small scene , Even test or use it on your own basis .
Now we have several directions to work on this problem : The first is the consumption of resources , Next OceanBase Will launch products with lower resource consumption . At present, it should be for 8C 64G Such memory demands , This year we will reduce the cost of resources to 4C, In the future 2C even to the extent that 1C. meanwhile , For more customers to use and experience more conveniently , For the specification scenario of a single node , We are also doing some exploration now .
in addition to , On the public cloud, we can consider using multiple leases to solve the cost problem of users , Because if a large number of small users monopolize this way , It will bring costs . For non core businesses , Customers can reduce some demands on resources , To get some return on economic investment , This kind of multi rent is better . We now have the ability to rent multiple kernels , Then do the resource isolation ability of storage computing and memory , In the future, we will also release payment capacity in the form of cloud , Let users experience it quickly , Let's summarize this problem again , It is the demand for this small specification , The appeal of lowering the specification is already being made , And we will see great progress this year , future OceanBase Will continue to do this .
Q:HTAP How to understand the intermediate state ?
Zhou Yanwei
There may be many kinds of understanding , As a product designer , We should consider from a higher goal . What is a lofty goal ?HTAP——TP And AP The form of fusion , This is also the combination of two basic capabilities of database . If you design such a product , So is your state the ultimate state ? Obviously not , Because the calculation of data except TP、AP outside , And the calculation of edges 、 Calculation of retrieval , Calculation of graph, etc , Due to the existence of this computing demand , So that we have all kinds of databases on the market .HTAP Maybe it's the original TP Type and later dominated by big data AP Get through , yes 1+1 The pattern of , Can we combine three into one 、 Four is one ? The design of the product determines the realization of the product , If you only consider HTAP Two dimensional intermediate state of , If you want to expand the implemented product in the future , It's very difficult .
So in terms of product design ideas , If we can solve the product form of multidimensional Computing , That may look the other way HTAP, It is an intermediate state , It is a very important step we have taken from one dimension to two dimensions . But this step is not an end , It's just a starting point , We still have a lot to do in the future . Let's look at it from this angle , If we design products , Design a frame that can support two-dimensional 、 Three dimensional and even multi-dimensional is the best , So the person who does technology in turn becomes a product manager , It can be considered synchronously from the perspective of product requirements and product realization . If you only focus on this dimension , Then in the future, when we look at a higher product demand , Maybe a lot of code you write now will be overturned and restarted . We are not willing to do this , We have to consider it more comprehensively , Make a bigger framework , Be able to realize this two-dimensional , At the same time, it can also be compatible with future multidimensional needs , This is the most ideal scene .
our DTArk It is based on such thinking , It has been realized TP/AP/FP Fusion computing , It is estimated that graph calculation will be supported soon , This multi-dimensional and extensible architecture is the same as thinking directly to realize HTAP It's completely different , let me put it another way , current HTAP Products should further realize the integration of more dimensions , May have to overturn and start again , The code has been rewritten .
Q: Is there any recommendation for distributed databases of the same service in the world ? Cover sub 、 beautiful 、 The kind in Europe ?
Wang Nan
I believe the students asking questions may not be talking about whether there is a database that can adapt to these areas ,“ Global service ” I may be asking whether there is such a database on the public cloud that can provide consistency capabilities in different regions of the world to support the deployment demands of global applications and data , It's really challenging now .
There is also an implied appeal : The same cloud can provide such capabilities in all regions of the world ? There are still many customers who don't want to be bound by a cloud . Simply look , Is there a distributed database or cloud service that can provide services around the world , This is actually quite a lot , There may be some differences in ability , Including several major cloud vendors , In fact, on the cloud in all regions of the world , Both include RDS, Shared storage services 、AP Products , If it is a cross cloud appeal, it will not be bound by a cloud , Especially large customers .
If you can only do application infrastructure on one cloud , In fact, there are many risks . Including technical 、 Business security 、 Cost risks . At present, many people have put forward the appeal of cross cloud , But not everyone can solve , Because what cloud manufacturers can solve is how to provide global and cross regional services based on infrastructure . But for cross cloud , It may still depend on independent database products 、 Manufacturers should solve this problem .OceanBase Now we are considering 、 solve , You will soon see that we have some products and services .
Zhou Yanwei
If you answer from the perspective of pure technology , There must be , For example, relying on the distributed network , Regardless of performance tolerance 、 Time cost , There is no problem with distributed global rapid deployment . let me put it another way , The bottleneck of global deployment , I think it lies in the network and authority , Just cross these two points .
In practice , We should really consider cross center 、 Cross cloud 、 Cross globalization , Also consider the combination of performance , It seems to be difficult now , Because the network must have timeliness , The speed of light is hard to break . So back to second place , You can only do shared storage on a distributed basis , Or both distributed , Also do shared storage , It mainly depends on the timing or performance tolerance of the business . We are realizing DTArk In this way Data Fabric Product time , It also proposes an innovative technology —— Data shuttle . It's simple , You have to do this , It may not be necessary to synchronize the full amount of data on a large scale , Maybe only a certain batch of data is needed , Then pass it around , There is no need to make the world synchronized , Waste a lot of time waiting for the network , But in some data or business 、 The data of different dimensions in the calculation scenario is cross domain or cross interval . At this moment, we should not consider instant synchronization , But the ultimate consistency ; Not instant calculation , It is the calculation of the time window according to the time configuration . I think the key point is still the problem of the network , Network delay is the essential problem that affects all requirements .
边栏推荐
- 协议圣经-谈端口和四元组
- After the QT program minimizes the tray, a msgbox pops up. Click OK and the program exits. The problem is solved
- 2018 arXiv | Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Mo
- 划分数据2
- NLP introduction + practice: Chapter 2: introduction to pytorch
- 数据可视化-《白蛇2:青蛇劫起》(1)
- 每日三题 7.22
- Sentinel three flow control modes
- How to gracefully realize idempotency and distributed current limiting of distributed interfaces (glory Collection Edition)
- Qt程序最小化托盘后,再弹出个msgbox,点击确定后程序退出问题解决
猜你喜欢

Real time weather API
![[micro service] eureka+ribbon realizes registration center and load balancing](/img/a7/57e28ce146270524774fd5d3304d96.png)
[micro service] eureka+ribbon realizes registration center and load balancing
![[FPGA]: IP core --divider (divider)](/img/bc/d8b7638e236c468ba23c8afc7ab70e.png)
[FPGA]: IP core --divider (divider)

LoRa无线技术与LoRaWAN网关模块的区别

Simply use MySQL index

零基础学习CANoe Panel(5)——改变变量的值,控件图像也改变,这是怎么回事?

TwinCAT3各版本下载路径

简单使用 MySQL 索引

Hash, bitmap and bloom filter for mass data De duplication

MySQL engine
随机推荐
PC博物馆(1) 1970年 Datapoint 2000
MySQL - multi column index
BBR 与 queuing
Qt应用程序防止多开,即单例运行
Distributed lock implementation scheme (glory collection version)
零基础学习CANoe Panel(3)—— 静态控件(Static Text , Group Box ,Picture Box)
When to use obj['attribute name'] for the attribute name of an object
测试左移和测试右移,我们为何要“上下求索”?
[dish of learning notes dog learning C] detailed operator
西门子200smart自创库与说明
563 pages (300000 words) overall design scheme of smart Chemical Park (phase I)
Cross platform audio playback Library
【白帽子讲Web安全】第二章 浏览器安全
小熊派学习——内核开发
1184. 公交站间的距离 : 简单模拟题
Filter the data with signal processing toolbox software
神器 ffmpeg —— 操作视频,极度舒适
js树形结构,根据里层id找出它所属的每层父级集合
实时天气API
[FPGA]: IP core -- xadc