Uncover n core 'black magic' of Presto + alluxio

I can give you a brief introduction Presto, Because many people may be the first to contact the technology of these big data platforms . that Presto What is it? ？

In fact, it can query a large number of 、 A of massive data SQL database ,SQL You've seen a lot of databases ,MySQL、oracle These are all SQL database . Many people may also experience ,SQL It is a very convenient language for querying data . Then why do we have Presto Well ？ First, if you use MySQL,oracle Words , You'll find that it looks up some small-scale data , If it's easy to hit , It's fast . But if you want to check a lot of data , For example, when there are hundreds of millions or billions , Performance continues to decline , And if you want to do a full table scan , This is a disaster for them , Maybe it will become very slow . Then there will be Presto A distributed SQL Big data query engine , This is it. Bin（ Fan bin ——Alluxio Founding members and VP of Open Source） The previously mentioned computing and storage separate computing engine , It no longer manages storage like our traditional database ,Presto Do not manage storage , It gives storage to a third-party platform , It can be HDFS It can also be GCS It can also be S3, Anything? .Presto Just calculate , In such a case , We can separate scale up&down, Then achieve the purpose of processing massive data .

What is the situation now ？ My words are groundless , You can see some information on the Internet , The data of Internet giants are massive , Are used by billions of people , These Internet companies are widely used Presto To query the data , What is the approximate amount of this ？ A basic Presto colony , May be 200~400 Of the server Presto colony , It can handle dozens of... In a second B, That is, tens of billions of lines of data , If you look at this production environment Presto, We can see Presto Tens of billions of lines of data can be processed in a second , This is really suitable for running with a large amount of data SQL Query for .

I just mentioned , Presto Is the separation of computing and storage , We all know that the Internet , Although the Internet seems to go everywhere very quickly , But there is a distance . For example, our live broadcast today , You may see , The video between us is not so smooth , Why? ？ Because we're a little far away , The transmission distance from Silicon Valley in the United States to China is very long , There are many gateway routers , The delay is growing , and Presto I don't have my own data , To query anything , Read out the data set to be checked , All of them load Come in , Scan it line by line , The speed depends on the speed of network transmission , In fact, this is also Presto One of the bottlenecks . We can Alluxio and Presto Or other computing engines deployed together , It acts as an intermediate layer , That is, the computing engine only starts from Alluxio Get data , then Alluxio Responsible for getting data from actual storage . For example, our leftmost example , This is the most commonly used architecture for many emerging Internet companies or new start-ups , It starts from the first day of starting a business or the first day of establishing a company , Data is completely on the cloud , You don't have to build your own data center , This is a very common pattern now . The data is on the cloud , How to calculate ？ Taking data from the cloud is actually expensive on the one hand , On the other hand, the speed may be slow , What to do with that ？ We'll pass Alluxio Make a layer of caching mechanism , If the data is in Alluxio Inside the cluster , Then we'll go from Alluxio Can get , If not , Our computing engine doesn't have to care , Give Way Alluxio Go to the cloud and get the data back , Play a role in speeding up .

Some friends said that our company didn't go to the cloud , There are some data centers . But many companies can't put a single place in their data centers , Many companies may have data centers in Beijing , Xi'an has a data center , Hangzhou has a data center , For example, colleagues in Xi'an want to run a calculation , Want to run a SQL Query for , What to do with that ？ This data may be in Hangzhou ？ This is another different process . It can also be used Alluxio,Alluxio Deploy with Computing , For example, calculation Presto colony ,500 The server is in Xi'an , Then we also put Alluxio Deploy with it . Computing engine to Alluxio Come and get the data ,Alluxio Responsible for getting data from remote data centers , And with this architecture , It can support multiple remote data centers , It's just... Anyway Alluxio To find the data ,Alluxio Responsible for getting the data back for you . This is a case in the middle . The situation on the far right is actually a situation I experienced personally . The data of this company was originally in its own data center Hadoop On , But maybe the machine is not enough , What if the company wants to go to the cloud ？ Because machines on the cloud may be cheaper , And it will be easier to get a new machine . In fact, many companies will wonder if they want to put data on the cloud copy One copy , Or for example, computing is on the cloud , Do I put the data on the cloud copy, But this expense or cost 、 It takes a lot of time, so what should I do ？ Our cloud computing cluster can deploy a Alluxio, In the same way, speed up . This figure solves the problem of long-distance , The distance increases, or in other places . Of course, we will also have some specific examples and data later , You can see Alluxio There will be acceleration , But also convenient to access such a process .

Let's go to the clouds , Or use third-party storage services , It is also a challenge for our computing engine . In fact the Presto and Spark not so bad , Because of all kinds of object store Basically, they all support , But if your company has many kinds of storage technologies , Then this problem is a little troublesome , Can pass Alluxio To be an abstraction layer , Help you manage 、 Centralized use of various storage methods . Because now many companies also have price strategies , You may not want to be limited to one cloud service , Right ？ Or the data may S3 Put a little ,GCP Put a little on the , Right ？ That actually exists Bin The problem of data islands just mentioned , Data is everywhere , What do I do ？ Real data users , Not so concerned about where the data is , He only cares about the query and access of data on this logical level , It doesn't care what form the underlying data exists . So through Alluxio To make a layer of abstraction , Make an isolation , This also reduces the burden on the computing engine .

Let's build such an abstraction layer , One layer In the middle , What are the benefits ？ The first advantage is fast , Why is it so fast ？ Because I mentioned earlier that the Internet also has a distance , The network has a distance , Then our Alluxio As a cache layer , It's closer to the computing engine , It can even be deployed on one machine , The latency of data access and reading will become very low . In fact, many problems were solved quickly , If you have a friend who maintains a large data computing engine, you will know , As the speed increases, the throughput of the whole system will also increase , Stability will also increase , All aspects will be improved . In fact, speed is not so simple , It also provides speed stability . I don't know if you've ever met , Do you use Presto When , The same query , May need to... Today 5 minute , Run it for another two hours tomorrow , It's not stable , Why not ？ Part of the reason is the network delay , Partly because Presto There is a scheduling problem , There are many other users , Or your priority has been lowered , So it's not stable . Then this instability brings a lot of disasters , We call it SLA. If your users are individuals , He may be a data engineer or a data analyst , He was supposed to run one day, for example 10 strip SQL Query for , Because he's going to make a statement , Then his number is unstable , Didn't come out for two hours , Maybe try again or something , I can't finish the task that day . If the client is a service , If the delay is unstable , If you exceed the upper limit , Then the problem is even more serious . Because any client will have a timeout setting , It failed directly . And just mentioned the network I/O It's a bottleneck , If there is Alluxio, It can help you block some network transmission , Reduce the burden of the network , Bandwidth can do a lot of things , Yes Presto For all kinds SQL Various join The connection of , Will become faster . And we don't just have this service that looks like a cache , And we also have some metadata services , It's called catalog service, Then we have some data conversion services . Because sometimes some data formats are not so conducive to query , Then you can do some conversion , Let's say a csv convert to parquet, It is more convenient for the computing engine to do query and Analysis .

This picture is about Presto and Alluxio How to work , Have a look first Presto No, Alluxio How did you work before . In fact, any one Presto The query is divided into two steps , The first step is to go first Hive Metastore Take the metadata . You're not giving me a table Do you ？ Then I'll take a look at this table Where does it exist , This metadata will give it a series of locations , A long list , Maybe hundreds , Thousands, even tens of thousands, hundreds of thousands , It's actually some files or paths .Presto After getting this series of paths , There's no black magic here , Namely Presto Read out the data , Make calculations to see what data meets your requirements .

When we have Alluxio after , We will let the metadata return a Alluxio Address , Because you could have been S3,HDFS, Now is Alluxio 了 , There's a real dark magic behind us , I'll talk about it later . Then it gives us a Alluxio Of table The path of , then Presto Will take this path to find it Alluxio. You used to put S3 Good ,HDFS Good , Mount it to Alluxio,Alluxio I know you want this thing , Because we have metadata services , Then we know you want this file , that Alluxio Will see , If I have , I'll give it to you , I didn't. I went to the bottom store , Get the data you want . Here we actually have two changes , One is metadata. There will be URL The change of , Then another job is that we need to mount real storage , Maybe now some people will feel that this seems a little troublesome , Then there will be a black magic later .

We are also in the latest version or two , Joined a call Transparent URI And auto mount , Now we don't bother with this metadata and Hive Metastore Give us an updated address , So you are S3 still HDFS, You'd better give me this , So for Hive Metastore Come on , It can be used to Alluxio No sense . That's like saying Presto Still got it S3 The address of , We don't have to change the code , So let's revise that Presto Configuration of , hold FS Replace the default implementation of with Alluxio Realize it by yourself . When he saw this S3 Of , He will go Alluxio look for , It will have an automatic mount mechanism , There will also be some configurations in advance , But after adding the configuration of one or two lines , He himself to the corresponding S3 bucket On , Mount this data . But for us Hive Metastore Good , For our Presto So are the users , It's all unconscious , Because it looks like the watch is still S3 On or in HDFS On , But in fact, it's to Alluxio I walked here for a while , If our cache hits ,Alluxio With this data , It will have a very good acceleration effect .

What do we see from the user's point of view ？

Because I said earlier ,Presto It's a separation of storage and computation , It doesn't manage storage space , you are here Presto When building the watch , Build this external table when , What are you actually doing ？ It just specifies a path , tell Presto Or tell Hive Metastore, Let's build a watch here , This table is in a certain position . If you have Alluxio Words , In fact, when you were building the watch, you told Presto, This watch is in this place , We are a Alluxio It's good to start with . All the later work is actually done by Presto and Alluxio To do the , The user's modification is already very few . That's it Transparent URI after , In fact, there can be fewer modifications , We don't have to change the original table creation script , If you had a watch , You are built on this S3 On , You don't have to move , But if you don't build , You can use this command , Just like the ordinary one before S3 Or HDFS They are as like as two peas. . Then you just check it inside , You write any SQL Inquire about , You don't care where its path is , Once the watch is built , For ordinary data engineers or data designers , They know this table The content is actually enough , You can start from table Get the data .

Then let's introduce some of our real cases , Let's have an intuitive understanding of our acceleration process . Because I also personally operate and maintain Presto For years , I feel like the biggest complaint I've ever received , Complaints are good , Why are you so slow ？ I haven't finished this query all day , I've been delayed by you all day . What to do with that ？ If you are also maintaining Presto, Or your company has Presto Words , Try our Alluxio caching, After some simple configuration and optimization , You'll see several times the speed . It's specially written here I/O Intensive query , But if you really operate Presto Words , You'll find that most of them query All are I/O Intensive access . We did a statistics before ,90% All my time is spent scanning my watch , That use Alluxio It will give these queries a very beautiful speed-up .

Then you can see that small files are specially emphasized here , In fact, there will be another live broadcast of this small file later , I will talk about this metadata aspect . Why does it emphasize small files , In fact, the bottleneck of small files , Some of them have changed from table scanning to metadata acquisition , Because the metadata mentioned earlier gives you a file list , If the file is very small , There will be a lot of files in the same amount of data , Then it will take a long time in this place . For this case , If we don't have Kaiyuan data cache , In fact, there was no acceleration 5 A few times , Seems to be 1 How many times or how many times , After we open the metadata cache , To speed up 5.9 times . Interested friends can follow our follow-up live broadcast , We will have one catalog service Live broadcast .

So how does this come true ? You can see the picture on our right , The data is still here S3 There is no change in , Right ？ This picture reveals a lot of information , You can see Alluxio and Presto There is a one-to-one correspondence . In fact, that's how we deployed , Let's put each one Presto worker And each one Alluxio worker Deploy together , Why deploy together ？ Here I want to mention , These two things are really a match made in heaven , Because its use of resources is misplaced ,Presto Will emphasize more on CPU Use , It needs a lot of CPU, It requires a lot of memory , however Alluxio Yes CPU Not much , Not much memory , But it requires a lot of disk . So you want a physical machine , Just put them together . A friend said that these two things seem to need the Internet , What do I do ？ If they're together , If you can hit the cache , You don't need to take it from the outside , If you miss , Then they just need to go out and get one , In fact, there is still no improvement in the total amount , If the cache hit rate is high , In fact, the pressure on the Internet has decreased significantly , Therefore, their cooperation can achieve the effect of speeding up the query .

This project is between us and Facebook Facebook collaboration , This is also BIN With a project I've been working on . In fact, this data is a little conservative , From their last lecture with us , They got 10 More than times the increase . Then this technology is slightly different from the previous one , The front is a distributed cache design , We call it... Here Presto worker Of local cache, It's a little different , It can be understood as a level-1 and level-2 cache . Then this is the same logic , If we hit , We'll give it locally , If you miss, go to the remote to get . This project has been widely used on facebook , Tens of thousands of servers are in use , It may be inconvenient for me to disclose the specific figures , These servers use our local cache To speed up , And we will have a lot of follow-up work , In fact, we still have great potential .

It is characterized by , You can also see on the picture local cache Is running in and Presto worker same JVM Inside .Presto Reading documents , No matter from HDFS It's good to read , from S3 It's good to read , You read a file , Then we have this cache file system, You can read this file from us , It will also have a very good effect . From this data, we can also see that we have improved the stability of query time , We are right. p50, This is the median increase 33%, Yes p95, That is to say 95% The time query is 48. Slow queries may have better results , And we also reduce the burden of many networks , This is actually a sign of success . The stability and speed of their services have been greatly improved .

We have more examples here , A lot of companies are working with us , They all achieved satisfactory results , It's just part of it , Many companies are still working with us , You can see these two links below for more information , Thank you. .

当前位置：网站首页>Uncover n core 'black magic' of Presto + alluxio

Uncover n core 'black magic' of Presto + alluxio

边栏推荐

猜你喜欢

随机推荐