当前位置:网站首页>Installation and deployment of alluxio

Installation and deployment of alluxio

2022-06-26 05:18:00 Air transport Alliance

Alluxio Installation and deployment of

Quick Start Guide - Alluxio v2.6.2 (stable) Documentation

Development of guidelines - Alluxio v2.6.2 (stable) Documentation

Deploy Alluxio - Run independently on the cluster Alluxio - 《Alluxio Community Edition v2.0 Official documents 》 - Book stack net · BookStack

First, determine the deployment environment , You can choose to deploy locally 、 colony 、AWS etc. . That is to choose Alluxio Underlying storage , Is to use the local file system 、HDFS、S3 etc. .

Tips : It can be downloaded from Alluxio The download page Get published version . Every Alluxio The release versions all offer different Hadoop Version compatible precompiled binaries . from Master Branch building Alluxio page Explains how to compile and generate from source code Alluxio project .

1. The basic requirements

The following is running in local or cluster mode Alluxio Basic requirements :

  • Cluster nodes need to run on one of the following operating systems :
    • MacOS 10.10 Or later
    • CentOS - 6.8 or 7
    • RHEL - 7.x
    • Ubuntu - 16.04
  • Alluxio need JDK 8. Higher versions are not supported :
    • Java JDK 8(Oracle or OpenJDK The distribution supports )
  • Alluxio Support only IPv4 Network protocol
  • Open the following ports and protocols
    • Inbound TCP 22 - As user ssh Enter the specified node to install Alluxio Components .

Master requirement

Here's the run Alluxio Master The configuration required by the cluster node of the process .

Note that these are the minimum requirements for operation . Large scale operation under high load Alluxio The corresponding system requirements will increase .

  • least 4 GB Hard disk space
  • least 4 GB Memory
  • least 4 individual CPU nucleus
  • Open the following ports and protocols :
    • Inbound TCP 19998-Alluxio master Default RPC port
    • Inbound TCP 19999-Alluxio master Default web UI port :http://<master-hostname>:19999
    • Inbound TCP 20001-Alluxio job master Default RPC port
    • Inbound TCP 20002-Alluxio job master The default network UI port
    • Embedded Journal requirement
      • Inbound TCP 19200-Alluxio master For the interior leader The default port of the election
      • Inbound TCP 20003-Alluxio job master For the interior leader The default port of the election

Worker requirement

Here's the run Alluxio Worker The configuration required by the cluster node of the process .

  • Minimum 1 GB Hard disk space
  • least 1 GB Memory
  • least 2 individual CPU nucleus
  • Open the following ports and protocols :
    • Inbound TCP 29999-Alluxio worker Default RPC port
    • Inbound TCP 30000-Alluxio worker The default network UI port :http://<worker-hostname>:30000
    • Inbound TCP 30001-Alluxio job worker Default RPC port
    • Inbound TCP 30002-Alluxio job worker Default data port for
    • Inbound TCP 30003-Alluxio job worker The default network UI port :http://<worker-hostname>:30003
Worker Cache

Need to be for Alluxio Workers Configure storage space as cache . By default Alluxio by Worker Provide a RAMFS, But it can be modified to use the... Of other storage volumes . By means of alluxio.worker.tieredstore.level%d.dirs.path Other directories are available in , Users can specify Alluxio Use storage media and directories that are different from the default configuration . For users who want to start with the default assignment , Use any sudo Permission account to run commands ./bin/alluxio-mount.sh SudoMount worker. Note that the above command should be completed after alluxio-site.properties Set in file alluxio.worker.ramdisk.size And put all workers Add to conf/workers Run after file .

$ ./bin/alluxio-mount.sh SudoMount workers

Proxy requirement

Proxy The process provides a REST The client of , need :

  • least 1 GB Memory
  • Open the following ports and protocols :
    • Inbound TCP 39999- clients To access Proxy node .

Fuse requirement

Here is Alluxio For operation fuse Process node requirements

Note that these are run Alluxio Minimum software requirements . Run under large load Alluxio Fuse Will increase system requirements .

  • least 1 individual CPU nucleus
  • least 1 GB Memory
  • already installed Fuse
    • libfuse 2.9.3 Or later ( Apply to Linux)
    • osxfuse 3.7.1 Or later ( Apply to MacOS)

Other requirements

Alluxio You can also summarize logs to a remote server for unified viewing . Here are Logging Server Port and resource requirements for .

Remote Logging Server requirement

Here is Alluxio For operation Remote Logging Server requirement :

  • least 1 GB Hard disk space
  • least 1 GB Memory
  • least 2 individual CPU nucleus
  • Open the following ports and protocols :
    • Inbound TCP 45600 - So that the logger can write logs to the server .

2. The local Alluxio Installation configuration

Use the local file system as the underlying storage .

Download installation package :Try Alluxio in the cloud or download/install where you want it

1) To configure Alluxio

$ tar -xzf alluxio-bin.tar.gz
$ cd alluxio-2.6.2

# Create from template file conf/alluxio-site.properties The configuration file .
$ cp conf/alluxio-site.properties.template conf/alluxio-site.properties
# stay conf/alluxio-site.properties file  
alluxio.master.hostname=localhost
alluxio.worker.ramdisk.size=1GB		# This memory cannot exceed the actual maximum memory of the system 

$ cp alluxio-env.sh.template alluxio-env.sh
# Appoint java route 
JAVA_HOME=/usr/java/jdk1.8.0_301

2) mount RAMFS file system

$ sudo ./bin/alluxio-mount.sh SudoMount

3) format Alluxio file system

Be careful : This step can only be run for the first time Alluxio The system only needs to execute . If the user is in the deployed Alluxio Run the format command on the cluster , Previously saved on the current server Alluxio All data and metadata of the file system will be cleared . however , The underlying data does not change .

$ sudo ./bin/alluxio format

$ ./alluxio validateEnv local		# Check the operating environment 

4) Local boot Alluxio file system

Simply run the following command to start Alluxio file system .

#  If you have not already mounted ramdisk Or re mount ( For example, to change ramdisk size )
$ sudo ./bin/alluxio-start.sh local SudoMount

#  perhaps , If already installed ramdisk
$ sudo ./bin/alluxio-start.sh local

5) verification Alluxio Whether to run

[[email protected] alluxio-2.6.0]# jps
5059 AlluxioProxy
5688 Jps
4377 AlluxioJobMaster
4268 AlluxioMaster
5053 AlluxioWorker
5055 AlluxioJobWorker

In order to confirm Alluxio In operation , You can visit http://localhost:19999 see Alluxio master Operating state , visit http://localhost:30000 see Alluxio worker Operating state .

Tips : If external pass IP visit , Access failure may be a problem with the firewall

image-20211011152258751

image-20211011152332558

Run a more comprehensive system integrity check :

$ ./bin/alluxio runTests

The following command can be executed at any time to close Alluxio:

$ ./bin/alluxio-stop.sh local

6) Use Alluxio Shell

Alluxio shell Contains a variety of and Alluxio Interactive command line operations . If you want to view the list of file system operation commands , function :

$ ./bin/alluxio fs

You can go through ls Command lists Alluxio Files in . For example, list all the files in the root directory :

$ ./bin/alluxio fs ls /

at present Alluxio There are no documents in the .copyFromLocal The command can copy local files to Alluxio in .

$ ./bin/alluxio fs copyFromLocal LICENSE /LICENSE
Copied LICENSE to /LICENSE

List again Alluxio Files in , You can see the just copied LICENSE file :

$ ./bin/alluxio fs ls /
-rw-r--r-- staff  staff     26847 NOT_PERSISTED 01-09-2018 15:24:37:088 100% /LICENSE

The output shows LICENSE The file in Alluxio in , It also contains some other useful information , For example, the size of the file 、 Date created 、 The owner and group of the file and Alluxio The cache percentage of this file in .

cat The command can print the contents of a file .

$ ./bin/alluxio fs cat /LICENSE
                                Apache License
                          Version 2.0, January 2004
                       http://www.apache.org/licenses/

  TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
...

By default ,Alluxio Use the local file system as the underlying file system (UFS). default UFS The path is a ./underFSStorage. We can see UFS The content in :

$ ls ./underFSStorage/

However , This directory does not exist ! This is because Alluxio By default, only data is written to Alluxio Storage space , Instead of writing UFS.

however , We can tell Alluxio The file from Alluxio Space Persistence To UFS.shell command persist It can be done .

$ ./bin/alluxio fs persist /LICENSE
persisted file /LICENSE with size 26847

If we check again now UFS, The document will appear .

$ ls ./underFSStorage
LICENSE

7)[ pluses ] Alluxio Mount function in

For configuration Alluxio And Amazon S3 Interaction , Please be there. conf/alluxio-site.properties In the document Alluxio Configuration add AWS Access information . The following command will update the configuration .

After configuration modification , Synchronize to each node , Restart again

$ echo "aws.accessKeyId=<AWS_ACCESS_KEY_ID>" >> conf/alluxio-site.properties
$ echo "aws.secretKey=<AWS_SECRET_ACCESS_KEY>" >> conf/alluxio-site.properties

You have to **<AWS_ACCESS_KEY_ID> Replace it with your AWS access key id, take <AWS_SECRET_ACCESS_KEY>** Replace it with your AWS secret access key.

Alluxio Unified access to the storage system through the unified namespace . You can read Unified namespace blog and Unified namespace document Get a more detailed explanation .

This feature allows users to mount different storage systems to Alluxio Namespace and through Alluxio Namespaces seamlessly access files across storage systems .

First , We are Alluxio Create a directory as a mount point in .

(base) [[email protected] alluxio-2.6.0]# alluxio fs mkdir /mnt
Successfully created directory /mnt

next , We mount an existing S3 bucket To Alluxio. This guide uses alluxio-quick-startS3 bucket.

$ ./bin/alluxio fs mount --readonly alluxio://localhost:19998/mnt/s3 s3://alluxio-quick-start/data
Mounted s3://alluxio-quick-start/data at alluxio://localhost:19998/mnt/s3

If not previously configured aws Access key , You can specify... When mounting :

./bin/alluxio fs mount --option aws.accessKeyId=<accessKeyId> --option aws.secretKey=<secretKey>  /mnt/s3 s3://data-bucket/

example :

(base) [[email protected] alluxio-2.6.0]# alluxio fs mount --option aws.accessKeyId=******* --option aws.secretKey=*************** /mnt/s3 s3://alluxio-quick-start/data
Mounted s3://alluxio-quick-start/data at /mnt/s3
(base) [[email protected] alluxio-2.6.0]# alluxio fs ls /mnt/s3
-r-x------  song_jie0109   song_jie0109          10077271       PERSISTED 06-21-2016 02:03:30:000   0% /mnt/s3/sample_tweets_10m.csv
-r-x------  song_jie0109   song_jie0109            955610       PERSISTED 06-21-2016 02:03:22:000   0% /mnt/s3/sample_tweets_1m.csv
-r-x------  song_jie0109   song_jie0109             89964       PERSISTED 06-21-2016 02:03:45:000   0% /mnt/s3/sample_tweets_100k.csv
-r-x------  song_jie0109   song_jie0109         157046046       PERSISTED 06-21-2016 02:03:45:000   0% /mnt/s3/sample_tweets_150m.csv

We can go through Alluxio Namespace lists S3 Documents in . Use familiar ls Command lists S3 Mount the files in the directory .

$ ./bin/alluxio fs ls /mnt/s3
-r-x------ staff  staff    955610 PERSISTED 01-09-2018 16:35:00:882   0% /mnt/s3/sample_tweets_1m.csv
-r-x------ staff  staff  10077271 PERSISTED 01-09-2018 16:35:00:910   0% /mnt/s3/sample_tweets_10m.csv
-r-x------ staff  staff     89964 PERSISTED 01-09-2018 16:35:00:972   0% /mnt/s3/sample_tweets_100k.csv
-r-x------ staff  staff 157046046 PERSISTED 01-09-2018 16:35:01:002   0% /mnt/s3/sample_tweets_150m.csv

The newly attached files and directories can also be found in Alluxio web UI see .

adopt Alluxio Unified namespace , You can seamlessly exchange data from different storage systems . for instance , Use ls -R command , You can recursively list all the files in a directory .

$ ./bin/alluxio fs ls -R /
-rw-r--r-- staff  staff     26847 PERSISTED 01-09-2018 15:24:37:088 100% /LICENSE
drwxr-xr-x staff  staff         1 PERSISTED 01-09-2018 16:05:59:547  DIR /mnt
dr-x------ staff  staff         4 PERSISTED 01-09-2018 16:34:55:362  DIR /mnt/s3
-r-x------ staff  staff    955610 PERSISTED 01-09-2018 16:35:00:882   0% /mnt/s3/sample_tweets_1m.csv
-r-x------ staff  staff  10077271 PERSISTED 01-09-2018 16:35:00:910   0% /mnt/s3/sample_tweets_10m.csv
-r-x------ staff  staff     89964 PERSISTED 01-09-2018 16:35:00:972   0% /mnt/s3/sample_tweets_100k.csv
-r-x------ staff  staff 157046046 PERSISTED 01-09-2018 16:35:01:002   0% /mnt/s3/sample_tweets_150m.csv

The output shows Alluxio Under the root directory of the file system, all files from the mounted storage system ./LICENSE The file is in the local file system ,/mnt/s3/ Directory in S3 in .

8)[ pluses ] use Alluxio Accelerate data access

because Alluxio Use memory to store data , It can speed up data access . First , Let's take a look at the previous S3 Mount to Alluxio Status of a file in :

$ ./bin/alluxio fs ls /mnt/s3/sample_tweets_150m.csv
-r-x------ staff  staff 157046046 PERSISTED 01-09-2018 16:35:01:002   0% /mnt/s3/sample_tweets_150m.csv

The output shows the file Not In Memory( Not in memory ). This file is a sample of twitter . Let's count how many tweets mention words “kitten”, And calculate the time consumption of this operation .

$ time ./bin/alluxio fs cat /mnt/s3/sample_tweets_150m.csv | grep -c kitten
889

real	0m22.857s
user	0m7.557s
sys	0m1.181s

It depends on your network connection , This operation may exceed 20 second . If it takes too long to read the file , You can choose a smaller data set . Other files in this directory are a smaller subset of this file . By putting data in memory ,Alluxio It can improve the speed of accessing the data .

Through cat Command to get the file , You can use it. ls Command to view the status of the file :

$ ./bin/alluxio fs ls /mnt/s3/sample_tweets_150m.csv
-r-x------ staff  staff 157046046 PERSISTED 01-09-2018 16:35:01:002 100% /mnt/s3/sample_tweets_150m.csv

The output shows that the file has 100% Be loaded to Alluxio in , In that case , It should be much faster to access the file again .

Now let's count the ownership “puppy” The number of tweets for this word .

$ time ./bin/alluxio fs cat /mnt/s3/sample_tweets_150m.csv | grep -c puppy
1553

real	0m1.917s
user	0m2.306s
sys	0m0.243s

As you can see , Because the data has been stored in Alluxio It's in memory , The subsequent reading of the same file is very fast .

Now let's count how many tweets contain “bunny” The word .

$ time ./bin/alluxio fs cat /mnt/s3/sample_tweets_150m.csv | grep -c bunny
907

real	0m1.983s
user	0m2.362s
sys	0m0.240s

Congratulations ! You installed... Locally Alluxio And through Alluxio Accelerated data access !

close Alluxio

You can use the following command to close Alluxio:

$ ./bin/alluxio-stop.sh local

3. Deployment on Cluster Alluxio

Deploy an Alluxio Cluster with a Single Master - Alluxio v2.6.0 Documentation

3.1 Single master node cluster

The simplest deployment Alluxio On the cluster , Prone to single point of failure .

Premise :

(1) Each node can be accessed without secret

(2) Make sure that all nodes RPC Port open ( Default :19998)

(3) Assign... To the operating user sudo Authority , mount RAMFS The need when .

1) Download decompression
(base) [[email protected] ~]# tar -xvf alluxio-2.6.0-bin.tar.gz
2) Modify the configuration file
(base) [[email protected] ~]# cd alluxio-2.6.0/
(base) [[email protected] alluxio-2.6.0]# cd conf/
(base) [[email protected] conf]# cp alluxio-site.properties.template alluxio-site.properties
(base) [[email protected] conf]# vim alluxio-site.properties

For example, the modified content is as follows :

alluxio.master.hostname=clu00
alluxio.master.mount.table.root.ufs=hdfs://clu00:9090/alluxio

i)hostname Refers to the primary node of the cluster , have access to IP, You can also use domain names ( It is required that all child nodes can access )

ii) alluxio.master.mount.table.root.ufs Appoint Alluxio Mount the root of the storage URL

For example, when HDFS is used as the under storage system, the value of this property can be set to alluxio.master.mount.table.root.ufs=hdfs://1.2.3.4:9000/alluxio/root/

When Amazon S3 is used as the under storage system, the value can be set to alluxio.master.mount.table.root.ufs=s3://bucket/dir/

Tips :master Configuration properties alluxio.master.mount.table.root.ufs Mount the specified directory to Alluxio Namespace root ( finger Alluxio Basic storage space ), This directory represents Alluxio Of ”primary storage”. On this basis , Users can mount API Add and remove ( It refers to mounting multiple underlying storage ).

3) Appoint java Environmental Science
(base) [[email protected] conf]# cp alluxio-env.sh.template alluxio-env.sh
(base) [[email protected] conf]# vim alluxio-env.sh


JAVA_HOME=/usr/java/jdk1.8.0_301

4) Set master node and work node
(base) [[email protected] conf]# vim masters 
clu00
(base) [[email protected] conf]# vim workers
clu01
clu02

These are the minimum configurations required for startup , Other detailed configurations are as follows configuration properties reference

  • You may need to set additional properties to enable Alluxio to access the configured under storage (eg., AWS S3 configuration)
5) Node synchronization configuration
(base) [[email protected] bin]# ./alluxio copyDir /root/alluxio-2.6.0
RSYNC'ing /root/alluxio-2.6.0 to masters... clu00 RSYNC'ing /root/alluxio-2.6.0 to workers...
clu01
clu02
6) format

Before the first start , The primary node should be formatted Alluxio, All metadata information will be deleted , But it will not affect the data stored at the bottom .

(base) [[email protected] alluxio-2.6.0]# ./bin/alluxio formatMasters

Tips : If an error occurs , View the log information and modify it accordingly ,JAVA Environmental issues require env.sh It is specified in

7) start-up
(base) [[email protected] alluxio-2.6.0]# ./bin/alluxio-start.sh all SudoMount
# all  Will start master Nodes and all workers node 
# SudoMount  Parameters will help workers The node is mounted to RamFS On , Only initial startup requires mounting 

......
All tasks finished
-----------------------------------------
Starting to monitor all remote services.
-----------------------------------------
--- [ OK ] The master service @ clu00 is in a healthy state.
--- [ OK ] The job_master service @ clu00 is in a healthy state.
--- [ OK ] The worker service @ clu00 is in a healthy state.
--- [ OK ] The worker service @ clu02 is in a healthy state.
--- [ OK ] The worker service @ clu01 is in a healthy state.
--- [ OK ] The job_worker service @ clu02 is in a healthy state.
--- [ OK ] The job_worker service @ clu01 is in a healthy state.
--- [ OK ] The job_worker service @ clu00 is in a healthy state.
--- [ OK ] The proxy service @ clu01 is in a healthy state.
--- [ OK ] The proxy service @ clu02 is in a healthy state.
--- [ OK ] The proxy service @ clu00 is in a healthy state.


# start-up 
 ./bin/alluxio-start.sh all
# close 
 ./bin/alluxio-stop.sh all
 
$ ./bin/alluxio-start.sh masters # starts all masters in conf/masters
$ ./bin/alluxio-start.sh workers # starts all workers in conf/workers

$ ./bin/alluxio-start.sh master # starts the local master
$ ./bin/alluxio-start.sh worker # starts the local worker
8) verification Alluxio colony

Web access visit http://<alluxio_master_hostname>:19999 see master node

image-20211011170554130

Web access visit http://<alluxio_worker_hostname>:30000 see worker node

image-20211011170728854

#Master I can see AlluxioMaster、AlluxioJobMaster、AlluxioProxy
(base) [[email protected] alluxio-2.6.0]# jps
26578 AlluxioProxy
27190 Jps
15670 NameNode
25515 AlluxioMaster
26014 AlluxioJobMaster


#Worker I can see AlluxioWorker、AlluxioJobWorker、AlluxioProxy
[[email protected] ~]# jps
22657 DataNode
25250 AlluxioWorker
25477 AlluxioJobWorker
26151 Jps
25759 AlluxioProxy

原网站

版权声明
本文为[Air transport Alliance]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202180507300053.html