当前位置:网站首页>Viewing the task arrangement ability of monorepo tool from turborepo

Viewing the task arrangement ability of monorepo tool from turborepo

2022-06-26 17:03:00 Haiqiu

Most of the pictures in this article are from the Internet

monorepo

Preface

2021 year 12 month 9 Number ,Vercel The official blog of www. zidir. Com published an article entitled Vercel acquires Turborepo to accelerate build speed and improve developer experience 's blog post , As its title says ,Vercel Acquired Turborepo, To speed up the build and improve the development experience .

Vercel+TURBONREPO

Turborepo Is a JavaScript and TypeScript High performance building system of code base . Build... Incrementally 、 Intelligent remote caching and optimized task scheduling ,Turborepo You can speed up the build 85% Or more , Enable teams of all sizes to maintain a fast and effective build system , The system can be expanded as the code base and team grow .

The blog post has been concise and to the point Turborepo The advantages of , This article will start from the existing actual scenarios , Talk about large code warehouses (Monorepo) May encounter some problems , Combined with the existing solutions in the industry , have a look Turborepo What innovations and breakthroughs have been made in task arrangement .

One is qualified Monorepo Self cultivation

With the development of the business and the change of the team , Business type Monorepo The number of projects in will gradually increase , An extreme example is Google Put the entire company's code in one repository , The size of the warehouse has reached 80TB.

Business type Monorepo: differ lib type Monorepo(React、Vue3、Next.js as well as Babel And so on packages), Business type Monorepo Apply multiple services App And the public component libraries or tool libraries that it depends on are organized into a warehouse . ——《Eden Monorepo series : elementary analysis Eden Monorepo Engineering construction 》

The increase in the number of projects means that you are enjoying Monorepo Advantages at the same time , It also brings great challenges , first-class Monorepo Tools allow developers to enjoy without burden Monorepo The advantages of , But not easy to use Monorepo Tools can make developers miserable , It even makes people doubt Monorepo Meaning of existence .

List some of the actual scenarios encountered by the author :

  1. Dependency version conflict

    1. Create a new project , The project cannot be started due to dependency problems
    2. Create a new project , Other projects cannot be started due to dependency problems
  2. Dependency installation is slow

    1. Initialize installation dependencies 20min+
    2. Add a dependency 3min+
  3. build/test/lint Wait for the task to execute slowly

The author has previously Rush Landing experience , In the process of practice , Found that in addition to the most basic code sharing capabilities , You should also have at least three abilities , namely :

  1. Rely on management ability . As the number of dependencies increases , Can still maintain the correctness of the dependency structure 、 Stability and installation efficiency .
  2. Task scheduling ability . Be able to execute with maximum efficiency and in the correct order Monorepo Tasks within the project ( It can be narrowly understood as npm scripts, Such as build、test as well as lint etc. ), And the complexity will not increase with Monorepo Increase with the increase of internal projects .
  3. Version release capability . Can be based on changed items , Combined with project dependencies , Change the version number correctly 、CHANGELOG Build and project release .

The supporting capabilities of some popular tools are shown in the following table :

- Dependency management Task arrangement version management
Pnpm Workspace
Rush(by Pnpm)
Lage
Turborepo
Lerna
  1. Pnpm:Pnpm Have certain ability of task arrangement (--filter Parameters ), Therefore, it is also listed here , At the same time as Package Manager, It is even larger Monorepo An integral part of .
  2. Rush: Open source extensibility by Microsoft Monorepo Management plan , built-in PNPM And classes Changesets Contracting scheme , Its plug-in mechanism is a highlight , Make use of Rush The built-in capability makes it extremely convenient to implement custom functions , Stepped out Rush The first step of plug-in ecosystem .
  3. Lage : Also open source by Microsoft , Personally think that yes Turborepo The forerunner of ,Turborepo yes Lage Of Go Language version .Lage Claiming to be "Monorepo Task Runner", Compare with Turborepo Of "High-Performance Build System" A lot more introverted ,Star The number is also an order of magnitude different (Lage 300+, and Turborepo 5k+), See this... For more information PR. In the following Lage Equate to Turborepo.
  4. Lerna: Maintenance has been stopped , Therefore, it will not be included in the subsequent discussion .

Dependency management is too low-level , Version control is simple and mature , It is difficult to make breakthroughs in these two abilities , In practice, they are basically combined Pnpm as well as Changesets Complete the overall capability , Even specialize in a little , Task arrangement , That is to say Lage as well as Turborepo The swimmer .

Changesets

How to choose your own Monorepo Tool chain ?

  1. Pnpm Workspace + Changesets: The cost is low , Meet most scenarios
  2. Pnpm Workspace + Changesets + Turborepo/Lage: stay 1 Enhance the ability of task arrangement on the basis of
  3. Rush: Consider comprehensive , Extensibility is strong

Task scheduling can be divided into three steps , Each tool supports the following :

Scoping Parallel execution Cloud cache
Pnpm
Rush
Turborepo/Lage

Scoping : Perform subset tasks on demand

Filtering/Scoping/Selecting subsets of projects

This capability has rich usage scenarios in daily development .

For example, the warehouse is pulled for the first time , Start project app1 Need to build Monorepo Inside app1 Pre dependency of package1 as well as package2.

And in the SCM Package the project app1 when , Need to build app1 Self and Monorepo Inside app1 Pre dependency of package1 as well as package2.

At this point, you should filter out the projects that need to be built according to your needs , Instead, you should not introduce project builds that are not related to your current intentions .

In different Monorepo In the tools , This behavior has different names :

  1. Rush Referred to as Selecting subsets of projects, Select project subset , In this example, the following command should be used :
//  Local boot  app1  Development mode ,app1  Is the top of the dependency graph , But no need to build  app1  Oneself 
$ rush build --to-except @monorepo/app1

// SCM  pack  app1,app1  Is the top of the dependency graph , And need to build  @monorepo/app1  Oneself 
$ rush build --to @monorepo/app1
  1. Pnpm Referred to as Filtering, The filter , Restrict commands to specific subsets of packages , In this example, the following command should be used :
//  Local boot  app1  Development mode ,app1  Is the top of the dependency graph , But no need to build  app1  Oneself 
$ pnpm build --filter @monorepo/app1^...

// SCM  pack  app1,app1  Is the top of the dependency graph , And need to build  @monorepo/app1  Oneself 
$ pnpm build --filter @monorepo/app1...
  1. Turborepo/Lage Referred to as Scoped Tasks, But at the moment, (2022/02/13) This ability is too limited ,Vercel The team is designing a set of Pnpm Basically consistent filter grammar , For details, see RFC: New Task Filtering Syntax

Scoping ensures that the number of tasks performed does not increase with Monorepo Increase with the increase of unrelated items , Rich parameters can help us in various scenarios (package Contract awarding 、app Build and CI Mission ) Go ahead selecting/filtering/scoping.

For example, it was modified package5, stay Merge Request Of CI The environment needs to be ensured package5 And dependence package5 Your project will not fail to build because of this modification , You can use the following command :

//  Use  Rush
$ rush build --to @monorepo/package5 --from @monorepo/package5

//  Use  Pnpm
$ pnpm build --filter ...@monorepo/package5...

In this example, you will eventually pick out package5 as well as app3 Build , Thus in CI It meets the minimum requirements of the combined code —— Does not affect other project builds .

Based on... For all projects in the workspace package.json file , You can easily get the specific dependencies between projects , Every project Project Are aware of their upstream projects Dependents And its downstream dependencies Dependencies, Cooperate with the parameters passed in by the developer , Thus, it is convenient to select subset items .

Parallel execution : Fully release the machine performance

Local task orchestration

Suppose you pick out 20 Subset task , How to perform this 20 A task to ensure correctness and efficiency ?

Project There is a dependency between , There are also dependencies between tasks , With build Task as an example , Only the pre dependency is built , To build the current project .

There is a popular online test to control the maximum concurrent number , The general meaning of the question is : Given m individual url, The maximum number of parallel requests per time is n, Please implement the code to guarantee the maximum number of requests .

max-request-count

The idea of this problem is similar to the parallel execution of tasks in task arrangement , Only in the interview questions url There are no dependencies , There is a topological order between tasks , That's all the difference .

Then the idea of task execution is ready to come out :

  1. The initial executable task must be a task without any predecessor

    • Its Dependencies The number of 0
  2. After a task is completed , Find the next executable task from the task queue , And immediately

    • After a task is completed , Need to update its Dependents Of Dependencies Number , Remove the current task from it (Dependencies Number -1)
    • Whether a task is executable , Depends on its Dependencies Whether all the execution is completed (Dependencies The number of 0)

This article does not cover the code level , The specific implementation is visible Monorepo Task scheduling mechanism in One article , At the code level, the topological order parallel execution of tasks is realized .

Breaking mission boundaries

turborepo-lerna

This picture is from Turborepo: Pipelining Package Tasks

When we talked about task execution , All under the same task , such as build、lint or test, In parallel execution build When the task , I won't think about it lint or test Mission . Pictured above Lerna The area is shown in , Perform four tasks in turn , Each task is blocked by the previous one , Even if the internal execution is parallel , But there is still a waste of resources between different tasks .

Lage/Turborepo It provides a set of methods for developers to clarify the task relationship ( see turbo.json), Based on this relationship ,Lage/Turborepo You can schedule and optimize different kinds of tasks .

Compared with only one task at a time , The overlapping waterfall task execution efficiency is certainly much higher .

turbo.json

{
  "$schema": "https://turborepo.org/schema.json",
  "pipeline": {
    "build": {
      //  After the dependency build command is completed , Build 
      "dependsOn": ["^build"]
    },
    "test": {
      //  After the self build command is completed , To test ( So there is an error in the above figure )
      "dependsOn": ["build"]
    },
    "deploy": {
      //  Oneself  lint  After the build test command is completed , Deployment 
      "dependsOn": ["build", "test", "lint"]
    },
    //  You can start at any time  lint
    "lint": {}
  }
}

Arrange the sequence correctly

fix-turbo-pipeline

Rush stay 20 year 3 Month and 10 There was also a discussion on the relevant design in June , And in 21 Similar features are supported at the end of the year , Specifically PR Can refer to [[rush] Add support for phased commands. #3113](https://github.com/microsoft/...)

Cloud cache : Reuse cache across multiple environments

Distributed computation caching

Rush Have Incremental build Characteristics of , send rush build Ability to skip since last build Input file (input files) Projects that have not changed , Cooperate with third-party storage services , It can achieve the effect of reusing cache across multiple environments .

Rush stay 5.57.0 Version introduced Plug-in mechanism , Further, it supports the third-party remote cache capability ( Prior to this, only azure And amazon), It gives developers the ability to build caching solutions based on enterprise internal services .

Landing in daily development scenarios , Local development 、CI as well as SCM All development links can benefit from it .

As mentioned above , stay CI The link construction modification project and its upstream and downstream projects can ensure to a certain extent Merge Request The quality of the .

Build changed projects 1

As shown in the figure above , The existing scenario has been modified package0 Code for , To ensure that its upstream and downstream structures are not affected , It's in CI Build Changed Projects Stage , Will execute the following commands :

$ rush build --to package0 --from package0
be based on git diff Pick out the source file changes projects, Here is package0

By definition ,package0 And upstream app1 Will be incorporated into the build process , because app1 Need to build , As its pre dependency ,package1 to package5 It also needs to be built , But this 5 individual package Actually with package0 There is no dependency , There are no changes , Just to complete app1 Build preparation for .

If dependencies get complicated , For example, a basic package is referenced by multiple applications , So it's similar to package1-package5 There will be a lot more work to prepare and build , Lead to this stage CI Very slow .

Number of actual build projects = Number of downstream projects of the modified project + The number of upstream items of the changed item + Change the number of downstream project pre dependencies of the project + Change the number of pre dependencies of the upstream project

Build changed projects 2

because package1-package5 etc. 5 Projects and package0 There is no direct or indirect dependency , And the input file has not changed , Therefore, it can hit the cache ( if there be ), Skip build behavior .

In this way, the build scope is changed from 7 individual project Down to 2 individual project.

Number of actual build projects = Number of downstream projects of the modified project + The number of upstream items of the changed item

How to determine whether the cache is hit ?

Detecting affected projects/packages

In the cloud , The cache compressed package of each project construction result and its input file input files Calculated cacheId Formation mapping , The input file has not changed , It's calculated cacheId The value doesn't change ( Content hash ), You can hit the corresponding cloud cache .

The input file contains the following :

  1. Project code source file
  2. project NPM rely on
  3. Other items on which the project depends Monorepo Of internal projects cacheId

If you are interested in implementation , You can see @rushstack/package-deps-hash.

Conclusion

In the process of writing this article, the author also thought of @sorrycc stay GMTC Shared 《 The systematic idea of accelerating front-end construction 》 The three magic weapons of accelerating construction mentioned in :

  1. Delays in processing . Request based on-demand compilation 、 Delayed compilation sourcemap
  2. cache .Vite Optmize、Webpack5 Physical cache 、Babel cache
  3. Native Code.SWC、ESBuild

As a task scheduling tool ,Native Code The advantages are not obvious ( although Turborepo Use Go Language writing , but Lage The author thinks that under the present scale , The efficiency bottleneck of task choreography is not in the choreography tool itself ), But deferred processing and caching are similar .

Finally, use concise and pragmatic Lage The subtitle of the official website is the theme of this article 「 Task arrangement 」 Ending :

Run all your npm scripts in topological order incrementally with cloud cache - @microsoft/lage

With cloud caching , Run all your... Incrementally according to the topology npm scripts.

Reference resources

原网站

版权声明
本文为[Haiqiu]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202162335349353.html