当前位置:网站首页>Implementation of multi GPU distributed training with horovod in Amazon sagemaker pipeline mode
Implementation of multi GPU distributed training with horovod in Amazon sagemaker pipeline mode
2020-11-07 20:15:00 【InfoQ】
At present , We can use a variety of techniques to train deep learning models with a small amount of data , It includes transfer learning for image classification tasks 、 Small sample learning and even one-time learning , It can also be based on pre training BERT or GPT2 Models fine tune language models . however , In some application cases, we still need to introduce a lot of training data . for example , If the current image and ImageNet The images in the dataset are completely different , Or is the current language corpus only for specific areas 、 It's not a generic type , So it's very difficult for transfer learning to bring about the ideal model performance . As a deep learning researcher , You may need to try new ideas or approaches from scratch . under these circumstances , We have to use large datasets to train large deep learning models ; Without finding the best way to train , The whole process can take a few days 、 Weeks, even months .
In this paper , We'll learn how to do it together Amazon SageMaker Run many on a single instance of GPU Training , And discuss how to do it in Amazon SageMaker On the implementation of more efficient GPU And multi node distributed training .
Link to the original text :【https://www.infoq.cn/article/0867pYEmzviBfvZxW37k】. Without the permission of the author , Prohibited reproduced .
版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
边栏推荐
- If you want to forget the WiFi network you used to connect to your Mac, try this!
- 11. Service update
- Facebook开源框架如何简化 PyTorch 实验
- [note] error while loading pyv8 binary: exit code 1 solution
- Kubernetes服务类型浅析:从概念到实践
- 是时候结束 BERTology了
- 技术债务是对业务功能缺乏真正的理解 -daverupert.com
- Git代码提交操作,以及git push提示failed to push some refs'XXX'
- graph generation model
- Why do we need software engineering -- looking at a simple project
猜你喜欢
【笔记】Error while loading PyV8 binary: exit code 1解决方法
Andque.
The official 1909 version of win10 cannot open the real-time protection solution of virus and threat protection in windows security center.
深入浅出大前端框架Angular6实战教程(Angular6、node.js、keystonejs、
What kind of technical ability should a programmer who has worked for 1-3 years? How to improve?
我们为什么需要软件工程——从一个简单的项目进行观察
Mate 40 series launch with Huawei sports health service to bring healthy digital life
DOM节点操作
Ac86u KX Online
从技术谈到管理,把系统优化的技术用到企业管理
随机推荐
Kylin on Kubernetes 在 eBay 的实践
How to learn technology efficiently
Using rabbitmq to implement distributed transaction
使用 Xunit.DependencyInjection 改造测试项目
Solution to st link USB communication error in stlink Download
全网最硬核讲解计算机启动流程
Design pattern of facade and mediator
Business Facade 与 Business Rule
What kind of technical ability should a programmer who has worked for 1-3 years? How to improve?
DOM节点操作
vue踩坑:axios使用this指针
Vue: Axios uses this pointer
利用线程通信、解决缓存穿透数据库雪崩
CI / CD of gitlab continuous integrated development environment
Kubernetes (1): introduction to kubernetes
What should be considered in the promotion plan outside the station?
聊聊先享後付
一种超参数优化技术-Hyperopt
In simple terms, the large front-end framework angular6 practical course (angular6 node.js 、keystonejs、
Reflection on a case of bus card being stolen and swiped