-
Notifications
You must be signed in to change notification settings - Fork 5.6k
2018 03 21
PR:
- C++ Multi GPU Executor: https://github.com/PaddlePaddle/Paddle/pull/9035
Review
- [Speed] Refine parallel_do_grad: https://github.com/PaddlePaddle/Paddle/pull/9299
- Shrink batch_norm_grad's inputs: https://github.com/PaddlePaddle/Paddle/pull/9072
-
API doc problems (nn and ops)
-
PR review
- Reuduce memory copy when communication between trainer and pserver
-
EDL
- DeepSpeech2 on Sys Kubernetes Cluster
- Distribute with CPU and EDL, done
- Distribute with GPU, debugging
- DeepSpeech2 on Sys Kubernetes Cluster
-
GPUDriect RDMA
- Nvidia/gdrcopy research
- module install and deploy
- test sample structuring
-
PR [BUG FIX]:
-
PR [DOC]
-
Review
-
ISSUE [USER]
- Simple baseline parallel executor and debugging with YangYang
- Reviews:
- Distributed lookup table
- Distributed training optimizations
- EDL
- 2018 milestone: https://github.com/PaddlePaddle/Paddle/issues/9108#issuecomment-373481203
- CI
- [Speed] scope.Findvar maybe slow down all the operators: https://github.com/PaddlePaddle/Paddle/issues/9232#event-1530105632
- FP16
- Help AMD developers on integrating AMD GPU build and test onto CI (WIP)
- fix AttributeError: 'module' object has no attribute 'framework_pb2': https://github.com/PaddlePaddle/Paddle/pull/9128
- [WIP]fuse batch norm
- code review:
- MKLDNN:
- Implementation of MKLDNN LRN: https://github.com/PaddlePaddle/Paddle/pull/9123
- MKLDNN Relu Tanh Sqrt Abs activations added: https://github.com/PaddlePaddle/Paddle/pull/9081
- [Merge] Softmax MKLDNN FLUID operator: https://github.com/PaddlePaddle/Paddle/pull/9214
- [AMD] Demostration of cmake refine for HIP support: https://github.com/PaddlePaddle/Paddle/pull/9165
- doc:
- add math_function to selected_rows_functor and softmax dependency list:
- MKLDNN:
Memory optimization:
Model | no optimize | forward memory | reuse memory | release memory | recomputation and release memory |
---|---|---|---|---|---|
Resnet | 169791488 | 77500416 | 92602368 | 78020608 | 59027456 |
- [Memory][WIP] Recompute policy in memory optimization
Code Review:
- [WIP] Turn on cmaka flag
WITH_DISTRIBUTE
on CI - [WIP] Parallel send gradients and execution of backward ops
- PR Review:
- [WIP] C++ ParallelExecutor
- Several enhancements
- FP16 inference
- Can successfully run vgg16 inference on CIFAR10 in float16 mode
- Add float16 support for cudnn conv2d: https://github.com/PaddlePaddle/Paddle/pull/9143
- Add float16 support for cast op: https://github.com/PaddlePaddle/Paddle/pull/9148
- Add float16 support for pool 2d operator: https://github.com/PaddlePaddle/Paddle/pull/9167
- Add float16 support to batch norm operator: https://github.com/PaddlePaddle/Paddle/pull/9176
- Add float16 support to dropout operator: https://github.com/PaddlePaddle/Paddle/pull/9223
- Add float16 support to Elementwise Add op: https://github.com/PaddlePaddle/Paddle/pull/9231
- Add float16 support to relu op: https://github.com/PaddlePaddle/Paddle/pull/9267
- Add float16 support to cudnn softmax kernel: https://github.com/PaddlePaddle/Paddle/pull/9269
- Review
- Update the RNN Search Model for multi-gpu.
- Refine the input of sequence_softmax in machine_translation.py.
- fluid dist train perf enhancements:
- pserver runs parrallel https://github.com/PaddlePaddle/Paddle/pull/9154
- prepare before run https://github.com/PaddlePaddle/Paddle/pull/9287
- review: https://github.com/PaddlePaddle/Paddle/pull/9271
- Bug: https://github.com/PaddlePaddle/Paddle/issues/9284
- remote lookup table discussions: https://github.com/PaddlePaddle/Paddle/issues/9211
- [Merged] Enhance LoDResetOp
https://github.com/PaddlePaddle/Paddle/pull/9204 - [Merged] Enhance SequenceExpandOp
https://github.com/PaddlePaddle/Paddle/pull/9100 - [WIP] Evaluate BLEU for RNN Search model
- Survey for ONNX
https://github.com/PaddlePaddle/Paddle/pull/9296
- Multi-threaded C++ Reader:
- Multi-pass C++ Reader:
- Bugs fix:
- C++ Readers: https://github.com/PaddlePaddle/Paddle/pull/9178
- RecordIO: https://github.com/PaddlePaddle/Paddle/pull/9240
- [WIP] C++ Reader profile
- SSD on Fluid:
- Fix a critical bug in softmax_with_cross_entropy_op.
- MobileNet-SSD on single GPU:
- mAP: Fluid 71.78% with part data argumentaion Vs Caffe 72.7% with all data argumentation.
- Expose RMSProp optimizer.
- Parallel training for MobileNet-SSD.
- Temporarily fix bug for backward tanspiler when using parallel_do operator.
- Delete the detection_output_op, which had been split into several operators.
- Other:
- Fix bug in LRN operator.
- Always synchronize when copy data on GPU from C++ to Numpy array.
- Problem of BatchNorm in Fluid.
- [WIP] Profiling parallel_do op.
-
Merge model average optimizer
- https://github.com/PaddlePaddle/Paddle/pull/9082
- validation on ctc model: https://github.com/PaddlePaddle/Paddle/issues/9172
-
Add python wrapper for Adadelta optimizer
-
Add model average option for OCR CTC model
-
Add parallel option for OCR CTC training.
-
Review:
- NMT:
- Refine and fix inference program for Transformer(Merged).
- Refine the ReshapeOp enhancement.
- Evaluate Transformer on WMT16.
- (Dict Size 3000) BLEU vs. PyTorch: 28.57 vs. 28.10
- (Dict Size 10000) BLEU: 30.8
- Review:
-
PR
-
Issues
- Inference Framework
- Add relu cudnn kernel
- Add an argument in Executor.Run to allow users to choose whether to create and destroy variables every time
- Enable the test of not creating variables every time
- Add multi-thread inference example which shares the inference_program and parameters
- Review
- add MKL for fluid static and shared library: https://github.com/PaddlePaddle/Paddle/pull/8887
-
Fluid support Abacus(discuss with @wuyi @yanxu @helin @wangyi @lidong)
-
Overall Plan:
- two ways:https://github.com/PaddlePaddle/Paddle/pull/9075
- Abacus kv-store migration by @LiDong, we provide technical support.
- Fluid will provide an open source implementation.
- two ways:https://github.com/PaddlePaddle/Paddle/pull/9075
-
Fluid implementation:
- Add design doc for lookup remote table https://github.com/PaddlePaddle/Paddle/pull/9068
- Add table design doc https://github.com/PaddlePaddle/Paddle/pull/9210
- model
asq_and_ubmq
check, we only need to add distribute lookup table. - TODO: https://github.com/PaddlePaddle/Paddle/issues/9211
-
Others:
- Discuss the way to speed up the download of boost.
-
Review
- Fix a program copy regression. https://github.com/PaddlePaddle/Paddle/pull/9141
- Add float16 support to batch norm operator https://github.com/PaddlePaddle/Paddle/pull/9176
- Shrink batch_norm_grad's inputs https://github.com/PaddlePaddle/Paddle/pull/9299
- Refactor and wrap decoder for DeepAsr model
- Train model on aishell dataset (public, Madarin, 178h)
- [WIP] Add design doc for onnx convertor
Code Review:
- https://github.com/PaddlePaddle/Paddle/pull/9008
- https://github.com/PaddlePaddle/models/pull/749
- https://github.com/PaddlePaddle/DeepSpeech/pull/185
- [Speed] accelerate 6x sequence expand/grad op, time cost per minibatch 0.9 -> 0.17
- [Speed] acclerate sequence pooling op, speed up the sequence average, sum, chip
- [NLP/lego] support sequence tagging model. fix bug in dropout
- [NLP/lego] support seq2seq model, fix bug in gpu mode
- [NLP/lego]
- Fix a regression, restore 10% on resnext
- Enable P2P memory copy, reduce 4.0 to 3.8 on K40
- Add memory caching to transformer, reduce 25% overhead
- Add multi-gpu to transfomer, 4 device speedup 1.33
- Review some NLP codes, such as beam search, expand sequence
- Follow up with QA team
- Follow up with performance test machines
- Collect information for the risk of 5.1 Paddle Cloud Launch
-
Model CE
-
Conv Sequence to sequence
-
VisualDL
- PR
- [WIP] Add pinned memory
- Add more times close test
- Fix v2_pooling doc
- Review
Bug fix:
- Set WITH_DOC=${WITH_DOC:-OFF} in paddle/scripts/docker/build.sh and remove paddle/scripts/tools/build_doc.sh
Doc:
-
Build basic sphinx doctree for doc/fluid
-
Adjust some contents in write_docs_en.rst for Contribute Documentation
-
Add contents for manully build documentation(cn version)
- PR
- Fluid channels should match the semantics of Go Channels https://github.com/PaddlePaddle/Paddle/pull/9265
- Add default value of keyword argument to DocString in channel_send https://github.com/PaddlePaddle/Paddle/pull/9262
- Support copy in Fluid channels https://github.com/PaddlePaddle/Paddle/pull/9138
- Fix unused variable build error https://github.com/PaddlePaddle/Paddle/pull/9140
- Add modification to Channel to support Select OP https://github.com/PaddlePaddle/Paddle/pull/9084
- Issues
- Move all Concurrency operators to paddle/fluid/operators/concurrency https://github.com/PaddlePaddle/Paddle/issues/9086
- Channel Destroy should inform Select callback of destruction. https://github.com/PaddlePaddle/Paddle/issues/9087
- Reviews
- https://github.com/PaddlePaddle/Paddle/pull/9130#pullrequestreview-104339031
- https://github.com/PaddlePaddle/Paddle/pull/9132#pullrequestreview-104399634
- https://github.com/PaddlePaddle/Paddle/pull/9136#pullrequestreview-104399519
- https://github.com/PaddlePaddle/Paddle/pull/9147#pullrequestreview-105054759
- https://github.com/PaddlePaddle/Paddle/pull/9157#pullrequestreview-105427071
- https://github.com/PaddlePaddle/Paddle/pull/9215#pullrequestreview-105054058
- https://github.com/PaddlePaddle/Paddle/pull/9280#pullrequestreview-105991130
- https://github.com/PaddlePaddle/Paddle/pull/9286#pullrequestreview-105988048
Plan 0.0.3 release goals https://github.com/PaddlePaddle/VisualDL/projects/5
-
PR
- Provide manual Save feature: https://github.com/PaddlePaddle/VisualDL/pull/332
- Adjust sync cycle: https://github.com/PaddlePaddle/VisualDL/pull/328
- Fix the issue where Scalar always misses the last record: https://github.com/PaddlePaddle/VisualDL/pull/331
-
Issue
Plan 0.0.3 release goals https://github.com/PaddlePaddle/VisualDL/projects/5
- In progress: Interactive ONNX Graphs
- Back from Beijing to Sunnyvale and rest for 1 day
- PR
- Fix issue of Paddle API documentation not updating on website (https://github.com/PaddlePaddle/PaddlePaddle.org/pull/443)
- Create select_op design document (https://github.com/PaddlePaddle/Paddle/pull/9139)
- Reviews
- Other
- Svail AWS account maintenance
- Concurrency reorganizing and subsequent build failure on thread debugging with Thuan - https://github.com/PaddlePaddle/Paddle/pull/9136
- New concurrency test - https://github.com/PaddlePaddle/Paddle/pull/9132
- Document planning and review: https://github.com/PaddlePaddle/Paddle/pull/9139
- Assist with VDL user study for audio component
(PTO for most of the week)
- Following up with TensorRT errors with engineers from Nvidia
- PR review: https://github.com/PaddlePaddle/Paddle/pull/9280