NVidia Meeting 2015-3-4: Difference between revisions
Jump to navigation
Jump to search
Line 9: | Line 9: | ||
==== How to handle multiple devices with one host? ==== | ==== How to handle multiple devices with one host? ==== | ||
We discussed the VTK-m strategy for supporting multiple devices from one host (i.e., a single node of Summit). | |||
Options presented were: | |||
* one MPI task for each device (i.e., multiple MPI tasks per node) | |||
** minus: may be lots of MPI tasks | |||
** minus: may be incongruent with sim code's usage of MPI | |||
** minus: hard boundaries between devices | |||
** plus: easy to implement | |||
* one MPI task per node, with (for example) threading to manage access to multiple devices | |||
** plus: less MPI tasks | |||
** plus: more likely to be congruent with sim code's usage of MPI | |||
** minus: hard boundaries between devices (??) | |||
** plus/minus: implementation easier? (depends on details) | |||
* one MPI task per node, devices are knowledgable of other devices and can coordinate between each other | |||
** plus: less MPI tasks | |||
** plus: more likely to be congruent with sim code's usage of MPI | |||
** plus: no boundaries between devices | |||
** minus: big implementation, right? | |||
==== What use case are we optimizing for? ==== | ==== What use case are we optimizing for? ==== |
Revision as of 15:58, 4 March 2015
Agenda
- Weds, March 4th, 1-5pm: VTK-m design review (Ken Moreland)
- Thurs, March 5th, 8am-noon: updates from NVIDIA
Design review
Issues raised in design review
How to handle multiple devices with one host?
We discussed the VTK-m strategy for supporting multiple devices from one host (i.e., a single node of Summit). Options presented were:
- one MPI task for each device (i.e., multiple MPI tasks per node)
- minus: may be lots of MPI tasks
- minus: may be incongruent with sim code's usage of MPI
- minus: hard boundaries between devices
- plus: easy to implement
- one MPI task per node, with (for example) threading to manage access to multiple devices
- plus: less MPI tasks
- plus: more likely to be congruent with sim code's usage of MPI
- minus: hard boundaries between devices (??)
- plus/minus: implementation easier? (depends on details)
- one MPI task per node, devices are knowledgable of other devices and can coordinate between each other
- plus: less MPI tasks
- plus: more likely to be congruent with sim code's usage of MPI
- plus: no boundaries between devices
- minus: big implementation, right?