NVidia Meeting 2015-3-4: Difference between revisions
Jump to navigation
Jump to search
Line 22: | Line 22: | ||
** minus: hard boundaries between devices (??) | ** minus: hard boundaries between devices (??) | ||
** plus/minus: implementation easier? (depends on details) | ** plus/minus: implementation easier? (depends on details) | ||
* one MPI tasks per node, devices are treated as one giant device | |||
** plus: less MPI tasks | |||
** minus: could lend itself to inefficient patterns (reaching across device memories) | |||
* one MPI task per node, devices are knowledgable of other devices and can coordinate between each other | * one MPI task per node, devices are knowledgable of other devices and can coordinate between each other | ||
** plus: less MPI tasks | ** plus: less MPI tasks |
Revision as of 16:00, 4 March 2015
Agenda
- Weds, March 4th, 1-5pm: VTK-m design review (Ken Moreland)
- Thurs, March 5th, 8am-noon: updates from NVIDIA
Design review
Issues raised in design review
How to handle multiple devices with one host?
We discussed the VTK-m strategy for supporting multiple devices from one host (i.e., a single node of Summit). Options presented were:
- one MPI task for each device (i.e., multiple MPI tasks per node)
- minus: may be lots of MPI tasks
- minus: may be incongruent with sim code's usage of MPI
- minus: hard boundaries between devices
- plus: easy to implement
- one MPI task per node, with (for example) threading to manage access to multiple devices
- plus: less MPI tasks
- plus: more likely to be congruent with sim code's usage of MPI
- minus: hard boundaries between devices (??)
- plus/minus: implementation easier? (depends on details)
- one MPI tasks per node, devices are treated as one giant device
- plus: less MPI tasks
- minus: could lend itself to inefficient patterns (reaching across device memories)
- one MPI task per node, devices are knowledgable of other devices and can coordinate between each other
- plus: less MPI tasks
- plus: more likely to be congruent with sim code's usage of MPI
- plus: no boundaries between devices
- minus: big implementation, right?