NVidia Meeting 2015-3-4: Difference between revisions

From XVis
Jump to navigation Jump to search
Line 9: Line 9:


==== How to handle multiple devices with one host? ====
==== How to handle multiple devices with one host? ====
We discussed the VTK-m strategy for supporting multiple devices from one host (i.e., a single node of Summit).
Options presented were:
* one MPI task for each device (i.e., multiple MPI tasks per node)
** minus: may be lots of MPI tasks
** minus: may be incongruent with sim code's usage of MPI
** minus: hard boundaries between devices
** plus: easy to implement
* one MPI task per node, with (for example) threading to manage access to multiple devices
** plus: less MPI tasks
** plus: more likely to be congruent with sim code's usage of MPI
** minus: hard boundaries between devices (??)
** plus/minus: implementation easier?  (depends on details)
* one MPI task per node, devices are knowledgable of other devices and can coordinate between each other
** plus: less MPI tasks
** plus: more likely to be congruent with sim code's usage of MPI
** plus: no boundaries between devices
** minus: big implementation, right?


==== What use case are we optimizing for? ====
==== What use case are we optimizing for? ====

Revision as of 15:58, 4 March 2015

Agenda

  • Weds, March 4th, 1-5pm: VTK-m design review (Ken Moreland)
  • Thurs, March 5th, 8am-noon: updates from NVIDIA

Design review

Issues raised in design review

How to handle multiple devices with one host?

We discussed the VTK-m strategy for supporting multiple devices from one host (i.e., a single node of Summit). Options presented were:

  • one MPI task for each device (i.e., multiple MPI tasks per node)
    • minus: may be lots of MPI tasks
    • minus: may be incongruent with sim code's usage of MPI
    • minus: hard boundaries between devices
    • plus: easy to implement
  • one MPI task per node, with (for example) threading to manage access to multiple devices
    • plus: less MPI tasks
    • plus: more likely to be congruent with sim code's usage of MPI
    • minus: hard boundaries between devices (??)
    • plus/minus: implementation easier? (depends on details)
  • one MPI task per node, devices are knowledgable of other devices and can coordinate between each other
    • plus: less MPI tasks
    • plus: more likely to be congruent with sim code's usage of MPI
    • plus: no boundaries between devices
    • minus: big implementation, right?

What use case are we optimizing for?