NVidia Meeting 2015-3-4: Difference between revisions

From XVis
Jump to navigation Jump to search
Line 32: Line 32:


==== What use case are we optimizing for? ====
==== What use case are we optimizing for? ====
We discussed what use case we were optimizing for.  It was observed that optimizing for one may be in tension with the other.
* All data located on device, memory never (rarely) transferred from device to host.
** This would be consistent with in situ usage.
* Data located on host (with big memory) and is streamed to the device (which has small memory)
** This is consistent with post-processing usage (at least this could be argued).

Revision as of 16:10, 4 March 2015

Agenda

  • Weds, March 4th, 1-5pm: VTK-m design review (Ken Moreland)
  • Thurs, March 5th, 8am-noon: updates from NVIDIA

Design review

Issues raised in design review

How to handle multiple devices with one host?

We discussed the VTK-m strategy for supporting multiple devices from one host (i.e., a single node of Summit). Options presented were:

  • one MPI task for each device (i.e., multiple MPI tasks per node)
    • minus: may be lots of MPI tasks
    • minus: may be incongruent with sim code's usage of MPI
    • minus: hard boundaries between devices
    • plus: easy to implement
  • one MPI task per node, with (for example) threading to manage access to multiple devices
    • plus: less MPI tasks
    • plus: more likely to be congruent with sim code's usage of MPI
    • minus: hard boundaries between devices (??)
    • plus/minus: implementation easier? (depends on details)
  • one MPI tasks per node, devices are treated as one giant device
    • plus: less MPI tasks
    • minus: could lend itself to inefficient patterns (reaching across device memories)
  • one MPI task per node, devices are knowledgable of other devices and can coordinate between each other
    • plus: less MPI tasks
    • plus: more likely to be congruent with sim code's usage of MPI
    • plus: no boundaries between devices
    • minus: big implementation, right?

What use case are we optimizing for?

We discussed what use case we were optimizing for. It was observed that optimizing for one may be in tension with the other.

  • All data located on device, memory never (rarely) transferred from device to host.
    • This would be consistent with in situ usage.
  • Data located on host (with big memory) and is streamed to the device (which has small memory)
    • This is consistent with post-processing usage (at least this could be argued).