Why It’s Worth the Hassle: The Value of In-Situ Studies When Designing Ubicomp: (Nominated for the Best Paper Award) Y Rogers, K Connelly, L Tedesco, W Hazlewood, A Kurtz, RE Hall, ... UbiComp 2007: Ubiquitous Computing: 9th International Conference, UbiComp …, 2007 | 295 | 2007 |
The design and implementation of checkpoint/restart process fault tolerance for Open MPI J Hursey, JM Squyres, TI Mattox, A Lumsdaine 2007 IEEE International Parallel and Distributed Processing Symposium, 1-8, 2007 | 268 | 2007 |
An evaluation of user-level failure mitigation support in MPI W Bland, A Bouteiller, T Herault, J Hursey, G Bosilca, JJ Dongarra Recent Advances in the Message Passing Interface: 19th European MPI Users …, 2012 | 143 | 2012 |
PMIx: Process management for exascale environments RH Castain, J Hursey, A Bouteiller, D Solt Parallel Computing 79, 9-29, 2018 | 102 | 2018 |
Interconnect agnostic checkpoint/restart in Open MPI J Hursey, TI Mattox, A Lumsdaine Proceedings of the 18th ACM international symposium on High Performance …, 2009 | 86 | 2009 |
Run-through stabilization: An MPI proposal for process fault tolerance J Hursey, RL Graham, G Bronevetsky, D Buntinas, H Pritchard, DG Solt Recent Advances in the Message Passing Interface: 18th European MPI Users …, 2011 | 71 | 2011 |
An evaluation of user-level failure mitigation support in MPI W Bland, A Bouteiller, T Herault, J Hursey, G Bosilca, JJ Dongarra Computing 95, 1171-1184, 2013 | 56 | 2013 |
Coordinated checkpoint/restart process fault tolerance for MPI applications on HPC systems J Hursey Indiana University, 2010 | 45 | 2010 |
A log-scaling fault tolerant agreement algorithm for a fault tolerant MPI J Hursey, T Naughton, G Vallee, RL Graham Recent Advances in the Message Passing Interface: 18th European MPI Users …, 2011 | 43 | 2011 |
Locality-aware parallel process mapping for multi-core HPC systems J Hursey, JM Squyres, T Dontje 2011 IEEE international conference on cluster computing, 527-531, 2011 | 38 | 2011 |
A checkpoint and restart service specification for Open MPI J Hursey, JM Squyres, A Lumsdaine Indiana University, Computer Science Department, Technical Report, 2006 | 33 | 2006 |
Netloc: Towards a comprehensive view of the HPC system topology B Goglin, J Hursey, JM Squyres 2014 43rd International Conference on Parallel Processing Workshops, 216-225, 2014 | 31 | 2014 |
Building a fault tolerant MPI application: A ring communication example J Hursey, RL Graham 2011 IEEE International Symposium on Parallel and Distributed Processing …, 2011 | 29 | 2011 |
An extensible framework for distributed testing of mpi implementations J Hursey, E Mallove, JM Squyres, A Lumsdaine European Parallel Virtual Machine/Message Passing Interface Users’ Group …, 2007 | 21 | 2007 |
A performance analysis and optimization of PMIx-based HPC software stacks AY Polyakov, BI Karasev, J Hursey, J Ladd, M Brinskii, E Shipunova Proceedings of the 26th European MPI Users' Group Meeting, 1-10, 2019 | 19 | 2019 |
A composable runtime recovery policy framework supporting resilient HPC applications J Hursey, A Lumsdaine Indiana University, Bloomington, Indiana, USA, Tech. Rep. TR686, 2010 | 18 | 2010 |
Checkpoint/restart-enabled parallel debugging J Hursey, C January, M O’Connor, PH Hargrove, D Lecomber, ... Recent Advances in the Message Passing Interface: 17th European MPI Users …, 2010 | 18 | 2010 |
Design considerations for building and running containerized MPI applications J Hursey 2020 2nd International Workshop on Containers and New Orchestration …, 2020 | 17 | 2020 |
Preserving collective performance across process failure for a fault tolerant MPI J Hursey, RL Graham 2011 IEEE International Symposium on Parallel and Distributed Processing …, 2011 | 17 | 2011 |
Advancing application process affinity experimentation: Open MPI's LAMA-based affinity interface J Hursey, JM Squyres Proceedings of the 20th European MPI Users' Group Meeting, 163-168, 2013 | 16 | 2013 |