Systematically inferring I/O performance variability by examining repetitive job behavior E Costa, T Patel, B Schwaller, JM Brandt, D Tiwari Proceedings of the International Conference for High Performance Computing …, 2021 | 15 | 2021 |
Proctor: A semi-supervised performance anomaly diagnosis framework for production hpc systems B Aksar, Y Zhang, E Ates, B Schwaller, O Aaziz, VJ Leung, J Brandt, ... High Performance Computing: 36th International Conference, ISC High …, 2021 | 15 | 2021 |
Evaluating the Marvell ThunderX2 server processor for HPC workloads SD Hammond, C Hughes, MJ Levenhagen, CT Vaughan, AJ Younge, ... 2019 International Conference on High Performance Computing & Simulation …, 2019 | 14 | 2019 |
Investigating TI KeyStone II and quad-core ARM Cortex-A53 architectures for on-board space processing B Schwaller, B Ramesh, AD George 2017 IEEE High Performance Extreme Computing Conference (HPEC), 1-7, 2017 | 13 | 2017 |
E2EWatch: an end-to-end anomaly diagnosis framework for production HPC systems B Aksar, B Schwaller, O Aaziz, VJ Leung, J Brandt, M Egele, AK Coskun Euro-Par 2021: Parallel Processing: 27th International Conference on …, 2021 | 12 | 2021 |
HPC system data pipeline to enable meaningful insights through analysis-driven visualizations B Schwaller, N Tucker, T Tucker, B Allan, J Brandt 2020 IEEE International Conference on Cluster Computing (CLUSTER), 433-441, 2020 | 11 | 2020 |
Emulation-based performance studies on the HPSC space processor B Schwaller, S Holtzman, AD George 2019 IEEE Aerospace Conference, 1-11, 2019 | 7 | 2019 |
Albadross: Active learning based anomaly diagnosis for production hpc systems B Aksar, E Sencan, B Schwaller, O Aaziz, VJ Leung, J Brandt, B Kulis, ... 2022 IEEE International Conference on Cluster Computing (CLUSTER), 369-380, 2022 | 5 | 2022 |
Integrated system and application continuous performance monitoring and analysis capability O Aaziz, B Allan, J Brandt, J Cook, K Devine, J Elliott, A Gentile, ... Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), 2021 | 4 | 2021 |
Prodigy: Towards unsupervised anomaly detection in production hpc systems B Aksar, E Sencan, B Schwaller, O Aaziz, VJ Leung, J Brandt, B Kulis, ... Proceedings of the International Conference for High Performance Computing …, 2023 | 2 | 2023 |
Using Monitoring Data to Improve HPC Performance via Network-Data-Driven Allocation Y Zhang, B Aksar, O Aaziz, B Schwaller, J Brandt, V Leung, M Egele, ... 2021 IEEE High Performance Extreme Computing Conference (HPEC), 1-7, 2021 | 2 | 2021 |
Lessons From Examining Repetitive Job Behavior and I/O Performance Variability on a Production HPC System Emily Costa Northeastern University, USA Tirthak Patel Northeastern … E Costa, T Patel, B Schwaller, J Brandt, D Tiwari Sandia National Lab.(SNL-NM), Albuquerque, NM (United States); Sandia …, 2021 | 2 | 2021 |
AD for Machine Learning Approach to Understanding HPC Application Performance Variation Poster. B Aksar, B Schwaller, OR Aaziz, E Ates, JM Brandt, A Coskun, M Egele, ... Sandia National Lab.(SNL-NM), Albuquerque, NM (United States); Sandia …, 2019 | 2 | 2019 |
Runtime Performance Anomaly Diagnosis in Production HPC Systems Using Active Learning B Aksar, E Sencan, B Schwaller, O Aaziz, VJ Leung, J Brandt, B Kulis, ... IEEE Transactions on Parallel and Distributed Systems, 2024 | 1 | 2024 |
Towards Practical Machine Learning Frameworks for Performance Diagnostics in Supercomputers B Aksar, E Sencan, B Schwaller, VJ Leung, J Brandt, B Kulis, M Egele, ... Proceedings of the First Workshop on AI for Systems, 1-6, 2023 | 1 | 2023 |
Integrated system and application continuous performance monitoring and analysis capability (final) B Schwaller Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), 2021 | 1 | 2021 |
A Machine Learning Approach to Understanding HPC Application Performance Variation. B Schwaller, B Aksar, OR Aaziz, E Ates, JM Brandt, A Coskun, M Egele, ... Sandia National Lab.(SNL-NM), Albuquerque, NM (United States); Sandia …, 2019 | 1 | 2019 |
Standardized Environment for Monitoring Heterogeneous Architectures C Brown, B Schwaller, N Gauntt, B Allan, K Davis 2019 IEEE International Conference on Cluster Computing (CLUSTER), 1-5, 2019 | 1 | 2019 |
Investigating, optimizing, and emulating candidate architectures for on-board space processing B Schwaller University of Pittsburgh, 2018 | 1 | 2018 |
LDMS Darshan Connector: For Run Time Diagnosis of HPC Application I/O Performance S Walton, O Aaziz, ALV Solórzano, B Schwaller 2022 IEEE International Conference on Cluster Computing (CLUSTER), 626-634, 2022 | | 2022 |