OMP task tracing via the OMPT interface
Adam Tuft (graduate MISCADA 2020/21) supervised by Tobias Weinzierl
I chose MISCADA for its fascinating combination of theoretical, practical, professional, and entrepreneurial topics. The flexibility offered by the various specialisations has allowed me to tailor my studies to my personal interests. I was motivated to gain exposure to cutting-edge research in scientific computing and data analysis. I chose a dissertation topic in high-performance computing as I wanted to develop novel software that would be directly applicable in many other domains of scientific computing research. A dissertation in HPC would also allow me to learn more about the myriad programming models and hardware architectures in development today. In this project I have already found that my work is of interest to multiple partners in industry and research, such as the MetOffice. The mission for my project (Otter) is to accelerate the development of efficient task-based parallel programs and empower HPC developers to exploit the high levels of parallelism available in modern hardware. My tool allows HPC developers to observe the structure and performance of their programs and informs how they may best deploy task-based parallelism in their code.I had the exciting opportunity to present the results of my project in a poster competition at the CIUK2021 conference in Manchester, alongside competing in the 2021 Student Cluster Challenge. This conference was a great way to meet a wide variety of vendors and see the diversity of novel technologies coming to market.
Adam continued with a PhD in Computer Science after graduating from MISCADA. Due to his dissertation project, he also became an Intel oneAPI Student Ambassador.
Machine Learning for a Silencer
The Dissertation Project with Lontra has shown how the different aspects of the course can be utilised in one project. In the case of this project, the aim was to optimise a silencer design through use of evolutionary algorithms. To achieve this first the existing simulation was parallelised and a benchmark for hyperparameters of the evolutionary algorithms was created. Due to the benchmark using synthetic data i.e. known results, the benchmark could be heavily optimised using techniques that were taught within the HPC-Module. As running any extra parameter raises the benchmarking time exponentially, Machine Learning could be employed to perform feature selection and therefore reducing the hyperparameters to the relevant ones. Working on a real-world project has shown how relevant all aspects of the course can be to solving real-world problems.
Molecular Dynamics and Fluid Dynamics Simulations
Allen Drews (graduate MISCADA 2019/20)
The scientific computing aspects of MISCADA helped me decide to pursue a PhD in computational biophysics. I will start the research project “Multiscale Simulations of Droplet-Membrane Mutual Remodelling” with Halim Kusumaatmaja in Durham as part of the Molecular Sciences in Medicine (MoSMed) PhD cohort in October 2020. The project will evolve around the interactions of intracellular droplets with membranes and the consequent re-shaping of their respective morphologies. These droplet-remodelling mechanisms have only been observed recently, and the ultimate goal of the project is to gain a much needed better theoretical understanding of them.
Amongst other things, the research will involve molecular dynamics simulations, parallel computing and fluid dynamics in the programming languages we worked with in MISCADA, namely Python, C and C++. I am sure that especially the finite-difference fluid dynamics computations underlying my dissertation project (A 2D Navier-Stokes Solver in Python, supervised by Anthony Yeates), as well as the basics in MPI programming we learned in the High Performance Computing submodule will fundamentally ensure a smooth start into my PhD research.
The Design and Implementation of Machine Learning Techniques for Fault Prediction in High-Performance Computing Systems
Chia-Hao Li (graduate MISCADA 2019/20), in collaboration with Durham’s DiRAC supercomputing centre, supervised by Alastair Basden
Scientists rely on High-Performance Computing (HPC) systems to conduct large scale simulations. However, the reliability and fault tolerance of powerful platforms have not had the same degree of improvement as the computing performance. Moreover, there are non-negligible numbers of unsuccessful jobs which occupy nodes and consume resources. How to best utilise these platforms is one of the challenges for the HPC system support teams.
This project aims to investigate the application of machine learning techniques for abnormal event detection and aid the diagnosis procedure of these systems. The sketch above shows the data path of the proposed architecture. The first step is to build an online framework to collect logs, such as data from sensors or system messages, and extract meaningful information. Then, the project uses influxDB, an open-source database, to store the information and visualises them with an open-source Grafana server. See the gallery below (left) for the home page of the proposed dashboard. Finally, an always-active model for anomalous power, temperature, and jobs detection are deployed. The right figure in the gallery illustrates the integration of alert messages for the system power and the corresponding raw data.
The reliability of this method is verified on the COSMA HPC system at Durham University. The proposed dashboards provide line charts and histogram plots for visualising raw data, as well as some statistical measurements to help the support team handle the platform better. The machine learning model can immediately catch unusual power and temperature figures when it occurs and identify more than 70% of failed jobs.