subscribe to arXiv mailings

arXiv:2407.20731 [pdf, other]

doi 10.1109/e-science58273.2023.10254865

In-Situ Techniques on GPU-Accelerated Data-Intensive Applications

Authors: Yi Ju, Mingshuai Li, Adalberto Perez, Laura Bellentani, Niclas Jansson, Stefano Markidis, Philipp Schlatter, Erwin Laure

Abstract: The computational power of High-Performance Computing (HPC) systems is constantly increasing, however, their input/output (IO) performance grows relatively slowly, and their storage capacity is also limited. This unbalance presents significant challenges for applications such as Molecular Dynamics (MD) and Computational Fluid Dynamics (CFD), which generate massive amounts of data for further visua… ▽ More The computational power of High-Performance Computing (HPC) systems is constantly increasing, however, their input/output (IO) performance grows relatively slowly, and their storage capacity is also limited. This unbalance presents significant challenges for applications such as Molecular Dynamics (MD) and Computational Fluid Dynamics (CFD), which generate massive amounts of data for further visualization or analysis. At the same time, checkpointing is crucial for long runs on HPC clusters, due to limited walltimes and/or failures of system components, and typically requires the storage of large amount of data. Thus, restricted IO performance and storage capacity can lead to bottlenecks for the performance of full application workflows (as compared to computational kernels without IO). In-situ techniques, where data is further processed while still in memory rather to write it out over the I/O subsystem, can help to tackle these problems. In contrast to traditional post-processing methods, in-situ techniques can reduce or avoid the need to write or read data via the IO subsystem. They offer a promising approach for applications aiming to leverage the full power of large scale HPC systems. In-situ techniques can also be applied to hybrid computational nodes on HPC systems consisting of graphics processing units (GPUs) and central processing units (CPUs). On one node, the GPUs would have significant performance advantages over the CPUs. Therefore, current approaches for GPU-accelerated applications often focus on maximizing GPU usage, leaving CPUs underutilized. In-situ tasks using CPUs to perform data analysis or preprocess data concurrently to the running simulation, offer a possibility to improve this underutilization. △ Less

Submitted 30 July, 2024; originally announced July 2024.

arXiv:2405.05640 [pdf, other]

Experience and Analysis of Scalable High-Fidelity Computational Fluid Dynamics on Modular Supercomputing Architectures

Authors: Martin Karp, Estela Suarez, Jan H. Meinke, Måns I. Andersson, Philipp Schlatter, Stefano Markidis, Niclas Jansson

Abstract: The never-ending computational demand from simulations of turbulence makes computational fluid dynamics (CFD) a prime application use case for current and future exascale systems. High-order finite element methods, such as the spectral element method, have been gaining traction as they offer high performance on both multicore CPUs and modern GPU-based accelerators. In this work, we assess how high… ▽ More The never-ending computational demand from simulations of turbulence makes computational fluid dynamics (CFD) a prime application use case for current and future exascale systems. High-order finite element methods, such as the spectral element method, have been gaining traction as they offer high performance on both multicore CPUs and modern GPU-based accelerators. In this work, we assess how high-fidelity CFD using the spectral element method can exploit the modular supercomputing architecture at scale through domain partitioning, where the computational domain is split between a Booster module powered by GPUs and a Cluster module with conventional CPU nodes. We investigate several different flow cases and computer systems based on the modular supercomputing architecture (MSA). We observe that for our simulations, the communication overhead and load balancing issues incurred by incorporating different computing architectures are seldom worthwhile, especially when I/O is also considered, but when the simulation at hand requires more than the combined global memory on the GPUs, utilizing additional CPUs to increase the available memory can be fruitful. We support our results with a simple performance model to assess when running across modules might be beneficial. As MSA is becoming more widespread and efforts to increase system utilization are growing more important our results give insight into when and how a monolithic application can utilize and spread out to more than one module and obtain a faster time to solution. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 13 pages, 5 figures, 3 tables, preprint

ACM Class: J.2; C.1.4; G.4

arXiv:2405.05639 [pdf, other]

Supercomputers as a Continous Medium

Authors: Martin Karp, Niclas Jansson, Philipp Schlatter, Stefano Markidis

Abstract: As supercomputers' complexity has grown, the traditional boundaries between processor, memory, network, and accelerators have blurred, making a homogeneous computer model, in which the overall computer system is modeled as a continuous medium with homogeneously distributed computational power, memory, and data movement transfer capabilities, an intriguing and powerful abstraction. By applying a ho… ▽ More As supercomputers' complexity has grown, the traditional boundaries between processor, memory, network, and accelerators have blurred, making a homogeneous computer model, in which the overall computer system is modeled as a continuous medium with homogeneously distributed computational power, memory, and data movement transfer capabilities, an intriguing and powerful abstraction. By applying a homogeneous computer model to algorithms with a given I/O complexity, we recover from first principles, other discrete computer models, such as the roofline model, parallel computing laws, such as Amdahl's and Gustafson's laws, and phenomenological observations, such as super-linear speedup. One of the homogeneous computer model's distinctive advantages is the capability of directly linking the performance limits of an application to the physical properties of a classical computer system. Applying the homogeneous computer model to supercomputers, such as Frontier, Fugaku, and the Nvidia DGX GH200, shows that applications, such as Conjugate Gradient (CG) and Fast Fourier Transforms (FFT), are rapidly approaching the fundamental classical computational limits, where the performance of even denser systems in terms of compute and memory are fundamentally limited by the speed of light. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 10 pages, 8 figures, 3 tables

ACM Class: F.1; F.2; I.6

arXiv:2401.14576 [pdf]

iFast: Host-Side Logging for Scientific Applications

Authors: Steven W. D. Chien, Kento Sato, Artur Podobas, Niclas Jansson, Stefano Markidis, Michio Honda

Abstract: We have seen an increase in the heterogeneity of storage technologies potentially available to scientific applications, such as burst buffers, managed cloud parallel file systems (PFS), and object stores. However, those applications cannot easily utilize those technologies, because they are designed for traditional HPC systems that offer very high remote storage and network bandwidth. We present i… ▽ More We have seen an increase in the heterogeneity of storage technologies potentially available to scientific applications, such as burst buffers, managed cloud parallel file systems (PFS), and object stores. However, those applications cannot easily utilize those technologies, because they are designed for traditional HPC systems that offer very high remote storage and network bandwidth. We present iFast, a new distributed host-side logging approach to transparently accelerating scientific applications. iFast has a strong emphasis on deployability, supporting unmodified MPI applications with unmodified MPI implementations while preserving the crash consistency semantics. We evaluate iFast on traditional HPC, cloud HPC, local cluster, and a hybrid of both, using three scientific applications. iFast reduces end-to-end execution time by 13-26% for popular scientific applications on the cloud. We show for the first time, how an application on a recent production HPC system can write data to S3 storage through fully fledged MPI-IO, in a readily shareable format. △ Less

Submitted 2 August, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: Submitted to VLDB 2025

arXiv:2307.03799 [pdf, other]

Uncertainty quantification for the squeeze flow of generalized Newtonian fluids

Authors: Aricia Rinkens, Clemens V. Verhoosel, Nick O. Jaensson

Abstract: The calibration of rheological parameters in the modeling of complex flows of non-Newtonian fluids can be a daunting task. In this paper we demonstrate how the framework of Uncertainty Quantification (UQ) can be used to improve the predictive capabilities of rheological models in such flow scenarios. For this demonstration, we consider the squeeze flow of generalized Newtonian fluids. To systemati… ▽ More The calibration of rheological parameters in the modeling of complex flows of non-Newtonian fluids can be a daunting task. In this paper we demonstrate how the framework of Uncertainty Quantification (UQ) can be used to improve the predictive capabilities of rheological models in such flow scenarios. For this demonstration, we consider the squeeze flow of generalized Newtonian fluids. To systematically study uncertainties, we have developed a tailored squeeze flow setup, which we have used to perform experiments with glycerol and PVP solution. To mimic these experiments, we have developed a three-region truncated power law model, which can be evaluated semi-analytically. This fast-to-evaluate model enables us to consider uncertainty propagation and Bayesian inference using (Markov chain) Monte Carlo techniques. We demonstrate that with prior information obtained from dedicated experiments - most importantly rheological measurements - the truncated power law model can adequately predict the experimental results. We observe that when the squeeze flow experiments are incorporated in the analysis in the case of Bayesian inference, this leads to an update of the prior information on the rheological parameters, giving evidence of the need for recalibration in the considered complex flow scenario. In the process of Bayesian inference we also obtain information on quantities of interest that are not directly observable in the experimental data, such as the spatial distribution of the three flow regimes. In this way, besides improving the predictive capabilities of the model, the uncertainty quantification framework enhances the insight into complex flow scenarios. △ Less

Submitted 7 July, 2023; originally announced July 2023.

arXiv:2305.01338 [pdf, other]

Physics-Informed Learning Using Hamiltonian Neural Networks with Output Error Noise Models

Authors: Sarvin Moradi, Nick Jaensson, Roland Tóth, Maarten Schoukens

Abstract: In order to make data-driven models of physical systems interpretable and reliable, it is essential to include prior physical knowledge in the modeling framework. Hamiltonian Neural Networks (HNNs) implement Hamiltonian theory in deep learning and form a comprehensive framework for modeling autonomous energy-conservative systems. Despite being suitable to estimate a wide range of physical system b… ▽ More In order to make data-driven models of physical systems interpretable and reliable, it is essential to include prior physical knowledge in the modeling framework. Hamiltonian Neural Networks (HNNs) implement Hamiltonian theory in deep learning and form a comprehensive framework for modeling autonomous energy-conservative systems. Despite being suitable to estimate a wide range of physical system behavior from data, classical HNNs are restricted to systems without inputs and require noiseless state measurements and information on the derivative of the state to be available. To address these challenges, this paper introduces an Output Error Hamiltonian Neural Network (OE-HNN) modeling approach to address the modeling of physical systems with inputs and noisy state measurements. Furthermore, it does not require the state derivatives to be known. Instead, the OE-HNN utilizes an ODE-solver embedded in the training process, which enables the OE-HNN to learn the dynamics from noisy state measurements. In addition, extending HNNs based on the generalized Hamiltonian theory enables to include external inputs into the framework which are important for engineering applications. We demonstrate via simulation examples that the proposed OE-HNNs results in superior modeling performance compared to classical HNNs. △ Less

Submitted 2 May, 2023; originally announced May 2023.

Comments: Preprint submitted to IFAC 2023

arXiv:2207.07098 [pdf, other]

Large-Scale Direct Numerical Simulations of Turbulence Using GPUs and Modern Fortran

Authors: Martin Karp, Daniele Massaro, Niclas Jansson, Alistair Hart, Jacob Wahlgren, Philipp Schlatter, Stefano Markidis

Abstract: We present our approach to making direct numerical simulations of turbulence with applications in sustainable shipping. We use modern Fortran and the spectral element method to leverage and scale on supercomputers powered by the Nvidia A100 and the recent AMD Instinct MI250X GPUs, while still providing support for user software developed in Fortran. We demonstrate the efficiency of our approach by… ▽ More We present our approach to making direct numerical simulations of turbulence with applications in sustainable shipping. We use modern Fortran and the spectral element method to leverage and scale on supercomputers powered by the Nvidia A100 and the recent AMD Instinct MI250X GPUs, while still providing support for user software developed in Fortran. We demonstrate the efficiency of our approach by performing the world's first direct numerical simulation of the flow around a Flettner rotor at Re=30'000 and its interaction with a turbulent boundary layer. We present one of the first performance comparisons between the AMD Instinct MI250X and Nvidia A100 GPUs for scalable computational fluid dynamics. Our results show that one MI250X offers performance on par with two A100 GPUs and has a similar power efficiency. △ Less

Submitted 23 June, 2022; originally announced July 2022.

Comments: 13 pages, 7 figures

ACM Class: G.4; J.2

arXiv:2109.03592 [pdf, ps, other]

Strong Scaling of OpenACC enabled Nek5000 on several GPU based HPC systems

Authors: Jonathan Vincent, Jing Gong, Martin Karp, Adam Peplinski, Niclas Jansson, Artur Podobas, Andreas Jocksch, Jie Yao, Fazle Hussain, Stefano Markidis, Matts Karlsson, Dirk Pleiter, Erwin Laure, Philipp Schlatter

Abstract: We present new results on the strong parallel scaling for the OpenACC-accelerated implementation of the high-order spectral element fluid dynamics solver Nek5000. The test case considered consists of a direct numerical simulation of fully-developed turbulent flow in a straight pipe, at two different Reynolds numbers $Re_τ=360$ and $Re_τ=550$, based on friction velocity and pipe radius. The strong… ▽ More We present new results on the strong parallel scaling for the OpenACC-accelerated implementation of the high-order spectral element fluid dynamics solver Nek5000. The test case considered consists of a direct numerical simulation of fully-developed turbulent flow in a straight pipe, at two different Reynolds numbers $Re_τ=360$ and $Re_τ=550$, based on friction velocity and pipe radius. The strong scaling is tested on several GPU-enabled HPC systems, including the Swiss Piz Daint system, TACC's Longhorn, Jülich's JUWELS Booster, and Berzelius in Sweden. The performance results show that speed-up between 3-5 can be achieved using the GPU accelerated version compared with the CPU version on these different systems. The run-time for 20 timesteps reduces from 43.5 to 13.2 seconds with increasing the number of GPUs from 64 to 512 for $Re_τ=550$ case on JUWELS Booster system. This illustrates the GPU accelerated version the potential for high throughput. At the same time, the strong scaling limit is significantly larger for GPUs, at about $2000-5000$ elements per rank; compared to about $50-100$ for a CPU-rank. △ Less

Submitted 4 November, 2021; v1 submitted 8 September, 2021; originally announced September 2021.

Comments: 9 pages, 8 figures. Submitted to HPC-Asia 2022 conference, updated to address reviewers comments

ACM Class: G.4; J.2; C.1

arXiv:2108.12188 [pdf, ps, other]

doi 10.1145/3492805.3492808

A High-Fidelity Flow Solver for Unstructured Meshes on Field-Programmable Gate Arrays

Authors: Martin Karp, Artur Podobas, Tobias Kenter, Niclas Jansson, Christian Plessl, Philipp Schlatter, Stefano Markidis

Abstract: The impending termination of Moore's law motivates the search for new forms of computing to continue the performance scaling we have grown accustomed to. Among the many emerging Post-Moore computing candidates, perhaps none is as salient as the Field-Programmable Gate Array (FPGA), which offers the means of specializing and customizing the hardware to the computation at hand. In this work, we de… ▽ More The impending termination of Moore's law motivates the search for new forms of computing to continue the performance scaling we have grown accustomed to. Among the many emerging Post-Moore computing candidates, perhaps none is as salient as the Field-Programmable Gate Array (FPGA), which offers the means of specializing and customizing the hardware to the computation at hand. In this work, we design a custom FPGA-based accelerator for a computational fluid dynamics (CFD) code. Unlike prior work -- which often focuses on accelerating small kernels -- we target the entire Poisson solver on unstructured meshes based on the high-fidelity spectral element method (SEM) used in modern state-of-the-art CFD systems. We model our accelerator using an analytical performance model based on the I/O cost of the algorithm. We empirically evaluate our accelerator on a state-of-the-art Intel Stratix 10 FPGA in terms of performance and power consumption and contrast it against existing solutions on general-purpose processors (CPUs). Finally, we propose a data movement-reducing technique where we compute geometric factors on the fly, which yields significant (700+ Gflop/s) single-precision performance and an upwards of 2x reduction in runtime for the local evaluation of the Laplace operator. We end the paper by discussing the challenges and opportunities of using reconfigurable architecture in the future, particularly in the light of emerging (not yet available) technologies. △ Less

Submitted 2 November, 2021; v1 submitted 27 August, 2021; originally announced August 2021.

Comments: 12 pages, 3 figures, 3 tables, Accepted to HPC Asia 2022

ACM Class: G.4; J.2; C.1

arXiv:2107.01243 [pdf]

Neko: A Modern, Portable, and Scalable Framework for High-Fidelity Computational Fluid Dynamics

Authors: Niclas Jansson, Martin Karp, Artur Podobas, Stefano Markidis, Philipp Schlatter

Abstract: Recent trends and advancement in including more diverse and heterogeneous hardware in High-Performance Computing is challenging software developers in their pursuit for good performance and numerical stability. The well-known maxim "software outlives hardware" may no longer necessarily hold true, and developers are today forced to re-factor their codebases to leverage these powerful new systems. C… ▽ More Recent trends and advancement in including more diverse and heterogeneous hardware in High-Performance Computing is challenging software developers in their pursuit for good performance and numerical stability. The well-known maxim "software outlives hardware" may no longer necessarily hold true, and developers are today forced to re-factor their codebases to leverage these powerful new systems. CFD is one of the many application domains affected. In this paper, we present Neko, a portable framework for high-order spectral element flow simulations. Unlike prior works, Neko adopts a modern object-oriented approach, allowing multi-tier abstractions of the solver stack and facilitating hardware backends ranging from general-purpose processors down to exotic vector processors and FPGAs. We show that Neko's performance and accuracy are comparable to NekRS, and thus on-par with Nek5000's successor on modern CPU machines. Furthermore, we develop a performance model, which we use to discuss challenges and opportunities for high-order solvers on emerging hardware. △ Less

Submitted 2 July, 2021; originally announced July 2021.

arXiv:2106.04979 [pdf]

Benchmarking the Nvidia GPU Lineage: From Early K80 to Modern A100 with Asynchronous Memory Transfers

Authors: Martin Svedin, Steven W. D. Chien, Gibson Chikafa, Niclas Jansson, Artur Podobas

Abstract: For many, Graphics Processing Units (GPUs) provides a source of reliable computing power. Recently, Nvidia introduced its 9th generation HPC-grade GPUs, the Ampere 100, claiming significant performance improvements over previous generations, particularly for AI-workloads, as well as introducing new architectural features such as asynchronous data movement. But how well does the A100 perform on non… ▽ More For many, Graphics Processing Units (GPUs) provides a source of reliable computing power. Recently, Nvidia introduced its 9th generation HPC-grade GPUs, the Ampere 100, claiming significant performance improvements over previous generations, particularly for AI-workloads, as well as introducing new architectural features such as asynchronous data movement. But how well does the A100 perform on non-AI benchmarks, and can we expect the A100 to deliver the application improvements we have grown used to with previous GPU generations? In this paper, we benchmark the A100 GPU and compare it to four previous generations of GPUs, with particular focus on empirically quantifying our derived performance expectations, and -- should those expectations be undelivered -- investigate whether the introduced data-movement features can offset any eventual loss in performance? We find that the A100 delivers less performance increase than previous generations for the well-known Rodinia benchmark suite; we show that some of these performance anomalies can be remedied through clever use of the new data-movement features, which we microbenchmark and demonstrate where (and more importantly, how) they should be used. △ Less

Submitted 3 July, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

Comments: 7 pages

arXiv:2103.09683 [pdf, other]

Accelerating Radiation Therapy Dose Calculation with Nvidia GPUs

Authors: Felix Liu, Niclas Jansson, Artur Podobas, Albin Fredriksson, Stefano Markidis

Abstract: Radiation Treatment Planning (RTP) is the process of planning the appropriate external beam radiotherapy to combat cancer in human patients. RTP is a complex and compute-intensive task, which often takes a long time (several hours) to compute. Reducing this time allows for higher productivity at clinics and more sophisticated treatment planning, which can materialize in better treatments. The stat… ▽ More Radiation Treatment Planning (RTP) is the process of planning the appropriate external beam radiotherapy to combat cancer in human patients. RTP is a complex and compute-intensive task, which often takes a long time (several hours) to compute. Reducing this time allows for higher productivity at clinics and more sophisticated treatment planning, which can materialize in better treatments. The state-of-the-art in medical facilities uses general-purpose processors (CPUs) to perform many steps in the RTP process. In this paper, we explore the use of accelerators to reduce RTP calculating time. We focus on the step that calculates the dose using the Graphics Processing Unit (GPU), which we believe is an excellent candidate for this computation type. Next, we create a highly optimized implementation for a custom Sparse Matrix-Vector Multiplication (SpMV) that operates on numerical formats unavailable in state-of-the-art SpMV libraries (e.g., Ginkgo and cuSPARSE). We show that our implementation is several times faster than the baseline (up-to 4x) and has a higher operational intensity than similar (but different) versions such as Ginkgo and cuSPARSE. △ Less

Submitted 19 September, 2021; v1 submitted 17 March, 2021; originally announced March 2021.

arXiv:2010.13463 [pdf]

High-Performance Spectral Element Methods on Field-Programmable Gate Arrays

Authors: Martin Karp, Artur Podobas, Niclas Jansson, Tobias Kenter, Christian Plessl, Philipp Schlatter, Stefano Markidis

Abstract: Improvements in computer systems have historically relied on two well-known observations: Moore's law and Dennard's scaling. Today, both these observations are ending, forcing computer users, researchers, and practitioners to abandon the general-purpose architectures' comforts in favor of emerging post-Moore systems. Among the most salient of these post-Moore systems is the Field-Programmable Gate… ▽ More Improvements in computer systems have historically relied on two well-known observations: Moore's law and Dennard's scaling. Today, both these observations are ending, forcing computer users, researchers, and practitioners to abandon the general-purpose architectures' comforts in favor of emerging post-Moore systems. Among the most salient of these post-Moore systems is the Field-Programmable Gate Array (FPGA), which strikes a convenient balance between complexity and performance. In this paper, we study modern FPGAs' applicability in accelerating the Spectral Element Method (SEM) core to many computational fluid dynamics (CFD) applications. We design a custom SEM hardware accelerator operating in double-precision that we empirically evaluate on the latest Stratix 10 GX-series FPGAs and position its performance (and power-efficiency) against state-of-the-art systems such as ARM ThunderX2, NVIDIA Pascal/Volta/Ampere Tesla-series cards, and general-purpose manycore CPUs. Finally, we develop a performance model for our SEM-accelerator, which we use to project future FPGAs' performance and role to accelerate CFD applications, ultimately answering the question: what characteristics would a perfect FPGA for CFD applications have? △ Less

Submitted 4 May, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

Comments: 10 pages, IEEE International Parallel and Distributed Processing Symposium 2021 (IPDPS'21)

ACM Class: G.4; J.2; C.1

arXiv:2005.13425 [pdf]

Optimization of Tensor-product Operations in Nekbone on GPUs

Authors: Martin Karp, Niclas Jansson, Artur Podobas, Philipp Schlatter, Stefano Markidis

Abstract: In the CFD solver Nek5000, the computation is dominated by the evaluation of small tensor operations. Nekbone is a proxy app for Nek5000 and has previously been ported to GPUs with a mixed OpenACC and CUDA approach. In this work, we continue this effort and optimize the main tensor-product operation in Nekbone further. Our optimization is done in CUDA and uses a different, 2D, thread structure to… ▽ More In the CFD solver Nek5000, the computation is dominated by the evaluation of small tensor operations. Nekbone is a proxy app for Nek5000 and has previously been ported to GPUs with a mixed OpenACC and CUDA approach. In this work, we continue this effort and optimize the main tensor-product operation in Nekbone further. Our optimization is done in CUDA and uses a different, 2D, thread structure to make the computations layer by layer. This enables us to use loop unrolling as well as utilize registers and shared memory efficiently. Our implementation is then compared on both the Pascal and Volta GPU architectures to previous GPU versions of Nekbone as well as a measured roofline. The results show that our implementation outperforms previous GPU Nekbone implementations by 6-10%. Compared to the measured roofline, we obtain 77 - 92% of the peak performance for both Nvidia P100 and V100 GPUs for inputs with 1024 - 4096 elements and polynomial degree 9. △ Less

Submitted 27 May, 2020; originally announced May 2020.

Comments: 4 pages, 4 figures

ACM Class: G.4; J.2

arXiv:2005.06811 [pdf, other]

doi 10.1103/PhysRevLett.125.098001

Microscale Marangoni Surfers

Authors: Kilian Dietrich, Nick Jaensson, Ivo Buttinoni, Giorgio Volpe, Lucio Isa

Abstract: We apply laser light to induce the asymmetric heating of Janus colloids adsorbed at water-oil interfaces and realize active micrometric "Marangoni surfers". The coupling of temperature and surfactant concentration gradients generates Marangoni stresses leading to self-propulsion. Particle velocities span four orders of magnitude, from microns/s to cm/s, depending on laser power and surfactant conc… ▽ More We apply laser light to induce the asymmetric heating of Janus colloids adsorbed at water-oil interfaces and realize active micrometric "Marangoni surfers". The coupling of temperature and surfactant concentration gradients generates Marangoni stresses leading to self-propulsion. Particle velocities span four orders of magnitude, from microns/s to cm/s, depending on laser power and surfactant concentration. Experiments are rationalized by finite elements simulations, defining different propulsion regimes relative to the magnitude of the thermal and solutal Marangoni stress components. △ Less

Submitted 14 May, 2020; originally announced May 2020.

Comments: main: 6 pages, 4 figures; supplemental: 18 pages, 11 figures

Journal ref: Phys. Rev. Lett. 125, 098001 (2020)

arXiv:1808.04099 [pdf, other]

CUBE: A scalable framework for large-scale industrial simulations

Authors: Niclas Jansson, Rahul Bale, Keiji Onishi, Makoto Tsubokura

Abstract: Writing high performance solvers for engineering applications is a delicate task. These codes are often developed on an application to application basis, highly optimized to solve a certain problem. Here, we present our work on developing a general simulation framework for efficient computation of time resolved approximations of complex industrial flow problems - Complex Unified Building cubE meth… ▽ More Writing high performance solvers for engineering applications is a delicate task. These codes are often developed on an application to application basis, highly optimized to solve a certain problem. Here, we present our work on developing a general simulation framework for efficient computation of time resolved approximations of complex industrial flow problems - Complex Unified Building cubE method (Cube). To address the challenges of emerging, modern supercomputers, suitable data structures and communication patterns are developed and incorporated into Cube. We use a Cartesian grid together with various immersed boundary methods to accurately capture moving, complex geometries. The asymmetric workload of the immersed boundary is balanced by a predictive dynamic load balancer, and a multithreaded halo-exchange algorithm is employed to efficiently overlap communication with computations. Our work also concerns efficient methods for handling the large amount of data produced by large-scale flow simulations, such as scalable parallel I/O, data compression and in-situ processing. △ Less

Submitted 13 August, 2018; originally announced August 2018.

arXiv:physics/0511251 [pdf, ps, other]

doi 10.1103/PhysRevLett.96.174502

Polygons on a Rotating Fluid Surface

Authors: Thomas R. N. Jansson, Martin P. Haspang, Kaare H. Jensen, Pascal Hersen, Tomas Bohr

Abstract: We report a novel and spectacular instability of a fluid surface in a rotating system. In a flow driven by rotating the bottom plate of a partially filled, stationary cylindrical container, the shape of the free surface can spontaneously break the axial symmetry and assume the form of a polygon rotating rigidly with a speed different from that of the plate. With water we have observed polygons w… ▽ More We report a novel and spectacular instability of a fluid surface in a rotating system. In a flow driven by rotating the bottom plate of a partially filled, stationary cylindrical container, the shape of the free surface can spontaneously break the axial symmetry and assume the form of a polygon rotating rigidly with a speed different from that of the plate. With water we have observed polygons with up to 6 corners. It has been known for many years that such flows are prone to symmetry breaking, but apparently the polygonal surface shapes have never been observed. The creation of rotating internal waves in a similar setup was observed for much lower rotation rates, where the free surface remains essentially flat. We speculate that the instability is caused by the strong azimuthal shear due to the stationary walls and that it is triggered by minute wobbling of the rotating plate. The slight asymmetry induces a tendency for mode-locking between the plate and the polygon, where the polygon rotates by one corner for each complete rotation of the plate. △ Less

Submitted 30 November, 2005; originally announced November 2005.

Showing 1–17 of 17 results for author: Jansson, N