Search
Search Results
-
LICOM3-CUDA: a GPU version of LASG/IAP climate system ocean model version 3 based on CUDA
The ocean general circulation model (OGCM) is an essential tool for researching oceanography and atmospheric science. The LASG/IAP climate system...
-
The Use of Functional Programming Library for Parallel Computing on CUDA
AbstractModern graphics accelerators (GPUs) can significantly speed up the execution of numerical problems. However, porting programs to graphics...
-
Improving CUDA performance of an unstructured high-order CFD application under OP2 framework
OP2 is a domain-specific language-based programming framework for unstructured mesh applications. It supports automatic code generation targeting...
-
Many-BSP: an analytical performance model for CUDA kernels
The unknown behavior of GPUs and the differing characteristics among their generations present a serious challenge in the analysis and optimization...
-
Migrating CUDA Code
Chapter 21 describes terminology, concepts, techniques, and tools to keep in mind when migrating CUDA code to C++ with SYCL. It describes places... -
CUDA-aware MPI implementation of Gibbs sampling for an IRT model
Item response theory (IRT) is a popular approach for addressing large-scale assessment problems in psychometrics and other areas of applied research....
-
swCUDA: Auto parallel code translation framework from CUDA to ATHREAD for new generation sunway supercomputer
Since specific hardware characteristics and low-level programming model are adapted to both NVIDIA GPU and new generation Sunway architecture,...
-
A CUDA-based parallel optimization method for SM3 hash algorithm
Hash algorithms are among the most crucial algorithms in cryptography. The SM3 algorithm is a hash cryptographic standard of China. Because of the...
-
A novel video compression model based on GPU virtualization with CUDA platform using bi-directional RNN
The exponential increase of superfluous video content across the web applications has provoked the evolution of proficient video compression...
-
StreamRec: A Recommendation Inference System with CUDA Stream Acceleration
Deep learning based recommendation models are widely used in various applications. There are often dozens of groups of sparse features in the input... -
Fast CUDA Geomagnetic Map Builder
In this paper, we use kriging techniques and inverse distance weighting (IDW) to generate geomagnetic maps in Romania. Kriging is a method of spatial... -
An Empirical Study of Memory Pool Based Allocation and Reuse in CUDA Graph
As the size of deep neural network models continues to increase, it places higher demands for memory capacity and allocation efficiency. NVIDIA GPUs... -
Porting Numerical Integration Codes from CUDA to oneAPI: A Case Study
We present our experience in porting optimized CUDA implementations to oneAPI. We focus on the use case of numerical integration, particularly the... -
GPU-CUDA Implementation of the Third Order Gaussian Recursive Filter
Gaussian convolution operation is a fundamental procedure in several data analysis tasks and scientific fields. For example, Gaussian convolution is...
-
An efficient parallelization method of Dempster–Shafer evidence theory based on CUDA
The Dempster–Shafer (D–S) evidence theory is effective for uncertain reasoning; it does not require advanced information. The theory has been widely...
-
Teaching High–performance Computing Systems – A Case Study with Parallel Programming APIs: MPI, OpenMP and CUDA
High performance computing (HPC) education has become essential in recent years, especially that parallel computing on high performance computing... -
Multidimensional adaptative and deterministic integration in CUDA and OpenMP
Parallelization schemes on many-core architectures, in this case, CUDA and OpenMP, are used to accelerate and improve the accuracy of adaptive...
-
Improving detection and classification of diabetic retinopathy using CUDA and Mask RCNN
Diabetic retinopathy (DR) is an eye disease caused by diabetes and can progress to certain degrees. Because DR’s the final stage can cause blindness,...
-
cuRCD: Region covariance descriptor CUDA implementation
Abstract Region covariance is a robust feature descriptor that allows the use of even the simplest image features like intensity and gradient...
-
A hybrid CUDA, OpenMP, and MPI parallel TCA-based domain adaptation for classification of very high-resolution remote sensing images
Domain Adaptation (DA) is a technique that aims at extracting information from a labeled remote sensing image to allow classifying a different image...