GPU Parallel Ensemble Simulations with generic linear algebra #110

albertomercurio · 2021-07-13T01:06:04Z

Hello,

I'm trying to solve and ODE with my GPU. I tested the example code present in the DiffEqGPU.js repository, and it worked fine.

I have the following function to implement in the ODE Problem

function drho_dt(rho, p, t)
    A, w_l= p
    return ( L + A * sin(w_l * t) * L_t ) * rho
end

or equivalently

function drho_dt(drho, rho, p, t)
    A, w_l = p
    mul!(drho, L + A * sin(w_l * t) * L_t, rho)
    return nothing
end

where A and w_l are two numbers, while L and L_t are two global matrices. Following the documentation, I used the following code to run a Parallel Ensemble:

A = 0.05
w_l = 1.0

p = [A, w_l]

tspan = (0.0, 10.0 / gam_q)

w_l_l = LinRange{Float64}(0.1, 1.9, 10)

function prob_func(prob,i,repeat)
  remake(prob,p=[prob.p[1], w_l_l[i]])
end

prob = ODEProblem(drho_dt, rho0_vec, tspan, p)
ensemble_prob = EnsembleProblem(prob, prob_func=prob_func, safetycopy=false)
@time sim = solve(ensemble_prob, BS3(), METHOD!!, trajectories=length(w_l_l))

where METHOD!! can be one of the methods, e.g. EnsembleSerial(), EnsembleThreads(), EnsembleCPUArray() or EnsembleGPUArray().

The problem is divided in two cases:

When L and L_t are sparse matrices. In this case all the methods work, except for EnsembleGPUArray(). In this case the error is

InvalidIRError: compiling kernel gpu_gpu_kernel_oop(Cassette.Context{nametype(CUDACtx), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, Nothing, KernelAbstractions.var"##PassType#257", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel_oop), typeof(drho_dt), CuDeviceMatrix{ComplexF64, 1}, CuDeviceMatrix{ComplexF64, 1}, CuDeviceMatrix{Float64, 1}, Float64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to overdub)
Stacktrace:
 [1] overdub
   @ ./In[8]:62
 [2] macro expansion
   @ ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:25
 [3] overdub
   @ ~/.julia/packages/KernelAbstractions/Yy47c/src/macros.jl:80
 [4] overdub
   @ ~/.julia/packages/Cassette/N5kbV/src/overdub.jl:0
Reason: unsupported dynamic function invocation (call to overdub)
Stacktrace:
 [1] macro expansion
   @ ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:27
 [2] overdub
   @ ~/.julia/packages/KernelAbstractions/Yy47c/src/macros.jl:80
 [3] overdub
   @ ~/.julia/packages/Cassette/N5kbV/src/overdub.jl:0

Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{typeof(Cassette.overdub), Tuple{Cassette.Context{nametype(CUDACtx), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, Nothing, KernelAbstractions.var"##PassType#257", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel_oop), typeof(drho_dt), CuDeviceMatrix{ComplexF64, 1}, CuDeviceMatrix{ComplexF64, 1}, CuDeviceMatrix{Float64, 1}, Float64}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/wHjqZ/src/validation.jl:111
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/wHjqZ/src/driver.jl:319 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/ZQ0rt/src/TimerOutput.jl:236 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/wHjqZ/src/driver.jl:317 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/wHjqZ/src/utils.jl:62
  [6] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/02Kjq/src/compiler/execution.jl:317
  [7] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/wHjqZ/src/cache.jl:89
  [8] cufunction(f::typeof(Cassette.overdub), tt::Type{Tuple{Cassette.Context{nametype(CUDACtx), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, Nothing, KernelAbstractions.var"##PassType#257", Nothing, Cassette.DisableHooks}, typeof(DiffEqGPU.gpu_gpu_kernel_oop), typeof(drho_dt), CuDeviceMatrix{ComplexF64, 1}, CuDeviceMatrix{ComplexF64, 1}, CuDeviceMatrix{Float64, 1}, Float64}}; name::String, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/02Kjq/src/compiler/execution.jl:288
  [9] macro expansion
    @ ~/.julia/packages/CUDA/02Kjq/src/compiler/execution.jl:102 [inlined]
 [10] (::KernelAbstractions.Kernel{CUDAKernels.CUDADevice, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(DiffEqGPU.gpu_gpu_kernel_oop)})(::Function, ::Vararg{Any, N} where N; ndrange::Int64, dependencies::CUDAKernels.CudaEvent, workgroupsize::Int64, progress::Function)
    @ CUDAKernels ~/.julia/packages/CUDAKernels/8wtKq/src/CUDAKernels.jl:194
 [11] SciML/DifferentialEquations.jl#55
    @ ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:414 [inlined]
 [12] ODEFunction
    @ ~/.julia/packages/SciMLBase/oTP8b/src/scimlfunctions.jl:334 [inlined]
 [13] initialize!(integrator::OrdinaryDiffEq.ODEIntegrator{Tsit5, true, CuArray{ComplexF64, 2}, Nothing, Float64, CuArray{Float64, 2}, Float64, Float64, Float64, Float64, Vector{CuArray{ComplexF64, 2}}, ODESolution{ComplexF64, 3, Vector{CuArray{ComplexF64, 2}}, Nothing, Nothing, Vector{Float64}, Vector{Vector{CuArray{ComplexF64, 2}}}, ODEProblem{CuArray{ComplexF64, 2}, Tuple{Float64, Float64}, true, CuArray{Float64, 2}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5, OrdinaryDiffEq.InterpolationData{ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{CuArray{ComplexF64, 2}}, Vector{Float64}, Vector{Vector{CuArray{ComplexF64, 2}}}, OrdinaryDiffEq.Tsit5Cache{CuArray{ComplexF64, 2}, CuArray{ComplexF64, 2}, CuArray{ComplexF64, 2}, OrdinaryDiffEq.Tsit5ConstantCache{Float64, Float64}}}, DiffEqBase.DEStats}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, OrdinaryDiffEq.Tsit5Cache{CuArray{ComplexF64, 2}, CuArray{ComplexF64, 2}, CuArray{ComplexF64, 2}, OrdinaryDiffEq.Tsit5ConstantCache{Float64, Float64}}, OrdinaryDiffEq.DEOptions{Float64, Float64, Float64, Float64, PIController{Rational{Int64}}, typeof(DiffEqGPU.diffeqgpunorm), typeof(opnorm), Nothing, CallbackSet{Tuple{}, Tuple{}}, typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), DiffEqGPU.var"#12#18", DataStructures.BinaryMinHeap{Float64}, DataStructures.BinaryMinHeap{Float64}, Nothing, Nothing, Int64, Tuple{}, Tuple{}, Tuple{}}, CuArray{ComplexF64, 2}, ComplexF64, Nothing, OrdinaryDiffEq.DefaultInit}, cache::OrdinaryDiffEq.Tsit5Cache{CuArray{ComplexF64, 2}, CuArray{ComplexF64, 2}, CuArray{ComplexF64, 2}, OrdinaryDiffEq.Tsit5ConstantCache{Float64, Float64}})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/a5qvy/src/perform_step/low_order_rk_perform_step.jl:623
 [14] __init(prob::ODEProblem{CuArray{ComplexF64, 2}, Tuple{Float64, Float64}, true, CuArray{Float64, 2}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, alg::Tsit5, timeseries_init::Tuple{}, ts_init::Tuple{}, ks_init::Tuple{}, recompile::Type{Val{true}}; saveat::Tuple{}, tstops::Tuple{}, d_discontinuities::Tuple{}, save_idxs::Nothing, save_everystep::Bool, save_on::Bool, save_start::Bool, save_end::Nothing, callback::Nothing, dense::Bool, calck::Bool, dt::Float64, dtmin::Nothing, dtmax::Float64, force_dtmin::Bool, adaptive::Bool, gamma::Rational{Int64}, abstol::Nothing, reltol::Nothing, qmin::Rational{Int64}, qmax::Int64, qsteady_min::Int64, qsteady_max::Int64, beta1::Nothing, beta2::Nothing, qoldinit::Rational{Int64}, controller::Nothing, fullnormalize::Bool, failfactor::Int64, maxiters::Int64, internalnorm::typeof(DiffEqGPU.diffeqgpunorm), internalopnorm::typeof(opnorm), isoutofdomain::typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), unstable_check::DiffEqGPU.var"#12#18", verbose::Bool, timeseries_errors::Bool, dense_errors::Bool, advance_to_tstop::Bool, stop_at_next_tstop::Bool, initialize_save::Bool, progress::Bool, progress_steps::Int64, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), userdata::Nothing, allow_extrapolation::Bool, initialize_integrator::Bool, alias_u0::Bool, alias_du0::Bool, initializealg::OrdinaryDiffEq.DefaultInit, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/a5qvy/src/solve.jl:456
 [15] #__solve#466
    @ ~/.julia/packages/OrdinaryDiffEq/a5qvy/src/solve.jl:4 [inlined]
 [16] #solve_call#58
    @ ~/.julia/packages/DiffEqBase/oe7VF/src/solve.jl:61 [inlined]
 [17] solve_up(prob::ODEProblem{CuArray{ComplexF64, 2}, Tuple{Float64, Float64}, true, CuArray{Float64, 2}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, sensealg::Nothing, u0::CuArray{ComplexF64, 2}, p::CuArray{Float64, 2}, args::Tsit5; kwargs::Base.Iterators.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :callback, :merge_callbacks, :internalnorm), Tuple{DiffEqGPU.var"#12#18", Nothing, Bool, typeof(DiffEqGPU.diffeqgpunorm)}}})
    @ DiffEqBase ~/.julia/packages/DiffEqBase/oe7VF/src/solve.jl:82
 [18] #solve#59
    @ ~/.julia/packages/DiffEqBase/oe7VF/src/solve.jl:70 [inlined]
 [19] batch_solve_up(ensembleprob::EnsembleProblem{ODEProblem{Vector{ComplexF64}, Tuple{Float64, Float64}, false, Vector{Float64}, ODEFunction{false, typeof(drho_dt), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, typeof(prob_func), typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, probs::Vector{ODEProblem{Vector{ComplexF64}, Tuple{Float64, Float64}, false, Vector{Float64}, ODEFunction{false, typeof(drho_dt), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}}, alg::Tsit5, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}, u0::Matrix{ComplexF64}, p::Matrix{Float64}; kwargs::Base.Iterators.Pairs{Symbol, DiffEqGPU.var"#12#18", Tuple{Symbol}, NamedTuple{(:unstable_check,), Tuple{DiffEqGPU.var"#12#18"}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:319
 [20] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{ComplexF64}, Tuple{Float64, Float64}, false, Vector{Float64}, ODEFunction{false, typeof(drho_dt), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, typeof(prob_func), typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5, ensemblealg::EnsembleGPUArray, I::UnitRange{Int64}; kwargs::Base.Iterators.Pairs{Symbol, DiffEqGPU.var"#12#18", Tuple{Symbol}, NamedTuple{(:unstable_check,), Tuple{DiffEqGPU.var"#12#18"}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:284
 [21] macro expansion
    @ ./timing.jl:287 [inlined]
 [22] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{ComplexF64}, Tuple{Float64, Float64}, false, Vector{Float64}, ODEFunction{false, typeof(drho_dt), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, typeof(prob_func), typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5, ensemblealg::EnsembleGPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:201
 [23] #solve#61
    @ ~/.julia/packages/DiffEqBase/oe7VF/src/solve.jl:96 [inlined]
 [24] top-level scope
    @ ./timing.jl:210 [inlined]
 [25] top-level scope
    @ ./In[9]:0
 [26] eval
    @ ./boot.jl:360 [inlined]
 [27] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base ./loading.jl:1094

When L and L_t are dense matrices. In this case not even EnsembleCPUArray() works, but the others yes. The error for the GPU method should be the same. While the error for the CPU one is

TaskFailedException

    nested task error: conversion to pointer not defined for Base.Experimental.Const{ComplexF64, 2}
    Stacktrace:
      [1] error(::String)
        @ ./error.jl:33 [inlined]
      [2] overdub
        @ ./error.jl:33 [inlined]
      [3] unsafe_convert(::Type{Ptr{ComplexF64}}, ::Base.Experimental.Const{ComplexF64, 2})
        @ ./pointer.jl:67 [inlined]
      [4] overdub
        @ ./pointer.jl:67 [inlined]
      [5] unsafe_convert(::Type{Ptr{ComplexF64}}, ::SubArray{ComplexF64, 1, Base.Experimental.Const{ComplexF64, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
        @ ./subarray.jl:427 [inlined]
      [6] overdub
        @ ./subarray.jl:427 [inlined]
      [7] overdub
        @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/blas.jl:704 [inlined]
      [8] overdub
        @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:544 [inlined]
      [9] mul!(::Vector{ComplexF64}, ::Matrix{ComplexF64}, ::SubArray{ComplexF64, 1, Base.Experimental.Const{ComplexF64, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, ::Bool, ::Bool)
        @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:66 [inlined]
     [10] overdub
        @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:66 [inlined]
     [11] mul!(::Vector{ComplexF64}, ::Matrix{ComplexF64}, ::SubArray{ComplexF64, 1, Base.Experimental.Const{ComplexF64, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
        @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:275 [inlined]
     [12] overdub
        @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:275 [inlined]
     [13] overdub
        @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:51 [inlined]
     [14] overdub(::Cassette.Context{nametype(CPUCtx), KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.NoDynamicCheck, CartesianIndex{1}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, Nothing, KernelAbstractions.var"##PassType#257", Nothing, Cassette.DisableHooks}, ::typeof(*), ::Matrix{ComplexF64}, ::SubArray{ComplexF64, 1, Base.Experimental.Const{ComplexF64, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true})
        @ Cassette ~/.julia/packages/Cassette/N5kbV/src/overdub.jl:0
     [15] overdub
        @ ./In[11]:62 [inlined]
     [16] macro expansion
        @ ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:25 [inlined]
     [17] overdub
        @ ~/.julia/packages/KernelAbstractions/Yy47c/src/macros.jl:265 [inlined]
     [18] __thread_run(tid::Int64, len::Int64, rem::Int64, obj::KernelAbstractions.Kernel{KernelAbstractions.CPU, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(DiffEqGPU.cpu_gpu_kernel_oop)}, ndrange::Tuple{Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}, args::Tuple{typeof(drho_dt), Matrix{ComplexF64}, Matrix{ComplexF64}, Matrix{Float64}, Float64}, dynamic::KernelAbstractions.NDIteration.NoDynamicCheck)
        @ KernelAbstractions ~/.julia/packages/KernelAbstractions/Yy47c/src/cpu.jl:157
     [19] __run(obj::KernelAbstractions.Kernel{KernelAbstractions.CPU, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(DiffEqGPU.cpu_gpu_kernel_oop)}, ndrange::Tuple{Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}, args::Tuple{typeof(drho_dt), Matrix{ComplexF64}, Matrix{ComplexF64}, Matrix{Float64}, Float64}, dynamic::KernelAbstractions.NDIteration.NoDynamicCheck)
        @ KernelAbstractions ~/.julia/packages/KernelAbstractions/Yy47c/src/cpu.jl:130
     [20] (::KernelAbstractions.var"#33#34"{Tuple{KernelAbstractions.NoneEvent}, Nothing, typeof(KernelAbstractions.__run), Tuple{KernelAbstractions.Kernel{KernelAbstractions.CPU, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(DiffEqGPU.cpu_gpu_kernel_oop)}, Tuple{Int64}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}, Tuple{typeof(drho_dt), Matrix{ComplexF64}, Matrix{ComplexF64}, Matrix{Float64}, Float64}, KernelAbstractions.NDIteration.NoDynamicCheck}})()
        @ KernelAbstractions ~/.julia/packages/KernelAbstractions/Yy47c/src/cpu.jl:22

Stacktrace:
  [1] wait
    @ ./task.jl:322 [inlined]
  [2] wait
    @ ~/.julia/packages/KernelAbstractions/Yy47c/src/cpu.jl:65 [inlined]
  [3] wait
    @ ~/.julia/packages/KernelAbstractions/Yy47c/src/cpu.jl:64 [inlined]
  [4] (::DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)})(du::Matrix{ComplexF64}, u::Matrix{ComplexF64}, p::Matrix{Float64}, t::Float64)
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:414
  [5] ODEFunction
    @ ~/.julia/packages/SciMLBase/oTP8b/src/scimlfunctions.jl:334 [inlined]
  [6] initialize!(integrator::OrdinaryDiffEq.ODEIntegrator{Tsit5, true, Matrix{ComplexF64}, Nothing, Float64, Matrix{Float64}, Float64, Float64, Float64, Float64, Vector{Matrix{ComplexF64}}, ODESolution{ComplexF64, 3, Vector{Matrix{ComplexF64}}, Nothing, Nothing, Vector{Float64}, Vector{Vector{Matrix{ComplexF64}}}, ODEProblem{Matrix{ComplexF64}, Tuple{Float64, Float64}, true, Matrix{Float64}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, Tsit5, OrdinaryDiffEq.InterpolationData{ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{Matrix{ComplexF64}}, Vector{Float64}, Vector{Vector{Matrix{ComplexF64}}}, OrdinaryDiffEq.Tsit5Cache{Matrix{ComplexF64}, Matrix{ComplexF64}, Matrix{ComplexF64}, OrdinaryDiffEq.Tsit5ConstantCache{Float64, Float64}}}, DiffEqBase.DEStats}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, OrdinaryDiffEq.Tsit5Cache{Matrix{ComplexF64}, Matrix{ComplexF64}, Matrix{ComplexF64}, OrdinaryDiffEq.Tsit5ConstantCache{Float64, Float64}}, OrdinaryDiffEq.DEOptions{Float64, Float64, Float64, Float64, PIController{Rational{Int64}}, typeof(DiffEqGPU.diffeqgpunorm), typeof(opnorm), Nothing, CallbackSet{Tuple{}, Tuple{}}, typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), DiffEqGPU.var"#12#18", DataStructures.BinaryMinHeap{Float64}, DataStructures.BinaryMinHeap{Float64}, Nothing, Nothing, Int64, Tuple{}, Tuple{}, Tuple{}}, Matrix{ComplexF64}, ComplexF64, Nothing, OrdinaryDiffEq.DefaultInit}, cache::OrdinaryDiffEq.Tsit5Cache{Matrix{ComplexF64}, Matrix{ComplexF64}, Matrix{ComplexF64}, OrdinaryDiffEq.Tsit5ConstantCache{Float64, Float64}})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/a5qvy/src/perform_step/low_order_rk_perform_step.jl:623
  [7] __init(prob::ODEProblem{Matrix{ComplexF64}, Tuple{Float64, Float64}, true, Matrix{Float64}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, alg::Tsit5, timeseries_init::Tuple{}, ts_init::Tuple{}, ks_init::Tuple{}, recompile::Type{Val{true}}; saveat::Tuple{}, tstops::Tuple{}, d_discontinuities::Tuple{}, save_idxs::Nothing, save_everystep::Bool, save_on::Bool, save_start::Bool, save_end::Nothing, callback::Nothing, dense::Bool, calck::Bool, dt::Float64, dtmin::Nothing, dtmax::Float64, force_dtmin::Bool, adaptive::Bool, gamma::Rational{Int64}, abstol::Nothing, reltol::Nothing, qmin::Rational{Int64}, qmax::Int64, qsteady_min::Int64, qsteady_max::Int64, beta1::Nothing, beta2::Nothing, qoldinit::Rational{Int64}, controller::Nothing, fullnormalize::Bool, failfactor::Int64, maxiters::Int64, internalnorm::typeof(DiffEqGPU.diffeqgpunorm), internalopnorm::typeof(opnorm), isoutofdomain::typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), unstable_check::DiffEqGPU.var"#12#18", verbose::Bool, timeseries_errors::Bool, dense_errors::Bool, advance_to_tstop::Bool, stop_at_next_tstop::Bool, initialize_save::Bool, progress::Bool, progress_steps::Int64, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), userdata::Nothing, allow_extrapolation::Bool, initialize_integrator::Bool, alias_u0::Bool, alias_du0::Bool, initializealg::OrdinaryDiffEq.DefaultInit, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/a5qvy/src/solve.jl:456
  [8] #__solve#466
    @ ~/.julia/packages/OrdinaryDiffEq/a5qvy/src/solve.jl:4 [inlined]
  [9] #solve_call#58
    @ ~/.julia/packages/DiffEqBase/oe7VF/src/solve.jl:61 [inlined]
 [10] solve_up(prob::ODEProblem{Matrix{ComplexF64}, Tuple{Float64, Float64}, true, Matrix{Float64}, ODEFunction{true, DiffEqGPU.var"#55#59"{typeof(drho_dt), typeof(DiffEqGPU.gpu_kernel_oop)}, UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, sensealg::Nothing, u0::Matrix{ComplexF64}, p::Matrix{Float64}, args::Tsit5; kwargs::Base.Iterators.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:unstable_check, :callback, :merge_callbacks, :internalnorm), Tuple{DiffEqGPU.var"#12#18", Nothing, Bool, typeof(DiffEqGPU.diffeqgpunorm)}}})
    @ DiffEqBase ~/.julia/packages/DiffEqBase/oe7VF/src/solve.jl:82
 [11] #solve#59
    @ ~/.julia/packages/DiffEqBase/oe7VF/src/solve.jl:70 [inlined]
 [12] batch_solve_up(ensembleprob::EnsembleProblem{ODEProblem{Vector{ComplexF64}, Tuple{Float64, Float64}, false, Vector{Float64}, ODEFunction{false, typeof(drho_dt), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, typeof(prob_func), typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, probs::Vector{ODEProblem{Vector{ComplexF64}, Tuple{Float64, Float64}, false, Vector{Float64}, ODEFunction{false, typeof(drho_dt), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}}, alg::Tsit5, ensemblealg::EnsembleCPUArray, I::UnitRange{Int64}, u0::Matrix{ComplexF64}, p::Matrix{Float64}; kwargs::Base.Iterators.Pairs{Symbol, DiffEqGPU.var"#12#18", Tuple{Symbol}, NamedTuple{(:unstable_check,), Tuple{DiffEqGPU.var"#12#18"}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:319
 [13] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{ComplexF64}, Tuple{Float64, Float64}, false, Vector{Float64}, ODEFunction{false, typeof(drho_dt), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, typeof(prob_func), typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5, ensemblealg::EnsembleCPUArray, I::UnitRange{Int64}; kwargs::Base.Iterators.Pairs{Symbol, DiffEqGPU.var"#12#18", Tuple{Symbol}, NamedTuple{(:unstable_check,), Tuple{DiffEqGPU.var"#12#18"}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:284
 [14] macro expansion
    @ ./timing.jl:287 [inlined]
 [15] __solve(ensembleprob::EnsembleProblem{ODEProblem{Vector{ComplexF64}, Tuple{Float64, Float64}, false, Vector{Float64}, ODEFunction{false, typeof(drho_dt), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, typeof(prob_func), typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::Tsit5, ensemblealg::EnsembleCPUArray; trajectories::Int64, batch_size::Int64, unstable_check::Function, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/gGgMY/src/DiffEqGPU.jl:201
 [16] #solve#61
    @ ~/.julia/packages/DiffEqBase/oe7VF/src/solve.jl:96 [inlined]
 [17] top-level scope
    @ ./timing.jl:210 [inlined]
 [18] top-level scope
    @ ./In[11]:0
 [19] eval
    @ ./boot.jl:360 [inlined]
 [20] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base ./loading.jl:1094

The text was updated successfully, but these errors were encountered:

ChrisRackauckas · 2021-07-13T01:11:59Z

Yes, I don't think it can generate generic linear algebra calls.

ChrisRackauckas · 2021-07-13T01:12:32Z

Did you try using StaticArrays?

albertomercurio · 2021-07-13T11:45:38Z

Did you try using StaticArrays?

I'm using very large matrices (usually 1500x1500), this is the reason i use sparse matrices. However, also with a 100x100 matrix it tooks a lot of time to convert L to a Static Array

SMatrix{N, N}(Matrix(L))

ChrisRackauckas · 2021-07-13T11:47:42Z

That makes no sense to solve with EnsembleGPUArray. Read the DiffEqGPU documentation on the two forms of GPU problems. You're using the form for 100 ODEs or less, not the one for 1000 ODEs or more.

https://github.com/SciML/DiffEqGPU.jl#within-method-gpu-parallelism-with-direct-cuarray-usage

That's more applicable here.

albertomercurio · 2021-07-13T13:24:46Z

Do you mean that I need to convert L and L_t into CuArray and than evolve each ODE serially with EnsembleSerial()? If it is the case, can i use EnsembleThreads() to parallelize that CUDA ODE again?

ChrisRackauckas · 2021-07-13T18:18:47Z

That would evolve the ODE in parallel on the GPU. You want to use EnsembleSerial unless you have multiple GPUs which you would want to use at the same time.

albertomercurio · 2021-07-14T14:52:07Z

Ok perfect. Thank you for your help.

However I have the following question. I'm yet able to perform this ODE directly on the GPU, using the differential function

L = CUDA.CUSPARSE.CuSparseMatrixCSR( L )
L_t = CUDA.CUSPARSE.CuSparseMatrixCSR( L_t )

function f(rho, p, t)
    A, w_l = p
    rho_tmp = similar(rho0_vec)
    return L * rho + CUDA.CUSPARSE.mv!('N', A * sin(w_l * t), L_t, rho, 0.0, rho_tmp, '0')
end

where this time L and L_t are CuSparse matrices, and they don't take up a lot of memory as the dense matrix form.
This method is a little bit faster than the CPU sparse one, but of course slower if I parallelize the CPU method in the Ensemble Simulation.
Whit this CUDA sparse method the GPU is loaded only up to the 10%, and so the memory. If I could be able to parallelize all the ODEs in the ensemble I think it would get a lot faster than the EnsembleThreads() method.

Why there is no way to implement this method for an EnsembleGPUArray() simulation?

ChrisRackauckas · 2021-07-19T22:56:13Z

To GPU it like this, you'd want to expand it out to its scalar form with something like ModelingToolkit, in order to then do the GPU codegen. EnsembleGPUArray is all about kernel code generation, so it's not using CUDA kernels like those from CUSPARSE since it's really made to generate nonlinear kernels.

ChrisRackauckas changed the title ~~GPU Parallel Ensemble Simulations doesn't work~~ Jul 13, 2021

ChrisRackauckas transferred this issue from SciML/DifferentialEquations.jl Jul 13, 2021

ChrisRackauckas added the upstream label Dec 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Parallel Ensemble Simulations with generic linear algebra #110

GPU Parallel Ensemble Simulations with generic linear algebra #110

albertomercurio commented Jul 13, 2021

ChrisRackauckas commented Jul 13, 2021

ChrisRackauckas commented Jul 13, 2021

albertomercurio commented Jul 13, 2021

ChrisRackauckas commented Jul 13, 2021

albertomercurio commented Jul 13, 2021

ChrisRackauckas commented Jul 13, 2021

albertomercurio commented Jul 14, 2021

ChrisRackauckas commented Jul 19, 2021

GPU Parallel Ensemble Simulations with generic linear algebra #110

GPU Parallel Ensemble Simulations with generic linear algebra #110

Comments

albertomercurio commented Jul 13, 2021

ChrisRackauckas commented Jul 13, 2021

ChrisRackauckas commented Jul 13, 2021

albertomercurio commented Jul 13, 2021

ChrisRackauckas commented Jul 13, 2021

albertomercurio commented Jul 13, 2021

ChrisRackauckas commented Jul 13, 2021

albertomercurio commented Jul 14, 2021

ChrisRackauckas commented Jul 19, 2021