-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Hi, I have the following nested AD problem that I'd like to get to work with TaylorDiff. Starting with a vector-valued function f(params, x)
for scalar x
, take a high-order derivative with respect to x
and evaluate for a specific value of x
. Then apply some reduction function to the result to obtain a scalar-valued function g(params)
. Finally I want to evaluate the gradient d g / d params
. Example:
# Arbitrary function:
f(params, x) = [params[1] * x^3 + params[2], params[2] * sin(x - params[1]), sqrt(x + params[2])]
function g(params)
closure(x) = f(params, x)
some_x = 0.7
d3f_dx3 = TaylorDiff.derivative(closure, some_x, Val(3))
return sum(d3f_dx3)
end
some_params = [1.3, 2.1]
@show g(some_params) # Fine, gives 6.095380076578732
TaylorDiff.derivative(g, some_params, [1.0, 0.0], Val(1)) # First element of the gradient
Results, using julia 1.11.5 and TaylorDiff v0.3.3:
ERROR: MethodError: *(::TaylorScalar{Float64, 1}, ::TaylorScalar{Float64, 3}) is ambiguous.
Candidates:
*(a::TaylorScalar, b::Number)
@ TaylorDiff ~/.julia/packages/TaylorDiff/qw5aY/src/primitive.jl:119
*(a::Number, b::TaylorScalar)
@ TaylorDiff ~/.julia/packages/TaylorDiff/qw5aY/src/primitive.jl:114
Possible fix, define
*(::TaylorScalar, ::TaylorScalar)
Stacktrace:
[1] f(params::Vector{TaylorScalar{Float64, 1}}, x::TaylorScalar{Float64, 3})
@ Main ./REPL[4]:1
[2] (::var"#closure#1"{Vector{TaylorScalar{Float64, 1}}})(x::TaylorScalar{Float64, 3})
@ Main ./REPL[5]:2
[3] derivatives
@ ~/.julia/packages/TaylorDiff/qw5aY/src/derivative.jl:41 [inlined]
[4] derivative
@ ~/.julia/packages/TaylorDiff/qw5aY/src/derivative.jl:16 [inlined]
[5] g(params_in::Vector{TaylorScalar{Float64, 1}})
@ Main ./REPL[5]:4
[6] derivatives
@ ~/.julia/packages/TaylorDiff/qw5aY/src/derivative.jl:41 [inlined]
[7] derivative(f::Function, x::Vector{Float64}, l::Vector{Float64}, p::Val{1})
@ TaylorDiff ~/.julia/packages/TaylorDiff/qw5aY/src/derivative.jl:17
[8] top-level scope
@ REPL[8]:1
Any idea how this could be made to work?
While Zygote-over-TaylorDiff does work for this problem, @btime
shows it is much faster to use ForwardDiff-over-ForwardDiff (probably due to the overhead of reverse mode), so I imagine TaylorDiff-over-TaylorDiff (or ForwardDiff-over-TaylorDiff) might be even faster due to the high-order inner derivative. Thanks.