Skip to content

Commit 411d471

Browse files
committed
[174]: adding initial compute documentation
compute.rst is documentation to describe compute functions. For now it describes how to define and register a compute function. Still a work in progress. Updates to compute_fn.cc are to reflect the description provided in compute.rst
1 parent a0bde0a commit 411d471

File tree

2 files changed

+95
-6
lines changed

2 files changed

+95
-6
lines changed

cpp/code/compute_fn.cc

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -167,8 +167,11 @@ RegisterScalarFnKernels() {
167167
*/
168168
void
169169
RegisterNamedScalarFn(FunctionRegistry *registry) {
170+
StartRecipe("AddFunctionToRegistry");
171+
// scalar_fn has type: shared_ptr<ScalarFunction>
170172
auto scalar_fn = RegisterScalarFnKernels();
171173
DCHECK_OK(registry->AddFunction(std::move(scalar_fn)));
174+
EndRecipe("AddFunctionToRegistry");
172175
}
173176

174177

@@ -181,8 +184,12 @@ RegisterNamedScalarFn(FunctionRegistry *registry) {
181184
ARROW_EXPORT
182185
Result<Datum>
183186
NamedScalarFn(const Datum &input_arg, ExecContext *ctx) {
184-
auto func_name = "named_scalar_fn";
185-
return CallFunction(func_name, { input_arg }, ctx);
187+
StartRecipe("InvokeByCallFunction");
188+
auto func_name = "named_scalar_fn";
189+
auto result_datum = CallFunction(func_name, { input_arg }, ctx);
190+
EndRecipe("InvokeByCallFunction");
191+
192+
return result_datum;
186193
}
187194

188195

@@ -201,13 +208,11 @@ class ComputeFunctionTest : public ::testing::Test {};
201208

202209
TEST(ComputeFunctionTest, TestRegisterAndCallFunction) {
203210
// >> Register the function first
204-
StartRecipe("RegisterComputeFunction");
205211
auto fn_registry = arrow::compute::GetFunctionRegistry();
206212
RegisterNamedScalarFn(fn_registry);
207-
EndRecipe("RegisterComputeFunction");
208213

209214
// >> Then we can call the function
210-
StartRecipe("InvokeComputeFunction");
215+
StartRecipe("InvokeByConvenienceFunction");
211216
auto build_result = BuildIntArray();
212217
if (not build_result.ok()) {
213218
std::cerr << build_result.status().message() << std::endl;
@@ -224,7 +229,7 @@ TEST(ComputeFunctionTest, TestRegisterAndCallFunction) {
224229
auto result_data = fn_result->make_array();
225230
std::cout << "Success:" << std::endl;
226231
std::cout << "\t" << result_data->ToString() << std::endl;
227-
EndRecipe("InvokeComputeFunction");
232+
EndRecipe("InvokeByConvenienceFunction");
228233

229234
// If we want to peek at the input data
230235
std::cout << col_data.make_array()->ToString() << std::endl;

cpp/source/compute.rst

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
.. Licensed to the Apache Software Foundation (ASF) under one
2+
.. or more contributor license agreements. See the NOTICE file
3+
.. distributed with this work for additional information
4+
.. regarding copyright ownership. The ASF licenses this file
5+
.. to you under the Apache License, Version 2.0 (the
6+
.. "License"); you may not use this file except in compliance
7+
.. with the License. You may obtain a copy of the License at
8+
9+
.. http://www.apache.org/licenses/LICENSE-2.0
10+
11+
.. Unless required by applicable law or agreed to in writing,
12+
.. software distributed under the License is distributed on an
13+
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
.. KIND, either express or implied. See the License for the
15+
.. specific language governing permissions and limitations
16+
.. under the License.
17+
18+
====================================
19+
Defining and Using Compute Functions
20+
====================================
21+
22+
This section contains (or will contain) a number of recipes illustrating how to
23+
define new "compute functions" or how to use existing ones. Arrow contains a "Compute
24+
API," which primarily consists of a "registry" of functions that can be invoked.
25+
Currently, Arrow populates a default registry with a variety of useful functions. The
26+
recipes provided in this section show some approaches to define a compute function as well
27+
as how to invoke a compute function by name, given a registry.
28+
29+
30+
.. contents::
31+
32+
Invoke a Compute Function
33+
=========================
34+
35+
When invoking a compute function, the function must exist in a function registry. In this
36+
recipe, we use `CallFunction()` to invoke the function with name "named_scalar_fn".
37+
38+
.. recipe:: ../code/compute_fn.cc InvokeByCallFunction
39+
:caption: Use CallFunction() to invoke a compute function by name
40+
:dedent: 2
41+
42+
.. note::
43+
This method allows us to specify arguments as a vector and a custom ExecContext.
44+
45+
If CallFunction is not provided an ExecContext (it is null), then the default builtin
46+
FunctionRegistry will be used to call the function from.
47+
48+
If we have defined a convenience function that wraps `CallFunction()`, then we can call
49+
that function instead. Various compute functions provided by Arrow have these convenience
50+
functions defined, such as `Add` or `Subtract`.
51+
52+
.. recipe:: ../code/compute_fn.cc InvokeByConvenienceFunction
53+
:caption: Use a convenience invocation function to call a compute function
54+
:dedent: 2
55+
56+
57+
Adding a Custom Compute Function
58+
================================
59+
60+
To make a custom compute function available, there are 3 primary steps:
61+
1. Define kernels for the function (these implement the actual logic)
62+
2. Associate the kernels with a function object
63+
3. Add the function object to a function registry
64+
65+
66+
Define Function Kernels
67+
-----------------------
68+
69+
A kernel function is a single function that implements the desired logic for the compute
70+
function. The body of the kernel function may use other functions, but the kernel function
71+
itself is a singular instance that will be associated with the desired compute function.
72+
73+
The signature of a kernel function is relatively standardized: it returns a `Status` and
74+
takes a context, some arguments, and a pointer to an output result. The context wraps an
75+
`ExecContext` and other metadata about the environment in which the kernel function should
76+
be executed. The input arguments are contained within an `ExecSpan` (newly added in place
77+
of `ExecBatch`), which holds non-owning references to argument data. Finally, the
78+
`ExecResult` pointed to should be set to an appropriate `ArraySpan` or `ArrayData`
79+
instance, depending on ownership semantics of the kernel's output.
80+
81+
.. recipe:: ../code/compute_fn.cc DefineAComputeKernel
82+
:caption: Define an example compute kernel that uses ScalarHelper from hashing.h to hash
83+
input values
84+
:dedent: 2

0 commit comments

Comments
 (0)