-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
点开文章的朋友们可以先看看讲的是不是你要的效果
初始数据:假设我们的表如下所示,时间戳(ts)我简化成了1到5的数字。
| ts | amount |
|---|---|
| 1 | 10 |
| 2 | 5 |
| 3 | 3 |
| 4 | 10 |
| 5 | 5 |
处理结果:SQL执行完想变成这样
| ts | amount |
|---|---|
| 1 | 10 |
| 2 | 15 |
| 3 | 18 |
| 4 | 22 |
| 5 | 27 |
我先放最终的完整SQL,然后再解释里面让人迷茫的Clickhouse函数。
SELECT ts, arraySum(arraySlice(total, 1, i)) as total_num
FROM (
SELECT
groupArray(ts) as t,
groupArray(amount) as total
FROM (
SELECT ts, amount
FROM [表名]
WHERE ts >= [开始时间]
AND ts <= [结束时间]
)
)
ARRAY JOIN
t as ts,
arrayEnumerate(total) as iSTEP1:groupArray
里面除了SQL的基本语法之外,主要有4个看着就蛋疼的Clickhouse函数。我们从最内层往外看。
SELECT
groupArray(ts) as t,
groupArray(amount) as total
FROM (
SELECT ts, amount
FROM [表名]
WHERE ts >= [开始时间]
AND ts <= [结束时间]
)这个SQL中groupArray比较陌生。这函数简单来说就是把列转成行,执行后的效果就是
| t | total |
|---|---|
| [1,2,3,4,5] | [10,5,3,10,5] |
STEP2:ARRAY JOIN
接下来只看ARRAY JOIN和arrayEnumerate,我把SQL中只保留ARRAY JOIN。
SELECT t, total
FROM (
SELECT
groupArray(ts) as t,
groupArray(amount) as total
FROM (
SELECT ts, amount
FROM [表名]
WHERE ts >= [开始时间]
AND ts <= [结束时间]
)
) ARRAY JOIN tARRAY JOIN的主要作用是把数组展开
| t | total |
|---|---|
| 1 | [10,5,3,10,5] |
| 2 | [10,5,3,10,5] |
| 3 | [10,5,3,10,5] |
| 4 | [10,5,3,10,5] |
| 5 | [10,5,3,10,5] |
STEP3:带alias的ARRAY JOIN
原始SQL中还带了alias有啥用?当我们ARRAY JOIN t的时候,t在SQL中就代表了数组中的元素,而不是数组本身。那如果在查询结果中既包含元素、也包含数组本身怎么办?就得再加上alias了。s
SELECT t, total, ts
FROM (
SELECT
groupArray(ts) as t,
groupArray(amount) as total
FROM (
SELECT ts, amount
FROM [表名]
WHERE ts >= [开始时间]
AND ts <= [结束时间]
)
) ARRAY JOIN t as ts使用了alias的话,t就变成了数组的引用,ts就变成了数组元素的引用
| t | total | ts |
|---|---|---|
| [1,2,3,4,5] | [10,5,3,10,5] | 1 |
| [1,2,3,4,5] | [10,5,3,10,5] | 2 |
| [1,2,3,4,5] | [10,5,3,10,5] | 3 |
| [1,2,3,4,5] | [10,5,3,10,5] | 4 |
| [1,2,3,4,5] | [10,5,3,10,5] | 5 |
STEP4:arrayEnumerate
arrayEnumerate这个方法就和row_number()一样,简单理解就是返回行号。
SELECT t, total, ts, i
FROM (
SELECT
groupArray(ts) as t,
groupArray(amount) as total
FROM (
SELECT ts, amount
FROM [表名]
WHERE ts >= [开始时间]
AND ts <= [结束时间]
)
) ARRAY JOIN
t as ts,
arrayEnumerate(total) as i查询结果如下:
| t | total | ts | i |
|---|---|---|---|
| [1,2,3,4,5] | [10,5,3,10,5] | 1 | 1 |
| [1,2,3,4,5] | [10,5,3,10,5] | 2 | 2 |
| [1,2,3,4,5] | [10,5,3,10,5] | 3 | 3 |
| [1,2,3,4,5] | [10,5,3,10,5] | 4 | 4 |
| [1,2,3,4,5] | [10,5,3,10,5] | 5 | 5 |
STEP5:arraySlice和arraySum
那我们绕这么一圈对咱们的目标有啥帮助呢?憋着急,这时候就要用到arraySlice和arraySum。我们再看看我们现在已经查询出来的结果。
| t | total | ts | i |
|---|---|---|---|
| [1,2,3,4,5] | [10,5,3,10,5] | 1 | 1 |
| [1,2,3,4,5] | [10,5,3,10,5] | 2 | 2 |
| [1,2,3,4,5] | [10,5,3,10,5] | 3 | 3 |
| [1,2,3,4,5] | [10,5,3,10,5] | 4 | 4 |
| [1,2,3,4,5] | [10,5,3,10,5] | 5 | 5 |
大家可能已经发现了,根据我们现有的数据,已经能从单行的数据中计算目标数据:
- 第一行:10 = total[0]
- 第二行:15 = total[0] + total[1]
- 以此类推
咱们利用Clickhouse的函数就可以做这种处理:
arraySlice:截断数组,arraySlice(数组变量名称, 开始下标, 结束下标),注意开始下标从1开始,并且区间左闭右开[开始下标, 结束下标)。arraySum:将数组中所有元素做加和。
所以咱们再看看这个SQL,截断每个total数组,截取到行号i的前一位。然后把数组加起来。
SELECT ts, arraySum(arraySlice(total, 1, i)) as total_num
FROM (
SELECT
groupArray(ts) as t,
groupArray(amount) as total
FROM (
SELECT ts, amount
FROM [表名]
WHERE ts >= [开始时间]
AND ts <= [结束时间]
)
)
ARRAY JOIN
t as ts,
arrayEnumerate(total) as i最终获得结果就是
| ts | total_num |
|---|---|
| 1 | 10 |
| 2 | 15 |
| 3 | 18 |
| 4 | 22 |
| 5 | 27 |
结尾
做按时间累加这个SQL第一次写的话我觉得还是挺绕的,大家可以把Clickhouse函数和细节放一边,先记忆查询结果的处理过程。
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
