Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion web/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@
},
"dependencies": {
"@aws-sdk/client-s3": "^3.828.0",
"@aws-sdk/client-sts": "^3.907.0",
"@aws-sdk/lib-storage": "^3.828.0",
"@aws-sdk/s3-request-presigner": "^3.828.0",
"@aws-sdk/client-sts": "^3.907.0",
"@babel/runtime": "7.26.0",
"@codemirror/autocomplete": "6.18.3",
"@codemirror/lang-json": "6.0.1",
Expand Down
Binary file not shown.
Binary file not shown.
2 changes: 1 addition & 1 deletion web/scripts/update-duckdb-extensions.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import https from "https";

const DUCKDB_ENGINE_VERSION = "v1.4.3";

const DEFAULT_EXTS = ["parquet", "json", "httpfs"];
const DEFAULT_EXTS = ["parquet", "json", "httpfs", "iceberg", "avro"];

const OUTPUT_DIR = path.join(
process.cwd(),
Expand Down
174 changes: 174 additions & 0 deletions web/src/core/adapters/icebergApi/icebergApi.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
import type { IcebergApi } from "core/ports/IcebergApi";
import type { SqlOlap } from "core/ports/SqlOlap";
import { id } from "tsafe/id";

export type IcebergCatalogConfig = {
name: string;
warehouse: string;
endpoint: string;
/**
* Returns a fresh bearer token before each request.
* Each catalog can have its own auth provider (different OIDC clients,
* different realms, etc.). Returns undefined for public catalogs.
*/
getAccessToken: () => Promise<string | undefined>;
};

/**
* Creates one IcebergApi instance that manages multiple catalogs.
* All catalog configs (endpoint, warehouse, token provider) are fixed at
* creation time — callers select which catalog to use via the `catalog` param.
*/
export function createDuckDbIcebergApi(params: {
sqlOlap: SqlOlap;
catalogs: IcebergCatalogConfig[];
}): IcebergApi {
const { sqlOlap, catalogs } = params;

function secretName(catalogName: string): string {

Check warning on line 28 in web/src/core/adapters/icebergApi/icebergApi.ts

View check run for this annotation

SonarQubeCloud / SonarCloud Code Analysis

Move function 'secretName' to the outer scope.

See more on https://sonarcloud.io/project/issues?id=InseeFrLab_onyxia&issues=AZy-m4epvjWlMhfwQLvs&open=AZy-m4epvjWlMhfwQLvs&pullRequest=1059
return `iceberg_${catalogName}`;
}

// Eagerly install the iceberg extension, create secrets and attach all
// catalogs in a single connection so everything is ready before the first query.
const prDb = (async () => {
const { db } = await sqlOlap.getConfiguredAsyncDuckDb();

const conn = await db.connect();
try {
await conn.query("INSTALL iceberg;\nLOAD iceberg;");

for (const catalogConfig of catalogs) {
const token = await catalogConfig.getAccessToken();

if (token !== undefined) {
await conn.query(
[
`CREATE OR REPLACE SECRET "${secretName(catalogConfig.name)}" (`,
` TYPE iceberg,`,
` TOKEN '${token}'`,
");"
].join("\n")
);
Comment on lines +41 to +52
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd web && find . -name "icebergApi.ts" -type f

Repository: InseeFrLab/onyxia

Length of output: 105


🏁 Script executed:

cat -n web/src/core/adapters/icebergApi/icebergApi.ts | head -80

Repository: InseeFrLab/onyxia

Length of output: 3450


🏁 Script executed:

# Search for any token refresh or expiration handling in the codebase
rg -i "token.*refresh|refresh.*token|token.*expir|expir.*token" --type ts --type tsx

Repository: InseeFrLab/onyxia

Length of output: 88


🏁 Script executed:

# Look for getAccessToken implementation
rg -A 10 "getAccessToken" --type ts --type tsx

Repository: InseeFrLab/onyxia

Length of output: 88


🌐 Web query:

DuckDB WASM CREATE OR REPLACE SECRET support token refresh dynamic update

💡 Result:

  • CREATE OR REPLACE SECRET itself is supported in DuckDB SQL, but whether it’s useful in DuckDB-Wasm depends on whether the relevant secret types are available (they’re registered by extensions such as httpfs/aws). (duckdb.org)
  • In official DuckDB-Wasm builds, httpfs is not available (Wasm uses a separate browser-oriented HTTP implementation), so the common secret types provided by httpfs (e.g., s3, http, huggingface, etc.) are not available out of the box in DuckDB-Wasm. (duckdb.org)

Token refresh / dynamic update

  • DuckDB’s cloud-credential refresh is implemented as extension-driven secret refresh (e.g., aws supports REFRESH auto for credential refresh; httpfs has a refresh mechanism that can recreate/replace secrets when invalid). (duckdb.org)
  • For browser (Wasm) scenarios, there is no general “OAuth refresh token flow” built into DuckDB-Wasm secrets. The practical pattern is: refresh the token in JavaScript (or your host app), then re-run CREATE OR REPLACE SECRET to overwrite the stored token value.
  • To make that overwrite less error-prone / avoid embedding tokens in SQL text, DuckDB supports expressions in CREATE SECRET (e.g., set a variable, then create/replace the secret from getvariable(...); update by changing the variable and running CREATE OR REPLACE SECRET again). (duckdb.org)

Implication: if you need s3/http/bearer-token secrets with refresh in the browser, you generally must either (a) handle refresh outside DuckDB-Wasm and replace the secret manually, or (b) use a custom DuckDB-Wasm build/extension strategy that actually provides those secret types and refresh hooks (since the stock Wasm distribution doesn’t include httpfs). (duckdb.org)

Citations:


Implement token refresh before queries to honor the interface contract.

The IcebergCatalogConfig.getAccessToken() interface documents that it returns "a fresh bearer token before each request", but the current implementation calls it only once during initialization. For long-running browser sessions, the token stored in the secret will expire while queries continue using the stale value, causing authentication failures.

Refresh the secrets before queries by calling getAccessToken() again and re-running CREATE OR REPLACE SECRET. DuckDB-Wasm supports this pattern for dynamic token updates.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/core/adapters/icebergApi/icebergApi.ts` around lines 41 - 52, The
current loop only calls catalogConfig.getAccessToken() once and writes it to the
secret but must refresh the token before each query to avoid expiration; modify
the code that runs queries (the logic that uses conn.query against Iceberg
catalogs) to call catalogConfig.getAccessToken() again immediately before
executing any query and then re-run conn.query with the CREATE OR REPLACE SECRET
payload using secretName(catalogConfig.name) and the fresh token (same SQL
pattern shown) so the secret is updated dynamically prior to every request.

}

const attachLines = [
`ATTACH '${catalogConfig.warehouse}' AS "${catalogConfig.name}" (`,
` TYPE iceberg,`,
...(token !== undefined

Check warning on line 58 in web/src/core/adapters/icebergApi/icebergApi.ts

View check run for this annotation

SonarQubeCloud / SonarCloud Code Analysis

Unexpected negated condition.

See more on https://sonarcloud.io/project/issues?id=InseeFrLab_onyxia&issues=AZy-m4epvjWlMhfwQLvt&open=AZy-m4epvjWlMhfwQLvt&pullRequest=1059
? [` SECRET '${secretName(catalogConfig.name)}',`]
: []),
` ENDPOINT '${catalogConfig.endpoint}'`,
");"
];
await conn.query(attachLines.join("\n"));
Comment on lines +45 to +64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

SQL injection risk from string interpolation.

Values are interpolated directly into SQL strings without escaping:

  • Line 49: TOKEN '${token}'
  • Line 56: ATTACH '${catalogConfig.warehouse}'
  • Line 61: ENDPOINT '${catalogConfig.endpoint}'

While these values originate from controlled sources (OIDC tokens, region config), a token or config value containing a single quote (') would break the query or enable injection. Consider escaping single quotes or using parameterized queries if DuckDB WASM supports them.

🛡️ Proposed minimal escaping
+function escapeSqlString(value: string): string {
+    return value.replace(/'/g, "''");
+}
+
 if (token !== undefined) {
     await conn.query(
         [
             `CREATE OR REPLACE SECRET "${secretName(catalogConfig.name)}" (`,
             `    TYPE iceberg,`,
-            `    TOKEN '${token}'`,
+            `    TOKEN '${escapeSqlString(token)}'`,
             ");"
         ].join("\n")
     );
 }
🧰 Tools
🪛 GitHub Check: SonarCloud Code Analysis

[warning] 58-58: Unexpected negated condition.

See more on https://sonarcloud.io/project/issues?id=InseeFrLab_onyxia&issues=AZy-m4epvjWlMhfwQLvt&open=AZy-m4epvjWlMhfwQLvt&pullRequest=1059

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/core/adapters/icebergApi/icebergApi.ts` around lines 45 - 64, The SQL
built in the iceberg catalog code interpolates unescaped values into conn.query
strings (see conn.query usage and attachLines creation around secretName, token,
catalogConfig.warehouse, and catalogConfig.endpoint); to fix, stop inserting raw
values and either switch to parameterized queries if the DuckDB/WASM client
supports them or else implement proper SQL escaping by replacing any single
quote ' with two single quotes '' for every interpolated value (token,
catalogConfig.warehouse, catalogConfig.endpoint and any value passed into
secretName) before building the query string so injected quotes cannot break or
alter the SQL.

}
} finally {
await conn.close();
}

return db;
})();

return {
listAllTables: async () => {
let db: import("@duckdb/duckdb-wasm").AsyncDuckDB;
try {
db = await prDb;
} catch {
return id<IcebergApi.ListAllTablesResult.Failed>({
errorCause: "network error"
});
}

const conn = await db.connect();
try {
const result = await conn.query(
`SELECT table_catalog AS database, table_schema AS schema, table_name AS name FROM information_schema.tables;`
);

const tables: IcebergApi.TableEntry[] = result.toArray().map(row => ({
catalog: String(row["database"]),
namespace: String(row["schema"]),
name: String(row["name"])
}));

return id<IcebergApi.ListAllTablesResult.Success>({ tables });
} catch (e) {
const cause = classifyError(e);
return id<IcebergApi.ListAllTablesResult.Failed>({
errorCause:
cause === "unauthorized" ? "unauthorized" : "network error"
});
} finally {
await conn.close();
}
},

fetchTablePreview: async ({ catalog: catalogName, namespace, table, limit }) => {
const catalogConfig = catalogs.find(c => c.name === catalogName);

if (catalogConfig === undefined) {
return id<IcebergApi.FetchTablePreviewResult.Failed>({
errorCause: "network error"
});
}

let db: import("@duckdb/duckdb-wasm").AsyncDuckDB;
try {
db = await prDb;
} catch {
return id<IcebergApi.FetchTablePreviewResult.Failed>({
errorCause: "network error"
});
}

const conn = await db.connect();
try {
const result = await conn.query(
`SELECT * FROM "${catalogName}"."${namespace}"."${table}" LIMIT ${limit};`
);
Comment on lines +128 to +130
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

SQL injection risk in table preview query.

catalogName, namespace, and table are interpolated directly into the query. While catalogName is validated against known catalogs (line 109), namespace and table come from listAllTables results which should be safe, but a defense-in-depth approach would escape or validate these identifiers.

-const result = await conn.query(
-    `SELECT * FROM "${catalogName}"."${namespace}"."${table}" LIMIT ${limit};`
-);
+const escapedCatalog = catalogName.replace(/"/g, '""');
+const escapedNamespace = namespace.replace(/"/g, '""');
+const escapedTable = table.replace(/"/g, '""');
+const result = await conn.query(
+    `SELECT * FROM "${escapedCatalog}"."${escapedNamespace}"."${escapedTable}" LIMIT ${limit};`
+);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const result = await conn.query(
`SELECT * FROM "${catalogName}"."${namespace}"."${table}" LIMIT ${limit};`
);
const escapedCatalog = catalogName.replace(/"/g, '""');
const escapedNamespace = namespace.replace(/"/g, '""');
const escapedTable = table.replace(/"/g, '""');
const result = await conn.query(
`SELECT * FROM "${escapedCatalog}"."${escapedNamespace}"."${escapedTable}" LIMIT ${limit};`
);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/src/core/adapters/icebergApi/icebergApi.ts` around lines 128 - 130, The
SELECT built for previewing tables interpolates catalogName, namespace, and
table directly into conn.query (in icebergApi.ts) creating a potential SQL
injection point; validate or escape namespace and table (which originate from
listAllTables) before use—e.g., enforce a strict identifier regex or whitelist
allowed names for namespace and table, ensure limit is an integer, and only then
interpolate with proper quoted identifiers when calling conn.query; update the
code path that constructs the query (the conn.query call) to perform these
validations/escapes first.


const columns: IcebergApi.Column[] = (
result.schema.fields as {
name: string;
type: { toString(): string };
nullable: boolean;
}[]
).map((field, index) => ({
fieldId: index,
name: field.name,
rawType: field.type.toString(),
isRequired: !field.nullable
}));

const rows: Record<string, unknown>[] = result
.toArray()
.map(row => Object.fromEntries(Object.entries(row)));

return id<IcebergApi.FetchTablePreviewResult.Success>({ columns, rows });
} catch (e) {
return id<IcebergApi.FetchTablePreviewResult.Failed>({
errorCause: classifyError(e)
});
} finally {
await conn.close();
}
}
};
}

// ---------------------------------------------------------------------------
// Error classification
// ---------------------------------------------------------------------------

function classifyError(e: unknown): "unauthorized" | "table not found" | "network error" {
const msg = (e instanceof Error ? e.message : String(e)).toLowerCase();
if (msg.includes("401") || msg.includes("unauthorized") || msg.includes("403")) {
return "unauthorized";
}
if (msg.includes("not found") || msg.includes("does not exist")) {
return "table not found";
}
return "network error";
}
1 change: 1 addition & 0 deletions web/src/core/adapters/icebergApi/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
export * from "./icebergApi";
6 changes: 6 additions & 0 deletions web/src/core/adapters/onyxiaApi/ApiTypes.ts
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,12 @@ export type ApiTypes = {
}
))[];
}>;
iceberg?: ArrayOrNot<{
warehouse: string;
endpoint: string;
catalog: string;
oidcConfiguration?: Partial<ApiTypes.OidcConfiguration>;
}>;
};
vault?: {
URL: string;
Expand Down
24 changes: 24 additions & 0 deletions web/src/core/adapters/onyxiaApi/onyxiaApi.ts
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,30 @@
s3ConfigCreationFormDefaults
};
})(),
iceberg: (() => {
const icebergConfig_api = (() => {

Check failure on line 319 in web/src/core/adapters/onyxiaApi/onyxiaApi.ts

View check run for this annotation

SonarQubeCloud / SonarCloud Code Analysis

Refactor this code to not nest functions more than 4 levels deep.

See more on https://sonarcloud.io/project/issues?id=InseeFrLab_onyxia&issues=AZy-m4eHvjWlMhfwQLvp&open=AZy-m4eHvjWlMhfwQLvp&pullRequest=1059
const value = apiRegion.data?.iceberg;

if (value === undefined) {
return [];
}

if (value instanceof Array) {

Check warning on line 326 in web/src/core/adapters/onyxiaApi/onyxiaApi.ts

View check run for this annotation

SonarQubeCloud / SonarCloud Code Analysis

Avoid using `instanceof` for type checking as it can lead to unreliable results.

See more on https://sonarcloud.io/project/issues?id=InseeFrLab_onyxia&issues=AZy-m4eHvjWlMhfwQLvq&open=AZy-m4eHvjWlMhfwQLvq&pullRequest=1059
return value;
}

return [value];
})();
return icebergConfig_api.map(icebergConfig => ({

Check failure on line 332 in web/src/core/adapters/onyxiaApi/onyxiaApi.ts

View check run for this annotation

SonarQubeCloud / SonarCloud Code Analysis

Refactor this code to not nest functions more than 4 levels deep.

See more on https://sonarcloud.io/project/issues?id=InseeFrLab_onyxia&issues=AZy-m4eHvjWlMhfwQLvr&open=AZy-m4eHvjWlMhfwQLvr&pullRequest=1059
warehouse: icebergConfig.warehouse,
endpoint: icebergConfig.endpoint,
catalog: icebergConfig.catalog,
oidcParams:
apiTypesOidcConfigurationToOidcParams_Partial(
icebergConfig.oidcConfiguration
)
}));
})(),
allowedURIPatternForUserDefinedInitScript:
apiRegion.services.allowedURIPattern,
kafka: (() => {
Expand Down
Loading