add functionality to prefect by LizzAlice · Pull Request #215 · MaRDI4NFDI/docker-importer

LizzAlice · 2026-03-23T14:03:28Z

Connect wikidata update functionality to prefect

Summary by CodeRabbit

New Features
- The Wikidata update endpoint now accepts multiple QIDs (list or comma/space-separated) and processes updates asynchronously, returning job tracking details with HTTP 202 (Accepted).
Bug Fixes
- Requests missing QIDs now return HTTP 400 with an error message.
- Failures starting the background job return HTTP 500 with an error and details.

coderabbitai · 2026-03-23T14:03:45Z

📝 Walkthrough

Walkthrough

The POST /update/wikidata endpoint was refactored to queue an asynchronous Prefect workflow for batch QIDs. A new service trigger starts the Prefect deployment and returns 202; Prefect tasks perform per-QID updates and report individual outcomes.

Changes

Cohort / File(s)	Summary
Flask Endpoint `flask_app/app.py`	Renamed handler to `update_wikidata_async()`. Accepts `qids` (list or comma/space-separated string), validates presence (400 on missing), calls `trigger_update_wikidata_async()`, returns Prefect payload with HTTP 202, and returns HTTP 500 with error details on trigger failure.
Prefect Workflow `prefect_workflow/prefect_mardi_importer.py`	Added `update_wikidata_batch(qids)` Prefect task that sets env vars, instantiates `WikidataImporter`, iterates QIDs calling `update_entities()`, captures per-QID status (`success`/`not_updated`/`error`) and aggregates `all_imported` flag; added `elif action == "update/wikidata"` branch to run this task.
Service Layer `services/import_service.py`	Added `trigger_update_wikidata_async(qids, workflow_name=DEFAULT_WORKFLOW_NAME)` which calls Prefect `run_deployment` with `{"action":"update/wikidata","qids": qids}`, `timeout=0`, and returns an "accepted" payload containing `deployment_id`, `id`, `flow_id`, `qids_queued`, and message.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant Flask as Flask Endpoint
    participant Service as Import Service
    participant Prefect as Prefect Deployment
    participant Task as update_wikidata_batch Task
    participant Importer as WikidataImporter

    Client->>Flask: POST /update/wikidata {qids: [...]}
    Flask->>Flask: parse & validate qids
    Flask->>Service: trigger_update_wikidata_async(qids)
    Service->>Prefect: run_deployment(action: "update/wikidata", qids)
    Prefect-->>Service: {deployment_id, id, flow_id, ...}
    Service-->>Flask: accepted payload
    Flask-->>Client: 202 + Prefect payload

    Note over Prefect,Task: async background processing
    Prefect->>Task: invoke update_wikidata_batch(qids)
    Task->>Task: set env & instantiate Importer
    loop for each qid
        Task->>Importer: update_entities(qid)
        Importer-->>Task: result / error
        Task->>Task: record per-QID status
    end
    Task-->>Prefect: {results: {...}, all_imported: bool}

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

fix import #194: Modifies the same Prefect workflow module; touches importer import paths that interact with the new task additions.
add doi asynch #193: Adds async import flows in the same Prefect workflow area; conceptually related to the new "update/wikidata" branch.
add wikidata update capability to the flask api #212: Previously changed the /update/wikidata Flask endpoint to a synchronous importer-based handler; this PR replaces that approach with an async Prefect-based flow.

Suggested reviewers

eloiferrer

Poem

🐰 Hopping code across the yard so spry,

QIDs queued up, they leap and fly,
Prefect wakes, the importer hums,
Background work now gently drums,
202 returns — off they go, bye-bye!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title is overly vague and generic; it uses non-descriptive language ('add functionality to prefect') that doesn't convey the specific change (connecting Wikidata updates to Prefect async workflows).	Revise the title to be more specific, such as 'Make Wikidata updates async via Prefect workflow' or 'Connect Wikidata updates to Prefect async processing'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch make_update_async

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

services/import_service.py (1)
283-310: Docstring says "import" but function performs "update".

Line 287 states "Trigger a Prefect Wikidata import workflow" but this function triggers an update workflow. The docstring should be corrected to match the function's purpose.

Additionally, this function is nearly identical to trigger_wikidata_async (lines 254-281), differing only in the action parameter and response message. Consider extracting a shared helper to reduce duplication.
🔧 Proposed docstring fix
 def trigger_update_wikidata_async(
     qids: list[str],
     workflow_name: str = DEFAULT_WORKFLOW_NAME,
 ) -> dict:
-    """Trigger a Prefect Wikidata import workflow.
+    """Trigger a Prefect Wikidata update workflow.
 
     Args:
         qids: List of Wikidata QIDs to update.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/import_service.py` around lines 283 - 310, The docstring for
trigger_update_wikidata_async is incorrect and should say it triggers a Prefect
Wikidata update workflow; update the docstring text in
trigger_update_wikidata_async to reflect "update" rather than "import", and make
the function share logic with trigger_wikidata_async by extracting a helper
(e.g., _trigger_wikidata_deployment or run_wikidata_deployment) that accepts
parameters like action, qids, workflow_name and returns the same payload
structure so both trigger_update_wikidata_async and trigger_wikidata_async call
that helper (preserve unique fields like the response message by allowing a
message param).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@flask_app/app.py`:
- Around line 212-226: Fix the typos and inconsistent wording in
update_wikidata_async: correct the docstring to remove the stray backtick and
'a' so it reads that the JSON expects a "qids" field (and remove trailing extra
whitespace), and change the error response message from "missing qid" to
"missing qids" so it matches the normalized variable qids and the docstring.
Ensure the docstring clearly states qids can be a list or comma/space-separated
string.

In `@prefect_workflow/prefect_mardi_importer.py`:
- Around line 259-262: The failure check is looking for
result.get("all_imported", True) but update_wikidata_batch returns
"all_updated", so change the flow's check to use the correct key (e.g.,
result.get("all_updated", False)) or standardize to a common key (e.g., "all_ok"
or "success") across batch tasks; update the logic in the flow where result is
inspected (the block handling update_wikidata_batch and the similar checks at
lines referenced) so it reads the matching key returned by update_wikidata_batch
(or adapt update_wikidata_batch to return the standardized key) and ensure the
default value causes the flow to fail when the key is missing/false.

---

Nitpick comments:
In `@services/import_service.py`:
- Around line 283-310: The docstring for trigger_update_wikidata_async is
incorrect and should say it triggers a Prefect Wikidata update workflow; update
the docstring text in trigger_update_wikidata_async to reflect "update" rather
than "import", and make the function share logic with trigger_wikidata_async by
extracting a helper (e.g., _trigger_wikidata_deployment or
run_wikidata_deployment) that accepts parameters like action, qids,
workflow_name and returns the same payload structure so both
trigger_update_wikidata_async and trigger_wikidata_async call that helper
(preserve unique fields like the response message by allowing a message param).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6e17f789-863a-4e29-a193-6a40c89a8ca0

📥 Commits

Reviewing files that changed from the base of the PR and between fcf876a and 805a405.

📒 Files selected for processing (3)

flask_app/app.py
prefect_workflow/prefect_mardi_importer.py
services/import_service.py

flask_app/app.py

prefect_workflow/prefect_mardi_importer.py

coderabbitai

♻️ Duplicate comments (1)

flask_app/app.py (1)

212-219: ⚠️ Potential issue | 🟡 Minor

Docstring is outdated and contains typos.

Line 213: "Update person profile" is too narrow – this now handles batch QIDs for any Wikidata entity
Line 215: Double comma ", ," before "which"
Line 216: Trailing whitespace
Lines 218-219: Return description mentions "QID that was updated or an empty list" but the endpoint now returns a Prefect trigger payload with HTTP 202

,

✏️ Proposed fix

 `@app.post`("/update/wikidata")
 def update_wikidata_async():
-    """Update person profile from wikidata; this is async and happens in Prefect
+    """Update Wikidata entities asynchronously via Prefect.
 
-    Expects JSON with a ``qids`` field, , which may be a list or a string of
-    comma/space-separated Wikidata QIDs. 
+    Expects JSON with a ``qids`` field, which may be a list or a string of
+    comma/space-separated Wikidata QIDs.
 
     Returns:
-        Response, either the QID that was updated or an empty list
+        Flask response tuple with Prefect flow metadata and HTTP 202.
     """

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@flask_app/app.py` around lines 212 - 219, The docstring for
update_wikidata_async is outdated and contains typos; update it to describe that
the function handles batch Wikidata QIDs for any entity (not just person), fix
the punctuation/whitespace (remove the double comma and trailing space), clarify
the input format ("qids" can be a list or a comma/space-separated string of
QIDs), and replace the return description with the current behavior stating the
endpoint triggers a Prefect flow and returns a Prefect trigger payload with HTTP
202; keep all changes inside the update_wikidata_async docstring.

🧹 Nitpick comments (1)

flask_app/app.py (1)

226-226: Minor: Log message uses singular "QID" for a list.

For consistency and clarity, consider using "QIDs" (plural) in the log message.

✏️ Proposed fix

-    log.info("QID: %s", qids)
+    log.info("QIDs: %s", qids)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@flask_app/app.py` at line 226, The log statement using log.info("QID: %s",
qids) is misleading because qids is a list; update the message to use the plural
form (e.g., "QIDs") so it accurately reflects the data. Locate the log.info call
in flask_app/app.py that references qids and change the message text from "QID:
%s" to "QIDs: %s" (preserving the qids variable and formatting) to improve
clarity.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@flask_app/app.py`:
- Around line 212-219: The docstring for update_wikidata_async is outdated and
contains typos; update it to describe that the function handles batch Wikidata
QIDs for any entity (not just person), fix the punctuation/whitespace (remove
the double comma and trailing space), clarify the input format ("qids" can be a
list or a comma/space-separated string of QIDs), and replace the return
description with the current behavior stating the endpoint triggers a Prefect
flow and returns a Prefect trigger payload with HTTP 202; keep all changes
inside the update_wikidata_async docstring.

---

Nitpick comments:
In `@flask_app/app.py`:
- Line 226: The log statement using log.info("QID: %s", qids) is misleading
because qids is a list; update the message to use the plural form (e.g., "QIDs")
so it accurately reflects the data. Locate the log.info call in flask_app/app.py
that references qids and change the message text from "QID: %s" to "QIDs: %s"
(preserving the qids variable and formatting) to improve clarity.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 11bed487-3786-4694-8a4b-c86e1db839c1

📥 Commits

Reviewing files that changed from the base of the PR and between 805a405 and 6e7c382.

📒 Files selected for processing (2)

flask_app/app.py
prefect_workflow/prefect_mardi_importer.py

timconrad

looks good

add functionality to prefect

805a405

LizzAlice requested a review from timconrad March 23, 2026 14:03

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

flask_app/app.py Show resolved Hide resolved

prefect_workflow/prefect_mardi_importer.py Show resolved Hide resolved

fix typos

6e7c382

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

timconrad approved these changes Mar 23, 2026

View reviewed changes

timconrad merged commit 2875be8 into main Mar 23, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add functionality to prefect#215

add functionality to prefect#215
timconrad merged 2 commits intomainfrom
make_update_async

LizzAlice commented Mar 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 23, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

timconrad left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LizzAlice commented Mar 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

timconrad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LizzAlice commented Mar 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 23, 2026 •

edited

Loading