Skip to content

add functionality to prefect#215

Merged
timconrad merged 2 commits intomainfrom
make_update_async
Mar 23, 2026
Merged

add functionality to prefect#215
timconrad merged 2 commits intomainfrom
make_update_async

Conversation

@LizzAlice
Copy link
Contributor

@LizzAlice LizzAlice commented Mar 23, 2026

Connect wikidata update functionality to prefect

Summary by CodeRabbit

  • New Features
    • The Wikidata update endpoint now accepts multiple QIDs (list or comma/space-separated) and processes updates asynchronously, returning job tracking details with HTTP 202 (Accepted).
  • Bug Fixes
    • Requests missing QIDs now return HTTP 400 with an error message.
    • Failures starting the background job return HTTP 500 with an error and details.

@LizzAlice LizzAlice requested a review from timconrad March 23, 2026 14:03
@coderabbitai
Copy link

coderabbitai bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

The POST /update/wikidata endpoint was refactored to queue an asynchronous Prefect workflow for batch QIDs. A new service trigger starts the Prefect deployment and returns 202; Prefect tasks perform per-QID updates and report individual outcomes.

Changes

Cohort / File(s) Summary
Flask Endpoint
flask_app/app.py
Renamed handler to update_wikidata_async(). Accepts qids (list or comma/space-separated string), validates presence (400 on missing), calls trigger_update_wikidata_async(), returns Prefect payload with HTTP 202, and returns HTTP 500 with error details on trigger failure.
Prefect Workflow
prefect_workflow/prefect_mardi_importer.py
Added update_wikidata_batch(qids) Prefect task that sets env vars, instantiates WikidataImporter, iterates QIDs calling update_entities(), captures per-QID status (success/not_updated/error) and aggregates all_imported flag; added elif action == "update/wikidata" branch to run this task.
Service Layer
services/import_service.py
Added trigger_update_wikidata_async(qids, workflow_name=DEFAULT_WORKFLOW_NAME) which calls Prefect run_deployment with {"action":"update/wikidata","qids": qids}, timeout=0, and returns an "accepted" payload containing deployment_id, id, flow_id, qids_queued, and message.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant Flask as Flask Endpoint
    participant Service as Import Service
    participant Prefect as Prefect Deployment
    participant Task as update_wikidata_batch Task
    participant Importer as WikidataImporter

    Client->>Flask: POST /update/wikidata {qids: [...]}
    Flask->>Flask: parse & validate qids
    Flask->>Service: trigger_update_wikidata_async(qids)
    Service->>Prefect: run_deployment(action: "update/wikidata", qids)
    Prefect-->>Service: {deployment_id, id, flow_id, ...}
    Service-->>Flask: accepted payload
    Flask-->>Client: 202 + Prefect payload

    Note over Prefect,Task: async background processing
    Prefect->>Task: invoke update_wikidata_batch(qids)
    Task->>Task: set env & instantiate Importer
    loop for each qid
        Task->>Importer: update_entities(qid)
        Importer-->>Task: result / error
        Task->>Task: record per-QID status
    end
    Task-->>Prefect: {results: {...}, all_imported: bool}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • fix import #194: Modifies the same Prefect workflow module; touches importer import paths that interact with the new task additions.
  • add doi asynch #193: Adds async import flows in the same Prefect workflow area; conceptually related to the new "update/wikidata" branch.
  • add wikidata update capability to the flask api #212: Previously changed the /update/wikidata Flask endpoint to a synchronous importer-based handler; this PR replaces that approach with an async Prefect-based flow.

Suggested reviewers

  • eloiferrer

Poem

🐰 Hopping code across the yard so spry,

QIDs queued up, they leap and fly,
Prefect wakes, the importer hums,
Background work now gently drums,
202 returns — off they go, bye-bye!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title is overly vague and generic; it uses non-descriptive language ('add functionality to prefect') that doesn't convey the specific change (connecting Wikidata updates to Prefect async workflows). Revise the title to be more specific, such as 'Make Wikidata updates async via Prefect workflow' or 'Connect Wikidata updates to Prefect async processing'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch make_update_async

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
services/import_service.py (1)

283-310: Docstring says "import" but function performs "update".

Line 287 states "Trigger a Prefect Wikidata import workflow" but this function triggers an update workflow. The docstring should be corrected to match the function's purpose.

Additionally, this function is nearly identical to trigger_wikidata_async (lines 254-281), differing only in the action parameter and response message. Consider extracting a shared helper to reduce duplication.

🔧 Proposed docstring fix
 def trigger_update_wikidata_async(
     qids: list[str],
     workflow_name: str = DEFAULT_WORKFLOW_NAME,
 ) -> dict:
-    """Trigger a Prefect Wikidata import workflow.
+    """Trigger a Prefect Wikidata update workflow.
 
     Args:
         qids: List of Wikidata QIDs to update.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/import_service.py` around lines 283 - 310, The docstring for
trigger_update_wikidata_async is incorrect and should say it triggers a Prefect
Wikidata update workflow; update the docstring text in
trigger_update_wikidata_async to reflect "update" rather than "import", and make
the function share logic with trigger_wikidata_async by extracting a helper
(e.g., _trigger_wikidata_deployment or run_wikidata_deployment) that accepts
parameters like action, qids, workflow_name and returns the same payload
structure so both trigger_update_wikidata_async and trigger_wikidata_async call
that helper (preserve unique fields like the response message by allowing a
message param).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@flask_app/app.py`:
- Around line 212-226: Fix the typos and inconsistent wording in
update_wikidata_async: correct the docstring to remove the stray backtick and
'a' so it reads that the JSON expects a "qids" field (and remove trailing extra
whitespace), and change the error response message from "missing qid" to
"missing qids" so it matches the normalized variable qids and the docstring.
Ensure the docstring clearly states qids can be a list or comma/space-separated
string.

In `@prefect_workflow/prefect_mardi_importer.py`:
- Around line 259-262: The failure check is looking for
result.get("all_imported", True) but update_wikidata_batch returns
"all_updated", so change the flow's check to use the correct key (e.g.,
result.get("all_updated", False)) or standardize to a common key (e.g., "all_ok"
or "success") across batch tasks; update the logic in the flow where result is
inspected (the block handling update_wikidata_batch and the similar checks at
lines referenced) so it reads the matching key returned by update_wikidata_batch
(or adapt update_wikidata_batch to return the standardized key) and ensure the
default value causes the flow to fail when the key is missing/false.

---

Nitpick comments:
In `@services/import_service.py`:
- Around line 283-310: The docstring for trigger_update_wikidata_async is
incorrect and should say it triggers a Prefect Wikidata update workflow; update
the docstring text in trigger_update_wikidata_async to reflect "update" rather
than "import", and make the function share logic with trigger_wikidata_async by
extracting a helper (e.g., _trigger_wikidata_deployment or
run_wikidata_deployment) that accepts parameters like action, qids,
workflow_name and returns the same payload structure so both
trigger_update_wikidata_async and trigger_wikidata_async call that helper
(preserve unique fields like the response message by allowing a message param).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6e17f789-863a-4e29-a193-6a40c89a8ca0

📥 Commits

Reviewing files that changed from the base of the PR and between fcf876a and 805a405.

📒 Files selected for processing (3)
  • flask_app/app.py
  • prefect_workflow/prefect_mardi_importer.py
  • services/import_service.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
flask_app/app.py (1)

212-219: ⚠️ Potential issue | 🟡 Minor

Docstring is outdated and contains typos.

  1. Line 213: "Update person profile" is too narrow – this now handles batch QIDs for any Wikidata entity
  2. Line 215: Double comma ", ," before "which"
  3. Line 216: Trailing whitespace
  4. Lines 218-219: Return description mentions "QID that was updated or an empty list" but the endpoint now returns a Prefect trigger payload with HTTP 202

,

✏️ Proposed fix
 `@app.post`("/update/wikidata")
 def update_wikidata_async():
-    """Update person profile from wikidata; this is async and happens in Prefect
+    """Update Wikidata entities asynchronously via Prefect.
 
-    Expects JSON with a ``qids`` field, , which may be a list or a string of
-    comma/space-separated Wikidata QIDs. 
+    Expects JSON with a ``qids`` field, which may be a list or a string of
+    comma/space-separated Wikidata QIDs.
 
     Returns:
-        Response, either the QID that was updated or an empty list
+        Flask response tuple with Prefect flow metadata and HTTP 202.
     """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flask_app/app.py` around lines 212 - 219, The docstring for
update_wikidata_async is outdated and contains typos; update it to describe that
the function handles batch Wikidata QIDs for any entity (not just person), fix
the punctuation/whitespace (remove the double comma and trailing space), clarify
the input format ("qids" can be a list or a comma/space-separated string of
QIDs), and replace the return description with the current behavior stating the
endpoint triggers a Prefect flow and returns a Prefect trigger payload with HTTP
202; keep all changes inside the update_wikidata_async docstring.
🧹 Nitpick comments (1)
flask_app/app.py (1)

226-226: Minor: Log message uses singular "QID" for a list.

For consistency and clarity, consider using "QIDs" (plural) in the log message.

✏️ Proposed fix
-    log.info("QID: %s", qids)
+    log.info("QIDs: %s", qids)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flask_app/app.py` at line 226, The log statement using log.info("QID: %s",
qids) is misleading because qids is a list; update the message to use the plural
form (e.g., "QIDs") so it accurately reflects the data. Locate the log.info call
in flask_app/app.py that references qids and change the message text from "QID:
%s" to "QIDs: %s" (preserving the qids variable and formatting) to improve
clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@flask_app/app.py`:
- Around line 212-219: The docstring for update_wikidata_async is outdated and
contains typos; update it to describe that the function handles batch Wikidata
QIDs for any entity (not just person), fix the punctuation/whitespace (remove
the double comma and trailing space), clarify the input format ("qids" can be a
list or a comma/space-separated string of QIDs), and replace the return
description with the current behavior stating the endpoint triggers a Prefect
flow and returns a Prefect trigger payload with HTTP 202; keep all changes
inside the update_wikidata_async docstring.

---

Nitpick comments:
In `@flask_app/app.py`:
- Line 226: The log statement using log.info("QID: %s", qids) is misleading
because qids is a list; update the message to use the plural form (e.g., "QIDs")
so it accurately reflects the data. Locate the log.info call in flask_app/app.py
that references qids and change the message text from "QID: %s" to "QIDs: %s"
(preserving the qids variable and formatting) to improve clarity.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 11bed487-3786-4694-8a4b-c86e1db839c1

📥 Commits

Reviewing files that changed from the base of the PR and between 805a405 and 6e7c382.

📒 Files selected for processing (2)
  • flask_app/app.py
  • prefect_workflow/prefect_mardi_importer.py

Copy link
Contributor

@timconrad timconrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@timconrad timconrad merged commit 2875be8 into main Mar 23, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants