Fix #876: Add external_id destination field to CSV import (Endpoint and UI)#990
Fix #876: Add external_id destination field to CSV import (Endpoint and UI)#9900xRozier wants to merge 6 commits intowe-promise:mainfrom
external_id destination field to CSV import (Endpoint and UI)#990Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 📝 WalkthroughWalkthroughAdds optional CSV external ID support: new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant API as API/Controller
participant Import as Import Model
participant TxnImport as TransactionImport
participant DB as Database
User->>API: POST /api/v1/imports (CSV + external_id_col_label)
API->>Import: create import with external_id_col_label
Import->>Import: generate_rows_from_csv (extract external_id using label)
Import->>DB: save import_rows (include external_id)
DB-->>Import: rows persisted
Import->>TxnImport: process import rows
loop per row
alt row.external_id present
TxnImport->>DB: find transaction by (external_id, source: "csv_import")
alt found
DB-->>TxnImport: existing txn
TxnImport->>TxnImport: update fields, set import_locked
TxnImport->>DB: save updated txn
else not found
TxnImport->>TxnImport: attempt legacy dedupe via adapter
alt legacy match
TxnImport->>DB: update matched txn and backfill external_id
else no match
TxnImport->>DB: create txn with external_id, source "csv_import", import_locked
end
end
else no external_id
TxnImport->>TxnImport: legacy dedupe/create flow
end
end
DB-->>TxnImport: created/updated transactions
TxnImport-->>API: processing complete
API-->>User: return import result (includes external_id_col_label)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6ae4724766
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if row.external_id.present? | ||
| # External ID-based deduplication: find existing entry by external_id + source | ||
| existing_entry = mapped_account.entries.find_by(external_id: row.external_id, source: "csv_import") |
There was a problem hiding this comment.
Fall back to legacy dedupe when external_id is unseen
In TransactionImport#import!, rows with external_id now only try find_by(external_id:, source: "csv_import") and otherwise always create a new entry, skipping the previous date/amount/name duplicate detection path. This causes duplicate transactions when re-importing data that was imported before external IDs were mapped (those existing entries have external_id NULL), even though the old behavior would have updated the existing record; add a fallback to Account::ProviderImportAdapter#find_duplicate_transaction when no external-ID match is found.
Useful? React with 👍 / 👎.
| add_column :imports, :external_id_col_label, :string | ||
| add_column :import_rows, :external_id, :string |
There was a problem hiding this comment.
Commit updated schema for new external_id columns
This migration adds imports.external_id_col_label and import_rows.external_id, but db/schema.rb is not updated in the same change (it still reflects an older schema version), so environments that initialize from schema (fresh setup/CI paths using db:schema:load) will miss these columns and then fail when import code writes/reads them. Regenerate and commit db/schema.rb alongside this migration to keep schema-based bootstraps consistent.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@app/models/transaction_import.rb`:
- Around line 33-70: The external_id dedupe path currently only checks the DB
via mapped_account.entries.find_by and misses duplicates created earlier in the
same import batch, so add an in-memory set (e.g. seen_external_ids) and consult
it alongside claimed_entry_ids when handling rows with row.external_id: if
row.external_id is already in seen_external_ids or claimed_entry_ids treat the
row as a duplicate (update the previously created new_transaction/entry or skip
creating another one according to chosen semantics — implement "first row wins"
by skipping creation and optionally updating notes/category on the existing new
Transaction stored in new_transactions), and when you create a new Transaction
push the external_id into seen_external_ids; ensure this logic is applied in the
external_id branch around mapped_account.entries.find_by, and that
Transaction.import! and claimed_entry_ids behavior remains unchanged for legacy
path.
🧹 Nitpick comments (2)
test/controllers/api/v1/imports_controller_test.rb (1)
224-229: New test uses Doorkeeper/OAuth instead of the required ApiKey authentication pattern.The coding guidelines require Minitest API v1 tests to use
ApiKey.create!withapi_headers(api_key)returning{ "X-Api-Key" => api_key.display_key }, not DoorkeeperBearertokens. This applies to the entire file (pre-existing), but since you're adding a new test, consider aligning the auth approach.Suggested auth helper replacement
private - def valid_token_for(user) - application = Doorkeeper::Application.create!(name: "Test App", redirect_uri: "urn:ietf:wg:oauth:2.0:oob", scopes: "read read_write") - Doorkeeper::AccessToken.create!(application: application, resource_owner_id: user.id, scopes: "read read_write").token - end + def api_headers(api_key) + { "X-Api-Key" => api_key.display_key } + endThen update
setupto create an ApiKey and replaceheaders: { Authorization: "Bearer #{@token}" }withheaders: api_headers(@api_key)throughout.As per coding guidelines: "Minitest API endpoint tests must use
ApiKey.create!with read and read_write scopes andapi_headers(api_key)pattern returning{ \"X-Api-Key\" => api_key.display_key }. Do not use OAuth/Bearer authentication."test/models/transaction_import_test.rb (1)
398-441: Deduplication test looks solid, but consider adding a same-batch duplicate edge case.This test correctly validates that re-importing with the same
external_iddeduplicates against the existing entry and updates its attributes. However, there's no test for when two rows within the same CSV share the sameexternal_id— which currently would bypass deduplication (see model comment). Consider adding a test to cover that scenario and lock down the expected behavior.
…mport When two rows within the same CSV file share the same external_id, only the first row is imported (first row wins). This prevents duplicate entries that would bypass the DB-only external_id deduplication. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace OAuth/Bearer token authentication with ApiKey pattern using X-Api-Key header, aligning with the project's API testing conventions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The migration added external_id_col_label to imports and external_id to import_rows, but schema.rb was not updated. CI uses db:schema:load so the missing columns caused the DB setup step to fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a CSV row has an external_id but no existing entry matches by external_id (e.g. entries imported before external_id support), fall back to the legacy date/amount/name deduplication. If a legacy match is found, backfill the external_id and source on the existing entry instead of creating a duplicate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addressed review feedbackCodex P1: Fall back to legacy dedup when
|
There was a problem hiding this comment.
🧹 Nitpick comments (5)
app/models/transaction_import.rb (3)
160-164:external_idadded tocolumn_keysbut not tocsv_template.
column_keys(line 161) includes:external_id, enabling users to map it in the UI. Thecsv_template(lines 179-184) doesn't include it, which is fine since it's optional — but documenting this asymmetry with a comment would help.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/transaction_import.rb` around lines 160 - 164, column_keys now exposes :external_id for UI mapping while csv_template omits it (optional); add a brief comment near the csv_template method explaining that :external_id is intentionally excluded from the generated CSV template even though column_keys allows mapping, so maintainers understand the asymmetry—refer to the column_keys and csv_template methods and mention :external_id in the comment.
145-149: Consider wrapping individual entry saves to surface row-level errors.If any single
entry.save!orentry.transaction.save!fails, the entire import rolls back (it's inside atransactionblock on line 3), which is correct. But the error won't indicate which entry failed. For large imports this could make debugging painful.This is pre-existing behavior and not a regression, so just flagging as optional.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/transaction_import.rb` around lines 145 - 149, Wrap each entry save inside a begin/rescue that catches any exception from entry.transaction.save! or entry.save! and re-raises a new exception that includes identifying info (e.g. entry.id and entry.transaction.id or other unique attributes) and the original error message; update the loop over updated_entries to rescue errors per-entry and raise a descriptive error (preserving the original exception class/message) so the outer transaction still rolls back but logs which row failed.
58-100: Legacy fallback backfill path looks correct but doesn't update name/date/amount on the matched entry.When a legacy match is found (lines 69-81), only
external_id,source,import, andimport_lockedare backfilled. Unlike the external_id match path (lines 43-55), date/amount/name/currency are not updated.This is defensible since the legacy match was found by those fields. However, if
row.nameis blank,find_duplicate_transactionskips the name filter (seequery = query.where(name: name) if name.present?in the adapter), meaning the matched entry could have a different name that won't be updated. If that's intentional, a brief inline comment would help future readers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/models/transaction_import.rb` around lines 58 - 100, Update the legacy-match backfill to mirror the external_id path: when a legacy_match (found via Account::ProviderImportAdapter.find_duplicate_transaction) is used, set the matched entry/transaction's date, amount, name and currency from row.date_iso, row.signed_amount, row.name and effective_currency (but only when the incoming row values are present/meaningful, e.g. name.nonblank) before assigning external_id/source/import/import_locked and adding to updated_entries; alternatively, if leaving those fields intentionally unchanged, add a concise inline comment on legacy_match explaining why name/date/amount/currency are not updated to avoid future confusion.test/models/transaction_import_test.rb (2)
398-441: Consider adding assertion forsourcepersistence after re-import update.The test verifies
nameandimport_idare updated on the existing entry after re-import, but doesn't assert thatsourceremains"csv_import". While unlikely to regress (the model doesn't changesourcein the external_id match path), it would strengthen the test for completeness.Proposed addition
existing_entry.reload assert_equal "Coffee Shop Updated", existing_entry.name assert_equal `@import.id`, existing_entry.import_id + assert_equal "csv_import", existing_entry.source end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/models/transaction_import_test.rb` around lines 398 - 441, The test "deduplicates by external_id on re-import" should also assert that the existing entry's source remains "csv_import" after re-import; after calling `@import.publish` and reloading existing_entry, add an assertion checking existing_entry.source == "csv_import" (alongside the existing name and import_id assertions) to ensure the publish path that matches on external_id does not alter the Entry#source attribute.
443-479: Verify that the skipped duplicate row (row 2) did not create any side effects.The test asserts 2 entries and checks their names/external_ids, which confirms first-row-wins. For extra confidence, you could assert that no entry has the skipped row's name (
"Coffee Shop Updated") to ensure the second row was truly dropped rather than merged.Optional assertion
assert_equal "txn-uuid-unique", entries.second.external_id assert_equal "Grocery Store", entries.second.name + assert_equal 0, account.entries.where(name: "Coffee Shop Updated").count end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/models/transaction_import_test.rb` around lines 443 - 479, Update the test "deduplicates rows with same external_id within the same CSV batch" to assert the skipped duplicate row produced no side effects by verifying no Entry (and optionally no Transaction) was created with the skipped row's name ("Coffee Shop Updated"); after calling `@import.generate_rows_from_csv`, `@import.publish` and loading entries via `@import.entries` (or querying Entry/Transaction models), add an assertion such as assert_nil/ refute Entry.find_by(name: "Coffee Shop Updated") or refute Entry.exists?(name: "Coffee Shop Updated") to ensure the second row was dropped rather than merged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@app/models/transaction_import.rb`:
- Around line 160-164: column_keys now exposes :external_id for UI mapping while
csv_template omits it (optional); add a brief comment near the csv_template
method explaining that :external_id is intentionally excluded from the generated
CSV template even though column_keys allows mapping, so maintainers understand
the asymmetry—refer to the column_keys and csv_template methods and mention
:external_id in the comment.
- Around line 145-149: Wrap each entry save inside a begin/rescue that catches
any exception from entry.transaction.save! or entry.save! and re-raises a new
exception that includes identifying info (e.g. entry.id and entry.transaction.id
or other unique attributes) and the original error message; update the loop over
updated_entries to rescue errors per-entry and raise a descriptive error
(preserving the original exception class/message) so the outer transaction still
rolls back but logs which row failed.
- Around line 58-100: Update the legacy-match backfill to mirror the external_id
path: when a legacy_match (found via
Account::ProviderImportAdapter.find_duplicate_transaction) is used, set the
matched entry/transaction's date, amount, name and currency from row.date_iso,
row.signed_amount, row.name and effective_currency (but only when the incoming
row values are present/meaningful, e.g. name.nonblank) before assigning
external_id/source/import/import_locked and adding to updated_entries;
alternatively, if leaving those fields intentionally unchanged, add a concise
inline comment on legacy_match explaining why name/date/amount/currency are not
updated to avoid future confusion.
In `@test/models/transaction_import_test.rb`:
- Around line 398-441: The test "deduplicates by external_id on re-import"
should also assert that the existing entry's source remains "csv_import" after
re-import; after calling `@import.publish` and reloading existing_entry, add an
assertion checking existing_entry.source == "csv_import" (alongside the existing
name and import_id assertions) to ensure the publish path that matches on
external_id does not alter the Entry#source attribute.
- Around line 443-479: Update the test "deduplicates rows with same external_id
within the same CSV batch" to assert the skipped duplicate row produced no side
effects by verifying no Entry (and optionally no Transaction) was created with
the skipped row's name ("Coffee Shop Updated"); after calling
`@import.generate_rows_from_csv`, `@import.publish` and loading entries via
`@import.entries` (or querying Entry/Transaction models), add an assertion such as
assert_nil/ refute Entry.find_by(name: "Coffee Shop Updated") or refute
Entry.exists?(name: "Coffee Shop Updated") to ensure the second row was dropped
rather than merged.
|
Hi, @alessiocappa if you can take a look at this, it would be nice (take your time tho, there is no urge). |
Signed-off-by: 0xRozier <rozier.exe@gmail.com>
Closes #876
Summary
Adds support for CSV external ID mapping in Enable Banking transaction imports. This allows proper tracking and deduplication of transactions using the external identifier from the banking API.
Changes
external_idcolumn to imports tableTransactionImportto handle external ID deduplicationexternal_idin v1 imports endpointsTechnical Details
Testing
Files Changed
Summary by CodeRabbit
New Features
API
UI
Tests
Chores