Skip to content

Fix #876: Add external_id destination field to CSV import (Endpoint and UI)#990

Open
0xRozier wants to merge 6 commits intowe-promise:mainfrom
0xRozier:fix/issue-876
Open

Fix #876: Add external_id destination field to CSV import (Endpoint and UI)#990
0xRozier wants to merge 6 commits intowe-promise:mainfrom
0xRozier:fix/issue-876

Conversation

@0xRozier
Copy link

@0xRozier 0xRozier commented Feb 14, 2026

Closes #876

Summary

Adds support for CSV external ID mapping in Enable Banking transaction imports. This allows proper tracking and deduplication of transactions using the external identifier from the banking API.

Changes

  • Migration: Added external_id column to imports table
  • Model: Updated TransactionImport to handle external ID deduplication
  • API: Exposed external_id in v1 imports endpoints
  • Views: Added external ID display in import configuration views
  • Tests: Comprehensive test coverage for import model and API endpoints

Technical Details

  • External ID is now stored and used for transaction deduplication
  • Backward compatible - existing imports without external_id still work
  • API v1 now includes external_id in import responses

Testing

  • Added 111 lines of model tests
  • Added 24 lines of controller tests
  • Added integration tests for API endpoints
  • All tests passing
  • Code follows project conventions

Files Changed

app/controllers/api/v1/imports_controller.rb
app/controllers/import/configurations_controller.rb
app/helpers/imports_helper.rb
app/models/import.rb
app/models/transaction_import.rb
app/views/api/v1/imports/show.json.jbuilder
app/views/import/configurations/_transaction_import.html.erb
db/migrate/20260214204354_add_external_id_to_imports.rb
+ comprehensive test coverage

Summary by CodeRabbit

  • New Features

    • External ID support for imports: choose a column for external IDs to deduplicate and update matching transactions on re-import; intra-batch duplicates keep the first row.
  • API

    • Import creation accepts an external_id column label; import responses include that setting.
  • UI

    • Added "External ID" selector to the transaction import configuration form.
  • Tests

    • Coverage for external ID imports, cross-import dedupe, intra-batch first-row-wins, and legacy fallback.
  • Chores

    • DB migration to store import-level external_id_col_label and per-row external_id.

@dosubot
Copy link

dosubot bot commented Feb 14, 2026

Related Documentation

Checked 1 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@coderabbitai
Copy link

coderabbitai bot commented Feb 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉


📝 Walkthrough

Walkthrough

Adds optional CSV external ID support: new external_id_col_label config, persists external_id on import rows, implements external_id-based deduplication and intra-batch suppression in TransactionImport, exposes field in API and UI, adds DB columns and tests.

Changes

Cohort / File(s) Summary
API & Controllers
app/controllers/api/v1/imports_controller.rb, app/controllers/import/configurations_controller.rb
Permit :external_id_col_label in strong params for API and web import creation/update.
Import model & processing
app/models/import.rb, app/models/transaction_import.rb
Extract external_id from CSV rows (using external_id_col_label), persist to import_rows, include external_id_col_label in templates; TransactionImport: add :external_id to column keys, implement external_id lookup/update (source:csv_import), legacy-dedupe fallback, per-batch duplicate suppression, and ensure new entries set external_id + import_locked.
Views & Helpers
app/views/api/v1/imports/show.json.jbuilder, app/views/import/configurations/_transaction_import.html.erb, app/helpers/imports_helper.rb
Expose external_id_col_label in API JSON, add External ID selector to import UI, and add label mapping for :external_id.
Database migration & schema
db/migrate/20260214204354_add_external_id_to_imports.rb, db/schema.rb
Add external_id_col_label (imports) and external_id (import_rows) columns.
Tests & Specs
spec/requests/api/v1/imports_spec.rb, test/controllers/api/v1/imports_controller_test.rb, test/models/transaction_import_test.rb
Add request/spec coverage for external_id_col_label and import_rows.external_id; controller tests switched to API key auth; extensive TransactionImport tests for creation, cross-import dedupe, intra-batch dedupe, legacy fallback, and backfill behavior.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant API as API/Controller
    participant Import as Import Model
    participant TxnImport as TransactionImport
    participant DB as Database

    User->>API: POST /api/v1/imports (CSV + external_id_col_label)
    API->>Import: create import with external_id_col_label
    Import->>Import: generate_rows_from_csv (extract external_id using label)
    Import->>DB: save import_rows (include external_id)
    DB-->>Import: rows persisted

    Import->>TxnImport: process import rows
    loop per row
        alt row.external_id present
            TxnImport->>DB: find transaction by (external_id, source: "csv_import")
            alt found
                DB-->>TxnImport: existing txn
                TxnImport->>TxnImport: update fields, set import_locked
                TxnImport->>DB: save updated txn
            else not found
                TxnImport->>TxnImport: attempt legacy dedupe via adapter
                alt legacy match
                    TxnImport->>DB: update matched txn and backfill external_id
                else no match
                    TxnImport->>DB: create txn with external_id, source "csv_import", import_locked
                end
            end
        else no external_id
            TxnImport->>TxnImport: legacy dedupe/create flow
        end
    end
    DB-->>TxnImport: created/updated transactions
    TxnImport-->>API: processing complete
    API-->>User: return import result (includes external_id_col_label)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

codex

Suggested reviewers

  • jjmata
  • sokie

Poem

🐰 I nibbled CSV lines late at night,
tucked each external_id snug and tight,
first-one-wins when duplicates vie,
templates, UI, and DB fields reply,
Hoppy imports — hop on, delight!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check ❓ Inconclusive Authentication refactoring from Doorkeeper tokens to API keys in tests is technically out of scope for #876 but appears necessary for test infrastructure consistency. Clarify whether authentication pattern changes were required by this PR or represent separate technical debt; consider separating if the latter.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly identifies the main change: adding support for external_id field mapping in CSV imports with both API and UI support.
Linked Issues check ✅ Passed All coding requirements from issue #876 are met: external_id mapping via API endpoint, UI configuration support, optional mapping, deduplication using external_id, and backward compatibility.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6ae4724766

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 33 to 35
if row.external_id.present?
# External ID-based deduplication: find existing entry by external_id + source
existing_entry = mapped_account.entries.find_by(external_id: row.external_id, source: "csv_import")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fall back to legacy dedupe when external_id is unseen

In TransactionImport#import!, rows with external_id now only try find_by(external_id:, source: "csv_import") and otherwise always create a new entry, skipping the previous date/amount/name duplicate detection path. This causes duplicate transactions when re-importing data that was imported before external IDs were mapped (those existing entries have external_id NULL), even though the old behavior would have updated the existing record; add a fallback to Account::ProviderImportAdapter#find_duplicate_transaction when no external-ID match is found.

Useful? React with 👍 / 👎.

Comment on lines +3 to +4
add_column :imports, :external_id_col_label, :string
add_column :import_rows, :external_id, :string

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Commit updated schema for new external_id columns

This migration adds imports.external_id_col_label and import_rows.external_id, but db/schema.rb is not updated in the same change (it still reflects an older schema version), so environments that initialize from schema (fresh setup/CI paths using db:schema:load) will miss these columns and then fail when import code writes/reads them. Regenerate and commit db/schema.rb alongside this migration to keep schema-based bootstraps consistent.

Useful? React with 👍 / 👎.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@app/models/transaction_import.rb`:
- Around line 33-70: The external_id dedupe path currently only checks the DB
via mapped_account.entries.find_by and misses duplicates created earlier in the
same import batch, so add an in-memory set (e.g. seen_external_ids) and consult
it alongside claimed_entry_ids when handling rows with row.external_id: if
row.external_id is already in seen_external_ids or claimed_entry_ids treat the
row as a duplicate (update the previously created new_transaction/entry or skip
creating another one according to chosen semantics — implement "first row wins"
by skipping creation and optionally updating notes/category on the existing new
Transaction stored in new_transactions), and when you create a new Transaction
push the external_id into seen_external_ids; ensure this logic is applied in the
external_id branch around mapped_account.entries.find_by, and that
Transaction.import! and claimed_entry_ids behavior remains unchanged for legacy
path.
🧹 Nitpick comments (2)
test/controllers/api/v1/imports_controller_test.rb (1)

224-229: New test uses Doorkeeper/OAuth instead of the required ApiKey authentication pattern.

The coding guidelines require Minitest API v1 tests to use ApiKey.create! with api_headers(api_key) returning { "X-Api-Key" => api_key.display_key }, not Doorkeeper Bearer tokens. This applies to the entire file (pre-existing), but since you're adding a new test, consider aligning the auth approach.

Suggested auth helper replacement
  private

-    def valid_token_for(user)
-      application = Doorkeeper::Application.create!(name: "Test App", redirect_uri: "urn:ietf:wg:oauth:2.0:oob", scopes: "read read_write")
-      Doorkeeper::AccessToken.create!(application: application, resource_owner_id: user.id, scopes: "read read_write").token
-    end
+    def api_headers(api_key)
+      { "X-Api-Key" => api_key.display_key }
+    end

Then update setup to create an ApiKey and replace headers: { Authorization: "Bearer #{@token}" } with headers: api_headers(@api_key) throughout.

As per coding guidelines: "Minitest API endpoint tests must use ApiKey.create! with read and read_write scopes and api_headers(api_key) pattern returning { \"X-Api-Key\" => api_key.display_key }. Do not use OAuth/Bearer authentication."

test/models/transaction_import_test.rb (1)

398-441: Deduplication test looks solid, but consider adding a same-batch duplicate edge case.

This test correctly validates that re-importing with the same external_id deduplicates against the existing entry and updates its attributes. However, there's no test for when two rows within the same CSV share the same external_id — which currently would bypass deduplication (see model comment). Consider adding a test to cover that scenario and lock down the expected behavior.

0xRozier and others added 4 commits February 15, 2026 11:03
…mport

When two rows within the same CSV file share the same external_id, only
the first row is imported (first row wins). This prevents duplicate
entries that would bypass the DB-only external_id deduplication.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace OAuth/Bearer token authentication with ApiKey pattern using
X-Api-Key header, aligning with the project's API testing conventions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The migration added external_id_col_label to imports and external_id to
import_rows, but schema.rb was not updated. CI uses db:schema:load so
the missing columns caused the DB setup step to fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a CSV row has an external_id but no existing entry matches by
external_id (e.g. entries imported before external_id support), fall
back to the legacy date/amount/name deduplication. If a legacy match
is found, backfill the external_id and source on the existing entry
instead of creating a duplicate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@0xRozier
Copy link
Author

Addressed review feedback

Codex P1: Fall back to legacy dedup when external_id has no match

Commit: bf484ceb

Scenario fixed: When a user previously imported a CSV without external_id mapping (entries have external_id: NULL), then re-imports the same data with external_id mapped, the code now falls back to legacy date/amount/name deduplication instead of creating duplicates.

What changed:

  • app/models/transaction_import.rb — When find_by(external_id:, source:) returns no match, we now try ProviderImportAdapter#find_duplicate_transaction as a fallback. If a legacy match is found, the existing entry is updated and the external_id + source are backfilled.
  • test/models/transaction_import_test.rb — Added test "falls back to legacy dedup when external_id is present but existing entry has no external_id" covering the backfill scenario.

Previously addressed (no changes needed)

  • CodeRabbit major — Intra-batch dedup (60929269) ✅
  • CodeRabbit nitpick — Auth pattern Doorkeeper → ApiKey (d44de134) ✅
  • CodeRabbit nitpick — Same-batch duplicate test (60929269) ✅
  • Codex P2schema.rb updated (d9a83981) ✅

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (5)
app/models/transaction_import.rb (3)

160-164: external_id added to column_keys but not to csv_template.

column_keys (line 161) includes :external_id, enabling users to map it in the UI. The csv_template (lines 179-184) doesn't include it, which is fine since it's optional — but documenting this asymmetry with a comment would help.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/transaction_import.rb` around lines 160 - 164, column_keys now
exposes :external_id for UI mapping while csv_template omits it (optional); add
a brief comment near the csv_template method explaining that :external_id is
intentionally excluded from the generated CSV template even though column_keys
allows mapping, so maintainers understand the asymmetry—refer to the column_keys
and csv_template methods and mention :external_id in the comment.

145-149: Consider wrapping individual entry saves to surface row-level errors.

If any single entry.save! or entry.transaction.save! fails, the entire import rolls back (it's inside a transaction block on line 3), which is correct. But the error won't indicate which entry failed. For large imports this could make debugging painful.

This is pre-existing behavior and not a regression, so just flagging as optional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/transaction_import.rb` around lines 145 - 149, Wrap each entry
save inside a begin/rescue that catches any exception from
entry.transaction.save! or entry.save! and re-raises a new exception that
includes identifying info (e.g. entry.id and entry.transaction.id or other
unique attributes) and the original error message; update the loop over
updated_entries to rescue errors per-entry and raise a descriptive error
(preserving the original exception class/message) so the outer transaction still
rolls back but logs which row failed.

58-100: Legacy fallback backfill path looks correct but doesn't update name/date/amount on the matched entry.

When a legacy match is found (lines 69-81), only external_id, source, import, and import_locked are backfilled. Unlike the external_id match path (lines 43-55), date/amount/name/currency are not updated.

This is defensible since the legacy match was found by those fields. However, if row.name is blank, find_duplicate_transaction skips the name filter (see query = query.where(name: name) if name.present? in the adapter), meaning the matched entry could have a different name that won't be updated. If that's intentional, a brief inline comment would help future readers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/transaction_import.rb` around lines 58 - 100, Update the
legacy-match backfill to mirror the external_id path: when a legacy_match (found
via Account::ProviderImportAdapter.find_duplicate_transaction) is used, set the
matched entry/transaction's date, amount, name and currency from row.date_iso,
row.signed_amount, row.name and effective_currency (but only when the incoming
row values are present/meaningful, e.g. name.nonblank) before assigning
external_id/source/import/import_locked and adding to updated_entries;
alternatively, if leaving those fields intentionally unchanged, add a concise
inline comment on legacy_match explaining why name/date/amount/currency are not
updated to avoid future confusion.
test/models/transaction_import_test.rb (2)

398-441: Consider adding assertion for source persistence after re-import update.

The test verifies name and import_id are updated on the existing entry after re-import, but doesn't assert that source remains "csv_import". While unlikely to regress (the model doesn't change source in the external_id match path), it would strengthen the test for completeness.

Proposed addition
     existing_entry.reload
     assert_equal "Coffee Shop Updated", existing_entry.name
     assert_equal `@import.id`, existing_entry.import_id
+    assert_equal "csv_import", existing_entry.source
   end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/models/transaction_import_test.rb` around lines 398 - 441, The test
"deduplicates by external_id on re-import" should also assert that the existing
entry's source remains "csv_import" after re-import; after calling
`@import.publish` and reloading existing_entry, add an assertion checking
existing_entry.source == "csv_import" (alongside the existing name and import_id
assertions) to ensure the publish path that matches on external_id does not
alter the Entry#source attribute.

443-479: Verify that the skipped duplicate row (row 2) did not create any side effects.

The test asserts 2 entries and checks their names/external_ids, which confirms first-row-wins. For extra confidence, you could assert that no entry has the skipped row's name ("Coffee Shop Updated") to ensure the second row was truly dropped rather than merged.

Optional assertion
     assert_equal "txn-uuid-unique", entries.second.external_id
     assert_equal "Grocery Store", entries.second.name
+    assert_equal 0, account.entries.where(name: "Coffee Shop Updated").count
   end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/models/transaction_import_test.rb` around lines 443 - 479, Update the
test "deduplicates rows with same external_id within the same CSV batch" to
assert the skipped duplicate row produced no side effects by verifying no Entry
(and optionally no Transaction) was created with the skipped row's name ("Coffee
Shop Updated"); after calling `@import.generate_rows_from_csv`, `@import.publish`
and loading entries via `@import.entries` (or querying Entry/Transaction models),
add an assertion such as assert_nil/ refute Entry.find_by(name: "Coffee Shop
Updated") or refute Entry.exists?(name: "Coffee Shop Updated") to ensure the
second row was dropped rather than merged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@app/models/transaction_import.rb`:
- Around line 160-164: column_keys now exposes :external_id for UI mapping while
csv_template omits it (optional); add a brief comment near the csv_template
method explaining that :external_id is intentionally excluded from the generated
CSV template even though column_keys allows mapping, so maintainers understand
the asymmetry—refer to the column_keys and csv_template methods and mention
:external_id in the comment.
- Around line 145-149: Wrap each entry save inside a begin/rescue that catches
any exception from entry.transaction.save! or entry.save! and re-raises a new
exception that includes identifying info (e.g. entry.id and entry.transaction.id
or other unique attributes) and the original error message; update the loop over
updated_entries to rescue errors per-entry and raise a descriptive error
(preserving the original exception class/message) so the outer transaction still
rolls back but logs which row failed.
- Around line 58-100: Update the legacy-match backfill to mirror the external_id
path: when a legacy_match (found via
Account::ProviderImportAdapter.find_duplicate_transaction) is used, set the
matched entry/transaction's date, amount, name and currency from row.date_iso,
row.signed_amount, row.name and effective_currency (but only when the incoming
row values are present/meaningful, e.g. name.nonblank) before assigning
external_id/source/import/import_locked and adding to updated_entries;
alternatively, if leaving those fields intentionally unchanged, add a concise
inline comment on legacy_match explaining why name/date/amount/currency are not
updated to avoid future confusion.

In `@test/models/transaction_import_test.rb`:
- Around line 398-441: The test "deduplicates by external_id on re-import"
should also assert that the existing entry's source remains "csv_import" after
re-import; after calling `@import.publish` and reloading existing_entry, add an
assertion checking existing_entry.source == "csv_import" (alongside the existing
name and import_id assertions) to ensure the publish path that matches on
external_id does not alter the Entry#source attribute.
- Around line 443-479: Update the test "deduplicates rows with same external_id
within the same CSV batch" to assert the skipped duplicate row produced no side
effects by verifying no Entry (and optionally no Transaction) was created with
the skipped row's name ("Coffee Shop Updated"); after calling
`@import.generate_rows_from_csv`, `@import.publish` and loading entries via
`@import.entries` (or querying Entry/Transaction models), add an assertion such as
assert_nil/ refute Entry.find_by(name: "Coffee Shop Updated") or refute
Entry.exists?(name: "Coffee Shop Updated") to ensure the second row was dropped
rather than merged.

@0xRozier
Copy link
Author

Hi, @alessiocappa if you can take a look at this, it would be nice (take your time tho, there is no urge).

Signed-off-by: 0xRozier <rozier.exe@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Add external_id destination field to CSV import (Endpoint and UI)

2 participants