Skip to content

feat: add toindefarray struct tag option for CBOR indefinite-length arrays#770

Merged
fxamacker merged 1 commit into
fxamacker:masterfrom
tdnguyenND:feature/toindefarray-v2
May 18, 2026
Merged

feat: add toindefarray struct tag option for CBOR indefinite-length arrays#770
fxamacker merged 1 commit into
fxamacker:masterfrom
tdnguyenND:feature/toindefarray-v2

Conversation

@tdnguyenND
Copy link
Copy Markdown
Contributor

@tdnguyenND tdnguyenND commented May 5, 2026

Description

Adds the toindefarray struct tag option, gated by a new EncOptions.ToIndefArrayStructTag field. When enabled, a tagged struct encodes as a CBOR indefinite-length array (head 0x9f, break 0xff) instead of a definite-length array. This was discussed and welcomed in issue #762.

Motivation: Cardano Plutus Data interoperability. The reference plutus-core data encoder emits Constr field lists as indefinite-length arrays for non-empty payloads, and consumers in that ecosystem accept only that exact byte layout.

Public API additions (encode-side only):

type ToIndefArrayStructTagMode int

const (
    ToIndefArrayStructTagForbidden ToIndefArrayStructTagMode = iota // default
    ToIndefArrayStructTagAllowed
)

// EncOptions adds:
//   ToIndefArrayStructTag ToIndefArrayStructTagMode

Struct usage:

type Constr0 struct {
    _      struct{} `cbor:",toindefarray"`
    Field1 uint64
    Field2 uint64
}
// With EncOptions.ToIndefArrayStructTag = ToIndefArrayStructTagAllowed:
//   Marshal(Constr0{1, 2})    => 9f 01 02 ff

Behavior contract:

  • Default ToIndefArrayStructTagForbidden rejects encoding a struct tagged with toindefarray and returns a clear error pointing at the option name. Existing users see no change.
  • Setting ToIndefArrayStructTag = ToIndefArrayStructTagAllowed together with IndefLength = IndefLengthForbidden is rejected at EncMode() time as a contradictory combination.
  • toarray and toindefarray on the same struct are mutually exclusive and rejected at type-cache time, on both encoder and decoder side, with the same error message.
  • The decoder is unchanged at the byte level: it accepts both definite- and indefinite-length arrays for toindefarray-tagged structs (same as it does for toarray).
  • An empty struct produces exactly two bytes: 0x9f 0xff.
  • The new option only governs the struct tag mechanism. Indefinite-length encoding via the streaming API ((*Encoder).StartIndefiniteArray) continues to be governed by IndefLength.

Files changed:

  • cache.gohasToIndefArrayOption, getEncodingStructToIndefArrayType, mutual-exclusion check on both encoder and decoder cache, route toindefarray through the existing array decode path.
  • encode.goToIndefArrayStructTagMode enum, EncOptions field, encMode field, validation in encMode() (including the cross-option check against IndefLengthForbidden), encodeStructToIndefArray with runtime gate, dispatch in getEncodeFuncInternal.
  • toindefarray_test.go — new tests (see below).
  • encode_test.go — extended TestEncOptions to skip ToIndefArrayStructTag for the non-zero invariant (its non-zero value conflicts with IndefLengthForbidden, mirroring the existing escape hatch for TagsMd).

Test coverage:

  • TestEncodeStructToIndefArrayBasic — header/break/roundtrip on a typical struct.
  • TestEncodeStructToIndefArrayEmpty — empty struct emits exactly 0x9f 0xff.
  • TestEncodeStructToIndefArrayNested — nested toindefarray structs roundtrip.
  • TestEncodeStructToIndefArrayEmbeddedField — fields promoted from an embedded struct are encoded in order.
  • TestEncodeStructToIndefArrayParentIndefChildArray — parent toindefarray containing child toarray produces nested 9f 82 ... ff.
  • TestEncodeStructToIndefArrayParentArrayChildIndef — parent toarray containing child toindefarray produces nested 82 9f ... ff ....
  • TestEncodeStructToIndefArrayForbiddenByDefault — default EncOptions{} rejects with an error mentioning ToIndefArrayStructTag.
  • TestEncodeStructToIndefArrayMutuallyExclusive — struct with both toarray and toindefarray is rejected.
  • TestToIndefArrayStructTagOptionsRoundTrip — non-zero value of the option round-trips through EncOptions -> EncMode -> EncOptions.
  • TestInvalidToIndefArrayStructTagMode — out-of-range mode values are rejected by EncMode().
  • TestToIndefArrayStructTagRejectsIndefLengthForbidden — combination with IndefLengthForbidden is rejected at EncMode() time.
  • TestEncodeStructToIndefArrayWithCBORTag — verifies a struct registered with a CBOR tag through TagSet and tagged toindefarray produces the tag header followed by an indefinite-length array.

PR Was Proposed and Welcomed in Currently Open Issue

Checklist (for code PR only, ignore for docs PR)

  • Include unit tests that cover the new code
  • Pass all unit tests (verified locally with go test ./... and go test -race -short ./...; coverage held at 97.0%)
  • Pass all lint checks in CI (goimports, gosec, staticcheck, etc.)
  • Sign each commit with your real name and email.
  • Certify the Developer's Certificate of Origin 1.1
    (see next section).

Certify the Developer's Certificate of Origin 1.1

  • By marking this item as completed, I certify
    the Developer Certificate of Origin 1.1.

@tdnguyenND tdnguyenND force-pushed the feature/toindefarray-v2 branch 3 times, most recently from 982d383 to e1e8080 Compare May 5, 2026 08:54
@tdnguyenND tdnguyenND marked this pull request as ready for review May 5, 2026 10:03
Copy link
Copy Markdown
Owner

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR with a detailed description and many tests.

I left some suggestions focused on reducing code duplication and preventing future code divergence/inconsistency.

The separate code path is a good approach to reduce coupling and it helps avoid modifying existing functions.

But in this case, the duplicate code is almost entirely the same. So I think we can refactor some existing functions to incorporate a few lines of differences, which will help us reuse code in a single path instead of maintaining mostly identical blocks used by two paths.

Comment thread cache.go Outdated
Comment thread cache.go Outdated
Comment thread cache.go Outdated
Comment thread encode.go Outdated
Comment thread encode.go Outdated
Comment thread toindefarray_test.go Outdated
Comment thread toindefarray_test.go Outdated
Comment thread toindefarray_test.go Outdated
Comment thread toindefarray_test.go Outdated
Comment thread toindefarray_test.go Outdated
@tdnguyenND tdnguyenND force-pushed the feature/toindefarray-v2 branch from e1e8080 to d589f29 Compare May 16, 2026 23:34
@tdnguyenND
Copy link
Copy Markdown
Contributor Author

Thanks for the careful review — you're right that the duplicate code was a maintenance trap waiting to happen. All suggestions applied in d589f29:

Refactor (code dedup)

  • arrayStructOptions(t, structOptions) helper extracts the mutual-exclusion check; reused by both getDecodingStructType and getEncodingStructType.
  • encodingStructType gains a toIndefArray bool field (set only when toArray is also true).
  • getEncodingStructToArrayType now takes toIndefArray bool; getEncodingStructToIndefArrayType is gone.
  • encodeStructToArray is the single encode function, branching on structType.toIndefArray for the gate check, head byte (0x9f vs definite head), and trailing break.
  • getEncodeFuncInternal dispatch collapsed to if hasToArrayOption(tag) || hasToIndefArrayOption(tag).

Tests

  • Full byte comparison replaces the first/last-byte checks (TestEncodeStructToIndefArrayBasic, TestEncodeStructToIndefArrayNested).
  • reflect.DeepEqual(in, out) replaces field-by-field roundtrip comparison.
  • TestToIndefArrayStructTagOptionsRoundTrip now compares the entire EncOptions value via DeepEqual.
  • All byte-layout comments now include EDN above the hex.

Net diff: -69 lines (the refactor removed more than it added). Coverage held at 97.2% (up from 97.0% before refactor — the merged path is now exercised by both toarray and toindefarray tests).

Copy link
Copy Markdown
Owner

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this PR. It looks good. I left some minor suggestions.

  • decouple encodingStructType.toArray and encodingStructType.toIndefArray
  • remove redundant hex comments in tests
  • add roundtrip decoding in new tests when they are feasible

BTW, there are some conflicts with main branch that need to be resolved.

Thanks again for opening this PR and the updates!

Comment thread cache.go
Comment on lines +397 to +399
fields: encFlds,
toArray: true,
toIndefArray: toIndefArray,
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add toArray function parameter to getEncodingStructToArrayType(), and only set toArray to true if it is actually specified by the user.

Suggested change
fields: encFlds,
toArray: true,
toIndefArray: toIndefArray,
fields: encFlds,
toArray: toArray,
toIndefArray: toIndefArray,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 988e0fdgetEncodingStructToArrayType() now takes toArray, toIndefArray bool and each flag reflects the user's literal struct tag input. Updated the field comments on encodingStructType accordingly. Note: isEmptyStruct at encode.go:2209 still uses structType.toArray only; under the new semantics a toindefarray-only struct now falls through to the map-shaped path there, which happens to give the same answer because array-shaped structs never populate omitEmptyFieldsIdx. Let me know if you'd prefer that check to be toArray || toIndefArray for explicitness.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! 👍 Yes, please update isEmptyStruct() to check for structType.toArray || structType.toIndefArray for the fast path.

Even though the current code produces the correct CBOR data, updating the check is more future-proof.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7b123cfisEmptyStruct() now checks structType.toArray || structType.toIndefArray for the fast path.

Comment thread toindefarray_test.go Outdated
Comment thread toindefarray_test.go Outdated
Comment thread toindefarray_test.go
Comment thread toindefarray_test.go Outdated
Comment thread toindefarray_test.go Outdated
Comment thread toindefarray_test.go Outdated
Comment thread toindefarray_test.go Outdated
Comment thread toindefarray_test.go Outdated
Comment thread cache.go Outdated
omitEmptyFieldsIdx []int // Only populated if toArray is false
err error
toArray bool
toIndefArray bool // Only set when toArray is also true.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can decouple encodingStructType.toArray and encodingStructType.toIndefArray here because these two options are mutually exclusive.

Decoupling will be more consistent with the user-specified struct tag options:

  • encodingStructType.toArray is true only if toarray is specified in struct tag;
  • encodingStructType.toIndefArray is true only if toindefarray is specified in struct tag.
Suggested change
toIndefArray bool // Only set when toArray is also true.
toIndefArray bool // Only set when toindefarray is specified.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decoupled in 988e0fd — the field comments now state toArray and toIndefArray independently reflect the user-specified option:\n\ngo\ntoArray bool // True iff the struct declares the `toarray` option\ntoIndefArray bool // True iff the struct declares the `toindefarray` option\n

@tdnguyenND tdnguyenND force-pushed the feature/toindefarray-v2 branch 2 times, most recently from 988e0fd to 851cf6c Compare May 18, 2026 00:06
@tdnguyenND
Copy link
Copy Markdown
Contributor Author

Rebased onto current master (851cf6c). One conflict in encode.go from the Go 1.24 modernization (#774) — kept the modernized for i := range flds and grafted the toIndefArray head/break branches around it. All tests pass; coverage at 97.2%.

Copy link
Copy Markdown
Owner

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for rebasing and updating the PR. I just left one comment to reply about the isEmptyStruct() check that you identified and suggested. 👍

Comment thread cache.go
Comment on lines +397 to +399
fields: encFlds,
toArray: true,
toIndefArray: toIndefArray,
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! 👍 Yes, please update isEmptyStruct() to check for structType.toArray || structType.toIndefArray for the fast path.

Even though the current code produces the correct CBOR data, updating the check is more future-proof.

Encodes a Go struct as a CBOR indefinite-length array (0x9f ... 0xff) when
tagged `cbor:",toindefarray"`. Gated by EncOptions.ToIndefArrayStructTag,
default ToIndefArrayStructTagForbidden. Mutually exclusive with `toarray`.
Decoder is unchanged. Motivated by Cardano Plutus Data interoperability;
proposed in fxamacker#762.

Signed-off-by: tdnguyenND <tdnguyen.uet@gmail.com>
@tdnguyenND tdnguyenND force-pushed the feature/toindefarray-v2 branch from 851cf6c to 7b123cf Compare May 18, 2026 02:01
Copy link
Copy Markdown
Owner

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 👍 Thanks for contributing this PR!

@fxamacker fxamacker merged commit 642d93f into fxamacker:master May 18, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants