Skip to content

Test updates#10

Open
josibake wants to merge 14 commits into
masterfrom
test-updates
Open

Test updates#10
josibake wants to merge 14 commits into
masterfrom
test-updates

Conversation

@josibake
Copy link
Copy Markdown
Collaborator

Just fixing a few things that have been annoying me. Main change is to remove amounts.

Add __lt__ operator to allow for sorting lists of ECPubKeys.
The comparison is done lexicographically.
This function is only needed for generating the test vectors.
Amount is not used for anything, so better to remove it.
To ensure determinism with the tests, the addresses are now sorted lexicographically
before generating outputs. This is necessary because the outputs will change based
on the order in which addresses for the same recipient are created.

For the receiver, this doesnt matter. This is only relevant for determinism in the
test vectors.
Run ./generate-test-vectors.py with all the new changes:

1. Remove amounts
2. Add new labels test case
3. Different outputs for tests with multiple outputs for the same recipient
@josibake josibake force-pushed the test-updates branch 2 times, most recently from f60e49b to d462ac2 Compare March 21, 2024 19:22
Comment thread reference.py Outdated
# same sender but with different labels. To account for this in testing, set `perumute=True`. This will generate
# all possible sets of outputs (based on all possible orderings of recipient addresses).
#
# NOTE: THIS IS A TEST ONLY FEATURE AND NOT RELEVANT FOR NORMAL SENDING/RECEIVING
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that's a reference implementation that's not optimal to include test only code.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an issue so long as its documented. I considered a few alternatives, but this seemed the simplest approach. Do you have another approach in mind?

@S3RK
Copy link
Copy Markdown
Owner

S3RK commented Apr 3, 2024

I don't understand the problem behind 55c3df6

Isn't the ordering of recipient addresses deterministic? They are listed in a particular order in the test.

Also, with the fix we have more expected outputs now in the test vectors. Do we expect to have all of them or some of them?

I'm very confused

@josibake
Copy link
Copy Markdown
Collaborator Author

josibake commented Apr 3, 2024

I don't understand the problem behind 55c3df6

The issue is that different orderings of recipient addresses will produce different output sets when sending to multiple labeled addresses for the same recipient in the same transaction (i.e. hash(shared_secret || k) + Bm results in different outputs based on the value of k and Bm).

For the sender, there is a requirement to group the addresses by Bspend which means it is very likely the a sending implementation will change the order of the addresses from how they are listed in the test. Furthermore, the sender should be able to change the order of addresses (so long as they are grouped correctly) since there is no order dependence in the protocol by design.

For testing sending and receiving this is not an issue since the receiver can always find their outputs for any valid output set the sender produces. However, if someone is implementing only sending, the tests need to provide a way for the sender to check if the outputs they have generated are valid. Previously, the tests were relying on a specific ordering (i.e. sorting by amount) so that the sender would always produce the same output set that could be checked against the test vectors. I ran into this issue while working on the libsecp256k1 module where the tests were passing until we added a heapsort step to group the addresses. heapsort is an unstable sorting algorithm (i.e. the relative order of equal elements is not preserved) so there was no way to rely on the ordering of the addresses in the test. We could have reordered the addresses in the test to match libsecp256k1s heapsort algorithm but that would have then caused implementations not using a heapsort or not sorting at all and using some other method for grouping to fail.

Also, with the fix we have more expected outputs now in the test vectors. Do we expect to have all of them or some of them?

Yep! This field now represents all possible valid address sets the sender can produce for the receiver. In reference.py, outputs are generated for all possible orderings of the recipient addresses. This means for any sending implementation, if they implement the protocol correctly the will match exactly one permutation (and ignore the rest of the outputs). Likewise, for the receiver, they will match exactly one permutation when scanning and ignore the rest.

@S3RK
Copy link
Copy Markdown
Owner

S3RK commented Apr 4, 2024

Thank you for detailed explanation. I think I fully understood the problem now.

For the sender, there is a requirement to group the addresses by Bspend which means it is very likely the a sending implementation will change the order of the addresses from how they are listed in the test.

I'd still say it's still a problem because output of sending phase could not match expected output from the test vectors.

Let me summarise the problem. The protocol doesn't prescribe deterministic result for a given input. Or to say it differently, for any input there is a number of results that are correct. This makes testing the protocol a bit tricky, because we can't just provide one result in the test vectors.

This means for any sending implementation, if they implement the protocol correctly the will match exactly one permutation (and ignore the rest of the outputs).

I think there is a problem here. How do they know if they fully matched a set and can safely ignore the rest?

Let's take a simple case when we have two outputs for the same address. There are two valid sets of two outputs. However the test vector has a flat list of four outputs. A wrong implementation can generate only one correct output and ignore the rest three. Not good.

One fix would be to provide a list<set<output>> of expected outputs, where each set in the list has to be matched fully. However, this is a bit complicated still as it breaks usual testing conventions. Normally one asserts equality of the results from your implementation agains expected result from the test vectors. It's definitely possible to implement the correct asserting semantics, but I'm not sure it's part of standard library and the implementor might need to understand how equality of containers (i.e. set<output>) is working in target language.

Alternatively, we can consider making the protocol deterministic. For example by defining ascending order of B_m in a grouped set. But that's adjusting protocol for testing purposes. I can see how this can be frowned upon, but testability is an important aspect of good design.

@josibake
Copy link
Copy Markdown
Collaborator Author

josibake commented Apr 4, 2024

I think there is a problem here. How do they know if they fully matched a set and can safely ignore the rest?

For every silent payment address provided, the sender must to generate a taproot scriptPubKey. From there, the sender needs to also check that each of their generated scriptPubKeys exists in the set of all possible scriptPubKeys that could have been generated. I don't think there's a problem here (aside from needing to update the reference implementation to explicitly check that len(generated_outputs) == len(sp_addresses).

Alternatively, we can consider making the protocol deterministic. For example by defining ascending order of B_m in a grouped set. But that's adjusting protocol for testing purposes. I can see how this can be frowned upon, but testability is an important aspect of good design.

I don't see the benefit here. We can fully test the protocol in the way I've described, so adding a sorting step to the protocol specifically for testing the sender side seems like extra complexity for no benefit? It would require the receiver to do a non-trivial sort as part of an already expensive scanning process.

Amount is not used for anything, so better to remove it.
To ensure determinism with the tests, the addresses are now sorted lexicographically
before generating outputs. This is necessary because the outputs will change based
on the order in which addresses for the same recipient are created.

For the receiver, this doesnt matter. This is only relevant for determinism in the
test vectors.
different orderings of the recipient addresses will produce different outputs in some scenarios.
for sending and receiving, this does not matter. however, for implementations only testing sending
e.g. testing that they can generate the correct outputs from an input set, this is fragile and leads
to confusion and wasted time. essentially, the tester needs to fiddle with the order of the recipient
addresses until the correct permutation is hit (when in reality any permutation is valid).
Run ./generate-test-vectors.py with all the new changes:

1. Remove amounts
2. All possible permutations in outputs for tests with multiple outputs
   for the same recipient
3. Make smallest outpoint an excluded input
@josibake
Copy link
Copy Markdown
Collaborator Author

@S3RK came back to this and cleaned some stuff up, mostly getting rid of duplicate code in the generate script. I also separated the generation code vs reference code by creating a copy of create_outputs which handles creating all possible output sets.

In addition, instead of having all output sets as one array, the sender now has an array of arrays of outputs: each possible "candidate set." This is less error prone than what I had before. For the receiver, I just use the first "candidate set" from the sender.

general refactor, namely:

* reduce reused code
* make test generation deterministic
* only generate multiple output sets for sender
* give each candidate set as its own result
* add new test case for lexicographic sort
* remove test related code from reference.py
Copy link
Copy Markdown
Owner

@S3RK S3RK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a pass and the changes look good.

It was a little bit hard to review big chunks of changes, so not sure I caught everything.

Need a little bit more time to verify new test vectors on my own secret implementation that I did from the bip without looking at your implementation.

Comment thread secp256k1.py
Comment thread reference.py


def decode_silent_payment_address(address: str, hrp: str = "tsp") -> Tuple[ECPubKey, ECPubKey]:
version, data = decode(hrp, address)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in a026b44

I think we should check a) that version is in the expected range b) fail if version is 31

Comment thread generate-test-vector.py Outdated
label_address_two = reference.create_labeled_silent_payment_address(scan1, Spend1, m=l2, hrp=HRP)
label_address_one2 = reference.create_labeled_silent_payment_address(scan2, Spend2, m=l1, hrp=HRP)
label_address_two2 = reference.create_labeled_silent_payment_address(scan2, Spend2, m=l2, hrp=HRP)
addresses1 = [label_address_one, address, address2, label_address_one, label_address_one2, label_address_one, label_address_two2, label_address_two, label_address_two, label_address_two2]
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in 404fe33

Is it on purpose you dropped this test?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to look again, but IIRC, this test wasn't adding any coverage by having 2 recipients, so was complicated in that it needed to two recipients but for no real benefit. Will take a look again to confirm, tho!

Comment thread generate-test-vector.py
# In[17]:


def generate_unknown_segwit_ver_test():
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in 404fe33

nit: should we still have a test for unknown segwit version?

Comment thread reference.py

# Check if the found output public keys match the expected output public keys
assert add_to_wallet == expected["outputs"], "Receiving test failed"
# Note: order doesn't matter for creating/finding the outputs. However, different orderings of the recipient addresses
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment here is irrelevant for the recipient part, no?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good call. This is a leftover from when I did have all possible output sets for the recipient. But now for the recipient there is just one of the possible output sets generated by the sender, so this comment is indeed irrelevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants