Skip to content

Conversation

milancurcic
Copy link
Member

Alternative to #236.

Thank you for demoing Julienne, Damian, and for the inspiration. I've thought about the minimal added code needed for NF to make more concise tests without an additional dependency. This PR shows it on two test suites, test_dense_layer and test_dense_network.

In summary:

  • Adds the tuff module in tuff.f90 (72 LOC), which provides test and test_result.
  • test can take a logical expression or a user-provided function that is test_result-typed.
  • test can act as a suite driver for a mix of logical-expr tests and user-function tests.
  • That's it.

Obviously less powerful than Julienne but at the same time much simpler to use (for me).

If you notice any serious caveats, let me know. In my view, nothing more is needed for NF and I'd prefer a simpler approach than bringing in an external dependency.

@milancurcic milancurcic requested review from jvdp1 and rouson October 13, 2025 16:31
@rouson
Copy link
Collaborator

rouson commented Oct 13, 2025

@milancurcic your plan is a great way to start. Not coincidentally, what you've written is almost exactly where Julienne started! :) Subtracting blank lines, the genesis of Julienne was an 87-line utility first released in the Sourcery 3.1.0 library. I'll add a subsequent comment here to list the additional capabilities that you'll probably ultimately find useful, whether by adding them to your utility or by revisiting Julienne.

@rouson
Copy link
Collaborator

rouson commented Oct 14, 2025

Over time, the aforementioned 87-line utility expanded to become Julienne in order to support

  1. Running test subsets that have descriptions containing a substring provided on the command line.
  2. Automatically generating diagnostic messages, including data, about test failures. (Writing conditional logic and printing diagnostics in each test function quickly becomes tedious and verbose.)
  3. Multi-image testing that reports a pass only if a given test passes on every image.
  4. Skipping tests (helpful if a given test crashes the test suite with a specific compiler). This adds a third test outcome to the otherwise binary pass/fail state captured by a logical outcome indicator.
  5. Offloading and centralizing a lot of the testing logic.
  6. A tally of the overall test-suite outcomes (passes, fails, and skips if any). That saves a lot of time over scanning the full list, which will become especially important once the list is too long to likely fit on a single screen.

Regarding item 5, there's a lot of logic hidden in x .approximates. y .within. tolerance. For example, if x and y are arrays, you only want diagnostics about the specific elements that are out of tolerance. Writing such logic with nested loops and if/then/print every time you need it gets tedious. The above Julienne expression handles all that logic and output for you.

@milancurcic
Copy link
Member Author

Thanks, @rouson, point 2 especially I see as being important here. We can consider this PR as a step stone toward Julienne, given that mainly only scaffolding will need to change if I choose that move.

Copy link
Collaborator

@rouson rouson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source code changes LGTM.

With GCC 15.1.0 installed and in my PATH, I tried fpm test --profile release yielded the output below, which reports a runtime error after reporting many tests passing:

test_dense_layer: All tests passed.
test_optimizers: All tests passed.
test_multihead_attention_layer: All tests passed.
test_input1d_layer: All tests passed.
test_parametric_activation: All tests passed.
test_conv2d_layer: All tests passed.
test_maxpool2d_layer: All tests passed.
test_dropout_layer: All tests passed.
test_input3d_layer: All tests passed.
test_flatten_layer: All tests passed.
test_reshape_layer: All tests passed.
test_conv2d_network: All tests passed.
test_reshape2d_layer: All tests passed.
test_metrics: All tests passed.
test_loss: All tests passed.
incorrect updated weights.. failed
test_linear2d_layer: One or more tests failed.
STOP 1
test_conv1d_network: All tests passed.
test_layernorm_layer: All tests passed.
test_locally_connected2d_layer: All tests passed.
test_get_set_network_params: All tests passed.
test_embedding_layer: All tests passed.
test_conv1d_layer: All tests passed.
test_maxpool1d_layer: All tests passed.
test_dense_network: All tests passed.
test_insert_flatten: All tests passed.
test_input2d_layer: All tests passed.
<ERROR> Execution for object " test_linear2d_layer " returned exit code  1
<ERROR> *cmd_run*:stopping due to failed executions
STOP 1

@rouson
Copy link
Collaborator

rouson commented Oct 14, 2025

@milancurcic I see now that the above output is the intended behavior for a failing test.

Here are few more thoughts on what would likely save time for an. unfamiliar user or new developer in interpreting the test output:

I think the compiler output stopping due to failed executions could be confusing . At first, failed executions suggested to me that a test crashed or somehow terminated early. I therefore inserted my own stops in the program at various intermediate points before ultimately concluding that the program was actually running all the way to what is essentially its end. It would be really useful to have a formatted summary, including pass/fail tallies, at the very end of the entire test-suite execution. That's the motivation for having just one main program and putting the actual tests in modules. I'm adding this as item 6 in the list that I wrote in my previous comment.

Also, I suggest indenting the text incorrect updated weights.. failed to clear that it's information that is grouped with one particular test. I also have a slight preference for that information to come after the summary for that corresponding test (i.e., after test_linear2d_layer: One or more tests failed.).

@milancurcic
Copy link
Member Author

The rationale for stop 1 at the end of a failing program is to set the non-zero exit code. How else would you do this without stop? Or do you suggest a failing suite should exit with 0?

@rouson
Copy link
Collaborator

rouson commented Oct 14, 2025

@milancurcic good question. The final three lines in Julienne output are equivalent to the final three lines in your test suite output. The biggest difference is that just above those lines is a global tally of test passes. Seeing the global tally is an indication that the test suite ran to completion, which took me quite a while to figure out with the neural-fortran test suite.

Here's the trailing output when a Julienne test fails:

_____ 71 of 72 tests passed. 0 tests were skipped _____

Fortran ERROR STOP: Some tests failed.
IEEE arithmetic exceptions signaled: INEXACT
<ERROR> Execution for object " driver " returned exit code  1
<ERROR> *cmd_run*:stopping due to failed executions
STOP 1

Less of a big deal is that Julienne uses error termination (which matters for multi-image runs) with a character stop code: error stop "Some tests failed". The character stop code is at least a bit more descriptive because program termination might have been initiated for some reason other than a test failure. But I think the most important difference is the global tally.

@milancurcic
Copy link
Member Author

tuff (this PR) does the tally on failure as well, we only don't see them here because the 2 suites that I adapted are not failing. But yes, there was no tally on failure in the existing test programs.

@rouson
Copy link
Collaborator

rouson commented Oct 15, 2025

It also took me until just now to figure out why the output shows STOP 1 twice: once after the failure and once at the end. I just realized that the first one is from the one failing test program and the second one must be from fpm. If you switch to a character stop code, then you just get the first one but not the second one. Once you've converted the rest of the tests to tuff, then you'll have the tally at the end and it might be nice to switch to character stop codes (or no stop codes). I just verified that doing so eliminates the STOP 1 at the end, which is redundant with a tally.

Part of my confusion is because I'm used to running parallel programs, wherein it can be common to have the same output from multiple images. So when I saw STOP 1 twice but knew that I had only executed one image, I got confused about how that could happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants