Skip to content

Elixir lexer now with 300% more blazingly fast and 300% less memory usage#255

Open
madlep wants to merge 3 commits into
ThePrimeagen:masterfrom
madlep:elixir-lexer-performance-optimise
Open

Elixir lexer now with 300% more blazingly fast and 300% less memory usage#255
madlep wants to merge 3 commits into
ThePrimeagen:masterfrom
madlep:elixir-lexer-performance-optimise

Conversation

@madlep
Copy link
Copy Markdown

@madlep madlep commented Jun 20, 2023

Was thinking about this, and worked in a bunch of optimisations to make things ✨ blazingly faster

TL;DR - 3x faster, 3x less memory usage

  • rather than doing a tail recursive loop in lex/2, and calling tokenize/2 to return a tuple of {token, rest}, just do everything pure tail recursively. This is the main speed and memory boost because:
    • building a tuple to return multiple values takes a bit of time
    • extracting values when the function returns takes a bit of time
    • it has to be done on heap, which is slower
    • it needs memory to be allocated on the heap, which takes time
    • it needs heap memory to eventually be GCed
    • calling a function to get a value, and then doing something with that value means the BEAM has to manage stack allocation, continuation pointers, stack deallocation etc. If everything is completely tail-call optimised, the BEAM can just set VM registers to pass arguments, and not worry about managing the stack, or return values etc, and use call_last instruction rather than call - just is faster.
  • when matching identifiers/numbers, instead of accumulating a character at a time into an iolist, just keep track of the integer position into the string containing the token. Then when we hit the end of the token, do a single binary pattern match with the calculated length. This gives us a sub binary match - which is just a reference into the original string in memory, and doesn't involve any copying, or need to create GC building and intermediate accumulator.
  • when matching a keyword, we've already got a huge headstart, so keep that match length, and just check the next character. If it's not another letter, we can just directly return a tuple. If it is, then carry on as if it were an identifier
  • rather than splitting logic for detecting whitespace/eof etc into a separate function, do as much as possible in one match. Means only one pattern match operation, but more importantly, it allows the compiler to see all the potential cases, and generate more optimal code based on them (in this case it builds a sort-of trie lookup based on prefix characters combinations from the hard coded symbols/keywords/whitespace, with a fallback for identifiers/numbers, but depending on cases could do linear search/binary search).
  • refactored main matches to be in a case expression rather than separate function heads. They both work identically (and have verified the generated BEAM byte code is the same in both cases), but because we're passing tokens, and doing another function call in each case, it gets messier as function heads. Because tokens is already an argument, we can just pass that along rather than having to explicitly declare it as an argument each time.
  • added a benchmark too, which I was using. Have commented out benchmarking the old code (cause it's not checked in), but to compare, you can copy+paste the old code into Monkey.OldLexer

Apologies to @ryanwinchester . Was gonna make just a couple of tweaks, but ended up touching pretty much all the original code. The same logic is still there. It's just all being called a bit differently.

Operating System: macOS
CPU Information: Apple M1 Pro
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.14.5
Erlang 26.0.1

Benchmark suite executing with the following configuration:
warmup: 20 s
time: 20 s
memory time: 5 s
reduction time: 5 s
parallel: 1
inputs: none specified
Estimated total run time: 1.67 min

Benchmarking Lexer ...
Benchmarking OldLexer ...

Name               ips        average  deviation         median         99th %
Lexer         515.43 K        1.94 μs  ±1439.50%        1.83 μs        2.13 μs
OldLexer      180.10 K        5.55 μs   ±316.21%        5.29 μs        9.79 μs

Comparison:
Lexer         515.43 K
OldLexer      180.10 K - 2.86x slower +3.61 μs

Memory usage statistics:

Name        Memory usage
Lexer            7.95 KB
OldLexer        24.13 KB - 3.04x memory usage +16.19 KB

**All measurements for memory usage were the same**

Reduction count statistics:

Name     Reduction count
Lexer                394
OldLexer             950 - 2.41x reduction count +556

**All measurements for reduction count were the same**

madlep added 3 commits June 20, 2023 14:07
- make everything pure tail recursive, rather than returning tuples from
  helper function
- figure out length of token, and do single binary pattern match to
  extract it one go as a sub binary match rather than building an
  accumulator
- do one main pattern match in one place, rather than several separate
  ones
- when matching keywords, keep what we've matched rather than matching
  all over again when figuring out if it's a keyword or identifier at
  the end
Comment thread elixir/lib/lexer.ex
## Example

iex> Lexer.init("let five = 5;")
iex> Monkey.Lexer.init("let five = 5;")
Copy link
Copy Markdown
Contributor

@ryanwinchester ryanwinchester Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need that if the alias in the test ¯\(ツ)/¯, but fine with me either way

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is weird, as it works fine in ExUnit, but when I had it in a livebook, it was complaining about it. I think it's a bug in livebook though. But adding it makes it work either way 🤷‍♂️

@ryanwinchester
Copy link
Copy Markdown
Contributor

ryanwinchester commented Jun 20, 2023

I 100% had the intermediary step with variable assignment (returning {token, rest}) because I knew it was going to be reviewed on-stream and I wanted it to be easier to read and focus on the sweetness of binary pattern-matching.

They did the on-stream review, so cool with optimizing it.

Copy link
Copy Markdown
Contributor

@ryanwinchester ryanwinchester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but even though I'm listed as a code-owner, I'm not added as a collaborator so my reviews don't actually count towards approval 🥲.

@Fa-C-Shus
Copy link
Copy Markdown
Contributor

I think your work and review matters, but I’m no collaborator either

@madlep
Copy link
Copy Markdown
Author

madlep commented Jun 20, 2023

I 100% had the intermediary step with variable assignment (returning {token, rest}) because I knew it was going to be reviewed on-stream and I wanted it to be easier to read and focus on the sweetness of binary pattern-matching.

They did the on-stream review, so cool with optimizing it.

Yeah, I watched that, it is was good to follow along with. The previous version is much more readable than this version. It's definitely sacrificing readability for performance here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants