refactor: better python performance #500

Lewiscowles1986 · 2025-12-30T10:51:41Z

So I tried a few things, which get the minimum time in k6 down to microseconds for a request. They set slightly more aggressive timeouts, use pydantic to marshal the response from the POST endpoint to the DB. ~~and set a latency on every request DB calls of 1 second.~~ (the set local DB call, actually creates some overhead)

Ive not re-run the entire test, instead I've contributed a k6 test, which I used to verify on my local machine; within the also modified docker-compose yaml.

The migrations are still manual; and you'll probably notice I tried increasing server connections. I Think adjusting the min_size of db connections could be another good way to gain performance.

I'm pretty sure it still wont top bun, but you could get a few more thousand requests per second

lessons/276/fastapi-app/config.yaml

emerson-proenca · 2025-12-30T17:23:11Z

Do you believe running Pypy would improve Python performance?

Pypy + FastAPI = fastapi/fastapi#3944
Pypy speed = https://speed.pypy.org/

Lewiscowles1986 · 2025-12-30T22:08:56Z

Maybe @emerson-proenca; If you make a PR I'd love to run it. I've not had a lot of experience with it, so I consider it to be esoteric at the moment; but maybe it's very easy and I'm just unclear about approach.

I do know AI was telling me I was barking up the wrong tree with this, but then when I looked at all my latencies, they seemed to support me, more than AI review of my work.

Lewiscowles1986 · 2025-12-30T22:10:14Z

Here are some logs from K6 test included with PR, run on an M2 Air laptop

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

     execution: local
        script: ../script.js
        output: -

     scenarios: (100.00%) 1 scenario, 100 max VUs, 1m0s max duration (incl. graceful stop):
              * default: 100 looping VUs for 30s (gracefulStop: 30s)


     ✓ is status 201

     checks.........................: 100.00% ✓ 137626      ✗ 0     
     data_received..................: 44 MB   1.5 MB/s
     data_sent......................: 26 MB   878 kB/s
     http_req_blocked...............: avg=2.77µs  min=0s     med=1µs     max=18.46ms p(90)=2µs     p(95)=2µs    
     http_req_connecting............: avg=1.22µs  min=0s     med=0s      max=6.77ms  p(90)=0s      p(95)=0s     
     http_req_duration..............: avg=21.58ms min=6.09ms med=18.25ms max=1.45s   p(90)=28.56ms p(95)=37.09ms
       { expected_response:true }...: avg=21.58ms min=6.09ms med=18.25ms max=1.45s   p(90)=28.56ms p(95)=37.09ms
     http_req_failed................: 0.00%   ✓ 0           ✗ 137626
     http_req_receiving.............: avg=16.37µs min=4µs    med=10µs    max=5.15ms  p(90)=30µs    p(95)=40µs   
     http_req_sending...............: avg=5.25µs  min=1µs    med=3µs     max=4.96ms  p(90)=8µs     p(95)=12µs   
     http_req_tls_handshaking.......: avg=0s      min=0s     med=0s      max=0s      p(90)=0s      p(95)=0s     
     http_req_waiting...............: avg=21.56ms min=6.02ms med=18.23ms max=1.45s   p(90)=28.53ms p(95)=37.08ms
     http_reqs......................: 137626  4585.523346/s
     iteration_duration.............: avg=21.79ms min=6.47ms med=18.45ms max=1.45s   p(90)=28.77ms p(95)=37.32ms
     iterations.....................: 137626  4585.523346/s
     vus............................: 100     min=100       max=100 
     vus_max........................: 100     min=100       max=100

Lewiscowles1986 · 2025-12-31T04:15:12Z

Apologies, I realised after re-running the base-line I'd committed something I should not have. I've commented it out, but basically, I doubled the number of queries being run as part of a failed experiment and then accidentally git add . it. The rest is done now

antonputra

Thank you, @Lewiscowles1986, for the PR. Sorry I don't have clear guidelines for pull requests. Would you be able to just create a fastapi-app-v2 folder and keep the original as is? It's just easier to compare in the future.

Lewiscowles1986 · 2025-12-31T07:27:47Z

To be honest, it merging isnt' the most important thing as you'll only get at most a few thousand requests (not per second, overall 😄 )

Thank you for the tutorial; if you'd still like the changes I'll make them

antonputra · 2025-12-31T08:22:41Z

@Lewiscowles1986 Yes please, I'll keep it. When it's time to refresh the benchmark, I'll go over and test everything again.

Lewiscowles1986 · 2026-01-02T18:40:40Z

It now points to a separate folder; sorry I've been unwell; it's also simpler than the original patch I'd sent over.

Before

% k6 run ./script.js                                                

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

     execution: local
        script: ./script.js
        output: -

     scenarios: (100.00%) 1 scenario, 100 max VUs, 1m0s max duration (incl. graceful stop):
              * default: 100 looping VUs for 30s (gracefulStop: 30s)


     ✓ is status 201

     checks.........................: 100.00% ✓ 124525      ✗ 0     
     data_received..................: 40 MB   1.3 MB/s
     data_sent......................: 24 MB   794 kB/s
     http_req_blocked...............: avg=2.93µs  min=0s     med=1µs     max=7.66ms  p(90)=2µs     p(95)=2µs    
     http_req_connecting............: avg=1.78µs  min=0s     med=0s      max=7.23ms  p(90)=0s      p(95)=0s     
     http_req_duration..............: avg=23.87ms min=6.17ms med=19.8ms  max=1.36s   p(90)=32.42ms p(95)=42.4ms 
       { expected_response:true }...: avg=23.87ms min=6.17ms med=19.8ms  max=1.36s   p(90)=32.42ms p(95)=42.4ms 
     http_req_failed................: 0.00%   ✓ 0           ✗ 124525
     http_req_receiving.............: avg=17.23µs min=4µs    med=10µs    max=12.88ms p(90)=32µs    p(95)=43µs   
     http_req_sending...............: avg=5.55µs  min=1µs    med=3µs     max=6.5ms   p(90)=9µs     p(95)=13µs   
     http_req_tls_handshaking.......: avg=0s      min=0s     med=0s      max=0s      p(90)=0s      p(95)=0s     
     http_req_waiting...............: avg=23.84ms min=6.15ms med=19.78ms max=1.36s   p(90)=32.4ms  p(95)=42.37ms
     http_reqs......................: 124525  4149.135507/s
     iteration_duration.............: avg=24.08ms min=6.39ms med=20ms    max=1.36s   p(90)=32.65ms p(95)=42.69ms
     iterations.....................: 124525  4149.135507/s
     vus............................: 100     min=100       max=100 
     vus_max........................: 100     min=100       max=100 


running (0m30.0s), 000/100 VUs, 124525 complete and 0 interrupted iterations
default ✓ [======================================] 100 VUs  30s

After

% k6 run ./script.js          

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

     execution: local
        script: ./script.js
        output: -

     scenarios: (100.00%) 1 scenario, 100 max VUs, 1m0s max duration (incl. graceful stop):
              * default: 100 looping VUs for 30s (gracefulStop: 30s)


     ✓ is status 201

     checks.........................: 100.00% ✓ 133066      ✗ 0     
     data_received..................: 43 MB   1.4 MB/s
     data_sent......................: 26 MB   849 kB/s
     http_req_blocked...............: avg=6.84µs  min=0s     med=1µs     max=17.94ms  p(90)=1µs     p(95)=2µs    
     http_req_connecting............: avg=5.47µs  min=0s     med=0s      max=14.93ms  p(90)=0s      p(95)=0s     
     http_req_duration..............: avg=22.34ms min=8.13ms med=19.96ms max=231.87ms p(90)=28.29ms p(95)=33.88ms
       { expected_response:true }...: avg=22.34ms min=8.13ms med=19.96ms max=231.87ms p(90)=28.29ms p(95)=33.88ms
     http_req_failed................: 0.00%   ✓ 0           ✗ 133066
     http_req_receiving.............: avg=14.72µs min=4µs    med=9µs     max=5.26ms   p(90)=28µs    p(95)=37µs   
     http_req_sending...............: avg=4.94µs  min=1µs    med=4µs     max=4.22ms   p(90)=7µs     p(95)=12µs   
     http_req_tls_handshaking.......: avg=0s      min=0s     med=0s      max=0s       p(90)=0s      p(95)=0s     
     http_req_waiting...............: avg=22.32ms min=8.03ms med=19.94ms max=231.85ms p(90)=28.26ms p(95)=33.85ms
     http_reqs......................: 133066  4434.119736/s
     iteration_duration.............: avg=22.53ms min=8.32ms med=20.12ms max=232.2ms  p(90)=28.52ms p(95)=34.15ms
     iterations.....................: 133066  4434.119736/s
     vus............................: 100     min=100       max=100 
     vus_max........................: 100     min=100       max=100 


running (0m30.0s), 000/100 VUs, 133066 complete and 0 interrupted iterations
default ✓ [======================================] 100 VUs  30s

Lewiscowles1986 · 2026-01-02T18:55:04Z

@emerson-proenca I've read up on this and apparently, this should be as good as it gets using FastAPI under python without pre-compiling to something else. While python can have good IO bound perf, at this scale it seems the overhead to maintain that is CPU bound. Pypy would apparently not help; although if you fancy forking any of this code, please do.

One thing I do like about the new version is no need for a class, just using the python module instance for the database pool, and letting the library manage that.

antonputra

Thank you!

emerson-proenca · 2026-01-03T12:31:00Z

@emerson-proenca I've read up on this and apparently, this should be as good as it gets using FastAPI under python without pre-compiling to something else. While python can have good IO bound perf, at this scale it seems the overhead to maintain that is CPU bound. Pypy would apparently not help; although if you fancy forking any of this code, please do.

One thing I do like about the new version is no need for a class, just using the python module instance for the database pool, and letting the library manage that.

Eventually I'll try Pypy and others python interpreters/compilers to see if it can improve performance, to be honest I agree with you and I don't think it'll make a difference, maybe if we remove logging it will?

Spitfire1900 · 2026-01-03T17:52:43Z

Many thanks to everyone! I would recommend setting workers explicitly for both grainian and bun to ensure better fairness in benchmark; I noticed guvicron was set to 1 worker in the original test.

It's always important to have a benchmark that has "average code" rather than performing esoteric operations or configuration.

Lewiscowles1986 · 2026-01-04T10:28:13Z

@Spitfire1900 no idea what you felt was esoteric, so happy to have you explicitly point out what is about reducing the code; but also workers defaults to 1 for granian...

https://github.com/emmett-framework/granian

Spitfire1900 · 2026-01-04T17:02:22Z

I don't think anything is esoteric at this point, but it's something to keep in mind.

It looks like Bun auto-manages thread size, but supports explicitly setting threads via a hidden env var: https://github.com/oven-sh/bun/pull/14401/files#diff-52c167d9c0701a90829218516da76eefd4a0a05a2c57dd216df4599105c921e9

Spitfire1900 · 2026-01-04T17:07:40Z

Another consideration, https://github.com/antonputra/tutorials/pull/500/files#diff-de1516dcd4f2ef07f2807fc25a08c5741d9bda870629052f59f969a9e5b12fd7R37 uses --interface asgi instead of --interface rsgi, this may be poorer performance. See https://github.com/emmett-framework/granian/blob/master/docs/spec/RSGI.md for more info.

Lewiscowles1986 · 2026-01-05T01:49:50Z

@Spitfire1900 FastAPI and Starlette do not support rsgi which I'd argue is esoteric; as I indeed looked into it when flipping to granian. Happy to have you launch an alternative PR with your input, probably under a new sub-folder as I was directed towards, so that there is a practical two-way dialogue.

Lewiscowles1986 commented Dec 30, 2025

View reviewed changes

lessons/276/fastapi-app/config.yaml Show resolved Hide resolved

antonputra requested changes Dec 31, 2025

View reviewed changes

refactor: make separate directory for changed fastapi

c57dc3f

Lewiscowles1986 force-pushed the refactor/better-python-performance branch from acf0486 to c57dc3f Compare January 2, 2026 18:39

chore: granian instead of uvicorn

7e55c33

antonputra approved these changes Jan 3, 2026

View reviewed changes

antonputra merged commit b5bf75d into antonputra:main Jan 3, 2026

Uh oh!

refactor: better python performance #500

refactor: better python performance #500

Uh oh!

Conversation

Lewiscowles1986 commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

emerson-proenca commented Dec 30, 2025

Uh oh!

Lewiscowles1986 commented Dec 30, 2025

Uh oh!

Lewiscowles1986 commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lewiscowles1986 commented Dec 31, 2025

Uh oh!

antonputra left a comment

Choose a reason for hiding this comment

Uh oh!

Lewiscowles1986 commented Dec 31, 2025

Uh oh!

antonputra commented Dec 31, 2025

Uh oh!

Lewiscowles1986 commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lewiscowles1986 commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antonputra left a comment

Choose a reason for hiding this comment

Uh oh!

emerson-proenca commented Jan 3, 2026

Uh oh!

Spitfire1900 commented Jan 3, 2026

Uh oh!

Lewiscowles1986 commented Jan 4, 2026

Uh oh!

Spitfire1900 commented Jan 4, 2026

Uh oh!

Spitfire1900 commented Jan 4, 2026

Uh oh!

Lewiscowles1986 commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Lewiscowles1986 commented Dec 30, 2025 •

edited

Loading

Lewiscowles1986 commented Dec 30, 2025 •

edited

Loading

Lewiscowles1986 commented Jan 2, 2026 •

edited

Loading

Lewiscowles1986 commented Jan 2, 2026 •

edited

Loading