Skip to content

Conversation

@Lewiscowles1986
Copy link
Contributor

@Lewiscowles1986 Lewiscowles1986 commented Dec 30, 2025

So I tried a few things, which get the minimum time in k6 down to microseconds for a request. They set slightly more aggressive timeouts, use pydantic to marshal the response from the POST endpoint to the DB. and set a latency on every request DB calls of 1 second. (the set local DB call, actually creates some overhead)

Ive not re-run the entire test, instead I've contributed a k6 test, which I used to verify on my local machine; within the also modified docker-compose yaml.

The migrations are still manual; and you'll probably notice I tried increasing server connections. I Think adjusting the min_size of db connections could be another good way to gain performance.

I'm pretty sure it still wont top bun, but you could get a few more thousand requests per second

@emerson-proenca
Copy link

Do you believe running Pypy would improve Python performance?

Pypy + FastAPI = fastapi/fastapi#3944
Pypy speed = https://speed.pypy.org/

@Lewiscowles1986
Copy link
Contributor Author

Maybe @emerson-proenca; If you make a PR I'd love to run it. I've not had a lot of experience with it, so I consider it to be esoteric at the moment; but maybe it's very easy and I'm just unclear about approach.

I do know AI was telling me I was barking up the wrong tree with this, but then when I looked at all my latencies, they seemed to support me, more than AI review of my work.

@Lewiscowles1986
Copy link
Contributor Author

Lewiscowles1986 commented Dec 30, 2025

Here are some logs from K6 test included with PR, run on an M2 Air laptop
          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

     execution: local
        script: ../script.js
        output: -

     scenarios: (100.00%) 1 scenario, 100 max VUs, 1m0s max duration (incl. graceful stop):
              * default: 100 looping VUs for 30s (gracefulStop: 30s)


     ✓ is status 201

     checks.........................: 100.00% ✓ 137626      ✗ 0     
     data_received..................: 44 MB   1.5 MB/s
     data_sent......................: 26 MB   878 kB/s
     http_req_blocked...............: avg=2.77µs  min=0s     med=1µs     max=18.46ms p(90)=2µs     p(95)=2µs    
     http_req_connecting............: avg=1.22µs  min=0s     med=0s      max=6.77ms  p(90)=0s      p(95)=0s     
     http_req_duration..............: avg=21.58ms min=6.09ms med=18.25ms max=1.45s   p(90)=28.56ms p(95)=37.09ms
       { expected_response:true }...: avg=21.58ms min=6.09ms med=18.25ms max=1.45s   p(90)=28.56ms p(95)=37.09ms
     http_req_failed................: 0.00%   ✓ 0           ✗ 137626
     http_req_receiving.............: avg=16.37µs min=4µs    med=10µs    max=5.15ms  p(90)=30µs    p(95)=40µs   
     http_req_sending...............: avg=5.25µs  min=1µs    med=3µs     max=4.96ms  p(90)=8µs     p(95)=12µs   
     http_req_tls_handshaking.......: avg=0s      min=0s     med=0s      max=0s      p(90)=0s      p(95)=0s     
     http_req_waiting...............: avg=21.56ms min=6.02ms med=18.23ms max=1.45s   p(90)=28.53ms p(95)=37.08ms
     http_reqs......................: 137626  4585.523346/s
     iteration_duration.............: avg=21.79ms min=6.47ms med=18.45ms max=1.45s   p(90)=28.77ms p(95)=37.32ms
     iterations.....................: 137626  4585.523346/s
     vus............................: 100     min=100       max=100 
     vus_max........................: 100     min=100       max=100 

@Lewiscowles1986
Copy link
Contributor Author

Apologies, I realised after re-running the base-line I'd committed something I should not have. I've commented it out, but basically, I doubled the number of queries being run as part of a failed experiment and then accidentally git add . it. The rest is done now

Copy link
Owner

@antonputra antonputra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @Lewiscowles1986, for the PR. Sorry I don't have clear guidelines for pull requests. Would you be able to just create a fastapi-app-v2 folder and keep the original as is? It's just easier to compare in the future.

@Lewiscowles1986
Copy link
Contributor Author

To be honest, it merging isnt' the most important thing as you'll only get at most a few thousand requests (not per second, overall 😄 )

Thank you for the tutorial; if you'd still like the changes I'll make them

@antonputra
Copy link
Owner

@Lewiscowles1986 Yes please, I'll keep it. When it's time to refresh the benchmark, I'll go over and test everything again.

@Lewiscowles1986 Lewiscowles1986 force-pushed the refactor/better-python-performance branch from acf0486 to c57dc3f Compare January 2, 2026 18:39
@Lewiscowles1986
Copy link
Contributor Author

Lewiscowles1986 commented Jan 2, 2026

It now points to a separate folder; sorry I've been unwell; it's also simpler than the original patch I'd sent over.

Before
% k6 run ./script.js                                                

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

     execution: local
        script: ./script.js
        output: -

     scenarios: (100.00%) 1 scenario, 100 max VUs, 1m0s max duration (incl. graceful stop):
              * default: 100 looping VUs for 30s (gracefulStop: 30s)


     ✓ is status 201

     checks.........................: 100.00% ✓ 124525      ✗ 0     
     data_received..................: 40 MB   1.3 MB/s
     data_sent......................: 24 MB   794 kB/s
     http_req_blocked...............: avg=2.93µs  min=0s     med=1µs     max=7.66ms  p(90)=2µs     p(95)=2µs    
     http_req_connecting............: avg=1.78µs  min=0s     med=0s      max=7.23ms  p(90)=0s      p(95)=0s     
     http_req_duration..............: avg=23.87ms min=6.17ms med=19.8ms  max=1.36s   p(90)=32.42ms p(95)=42.4ms 
       { expected_response:true }...: avg=23.87ms min=6.17ms med=19.8ms  max=1.36s   p(90)=32.42ms p(95)=42.4ms 
     http_req_failed................: 0.00%   ✓ 0           ✗ 124525
     http_req_receiving.............: avg=17.23µs min=4µs    med=10µs    max=12.88ms p(90)=32µs    p(95)=43µs   
     http_req_sending...............: avg=5.55µs  min=1µs    med=3µs     max=6.5ms   p(90)=9µs     p(95)=13µs   
     http_req_tls_handshaking.......: avg=0s      min=0s     med=0s      max=0s      p(90)=0s      p(95)=0s     
     http_req_waiting...............: avg=23.84ms min=6.15ms med=19.78ms max=1.36s   p(90)=32.4ms  p(95)=42.37ms
     http_reqs......................: 124525  4149.135507/s
     iteration_duration.............: avg=24.08ms min=6.39ms med=20ms    max=1.36s   p(90)=32.65ms p(95)=42.69ms
     iterations.....................: 124525  4149.135507/s
     vus............................: 100     min=100       max=100 
     vus_max........................: 100     min=100       max=100 


running (0m30.0s), 000/100 VUs, 124525 complete and 0 interrupted iterations
default ✓ [======================================] 100 VUs  30s
After
% k6 run ./script.js          

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

     execution: local
        script: ./script.js
        output: -

     scenarios: (100.00%) 1 scenario, 100 max VUs, 1m0s max duration (incl. graceful stop):
              * default: 100 looping VUs for 30s (gracefulStop: 30s)


     ✓ is status 201

     checks.........................: 100.00% ✓ 133066      ✗ 0     
     data_received..................: 43 MB   1.4 MB/s
     data_sent......................: 26 MB   849 kB/s
     http_req_blocked...............: avg=6.84µs  min=0s     med=1µs     max=17.94ms  p(90)=1µs     p(95)=2µs    
     http_req_connecting............: avg=5.47µs  min=0s     med=0s      max=14.93ms  p(90)=0s      p(95)=0s     
     http_req_duration..............: avg=22.34ms min=8.13ms med=19.96ms max=231.87ms p(90)=28.29ms p(95)=33.88ms
       { expected_response:true }...: avg=22.34ms min=8.13ms med=19.96ms max=231.87ms p(90)=28.29ms p(95)=33.88ms
     http_req_failed................: 0.00%   ✓ 0           ✗ 133066
     http_req_receiving.............: avg=14.72µs min=4µs    med=9µs     max=5.26ms   p(90)=28µs    p(95)=37µs   
     http_req_sending...............: avg=4.94µs  min=1µs    med=4µs     max=4.22ms   p(90)=7µs     p(95)=12µs   
     http_req_tls_handshaking.......: avg=0s      min=0s     med=0s      max=0s       p(90)=0s      p(95)=0s     
     http_req_waiting...............: avg=22.32ms min=8.03ms med=19.94ms max=231.85ms p(90)=28.26ms p(95)=33.85ms
     http_reqs......................: 133066  4434.119736/s
     iteration_duration.............: avg=22.53ms min=8.32ms med=20.12ms max=232.2ms  p(90)=28.52ms p(95)=34.15ms
     iterations.....................: 133066  4434.119736/s
     vus............................: 100     min=100       max=100 
     vus_max........................: 100     min=100       max=100 


running (0m30.0s), 000/100 VUs, 133066 complete and 0 interrupted iterations
default ✓ [======================================] 100 VUs  30s

@Lewiscowles1986
Copy link
Contributor Author

Lewiscowles1986 commented Jan 2, 2026

@emerson-proenca I've read up on this and apparently, this should be as good as it gets using FastAPI under python without pre-compiling to something else. While python can have good IO bound perf, at this scale it seems the overhead to maintain that is CPU bound. Pypy would apparently not help; although if you fancy forking any of this code, please do.

One thing I do like about the new version is no need for a class, just using the python module instance for the database pool, and letting the library manage that.

Copy link
Owner

@antonputra antonputra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@antonputra antonputra merged commit b5bf75d into antonputra:main Jan 3, 2026
@emerson-proenca
Copy link

@emerson-proenca I've read up on this and apparently, this should be as good as it gets using FastAPI under python without pre-compiling to something else. While python can have good IO bound perf, at this scale it seems the overhead to maintain that is CPU bound. Pypy would apparently not help; although if you fancy forking any of this code, please do.

One thing I do like about the new version is no need for a class, just using the python module instance for the database pool, and letting the library manage that.

Eventually I'll try Pypy and others python interpreters/compilers to see if it can improve performance, to be honest I agree with you and I don't think it'll make a difference, maybe if we remove logging it will?

@Spitfire1900
Copy link

Many thanks to everyone! I would recommend setting workers explicitly for both grainian and bun to ensure better fairness in benchmark; I noticed guvicron was set to 1 worker in the original test.

It's always important to have a benchmark that has "average code" rather than performing esoteric operations or configuration.

@Lewiscowles1986
Copy link
Contributor Author

@Spitfire1900 no idea what you felt was esoteric, so happy to have you explicitly point out what is about reducing the code; but also workers defaults to 1 for granian...

Screenshot 2026-01-04 at 10 26 32

https://github.com/emmett-framework/granian

@Spitfire1900
Copy link

I don't think anything is esoteric at this point, but it's something to keep in mind.

It looks like Bun auto-manages thread size, but supports explicitly setting threads via a hidden env var: https://github.com/oven-sh/bun/pull/14401/files#diff-52c167d9c0701a90829218516da76eefd4a0a05a2c57dd216df4599105c921e9

@Spitfire1900
Copy link

Another consideration, https://github.com/antonputra/tutorials/pull/500/files#diff-de1516dcd4f2ef07f2807fc25a08c5741d9bda870629052f59f969a9e5b12fd7R37 uses --interface asgi instead of --interface rsgi, this may be poorer performance. See https://github.com/emmett-framework/granian/blob/master/docs/spec/RSGI.md for more info.

@Lewiscowles1986
Copy link
Contributor Author

@Spitfire1900 FastAPI and Starlette do not support rsgi which I'd argue is esoteric; as I indeed looked into it when flipping to granian. Happy to have you launch an alternative PR with your input, probably under a new sub-folder as I was directed towards, so that there is a practical two-way dialogue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants