-
-
Notifications
You must be signed in to change notification settings - Fork 3k
refactor: better python performance #500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: better python performance #500
Conversation
|
Do you believe running Pypy would improve Python performance? Pypy + FastAPI = fastapi/fastapi#3944 |
|
Maybe @emerson-proenca; If you make a PR I'd love to run it. I've not had a lot of experience with it, so I consider it to be esoteric at the moment; but maybe it's very easy and I'm just unclear about approach. I do know AI was telling me I was barking up the wrong tree with this, but then when I looked at all my latencies, they seemed to support me, more than AI review of my work. |
Here are some logs from K6 test included with PR, run on an M2 Air laptop /\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: ../script.js
output: -
scenarios: (100.00%) 1 scenario, 100 max VUs, 1m0s max duration (incl. graceful stop):
* default: 100 looping VUs for 30s (gracefulStop: 30s)
✓ is status 201
checks.........................: 100.00% ✓ 137626 ✗ 0
data_received..................: 44 MB 1.5 MB/s
data_sent......................: 26 MB 878 kB/s
http_req_blocked...............: avg=2.77µs min=0s med=1µs max=18.46ms p(90)=2µs p(95)=2µs
http_req_connecting............: avg=1.22µs min=0s med=0s max=6.77ms p(90)=0s p(95)=0s
http_req_duration..............: avg=21.58ms min=6.09ms med=18.25ms max=1.45s p(90)=28.56ms p(95)=37.09ms
{ expected_response:true }...: avg=21.58ms min=6.09ms med=18.25ms max=1.45s p(90)=28.56ms p(95)=37.09ms
http_req_failed................: 0.00% ✓ 0 ✗ 137626
http_req_receiving.............: avg=16.37µs min=4µs med=10µs max=5.15ms p(90)=30µs p(95)=40µs
http_req_sending...............: avg=5.25µs min=1µs med=3µs max=4.96ms p(90)=8µs p(95)=12µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=21.56ms min=6.02ms med=18.23ms max=1.45s p(90)=28.53ms p(95)=37.08ms
http_reqs......................: 137626 4585.523346/s
iteration_duration.............: avg=21.79ms min=6.47ms med=18.45ms max=1.45s p(90)=28.77ms p(95)=37.32ms
iterations.....................: 137626 4585.523346/s
vus............................: 100 min=100 max=100
vus_max........................: 100 min=100 max=100
|
|
Apologies, I realised after re-running the base-line I'd committed something I should not have. I've commented it out, but basically, I doubled the number of queries being run as part of a failed experiment and then accidentally |
antonputra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @Lewiscowles1986, for the PR. Sorry I don't have clear guidelines for pull requests. Would you be able to just create a fastapi-app-v2 folder and keep the original as is? It's just easier to compare in the future.
|
To be honest, it merging isnt' the most important thing as you'll only get at most a few thousand requests (not per second, overall 😄 ) Thank you for the tutorial; if you'd still like the changes I'll make them |
|
@Lewiscowles1986 Yes please, I'll keep it. When it's time to refresh the benchmark, I'll go over and test everything again. |
acf0486 to
c57dc3f
Compare
|
It now points to a separate folder; sorry I've been unwell; it's also simpler than the original patch I'd sent over. Before% k6 run ./script.js
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: ./script.js
output: -
scenarios: (100.00%) 1 scenario, 100 max VUs, 1m0s max duration (incl. graceful stop):
* default: 100 looping VUs for 30s (gracefulStop: 30s)
✓ is status 201
checks.........................: 100.00% ✓ 124525 ✗ 0
data_received..................: 40 MB 1.3 MB/s
data_sent......................: 24 MB 794 kB/s
http_req_blocked...............: avg=2.93µs min=0s med=1µs max=7.66ms p(90)=2µs p(95)=2µs
http_req_connecting............: avg=1.78µs min=0s med=0s max=7.23ms p(90)=0s p(95)=0s
http_req_duration..............: avg=23.87ms min=6.17ms med=19.8ms max=1.36s p(90)=32.42ms p(95)=42.4ms
{ expected_response:true }...: avg=23.87ms min=6.17ms med=19.8ms max=1.36s p(90)=32.42ms p(95)=42.4ms
http_req_failed................: 0.00% ✓ 0 ✗ 124525
http_req_receiving.............: avg=17.23µs min=4µs med=10µs max=12.88ms p(90)=32µs p(95)=43µs
http_req_sending...............: avg=5.55µs min=1µs med=3µs max=6.5ms p(90)=9µs p(95)=13µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=23.84ms min=6.15ms med=19.78ms max=1.36s p(90)=32.4ms p(95)=42.37ms
http_reqs......................: 124525 4149.135507/s
iteration_duration.............: avg=24.08ms min=6.39ms med=20ms max=1.36s p(90)=32.65ms p(95)=42.69ms
iterations.....................: 124525 4149.135507/s
vus............................: 100 min=100 max=100
vus_max........................: 100 min=100 max=100
running (0m30.0s), 000/100 VUs, 124525 complete and 0 interrupted iterations
default ✓ [======================================] 100 VUs 30sAfter% k6 run ./script.js
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: ./script.js
output: -
scenarios: (100.00%) 1 scenario, 100 max VUs, 1m0s max duration (incl. graceful stop):
* default: 100 looping VUs for 30s (gracefulStop: 30s)
✓ is status 201
checks.........................: 100.00% ✓ 133066 ✗ 0
data_received..................: 43 MB 1.4 MB/s
data_sent......................: 26 MB 849 kB/s
http_req_blocked...............: avg=6.84µs min=0s med=1µs max=17.94ms p(90)=1µs p(95)=2µs
http_req_connecting............: avg=5.47µs min=0s med=0s max=14.93ms p(90)=0s p(95)=0s
http_req_duration..............: avg=22.34ms min=8.13ms med=19.96ms max=231.87ms p(90)=28.29ms p(95)=33.88ms
{ expected_response:true }...: avg=22.34ms min=8.13ms med=19.96ms max=231.87ms p(90)=28.29ms p(95)=33.88ms
http_req_failed................: 0.00% ✓ 0 ✗ 133066
http_req_receiving.............: avg=14.72µs min=4µs med=9µs max=5.26ms p(90)=28µs p(95)=37µs
http_req_sending...............: avg=4.94µs min=1µs med=4µs max=4.22ms p(90)=7µs p(95)=12µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=22.32ms min=8.03ms med=19.94ms max=231.85ms p(90)=28.26ms p(95)=33.85ms
http_reqs......................: 133066 4434.119736/s
iteration_duration.............: avg=22.53ms min=8.32ms med=20.12ms max=232.2ms p(90)=28.52ms p(95)=34.15ms
iterations.....................: 133066 4434.119736/s
vus............................: 100 min=100 max=100
vus_max........................: 100 min=100 max=100
running (0m30.0s), 000/100 VUs, 133066 complete and 0 interrupted iterations
default ✓ [======================================] 100 VUs 30s |
|
@emerson-proenca I've read up on this and apparently, this should be as good as it gets using FastAPI under python without pre-compiling to something else. While python can have good IO bound perf, at this scale it seems the overhead to maintain that is CPU bound. Pypy would apparently not help; although if you fancy forking any of this code, please do. One thing I do like about the new version is no need for a class, just using the python module instance for the database pool, and letting the library manage that. |
antonputra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Eventually I'll try Pypy and others python interpreters/compilers to see if it can improve performance, to be honest I agree with you and I don't think it'll make a difference, maybe if we remove logging it will? |
|
Many thanks to everyone! I would recommend setting workers explicitly for both grainian and bun to ensure better fairness in benchmark; I noticed guvicron was set to 1 worker in the original test. It's always important to have a benchmark that has "average code" rather than performing esoteric operations or configuration. |
|
@Spitfire1900 no idea what you felt was esoteric, so happy to have you explicitly point out what is about reducing the code; but also workers defaults to 1 for granian... |
|
I don't think anything is esoteric at this point, but it's something to keep in mind. It looks like Bun auto-manages thread size, but supports explicitly setting threads via a hidden env var: https://github.com/oven-sh/bun/pull/14401/files#diff-52c167d9c0701a90829218516da76eefd4a0a05a2c57dd216df4599105c921e9 |
|
Another consideration, https://github.com/antonputra/tutorials/pull/500/files#diff-de1516dcd4f2ef07f2807fc25a08c5741d9bda870629052f59f969a9e5b12fd7R37 uses |
|
@Spitfire1900 FastAPI and Starlette do not support |
So I tried a few things, which get the minimum time in k6 down to microseconds for a request. They set slightly more aggressive timeouts, use pydantic to marshal the response from the POST endpoint to the DB.
and set a latency on every request DB calls of 1 second.(the set local DB call, actually creates some overhead)Ive not re-run the entire test, instead I've contributed a k6 test, which I used to verify on my local machine; within the also modified docker-compose yaml.
The migrations are still manual; and you'll probably notice I tried increasing server connections. I Think adjusting the
min_sizeof db connections could be another good way to gain performance.I'm pretty sure it still wont top bun, but you could get a few more thousand requests per second