-
Notifications
You must be signed in to change notification settings - Fork 40
Description
Does anybody have experience running quickfix (specifically node-quickfix) on Amazon EC2? I'm having a very strange disconnect issue, and contacting Amazon tech support wasn't fruitful.
I'm not sure if this is a quickfix issue or with the node wrapper (I'm asking on both groups).
We have multiple containers running node-quickfix-client. After a while -- sometimes an hour, sometimes four hours, sometimes a day -- the instance will show high-cpu usage and a lot time spent in I/O Wait. We see this in atop running on the instance, but within the containers we don't see this with anything. The client will freeze up for 1 to 4 minutes, more load in the containers seems to increase the length of time but it is difficult to tell; it could just be randomness. The server will the drop the connection after not receiving a heartbeat nor response to the test request from the client. 1 to 4 minutes later the client comes out of its coma and begins its reconnect sequence. Then everything runs fine until this plays out again.
I was hoping somebody here has tried quickfix on Amazon EC2 and either seen something similar or can say it runs fine for them. We've tried upping the instance size, but that didn't change anything. Presently the sequence numbers are written to an EFS (Amazon's wrapper around NFS), but we are in the process of moving that to in container storage to see if that helps.