Service Throughput Tradeoffs

2018-04-13, David Crawshaw

I am currently writing a service that lets users upload files. The typical file size is about 25KB, the maximum is 100MB, with file sizes following a power-law distribution.

The nature of the service is that I need to buffer the contents of the file until it is complete before processing it. As the files can be 100MB, my very first version used temporary files, which meant the service could handle a large number of concurrent requests without significant memory pressure.

Load testing for throughput

Then I wrote a simple load test, blasting files as quickly as I could at the service from a dozen threads. The throughput after a little tuning was abysmal, on the order of hundreds of QPS.

A quick look at pprof showed that the service spent almost all of its time writing and then reading back temporary files.

A very easy way to make the load test faster is to store the temporary files in RAM. At 100MB, even a reasonably small slice of a modern server can concurrently serve hundreds of simultaneous connections. Now the service can handle thousands of QPS, using less CPU per request.

So far this is fairly typical of the sorts of tradeoffs in resources (RAM, CPU) and features (throughput) you see when designing services. It gets more interesting.

Is the load test representative of real load?

Consider: what if the typical user connection is a slow link? Transferring large files may not take the ~1 second we see in the load test, but rather 5 minutes of trickling packets.

In this case, the RAM-heavy version of the service hurts. It has limited the maximum concurrent uploads from tens of thousands to hundreds, in the name of making those transfers more CPU-time and wall-time efficient. With low-bandwidth clients, we will always have plenty of CPU, and with the RAM-heavy version we have significantly reduced QPS.

For a large number of slow clients, the first version is better.

The distracting unreality of synthetic load tests

A load test needs to reflect real loads, so do not write one if you do not have real traffic to use as a baseline. Just as with micro-benchmarks, attempting to construct one by reasoning from a blank sheet of paper can mislead and confuse.

So rather than inventing more elaborate load tests, I am going to spend my time writing more elaborate logs, with upload timing information.


Index
github.com/crawshaw
twitter.com/davidcrawshaw
david@zentus.com