Consider a component that provides address and/or postal code validation around the world. For the sake of illustration, let us consider 10 regions that each have the same volume of data.
Initial tests found that it took 25 GB of data to load all of them in memory. Working of AWS EC2 price list, we find that a m4.2xlarge is needed to run it, at $0.479/hr. This gives us 8 CPUs.
If we run with 10 @ 2.5 GB instead, we end up with 10 t2.medium, each with 2 CPU and a cost of $0.052/hr, or $0.52/hr -- which on first impression is more expensive, except we have 20 CPUs instead of 8 CPU. We may have better performance. If one of these instances is a hot spot (like US addresses), then we may end up with 9 instances that each support one region each and perhaps 5 instances supporting the US. As a single instance model, we may need 5 instances.
In this case, we could end up with
- Single Instance Model: 5 * $0.479 = $2.395/hr with 40 CPU
- Sharded Instances Model: (9 + 5) * $0.052 = $1.02/hr with 28 CPU
We have moved from one model being 10% more expensive to the other model being 100% as expensive.
Flexibility in initial design to support independent cloud instances with low resource requirements as well as sharding may be a key cost control mechanism for cloud applications.
It is impossible to a-priori determine the optimal design and deployment strategy. It needs to determined by experiment. To do experiments cheaply, means that the components and architecture must be designed to support experimentation.
In some ways, cloud computing is like many phone plans -- you are forced to pay for a resource level and may not used all of resources that you pay for. Yes, the plans have steps, but if you need 18 GB of memory you may have to also pay for 8 CPUs that will never run more than 5% CPU usage (i.e. a single CPU is sufficient). Designing to support flexibility of cloud instances is essential for cost savings.