Scalability
Cache
...
Clones
...
Denormalization
...
File based caching
...
No joins
...
Overtime
...
Persistent
...
SQL tuning
...
Sharding
...
Web service
...
Cache
A simple key-value store and it should reside as a buffering layer between your application and your data storage.
External persistent cache
An external persistent cache will have better performance than an external database
Only if it's not in the cache should it then try to get the data from the main data source. Why should you do that?
Because a cache is lightning-fast. It holds every dataset in RAM and requests are handled as fast as technically possible.
Path 2 problem
But even if you successfully switch to the latest and greatest NoSQL database and let your app do the dataset-joins, soon your database requests will again be slower and slower. You will need to introduce a cache.
External
By external I mean that the data store does not reside on the application servers. Instead, it is somewhere in or near the data center of your application servers.
There are 2 patterns of caching your data. An old one and a new one:
Cached Database Queries Cached Objects
Redis
External persistent cache
Path 1
Hire a database administrator (DBA,) tell him to do master-slave replication (read from slaves, write to master) and upgrade your master server by adding RAM, RAM and more RAM.
Path 1 problem
In some months, your DBA will come up with words like "sharding", "denormalization" and "SQL tuning" and will look worried about the necessary overtime during the next weeks. At that point every new action to keep your database running will be more expensive and time consuming than the previous one.
What are the common in memory caches
Memcached or Redis
try to get the data from the main data source.
Only if data is not in the cache should
Now the required changes are more radical than just adding more cloned servers and may even require some boldness. In the end, you can choose from 2 paths:
Path #1 is to stick with MySQL and keep the "beast" running. Path #2 means to denormalize right from the beginning and include no more Joins in any database query.
Cache not to do
Please never do file-based caching, it makes cloning and auto-scaling of your servers just a pain.
Load balancer
Public servers of a scalable web service are hidden behind a load balancer. This load balancer evenly distributes load (requests from your users) onto your group/cluster of application servers.
Redis
Redis can do several hundreds of thousands of read operations per second when being hosted on a standard server. Also writes, especially increments, are very, very fast. Try that with a database!
Sessions
Sessions need to be stored in a centralized data store which is accessible to all your application servers. It can be an external database or an external persistent cache, like Redis
Cached objects
That's my strong recommendation and I always prefer this pattern. In general, see your data as an object like you already do in your code (classes, instances, etc.). Let your class assemble a dataset from your database and then store the complete instance of the class or the assembed dataset in the cache.
Cached Database Queries
That's still the most commonly used caching pattern.
But what about deployment? How can you make sure that a code change is sent to all your servers without one server still serving old code?
This tricky problem is fortunately already solved by the great tool Capistrano. It requires some learning, especially if you are not into Ruby on Rails, but it is definitely both the effort.
try to retrieve the data from your cache.
Whenever your application has to read data it should at first
Cache
With "cache" I always mean in-memory caches like Memcached or Redis.
Path 2
You can stay with MySQL, and use it like a NoSQL database, or you can switch to a better and easier to scale NoSQL database like MongoDB or CouchDB. Joins will now need to be done in your application code. The sooner you do this step the less code you will have to change in the future.
Path 1 solution
You might have been better off if you had chosen Path #2 while your dataset was still small and easy to migrate.
After "outsourcing" your sessions and serving the same codebase from all your servers, you can now create
an image file from one of these servers (AWS calls this AMI - Amazon Machine Image.). Use this AMI as a "super-clone" that all your new instances are based upon. Whenever you start a new instance/clone, just do an initial deployment of your latest code and you are ready!
First rule for scalability
every server contains exactly the same codebase and does not store any user-related data, like sessions or profile pictures, on local disc or memory.
The next time you run the query, you check at first the cache if there is already a result. This pattern has several issues. The main issue is
expiration
A hashed version of your query
is the cache key.
After following Part 2 of this series, you now have a scalable database solution. You have no fear of storing terabytes anymore and the world is looking fine. But just for you. Your users still have to suffer slow page requests when a lot of data is fetched from the database. The solution
is the implementation of a cache.
It is hard to delete a cached result when you cache
when you cache a complex query (who has not?).
The next time you run the query,
you first check if it is already in the cache.
When one piece of data changes (for example a table cell)
you need to delete all cached queries who may include that table cell.
Whenever you do a query to your database,
you store the result dataset in cache.
your servers can now horizontally scale and you can already serve thousands of concurrent requests. But somewhere down the road your application gets slower and slower and finally breaks down. The reason:
your database. It's MySQL, isn't it?