More Scaling Considerations

In a previous post here, I talked about some scaling decisions when implementing a new web application but I just realised I missed a whole earlier set of design questions when considering scaling and had just assumed that you already had the site architecture decided.

So let's take a step back. I am looking at a new project, not based on anything previously built. My canvas is blank, I have no reason to choose any architecture, framework or deployment other than what is right for the job. The issue is, this site could become super popular so the following lists some things I am already thinking about this new super site. (not in any order)

You cannot predict the future and you won't even correctly predict exactly how your site will perform when it has loads of users. You can perform load testing but that will only match what you think will happen, not necessarily what will. That is fine. Do NOT worry about a perfect solution - you will not be able to create it. However, you can make some sensible decisions early on to help with the scaling, at least to the point when you can make more specialised decisions when you are really scraping the limits of performance at scale.
Consider translation. If your site will become very popular, chances are you will want it available in at least a few other languages. Chinese, Spanish, Portuguese, German, French, Russian perhaps. You should build the functionality in from the beginning and consider how it might work practically i.e. how will you extract the text to translate and inject the translations back in again. What if you are changing the text too quickly, how will you handle that?
Consider your database design early on. If your application is quite data-heavy (many are), a single database will only last you so long but there are various ways to scale out, each with their own challenges. Replication/geo-caching; write-once, read-many; vertical and horizontal sharding. Part of this decision will be based on whether your site is mainly reading data, in which case, writing to a single master and replicating across many read-only databases is fairly straight-forward. You might also create multiple instances of your application across the world so that US data is stored in one database, European data in another etc. Although you shouldn't take too much time worrying about this in the early days, deciding what you will eventually have will dictate some of your decisions now. For instance, integer primary keys are not replication friendly (what happens when two database instances have two different rows at index 123?).
How will you handle shared session when your site is likely to have multiple web servers?
In what ways can you cache data across web servers?
Most bottleneck is in I/O so how can you reduce traffic between web servers and databases (or any other servers you might be using?). Can you reduce the number of database calls by merging requests? Pre-emptively get data from the database in case the user then wants to read that data, avoiding another call to the database?
Consider how the largest organisations will use your system. In my case, the simple case is simple and does not require a lot of functionality but if I start there and then consider the more complicated corporate case, I would then have to retro-fit a load of functionality that is almost certainly needed for the large case.
Do not be afraid to jump around between design type documents. Hand-drawn pictures might work for something but then you think a data definition document or translation spreadsheet might be helpful. Don't feel forced to totally finish a particular document if it isn't really doing what you want. 100% documentation sounds great but is incredibly time consuming and often contains a lot of additional work with formatting/copy etc. that doesn't actually make it into the product.
Spend the most time thinking about the most important stuff. I have a User Management section for this new site but for the most part, this is fairly well-known and common. Another section about object lifecycle, entry and exit points with transitions and the ability to edit it is slightly more involved and I need to consider more things - permissions, widows and orphan objects, cascading changes (or not) etc.
Make sure that just because your site is designed to scale big, that you don't make it too complicated and unusable. Twitter is simple but scalable and your site can be too. You might want to separate the more advanced options out or disable them by default so that people can use the simple case out of the box before trying other things out.
If you are successful and start to scale big, investment or sales should be close behind at which point, you can then start worrying about whether MySql is really up to the job rather than Cassandra or whether you need an asynchronous queueing system to speed things up. The point is that although you haven't predicted the future, you have produced enough slack to buy you the time to decide what you need to change and when.
Consider early on what you will do with side-channels like support. In some ways, it can be easier to include these in the main site but if you are scaling big the chances are that you will want this separated to reduce the burden on the main system. Only join systems that have to be joined!