“I am creating an Elgg network and I am expecting 10 million users, can it cope?”
This question gets asked in one form or another almost every week, but I believe this is the wrong question to be asking. Perhaps the more pertinent scalability question is:
“I am creating an Elgg network, how do I attract 10 million users?”
This is by far the hardest question to answer and it must be solved before you can seriously address the comparatively simple task of hardware and software scalability.
Attracting users can be accomplished in many ways but it is mainly a matter of marketing, and of course to have a killer idea. As Ben discussed in his presentation at the recent Elgg Conference, this idea must be as useful to user 1 as user 10 million.
The idea has to be solid from day one, so forget all the trendy “long tail” and “wisdom of crowds” buzzwords!
Once you manage to solve this most tricky of problems, you can begin to look at the infrastructure. So, can Elgg handle 10 million users out of the box?
Simply, no script in the world can handle this level of usage straight away without some modification and a serious investment in both time and money. You will not be able to unpack Elgg on a cheap shared host and have it handle 10 million users.
This is not an issue with Elgg’s design (which actually lends itself to many scalability techniques), but simple realism. Elgg has had substantial work done on scalability and optimisation – reducing queries, caching etc – and currently performs very well page for page against competitors like Ning and Buddypress.
Asking how many users an Elgg install can support is also a pointless question, because the answer is always going to be “it depends”. How many users Elgg can support depends on your hardware, your host (shared or dedicated), your database server, how your users behave and how many of them are active at any given time.
So what should you pay attention to?
Elgg itself is fairly optimal, and will improve over time. If you are dealing with millions of user you will be wanting to look at your server infrastructure – database server, bandwidth, memory, caching at every level. After this you can look at customised code to squeeze out the last percentage points of performance.
If you are serious about handling high load there is no avoiding the need to spend some time and money investing in your infrastructure. But, these are good problems to have, because it means that you have a successful network!
So in conclusion, my answer to the scalability question is “Don’t worry about it until you have to worry about it!”, get your users in first. Make a killer service that is useful from day one, and then worry about how you will handle millions of concurrent users.
Scalability is a largely solved problem… building a successful service isn’t, and is the thing you should be concerned with.