Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Wikipedia simplifies IT infrastructure by moving to one Linux vendor (computerworld.com)
13 points by davidw on Oct 11, 2008 | hide | past | favorite | 11 comments


Somewhat unrelated: why is anyone using physical servers instead of AWS, aside from legacy considerations?


One good reason is that Wikipedia was around before Amazon offered web services. Since then, while I'm sure they've added more broadband and servers, I would guess that they are not mass-replacing old servers, instead simply adding new ones when they are needed. Being a non-CPU-intensive website, 7-year-old servers are surely adequate in the backend.

I would guess that the newer, more powerful servers have a lot of memory and sit up front caching requests, while the older ones act as the database and process infrequent requests like edits.

That they are migrating to Ubuntu does not indicate that they are replacing their hardware. Surely they are just installing it on all their servers to make administration easier.

If Wikia did not already have a ton of existing machines and if Amazon gave them a bargain rate since they are non-profit, using AWS might make sense. As things exist though, using AWS would surely be less cost-effective.


True, I've updated my question to ask what I really want to know. Sorry to make your comment into an odd non-sequitor :)


This article is about Wikipedia which is ran by the not-for-profit Wikimedia Foundation - not Wikia which is a commercial wiki company founded by Wikipedia founder Jimmy Wales.

It's probably because it's not economical for the huge amount of traffic that Wikipedia shifts each month and there's no real advantage to moving AWS. I'm not totally convinced that AWS could deal with the traffic of the 8th biggest site on the web moving to use them overnight either.

Edit: The question has been changed making my answer a rather redundant non-sequitur.


Er, again, sorry for the edit.

You're probably right about Amazon handling the traffic, but here's the data I found trying to figure that out in case everyone's interested:

-Web services already used more bandwidth than amazon.com at the end of 07 (http://bit.ly/T0hX1)

-Wikipedia.org gets about the same amount of traffic as amazon.com (http://bit.ly/1I4o7L)


Wikipedia needs better uptime than AWS. Don't get me wrong, AWS is probably a good idea but not just yet.


Hmm, my best research shows that Wikipedia's uptime is 99.8% [1], and AWS' SLA gives 99.9% [2].

[1] http://uptime.pingdom.com/site/month_summary/site_name/en.wi...

[2] http://aws.amazon.com/about-aws/media-coverage/2007/10/10/wi...


But it will probably compound and net result will be more like 0.999 * 0.998 = 0.997


Why does anyone drive their own cars when they could instead ride in a taxi, ride Greyhound, fly, or rent a car? The simple answer is that different approaches are appropriate in different situations.

To actually answer your question, though, AWS really constrains the way you do things. On your own servers, you can run whatever programs you want. You can write your own server from scratch (see http://python.org/doc/1.5.2p2/lib/module-BaseHTTPServer.html), you can run background jobs that reprocess the data sitting on your server without having to pay any extra for it, you can add memory or hard drives or double your RAID when hardware upgrades are needed, you can be confident in your backup system, etc. etc.

Personally, I have old computers around at home. I have a mostly-always-on cable connection. I have an outward-facing domain name for it thanks to http://www.dyndns.com/ -- a dynamically-updating nameserver. If I want to put up a new service and I'm not too worried about uptime, I can install lighttpd and run it on an old computer right out of my house. I can easily hook up to the live server to investigate any errors or problems I'm having.

If my service starts getting popular such that uptime is important but there still aren't a ton of visitors, I can move to a lighttpd shared hosting solution: just move the server root to the shared host. Now I have a great development platform at home (I could even point beta.mywebsite.com at it if I wanted), and I can commit tested functionality to the shared host. Do you have a nice AWS model at home that you can use to debug your AWS application?

Once uptime and bandwidth become an issue, then I can make the decision on whether it makes sense to go with a higher-tier shared hosting, a dedicated server, or with a service like AWS. In most cases, though, it's going to be hard to make AWS do everything that you need it to do. And with the necessity to have failover solutions for the three or more major outages that Amazon has every year, the need for some other solution is always going to be there.

And through all that, we haven't even touched on security or other guarantees that must be made to customers for certain applications, or on those instances where AWS is more expensive. In short, there are plenty of reasons to go with a solution that does not solely depend on AWS.


For one thing, Wikipedia is international. They have servers in South Korea, the US, and the Netherlands. While EC2 offers three data centers, they're all located on the US east coast. It's a consideration when you're running a site as large as Wikipedia that aims for a lot of international traffic.


AWS run on physical servers too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: