Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
[dead]
on March 31, 2008 | hide | past | favorite


This brings up an interesting issue of scaling. Most larger sites would use a relational database for the backend store (right or wrong) and scale the frontend and backend independently. Because news.yc is using in-memory and disk storage, what are the options available to scale this type of backend architecture?

Assuming load was the issue earlier (which it most likely wasn't) would you move the store to something like a Berkley DB with a network accessible frontend or the newer Memcachedb? Maybe a relational database or even Amazon's new db webservice. If this site was your startup, what would your next move be?


In-memory, in-process storage is by far the fastest option (assuming you know a hashtable from a granola bar). A single server can easily handle 1000+ requests/second, which is plenty. The only reason not to do it is if your data doesn't fit in RAM, which is probably not the case for News.YC.


This is a scaling question. Clearly the question is, assuming the current set up is no longer enough, what would be the best way to scale this architecture.


Probably the easiest would be to use memcache for some of the repeated queries, specifically the items that are relatively static, such as the original post or posted item details.

It probably wouldn't help too much on the front page, since that'll update constantly and therefore just be a waste of overhead.


This doesn't really address the issue that there's no centralized information repository. With the content entirely in RAM, to distribute the app to two boxes would require either some sort of message broadcasting system, or re-architecting the data to be stored in some other way.


Agreed that in-memory storage is by far the fastest form of storage available to a single server setup like this. However, I was thinking hypothetically if load was the issue, how would people scale the backend. I was curious if one option would emerge as a favorite.

I disagree that the only reason not to do it is if the data won't fit in RAM. Going to a multi-server setup has many advantages including redundancy for situations such as these, of course, at the cost of increased administration.


These sort of questions have resulted in the traditional web setup you see today with a database backend for storage.

Any reasonably large site will (relatively) quickly exhaust RAM, especially if the same server is handling the requests.

Plus, traditionally there are large sets of data that needs to be searchable, hence the desirability of putting it in a database.

So instead of actually having things in memory, people now use something like memcache to reasonably simulate that for performance.


We're trying to figure it out. I just restarted the server. Will see if that fixes anything...


Later: Problem seems fixed. The server had been running for a comparatively long time, maybe two weeks. Presumably it (or Mzscheme) ran out of something. It seemed more important to make the site work than figure out what.


Are you sure it was news.yc? I was seeing slowness across different parts of the net at the same time... perhaps it's just a coincidence.


Yes, it was us.


I also noticed some random other slowness.


I notice random slowness everyday with Comcast...


i don't know about the nature of the YC.news architecture, but if it's an error that happens unexpectecly then it must be due to some scheduled automatically recurring function that fails somehow and eats up all memory...

so probably it must be on the web server or the stats logging

update: or the weighting algorithm, which is obviously a new addition


I think we should redirect the warp drive through the deflector array. Either that or jettison the anti-matter pods.

EDIT: We should also re-phase the dilithium crystals before we lose containment in the nacelles.

Ok Ok -- I deserve a score of zero for that one. But I couldn't resist. Sometimes humor can be instructive, right? : )


No, you fool!

Reroute auxiliary power through the plasma conduits!


You're missing "containment" somewhere in there.


couldn't resist to what exactly?


Assimilation.


Resistance is futile


it seems better now...


memory leak?


Did anyone else lose their comments, karma, etc? I'm down to 1 point. My profile settings seemed to also be reset, though it still remembers my saved articles.


Ugh. Looks like some accounts got corrupted when I shut down the server. If anyone else sees similar problems, please let me know here, and I'll fix your account from backups.


Data got corrupted because of server restarts? That doesn't sound like a robust design. I can see how some sessions can get kicked out but data corruption should not occur.

It didn't seem to happen to me but I feel bad for others who didn't realize their karma dropped.


That doesn't sound like a robust design.

I think you mean implementation. Software sometimes has bugs. But thanks for the implicit compliment.


That is just semantics, it sounds like he means design of the implementation.

However, I suppose it does depend on whether the design was flawed and allowed for data corruption or if it was strictly the result of a bug.


As I mentioned below, I have the same problem: all my karma and my comment history disappeared.


Maybe an isolated incident. My stats/history appears fine.


Thanks. I'll wait a bit longer and then ping the powers that be.


A lazily-evaluated, infinitely-recursive generator function decided to become ambitious.


I get an error like "unknown or expired link" like twice in a day or so. Is that somekind of bug? and btw, the site was slower this morning.


I get the "unknown or expired link" error almost every time I comment... probably because it always takes me 10 minutes to sift through the muddled essay I write to myself before figuring out what I actually want to say.

Anyway, the error is annoying as hell.


It seems to happen if you spend a long time writing a comment or preparing a submission.


How difficult would it be to host news.YC on X webservers with load balancing?

Is the implementation in arc currently horizontally scalable?


Arc is horizontally scalable so long as you can make sure each fnid is processed by the same server that created the continuation.

I haven't looked at the news.yc code specifically, but how data gets stored would determine how difficult it is to horizontally scale.


The entirety of NewsYC is kept in memory, if I recall correctly.


I hope no one minds if I kill this now that the problem seems to be fixed...


The comments/karma problem still isn't fixed though for ph0rque and myself.


Don't worry, I'll fix it. This happened once before and we were able to repair the damage from backups.


Great. We just submitted our YC app, and I wanted to make sure that my account history was going to be OK.


Hmmm... I seem to have lost my comment history and karma. Related?


You mean this isn't related to the spamming problem discussed earlier?


I thought it was just me.


So did I, but then it was a chance to use http://downforeveryoneorjustme.com/ - told me it was down for everyone.


seems back to normal now, it was really slow and i could not login about 2 hours ago


speaking of which, I would love to see news.yc on an ec2 setup. perhaps YC's own heroku..

edit: nevermind, forget heroku was just for ror apps


Now we're cooking


we were conversing a bit earlier and hypothesized it was a ddos from the spam thread discussed yesterday. was also thinking that maybe a spam filter put in place had a bug.


Out of RAM?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: