Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"... Translating is just the next step beyond that. It's a hard problem but not insurmountable. If someone wanted an idea for a company to sell to Google ..."

So parse each image, use text recognition to find the text (determine language), then re-render the text using the same font (have to recognise fonts) & render the changed image (or just re-build an approximation), the same colour (edge cases like colour fading) then store the image and display it.

A much better way would be just to substitute text extracted from the image. At least you could read it (technically it works) but the result would (without some clever hackery) look crap.

Maybe a better way would simply extract the text & rebuild a page in a standard format?

"... It's a hard problem but not insurmountable. If someone wanted an idea for a company to sell to Google... ..."

good point.



Google just open sourced their OCR the other day, it's multi-lingual, has natural language modeling, and a plug-in system for lay-out analysis and character recognition.

http://googleblog.blogspot.com/2007/06/google-and-open-source-ocr.html




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: