"... Translating is just the next step beyond that. It's a hard problem but not insurmountable. If someone wanted an idea for a company to sell to Google ..."
So parse each image, use text recognition to find the text (determine language), then re-render the text using the same font (have to recognise fonts) & render the changed image (or just re-build an approximation), the same colour (edge cases like colour fading) then store the image and display it.
A much better way would be just to substitute text extracted from the image. At least you could read it (technically it works) but the result would (without some clever hackery) look crap.
Maybe a better way would simply extract the text & rebuild a page in a standard format?
"... It's a hard problem but not insurmountable. If someone wanted an idea for a company to sell to Google... ..."
Google just open sourced their OCR the other day, it's multi-lingual, has natural language modeling, and a plug-in system for lay-out analysis and character recognition.
So parse each image, use text recognition to find the text (determine language), then re-render the text using the same font (have to recognise fonts) & render the changed image (or just re-build an approximation), the same colour (edge cases like colour fading) then store the image and display it.
A much better way would be just to substitute text extracted from the image. At least you could read it (technically it works) but the result would (without some clever hackery) look crap.
Maybe a better way would simply extract the text & rebuild a page in a standard format?
"... It's a hard problem but not insurmountable. If someone wanted an idea for a company to sell to Google... ..."
good point.