> By definition you can't have a "WYSIWYG" for semantical information, and if you give people a WYSIWYG HTML editor to enter content, they'll just tweak the code until it "looks" right, which will bomb when said content has to be transferred to a new medium (say, non-HTML).
That problem can be mostly fixed with clever programming.
Does the content seem to include a bunch of one-line paragraphs, all of which share roughly the same font that is different from the rest of the content? Treat them as headlines. Have a bit of tolerance for people who can't distinguish 13pt from 14pt, but if the sizes or other styling differ considerably, try to decide which ones are <h1> and which ones are <h2>, etc.
Does the content include a bunch of consecutive paragraphs that begin with numerals (1. 2. 3.) or hyphens/asterisks? Turn them into <ol> and <ul> respectively. C'mon, even Microsoft Word knows how to do this, and the user can override it if the program guesses wrong. There's no reason why an open-source HTML editor can't do the same or better.
You could even have the best of both worlds: the editor might recognize Markdown syntax and automatically convert them to the corresponding HTML markup for live preview. For example, if you begin and end a word with an underscore, put an <em> around it. (You can already configure Microsoft Word to do this.) If you begin a paragraph with ###, >, or four spaces, turn it into <h3>, <blockquote>, and <pre><code> respectively.
Okay, but what if a piece of user-submitted content is so messed up that it is impossible to get any semantical information out of it? Well, think about it this way: whoever produced that content probably didn't intend to convey any semantical information anyway. By failing to determine the semantic structure of that content, the computer is actually guessing the author's intention correctly. Just slap a CSS normalizer on that piece of cow dung, and it will look more or less the same in all current and future browsers (which is probably what the author intended). The only thing that matters is that you don't produce such content. If someone else wants to shoot themselves in the balls with Comic Sans, why stop them? It's what they want.
The primary reason why WYSIWYG editors carry a stigma is that the first generation of popular editors, such as TinyMCE and FCKEditor (the precursor to CKEditor), tended to produce horribly broken markup. But that was 10 years ago, and now we have much better editors. And we can make them even better if we want to.
That problem can be mostly fixed with clever programming.
Does the content seem to include a bunch of one-line paragraphs, all of which share roughly the same font that is different from the rest of the content? Treat them as headlines. Have a bit of tolerance for people who can't distinguish 13pt from 14pt, but if the sizes or other styling differ considerably, try to decide which ones are <h1> and which ones are <h2>, etc.
Does the content include a bunch of consecutive paragraphs that begin with numerals (1. 2. 3.) or hyphens/asterisks? Turn them into <ol> and <ul> respectively. C'mon, even Microsoft Word knows how to do this, and the user can override it if the program guesses wrong. There's no reason why an open-source HTML editor can't do the same or better.
You could even have the best of both worlds: the editor might recognize Markdown syntax and automatically convert them to the corresponding HTML markup for live preview. For example, if you begin and end a word with an underscore, put an <em> around it. (You can already configure Microsoft Word to do this.) If you begin a paragraph with ###, >, or four spaces, turn it into <h3>, <blockquote>, and <pre><code> respectively.
Okay, but what if a piece of user-submitted content is so messed up that it is impossible to get any semantical information out of it? Well, think about it this way: whoever produced that content probably didn't intend to convey any semantical information anyway. By failing to determine the semantic structure of that content, the computer is actually guessing the author's intention correctly. Just slap a CSS normalizer on that piece of cow dung, and it will look more or less the same in all current and future browsers (which is probably what the author intended). The only thing that matters is that you don't produce such content. If someone else wants to shoot themselves in the balls with Comic Sans, why stop them? It's what they want.
The primary reason why WYSIWYG editors carry a stigma is that the first generation of popular editors, such as TinyMCE and FCKEditor (the precursor to CKEditor), tended to produce horribly broken markup. But that was 10 years ago, and now we have much better editors. And we can make them even better if we want to.