IBM Computer, Laptops and Servers

Back Homepage Content Directory Resource Guide Blog

Eitherconvertthemtoconfor...

Either convert them to conform to some new document type (with or without a DTD) and write a stylesheet to go with them; or edit them to conform to XHTML. It is necessary to convert existing HTML files because XML does not permit end-tag minimization (missing , etc), unquoted attribute values, and a number of other shortcuts which are normal in most HTML DTDs. However, many HTML authoring tools already produce almost (but not quite) well-formed XML. As a preparation for XML, the W3C's HTML Tidy program can clean up some of the formatting mess left behind by inadequate HTML editors, and even separate out some of the formatting to a stylesheet, but there is usually still some hand-editing to do.
Converting to a new document type
If you want to move your files out of HTML into some other DTD entirely, there are already many native XML application DTDs, and several XML versions of popular SGML DTDs like TEI and DocBook to choose from. There is a pilot site run by CommerceNet (http://www.xmlx.com/) for the exchange of XML DTDs. Alternatively you could just make up your own markup: so long as it makes sense and you create a well-formed file, you should be able to write a CSS or XSLT stylesheet and have your document displayed in a browser.
Converting valid HTML to XHTML
If your HTML files are valid (full formal validation with an SGML parser, not just a simple syntax check), then try validating them as XHTML. If you have been creating clean HTML without embedded formatting then this process should throw up only mismatches in upper/lowercase element and attribute names, and empty elements (plus perhaps the odd non-standard element type name if you use them). Simple hand-editing or a short script should be enough to fix these changes. If your HTML validly uses end-tag omission, this can be fixed automatically by a normalizer program like sgmlnorm (part of SP) or by the sgml-normalize function in an editor like Emacs/psgml (don't be put off by the names, they both do XML). If you have a lot of valid HTML files, could write a script to do this in a programming language which understands SGML/XML markup (such as Omnimark, Balise, SGMLC, or a system using one of the SGML libraries for Perl, Python, or Tcl), or you could even use editor macros if you know what you're doing.
Converting invalid HTML to well-formed XHTML
If your files are invalid HTML (95% of the Web) they can be converted to well-formed DTDless files as follows: replace the DOCTYPE Declaration with the XML Declaration . If there was no DOCTYPE Declaration, just prepend the XML Declaration.
change any EMPTY elements (eg every , , , , and in the header, and every ,
, , , , , , , ,

Laptop Battery , , , , , , , , , , , and in the body of the document) so that they end with /> instead, for example ;
make all element names and attribute names lowercase;
ensure there are correctly-matched explicit end-tags for all non-empty elements; eg every

According to the indictment, Jones would steal various IBM and Penguin computer servers from Verisign's warehouse in Virginia and sell them to Johnson. Johnson would then sell the servers to several individuals, who would sometimes place them for sale on eBay. As a result of this scheme, the indictment alleges that Jones and Johnson caused Verisign to lose more than $120, 000 worth of computer equipment. In the indictment, Jones and Johnson are charged in three counts with causing the interstate transportation of stolen property, namely IBM 330 and 335 servers, in violation of 18 U.S.C.

Thinkpad must have a , etc;
escape all ensure all attribute values are in quotes.
Be aware that many HTML browsers may not accept XML-style EMPTY elements with the trailing slash, so the above changes may not be backwards-compatible. An alternative is to add a dummy end-tag to all EMPTY elements, so becomes . This is still valid XML provided you guarantee never to put any text content in such elements. Adding a space before the slash (eg ) may also fool older browsers into accepting XHTML as HTML. If your HTML files fall into this category (HTML created by some WYSIWYG editors is frequently invalid) then they will almost certainly have to be converted manually, although if the deformities are regular and carefully constructed, the files may actually be almost well-formed, and you could write a program or script to do as described above. The oddities you may need to check for include: do the files contain markup syntax errors? For example, are there any missing angle-brackets, backslashes instead of forward slashes on end-tags, or elements which nest incorrectly (eg an element starting inside another but ending outside)?
are there any URLs (eg in hrefs or srcs) which use backslashes instead of forward slashes?
do the files contain markup which conflicts with HTML DTDs, such as headings or lists inside paragraphs, list items outside list environments, header elements like preceding the first , etc?
do the files use imaginary elements which are not in any known HTML DTD? (large amounts of these are used in proprietary markup systems masquerading as HTML). Although this is easy to transform to a DTDless well-formed file (because you don't have to define elements in advance) most proprietary or browser-specific extensions have never been formally defined, so it is often impossible to work out meaningfully where the element types can be used.
Are there any non-ISO Latin-1 (8859-1) characters or wrongly-coded characters in your files? Look especially for native Apple Mac characters left by careless designers, or any of the illegal characters (the 32 characters at decimal codes 128-159 inclusive) inserted by MS-Windows editors. These need to be converted to the correct characters in ISO 8859-1 or the relevant plane of Unicode (and the XML Declaration should show iso-8859-1 encoding unless you specifically know otherwise).
Do your files contain malformed (Mosaic/Netscape-style) comments? Comments must look with double-dashes each end and no double dashes in between (safest: no multiple dashes in between).
If you answer Yes to any of these, you can save yourself a lot of grief by fixing those problems first before doing anything else. You will likely then be getting close to having well-formed files. Markup which is syntactically correct but semantically meaningless or void should be edited out before conversion. Examples are spacing devices such as repeated empty paragraphs or linebreaks, empty tables, invisible spacing GIFs etc: XML uses stylesheets, so you won't need any of these. Unfortunately there is rather a lot of work to do if your files are invalid: this is why many professional Webmasters will always insist that only valid or well-formed files are used (and why you should instruct designers to do the same), in order to avoid unnecessary manual maintenance and conversion costs later.

Computer memory is the quickest, cheapest, and easiest way to improve the performance of your system. Find RAM memory upgrades for desktops, laptops, servers, and printers all backed by a lifetime warranty and guaranteed compatible with your computer. Shipping is an everyday low price of $1.99! Computer Memory Outlet sells memory compatible with all leading computer manufacturers like Dell, Apple, Compaq, HP, Sony, IBM, Lenovo, and many more.”

[ Comment, Edit or Article Submission ]

Share this:

Add To Yahoo MyWeb Add To Google Bookmarks Add To Furl Fav This With Technorati Add To Newsvine Add To Bloglines Add To Ask Add To Windows Live Add To Slashdot Stumble This Digg This Add To Del.icio.us Add To Reddit
Nov December 2008 Jan
Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      

IBM Computer, Laptops and Servers Blog on Technorati Related Blog of IBM Computer, Laptops and Servers on Sphere
Content Directory
Resource Guide


TheNerds PC Hardware Office Electronics GPS

Website Links
IBM Computer, Laptops and Servers Copyright © 2008 www.ibmfans.com. All rights reserved. Site Map
Homepage | Blog | Advertise | Privacy Policy | Disclaimer | Contact Us | Links