Chinese Web Authoring

--A Work in Progress--

This is not meant as an exhaustive guide; instead it is just some notes on issues faced by those putting togther web sites that incorporate Chinese, including links to more information.

1. Character Sets and Charset tags

There are three main Chinese character sets one encounters on the internet: Big-5 (traditional characters, used initially in Taiwan); GB (simplified characters, used initially in the PRC); and Unicode (a new world-wide standard that includes both simplified and traditional forms, with more of either than the preceeding two character sets). Currently the windows world is a bit ahead of the Mac world with respect to Unicode implementation--Windows 98 and some Win 95 apps use support Unicode fonts--but Apple is doing its best to catch up. System 8.1 is already able to display the portion of Unicode that overlaps with Big-5 and/or GB, assuming one has the CLK installed, and System 8.5 makes further advances. See this page for a recent discussion by an Apple engineer about the future of Apple's Chinese language support.

When one first comes to a web page that contains Chinese, one typically has to tell one's browser to interpret the page as a Chinese character set in order to display the characters properly; one does this in Netscape, for instance, by using the View:Encoding submenu. It is possible for authors of web pages to remove this requirement, though, by encluding the following tag (or equivalent) in the header of their HTML file:

<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=Big-5">

For more discussion and for alternative charset IDs, see this page, maintained by Robert Smitheram of Middlebury College.

There is an additional benefit for including this tag in your documents: it may allow you to write Chinese in web authoring programs that are otherwise problematic. A major problem with web authoring software, discussed in more detail here, is that many programs automatically convert high-ascii symbols to HTML codes. This destroys the Chinese coding which relies on these high-ascii symbols. Some web authoring tools, though, allow you to designate, through the use of these headers, what charset the page is and thus whether to convert high-ascii symbols. On the Mac, GoLive CyberStudio 3.11 does precisely this, with excellent results. Adobe PageMill and Netscape Composer, on the other hand, destroy any Chinese they touch through these automatic conversions.

However, there also seems to be a drawback of including the header, at least as of this writing. According to reliable sources, the combination of Netscape 4.x and TwinBridge 4.8 through 4.98 (at least) fails to display Chinese when such a header is included. I have been told that this is a bug in TwinBridge, which I am willing to believe; I know of no better explanation.

© 1998 Stephen C. Angle
Last Updated: 11/24/98