Previous Next Contents

2. Difficulties of using Chinese on Linux System

This section makes an attempt to do a general description for the possible obstacles in using Chinese on Linux; then you could find the key points out much easier as you meet with problems. As a matter of fact, the shortcomings described here not only appear on Linux but also the other system. Even more, the whole computers environments are concerned. If this section is not suited for your tastes or you are eager to act directly, then you can jump onto the section Display and Input Chinese!

A Chinese word is composite of two bytes in computers, as we all know. The most popular encoding methods including BIG5 codes available in the area of Taiwan and GB codes available in the mainland China. And the first byte is almost bigger than numeric values 128, which is what we called the non-ASCII codes.(The ASCII codes means codes smaller than 128.)

Yes! Then so what? Here are the points! Because of different kinds of reasons, in the early days, many programs didn't consider the possibility of non-ASCII codes as a part of entering data.

These kinds of programs always assume that the data prepared to manipulation are all limited in the range of ASCII codes, and the most worst is that when they meet with non-ASCII codes, an assumption of their non-existence and a truncation of the 8th bit is the most frequent method they take. This is the so called 8-bit clean problem.

Your program, for example, always take it for granted that your inputs are all the 7-bit-width ASCII codes. When you enter Chinese words, it will erase the 8th bit so that the inputs under circumstances of Chinese will all become disturbed codes.

Communication programs on Internet are usually only could transmit 7-bit data. A notorious substance is the earlier sendmail program. sendmail can only send and receive 7-bit mails, causing that the strategies of many odd encoding methods, Encoding of which make the receivers an excessive disturbance, are recognized as sending out Chinese mails(like uuencode, base64, QP and so on). (Frequently, I thought by myself that if the founders of emails could have put much foresight on it, then we could have little problems nowadays perhaps.)

This problem seems to be more complicated on Internet. Even though you and your receivers all have the machines installed with sendmail program of which might manipulate with Chinese mails, the receiver might get disturbed mails in any way. This is because this mail before its arrival at the target may travel over several hosts settled on Internet, if one of the hosts' sendmail cuts the 8th bit off, then things go down. To the programs with the architecture of client/server, the problem maybe on the end of client, or on the end of server; otherwise both of them are.

Applications which are incapable of identifying the Chinese encoding are also a major problem, apart from being unable to deal with non-ASCII codes' data. That is, most programs(even if they can deal with 8-bit data accurately) all take a Chinese word as two individual bytes. This won't cause problems under some conditions, but it will show an unfortunate disaster under some spots.

The most obvious matter is that, for instance, even if you can input Chinese words properly, but the whole word will be split into two parts, only one byte(column) can backspace on monitor and the redundant half one then become a disturbed code as you hit the backspace key once trying to delete a complete word. More over than that, text editor might change new line at the middle of a Chinese word and then disturbed codes occurred or might think that a long Chinese sentence as a long English sentence without changing to a new line, making the picture of screen ugly and chaotic.

More worse matters are there! Some Chinese words contain special codes which correspond to some particular meaning for some applications might make these programs producing severe faults when meeting with that codes or make just collapse.

Here below will try to propose some resolved methods but segmental, incomplete and also unsatisfactory. Only when all softwares can fit with Chinese, then the problems could really resolve perhaps.

However, more and more programs have noticed the significance of internationalization, for example, most hosts' sendmail programs all can cope with 8-bit mails exactly --- Not only transmitting Chinese mails need 8-bit, but also many multi-media mails need 8-bit. Lots of softwares already don't need to modify at all or just open some special options for the purpose of using Chinese. Simultaneously, there are more and more persons devoting to the birth of Chinese softwares. Let's us wait and expect.


Previous Next Contents