On Bilingual Weblog I, some thoughts

Although the ideas and methods provided within the two articles of this topic are intended to explain and demonstrate on setting up a bilingual weblog, they can also apply to build multi-languages weblog or website in general, of course, except those tools or plugins particularly for the WordPress in this case.

If you are reading this article, certainly you are already have some interest on the idea of setting up a bilingual weblog. However, what exactly you meant by the “bilingual weblog”? The reason I ask this question is: The incentive of setting up a bilingual weblog maybe very obvious, that is to have more targeted visitors of your site thus increasing the potential of getting higher traffic to your site; but the concept of the bilingual weblog may not.

Ok, I have already hear the voice of yours over the screen whisper, “what? Is that a bilingual weblog just mean a weblog provides two different language to the visitor? What is mystery of this?”

Yes, to some extent, you are right, and I agree there is absolutely no mysterious point about this. What I want to say is just this: you must think carefully about following questions, which is what a serious webmaster should do:

Do you want to provide quality content to your user? Or do you just want to provide more than one language choice just for increase-the-traffic sake?
Do you want to your visitor can change the language on the fly? Or do you want to your users can set the language preference themselves, then when they come back they always see the language they chose?

What is the real reason you provide a bilingual website? Does this is because you can provide service or content for more people? Does this is because you really want to meet people speak other language other than your mother tone online?
Do you really have the knowledge of the other language? Do you want to moderate the content of your website? How do you moderate it?

I hope I didn’t scare you with these questions. As you can see, depending on the answers to these questions, the end result of a bilingual weblog may be very different between one and another. If you just want to provide more than one language, and don’t care about the quality of the content of your second language ( that is, the language is not your native language ), then you can just put automatically machine translated text ( for both interface and content ) on your web page, you don’t even need know the other language. If you just care about your web application’s interface( such as the navigation links, menus, buttons ), then you can just translate or localize those, and leave the content to its own nature ( whatever the language is ). But I think a “real” bilingual weblog should have both aspects covered ( change language on the fly and localize interface and content ), plus quality control. As a serious webmaster, you want to not only localize the interface, but also provide two language versions for your content; further more, you should make sure the content have a certain level of quality ——that is you at least not 100% rely on the machine translation, and you must minimize the errors in the other language’s content.

So, a bilingual weblog has these three aspect we should consider and implement: Interface, Content, and Flexibility.

The interface.
Translating the interface is much about the same process of doing Software Localization. Since a website is just a special kind of software, i.g., web application, so the two are actually the same thing. You have developed a software, when it comes to the day to think about expand your geo-target customer of your software, the first thing you think is localize the software for the target geographic users. Here, the concept of Software Localization is rarely involved other things except the interface of the software. Because there are nothing else need be and can’t be translated! Take windows for example, Windows XP English version and Windows XP Simplified Chinese version have different interface ( menus, buttons, document, etc ), but they all have the same set of features, functionality and methods of manage the system. One plus one is always equals two, no matter where you are 🙂 .

In the realm of web, however, the content are more important. So basically, there is another question of the content localization( which will be discussed in just a minute ). But this question seems can be separated from “Software Localization” entirely, since which and what content will be put on the web is up to the website owners not the programmers. No wonder that many today’s website systems or so called “CMS ( content manage system )”-es just have considered the multi-language interface support out of box. Even many official plugins —— a plugin is a set of piece program code which can only works when it “plug into” ( installed via particular methods ) the whole system —— are considered have done the job if they have the ability of change the interface between multiple languages.

In order to have different language, you must have language file, or language pack some software call it. You can manually translate or use some tools to get the language file. If you decide do translate yourself, it means you must translate every word in the user interface, maybe plus the document, and the help you can get is very limited if you don’t use certain dedicated tools and methods. Some software, such as drupal ( visit: drupal.org ) does provide some convenient ways, but the easier and better way is to find already translated language files by others. Unless you translate everything yourself ( obvious it’s very tedious ), you can find the language file(s), and you can edit it or modify it with certain tool. After you done with the language file, normally you need to put it in a particular location. Then after you have your weblog configured correct, you have a bilingual weblog ready to go. In many case, you will have some settings under your administration panel of your website, you can select the language there; better solution is to have a selection menu on the front end of your website, so users can select the preferred language themselves ( more about flexibility in the later of this article ).

The Content.
The translation of your content is actual a part of your real work, just as you provide content to your weblog without a language choice. If you serious about this, you should keep your content’s quality in good shape. that means you will not use automatically machine translated text directly. However, it doesn’t mean you can’t make use of them to help you do better translation work, even you think yourself are a very good bilingual person, a handy dictionary is a good assistant. Most free machine translation service mainly provided as one of the services by some big internet company, such as Google’s language tools, Yahoo’s babelfish; or provided by other specialized language translation service providers, such as translation2.paralink.com by PROMT Ltd, e-promt.com. Please note, these providers also provide some tools or web widgets which you can use them directly on your webpage to translate your whole site into an other language; some even provide open APIs allow programmers to implement plugins to utilize their service. And I must emphasize, I am not saying that these services are not worthy using, but I just think today’s technology has not reach a such high level that the machine translated text is very accurate, especially translate a whole article ( I am not trust them even when I need translate a whole sentence ), in my own opinion of view, you can only use it like a dictionary.

And I have thought about one of the technical cause for my cautions of using machine translated text. And I want to share it here: The fact is that most these machine translated text were translated on the provider’s server, and then, they were transfered to our web site’s web page. So, the result text are not come from our web application’s database ( if you use one ) or our local web server’s file, and they normally will also not be stored in our database our locally on our web server. If we have some kind of tool that can capture those machine translated text and store them in the database, then provide the site’s author or editor some opportunities to make use of these text… don’t you think that would be a worthy feature, and we may use the online free translate services more often and use them to do more serious things?

Interestingly enough, there is another article published on quickonlinetips.com, named: Human Translators Superior to Automatic Blog Translations

After installing the Global wordpress translation plugin, I have been subjected to constant Forbidden 403 Errors on all the translated pages. Though caching is essential to reduce the load on these translation services and prevent your blog from being blocked as spam, even 24 hours caching failed to work and a few pages that worked…

Alone with this 403 errors, it discussed many interested aspects of a multi-language blog, including that the machine translating is not accurate enought. You can check this article out there.

Anyway, my main point is that you should provide different language versions of your content yourself. You can do it in an One-To-One fashion, that is, for every entry of your blog, you provide two language version. Or, of course, you can do it Non-One-To-One, but remember to put a friendly message, such as “the article in the language you request is not available right now”, to the end user. The basic purpose is the quality control of your content. In order to do that, I think you must have the enough knowledge of the language you want to add to your weblog.

In the history of the “Software Localization”, there was a very serious problem and it still has very strong influence to todays localized software. The problem is: “hard code” localized language text into the software’s source code. So, the reality is that some softwares are actually can’t run in different language system than the language it originally developed. For example, Windows 95 in Simplified Chinese version used something called “code page ” which defines the character set, in this case, GB2312; and the English version may just need the Character set, ISO-8859-1; and you will found some software can run on the English version can’t run on the Chinese version or if it can run, the interface is just a mess. Similar experience can be found on the web ( yes, even today’s Web 2.0 era ). Just try a couple of websites, and change your web browser’s Character Encoding option, I am sure most of you will be surprised what you will find out! The main reason causing this situation is that in the early days, there has no a unified character set which can include any language character our humans use on this planet, and each country or region has its own standard character set. This situation was not solved until the arrival of Unicode. Use Unicode, you can include any language character plus much more. Before, the idea of software can switch between different language just use one mouse click was hard to implement, but now it is just comment sense. Today, any software don’t use Unicode means lack flexibility, any website don’t use UTF-8 means it may cause some problem at some point of time.

please note: This is not an article of specially discuss Character Encoding, so the concept of Code Page, Character set and Unicode will be not explained here. If you are interested in Character Encoding, you can visit sitepoint.com: The Definitive Guide to Web Character Encoding.

The flexibility aspect of a bilingual weblog is not only you have one click feature to your end user to let them change the language, but also you can provide it on where is should be provided. What is this mean? It means that, if you have a webpage or an article in two different language, you should provide a language choose button or link switches between the languages. Same is ture if you have two different language versions of RSS feed, user comment and so on. Provide a site-wide language choice is essential but not enough, what is better is that your registered users can have their own user account’s option which can set the language they prefer to view your weblog ( But since most weblog is just one author, and the end user may need not register to post comment but never can edit or write articles, so your weblog may not need much more complex user account control panel like those large community forum or CMS website has. ).

Ok. I hope you now have a clearer picture of what a bilingual weblog is. Enough with the theory, in the part II, we will start to actual set up a English and Chinese weblog use WordPress 2.1.3 just like this blog you are reading.

No Comments - Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *