Internationalization

 


 

Before Internationalization

Suppose that you've written a program that displays three messages, as follows:


public class NotI18N {

    static public void main(String[] args) {

        System.out.println("Hello.");
        System.out.println("How are you?");
        System.out.println("Goodbye.");
    }
}
You've decided that this program needs to display these same messages for people living in France and Germany. Unfortunately your programming staff is not multilingual, so you'll need help translating the messages into French and German. Since the translators aren't programmers, you'll have to move the messages out of the source code and into text files that the translators can edit. Also, the program must be flexible enough so that it can display the messages in other languages, but right now no one knows what those languages will be.

It looks like the program needs to be internationalized.

 


 

After Internationalization

The source code for the internationalized program follows. Notice that the text of the messages is not hardcoded.

import java.util.*;

public class I18NSample {

    static public void main(String[] args) {

        String language;
        String country;

        if (args.length != 2) {
            language = new String("en");
            country = new String("US");
        } else {
            language = new String(args[0]);
            country = new String(args[1]);
        }

        Locale currentLocale;
        ResourceBundle messages;

        currentLocale = new Locale(language, country);

        messages = ResourceBundle.getBundle("MessagesBundle",
                                           currentLocale);
        System.out.println(messages.getString("greetings"));
        System.out.println(messages.getString("inquiry"));
        System.out.println(messages.getString("farewell"));
    }
}
To compile and run this program, you need these source files:

 


 

Running the Sample Program

The internationalized program is flexible; it allows the end user to specify a language and a country on the command line. In the following example the language code is fr (French) and the country code is FR (France), so the program displays the messages in French:

% java I18NSample fr FR
Bonjour.
Comment allez-vous?
Au revoir.
In the next example the language code is en (English) and the country code is US (United States) so the program displays the messages in English:

% java I18NSample en US
Hello.
How are you?
Goodbye.

 


 

Internationalizing the Sample Program

If you look at the internationalized source code, you'll notice that the hardcoded English messages have been removed. Because the messages are no longer hardcoded and because the language code is specified at run time, the same executable can be distributed worldwide. No recompilation is required for localization. The program has been internationalized.

You may be wondering what happened to the text of the messages or what the language and country codes mean. Don't worry. You'll learn about these concepts as you step through the process of internationalizing the sample program.

 

 

1. Create the Properties Files

A properties file stores information about the characteristics of a program or environment. A properties file is in plain-text format. You can create the file with just about any text editor.

In the example the properties files store the translatable text of the messages to be displayed. Before the program was internationalized, the English version of this text was hardcoded in the System.out.println statements. The default properties file, which is called MessagesBundle.properties, contains the following lines:

greetings = Hello
farewell = Goodbye
inquiry = How are you?
Now that the messages are in a properties file, they can be translated into various languages. No changes to the source code are required. The French translator has created a properties file called MessagesBundle_fr_FR.properties, which contains these lines:

greetings = Bonjour.
farewell = Au revoir.
inquiry = Comment allez-vous?
Notice that the values to the right side of the equal sign have been translated but that the keys on the left side have not been changed. These keys must not change, because they will be referenced when your program fetches the translated text.

The name of the properties file is important. For example, the name of the MessagesBundle_fr_FR.properties file contains the fr language code and the FR country code. These codes are also used when creating a Locale object.

 

 

2. Define the Locale

The Locale object identifies a particular language and country. The following statement defines a Locale for which the language is English and the country is the United States:

aLocale = new Locale("en","US");

The next example creates Locale objects for the French language in Canada and in France:

caLocale = new Locale("fr","CA");
frLocale = new Locale("fr","FR");

The program is flexible. Instead of using hardcoded language and country codes, the program gets them from the command line at run time:

String language = new String(args[0]);
String country = new String(args[1]);
currentLocale = new Locale(language, country);

Locale objects are only identifiers. After defining a Locale, you pass it to other objects that perform useful tasks, such as formatting dates and numbers. These objects are locale-sensitive because their behavior varies according to Locale. A ResourceBundle is an example of a locale-sensitive object.

 

 

3. Create a ResourceBundle

ResourceBundle objects contain locale-specific objects. You use ResourceBundle objects to isolate locale-sensitive data, such as translatable text. In the sample program the ResourceBundle is backed by the properties files that contain the message text we want to display.

The ResourceBundle is created as follows:

message = ResourceBundle.getBundle("MessagesBundle",
                                   currentLocale);

The arguments passed to the getBundle method identify which properties file will be accessed. The first argument, MessagesBundle, refers to this family of properties files:

MessagesBundle_en_US.properties
MessagesBundle_fr_FR.properties
MessagesBundle_de_DE.properties

The Locale, which is the second argument of getBundle, specifies which of the MessagesBundle files is chosen. When the Locale was created, the language code and the country code were passed to its constructor. Note that the language and country codes follow MessagesBundle in the names of the properties files.

Now all you have to do is get the translated messages from the ResourceBundle.

 

 

4. Fetch the Text from the ResourceBundle

The properties files contain key-value pairs. The values consist of the translated text that the program will display. You specify the keys when fetching the translated messages from the ResourceBundle with the getString method. For example, to retrieve the message identified by the greetings key, you invoke getString as follows:

String msg1 = messages.getString("greetings");
The sample program uses the key greetings because it reflects the content of the message, but it could have used another String, such as s1 or msg1. Just remember that the key is hardcoded in the program and it must be present in the properties files. If your translators accidentally modify the keys in the properties files, getString won't be able to find the messages.

 

 

Conclusion

That's it. As you can see, internationalizing a program isn't too difficult. It requires some planning and a little extra coding, but the benefits are enormous. To provide you with an overview of the internationalization process, the sample program in this lesson was intentionally kept simple. As you read the lessons that follow, you'll learn about the more advanced internationalization features of the Java programming language.

 


 

Checklist

Many programs are not internationalized when first written. These programs may have started as prototypes, or perhaps they were not intended for international distribution. If you must internationalize an existing program, take the following steps:

 

 

Identify Culturally Dependent Data

Text messages are the most obvious form of data that varies with culture. However, other types of data may vary with region or language. The following list contains examples of culturally dependent data:
  • Messages
  • Labels on GUI components
  • Online help
  • Sounds
  • Colors
  • Graphics
  • Icons
  • Dates
  • Times
  • Numbers
  • Currencies
  • Measurements
  • Phone numbers
  • Honorifics and personal titles
  • Postal addresses
  • Page layouts

 

 

Isolate Translatable Text in Resource Bundles

Translation is costly. You can help reduce costs by isolating the text that must be translated in ResourceBundle objects. Translatable text includes status messages, error messages, log file entries, and GUI component labels. This text is hardcoded into programs that haven't been internationalized. You need to locate all occurrences of hardcoded text that is displayed to end users. For example, you should clean up code like this:

String buttonLabel = "OK";
...
JButton okButton = new JButton(buttonLabel);

See the section Isolating Locale-Specific Data for details.

 

 

Deal with Compound Messages

Compound messages contain variable data. In the message "The disk contains 1100 files." the integer 1100 may vary. This message is difficult to translate because the position of the integer in the sentence is not the same in all languages. The following message is not translatable, because the order of the sentence elements is hardcoded by concatenation:

Integer fileCount;
...
String diskStatus = "The disk contains " + fileCount.toString() 
                    + " files.";
Whenever possible, you should avoid constructing compound messages, because they are difficult to translate. However, if your application requires compound messages, you can handle them with the techniques described in the section Messages.

 

 

Format Numbers and Currencies

If your application displays numbers and currencies, you must format them in a locale-independent manner. The following code is not yet internationalized, because it will not display the number correctly in all countries:

Double amount;
TextField amountField;
...
String displayAmount = amount.toString();
amountField.setText(displayAmount);

You should replace the preceding code with a routine that formats the number correctly. The Java programming language provides several classes that format numbers and currencies.

 

 

Format Dates and Times

Date and time formats differ with region and language. If your code contains statements like the following, you need to change it:

Date currentDate = new Date();
TextField dateField;
...
String dateString = currentDate.toString();
dateField.setText(dateString);

If you use the date-formatting classes, your application can display dates and times correctly around the world.

 

 

Use Unicode Character Properties

The following code tries to verify that a character is a letter:

char ch;
...
if ((ch >= 'a' && ch <= 'z') || 
    (ch >= 'A' && ch <= 'Z'))       // WRONG!
Watch out for code like this, because it won't work with languages other than English. For example, the if statement misses the character ü in the German word Grün.

The Character comparison methods use the Unicode standard to identify character properties. Thus you should replace the previous code with the following:

char ch;
...
if (Character.isLetter(ch))

 

 

Compare Strings Properly

When sorting text you often compare strings. If the text is displayed, you shouldn't use the comparison methods of the String class. A program that hasn't been internationalized might compare strings as follows:

String target;
String candidate;
...
if (target.equals(candidate)) {
...
if (target.compareTo(candidate) < 0) {
...

The String.equals and String.compareTo methods perform binary comparisons, which are ineffective when sorting in most languages. Instead you should use the Collator class, which is described in the section Comparing Strings.

 

 

Convert Non-Unicode Text

Characters in the Java programming language are encoded in Unicode. If your application handles non-Unicode text, you might need to translate it into Unicode.

 


 

Lesson: Setting the Locale

An internationalized program can display information differently throughout the world. For example, the program will display different messages in Paris, Tokyo, and New York. If the localization process has been fine-tuned, the program will display different messages in New York and London to account for the differences between American and British English. How does an internationalized program identify the appropriate language and region of its end users? Easy. It references a Locale object.

A Locale object is an identifier for a particular combination of language and region. If a class varies its behavior according to Locale, it is said to be locale-sensitive. For example, the NumberFormat class is locale-sensitive; the format of the number it returns depends on the Locale. Thus NumberFormat may return a number as 902 300 (France), or 902.300 (Germany), or 902,300 (United States). Locale objects are only identifiers. The real work, such as formatting and detecting word boundaries, is performed by the methods of the locale-sensitive classes.

 


 

Creating a Locale

To create a Locale object, you typically specify the language code and the country code. For example, to specify the French language and the country of Canada, you would invoke the constructor as follows:

aLocale = new Locale("fr", "CA");
The next example creates Locale objects for the English language in the United States and Great Britain:

bLocale = new Locale("en", "US");
cLocale = new Locale("en", "GB");
The first argument is the language code, a pair of lowercase letters that conform to ISO-639. You can find a full list of the ISO-639 codes at http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt.

The following table lists just a few of the language codes.

Language Code Description
de German
en English
fr French
ja Japanese
jw Javanese
ko Korean
zh Chinese

The second argument of the Locale constructor is the country code. It consists of two uppercase letters and conforms to ISO-3166. A copy of ISO-3166 can be found at http://www.chemie.fu-berlin.de/diverse/doc/ISO_3166.html.

The following table contains several sample country codes.

Country Code Description
CN China
DE Germany
FR France
IN India
US United States

If you need to distinguish your Locale further, you can specify a third parameter, called the variant code. Usually you specify variant codes to identify differences caused by the computing platform. For example, font differences may force you to use different characters on Windows and UNIX. You could then define the Locale objects with the variant codes WINDOWS and UNIX as follows:

xLocale = new Locale("de", "DE", "UNIX");
yLocale = new Locale("de", "DE", "WINDOWS");
The variant codes conform to no standard. They are arbitrary and specific to your application. If you create Locale objects with variant codes only your application will know how to deal with them.

The country and variant codes are optional. When omitting the country code, you specify a null String. You may create a Locale for the English language as follows:

enLocale = new Locale("en", "");

For your convenience the Locale class provides constants for some languages and countries. For example, you can create Locale objects by specifying the JAPANESE or JAPAN constants. The Locale objects created by the following two statements are equivalent:

j1Locale = Locale.JAPAN;
j2Locale = new Locale("ja", "JP");

When you specify a language constant, the country portion of the Locale is undefined. The next two statements create equivalent Locale objects:

j3Locale = Locale.JAPANESE;
j4Locale = new Locale("ja", "");

 


 

Identifying Available Locales

You can create a Locale with any combination of valid language and country codes, but that doesn't mean that you can use it. Remember, a Locale object is only an identifier. You pass the Locale object to other objects, which then do the real work. These other objects, which we call locale-sensitive, do not know how to deal with all possible Locale definitions.

To find out which types of Locale definitions a locale-sensitive class recognizes, you invoke the getAvailableLocales method. For example, to find out which Locale definitions are supported by the DateFormat class, you could write a routine such as the following:

import java.util.*;
import java.text.*;

public class Available {
    static public void main(String[] args) {
	Locale list[] = DateFormat.getAvailableLocales();
	for (int i = 0; i < list.length; i++) {
	    System.out.println(list[i].toString());
	}
    }
}

The output of the previous program follows. Note that the String returned by toString contains the language and country codes, separated by an underscore:

ar_EG
be_BY
bg_BG
ca_ES
cs_CZ
da_DK
de_DE
.
.
.

If you want to display a list of Locale names to end users, you should show them something easier to understand than the language and country codes returned by toString. Instead you can invoke the Locale.getDisplayName method, which retrieves a localized String of a Locale object. For example, when toString is replaced by getDisplayName in the preceding code, the program prints the following lines:

Arabic (Egypt)
Byelorussian (Belarus)
Bulgarian (Bulgaria)
Catalan (Spain)
Czech (Czech Republic)
Danish (Denmark)
German (Germany)
.
.
.

 


 

The Scope of a Locale

The Java platform does not require you to use the same Locale throughout your program. If you wish, you can assign a different Locale to every locale-sensitive object in your program. This flexibility allows you to develop multilingual applications, which can display information in multiple languages.

However, most applications are not multi-lingual and their locale-sensitive objects rely on the default Locale. Set by the Java Virtual Machine when it starts up, the default Locale corresponds to the locale of the host platform. To determine the default Locale of your Java Virtual Machine, invoke the Locale.getDefault method. You should not set the default Locale programmatically because it is shared by all locale-sensitive classes.

Distributed computing raises some interesting issues. For example, suppose you are designing an application server that will receive requests from clients in various countries. If the Locale for each client is different, what should be the Locale of the server? Perhaps the server is multithreaded, with each thread set to the Locale of the client it services. Or perhaps all data passed between the server and the clients should be locale-independent.

Which design approach should you take? If possible, the data passed between the server and the clients should be locale-independent. This simplifies the design of the server by making the clients responsible for displaying the data in a locale-sensitive manner. However, this approach won't work if the server must store the data in a locale-specific form. For example, the server might store Spanish, English, and French versions of the same data in different database columns. In this case, the server might want to query the client for its Locale, since the Locale may have changed since the last request.

 


 

Isolating Locale-Specific Data

Locale-specific data must be tailored according to the conventions of the end user's language and region. The text displayed by a user interface is the most obvious example of locale-specific data. For example, an application with a Cancel button in the U.S. will have an Abbrechen button in Germany. In other countries this button will have other labels. Obviously you don't want to hardcode this button label. Wouldn't it be nice if you could automatically get the correct label for a given Locale? Fortunately you can, provided that you isolate the locale-specific objects in a ResourceBundle.

 

 

How a ResourceBundle is Related to a Locale

Conceptually each ResourceBundle is a set of related subclasses that share the same base name. The list that follows shows a set of related subclasses. ButtonLabel is the base name. The characters following the base name indicate the language code, country code, and variant of a Locale. ButtonLabel_en_GB, for example, matches the Locale specified by the language code for English (en) and the country code for Great Britain (GB).

ButtonLabel
ButtonLabel_de
ButtonLabel_en_GB
ButtonLabel_fr_CA_UNIX

To select the appropriate ResourceBundle, invoke the ResourceBundle.getBundle method. The following example selects the ButtonLabel ResourceBundle for the Locale that matches the French language, the country of Canada, and the UNIX platform.

Locale currentLocale = new Locale("fr", "CA", "UNIX");
ResourceBundle introLabels =
    ResourceBundle.getBundle("ButtonLabel", currentLocale);

If a ResourceBundle class for the specified Locale does not exist, getBundle tries to find the closest match. For example, if ButtonLabel_fr_CA_UNIX is the desired class and the default Locale is en_US, getBundle will look for classes in the following order:

ButtonLabel_fr_CA_UNIX
ButtonLabel_fr_CA
ButtonLabel_fr
ButtonLabel_en_US
ButtonLabel_en
ButtonLabel

Note that getBundle looks for classes based on the default Locale before it selects the base class (ButtonLabel). If getBundle fails to find a match in the preceding list of classes, it throws a MissingResourceException. To avoid throwing this exception, you should always provide a base class with no suffixes.

 

 

The ListResourceBundle and PropertyResourceBundle Subclasses

The abstract class ResourceBundle has two subclasses: PropertyResourceBundle and ListResourceBundle.

A PropertyResourceBundle is backed by a properties file. A properties file is a plain-text file that contains translatable text. Properties files are not part of the Java source code, and they can contain values for String objects only. If you need to store other types of objects, use a ListResourceBundle instead.

The ListResourceBundle class manages resources with a convenient list. Each ListResourceBundle is backed by a class file. You can store any locale-specific object in a ListResourceBundle. To add support for an additional Locale, you create another source file and compile it into a class file.

The ResourceBundle class is flexible. If you first put your locale-specific String objects in a PropertyResourceBundle and then later decided to use ListResourceBundle instead, there is no impact on your code. For example, the following call to getBundle will retrieve a ResourceBundle for the appropriate Locale, whether ButtonLabel is backed up by a class or by a properties file:

ResourceBundle introLabels =
    ResourceBundle.getBundle("ButtonLabel", currentLocale);

 

 

Key-Value Pairs

ResourceBundle objects contain an array of key-value pairs. You specify the key, which must be a String, when you want to retrieve the value from the ResourceBundle. The value is the locale-specific object. The keys in the following example are the OkKey and CancelKey strings:

class ButtonLabel_en extends ListResourceBundle {
    // English version
    public Object[][] getContents() {
	return contents;
    }
	{"OkKey", "OK"},
	{"CancelKey", "Cancel"},
    };
}

To retrieve the OK String from the ResourceBundle, you would specify the appropriate key when invoking getString:

String okLabel = ButtonLabel.getString("OkKey");

A properties file contains key-value pairs. The key is on the left side of the equal sign, and the value is on the right. Each pair is on a separate line. The values may represent String objects only. The following example shows the contents of a properties file named ButtonLabel.properties:

OkKey = OK
CancelKey = Cancel

 

 

Identifying the Locale-Specific Objects

If your application has a user interface, it contains many locale-specific objects. To get started, you should go through your source code and look for objects that vary with Locale. Your list might include objects instantiated from the following classes:

You'll notice that this list doesn't contain objects representing numbers, dates, times, or currencies. The display format of these objects varies with Locale, but the objects themselves do not. For example, you format a Date according to Locale, but you use the same Date object regardless of Locale. Instead of isolating these objects in a ResourceBundle, you format them with special locale-sensitive formatting classes.

In general, the objects stored in a ResourceBundle are predefined and ship with the product. These objects are not modified while the program is running. For instance, you should store a Menu label in a ResourceBundle because it is locale-specific and will not change during the program session. However, you should not isolate in a ResourceBundle a String object the end user enters in a TextField. Data such as this String may vary from day to day. It is specific to the program session, not to the Locale in which the program runs.

Usually most of the objects you need to isolate in a ResourceBundle are String objects. However, not all String objects are locale-specific. For example, if a String is a protocol element used by interprocess communication, it doesn't need to be localized, because the end users never see it.

The decision whether to localize some String objects is not always clear. Log files are a good example. If a log file is written by one program and read by another, both programs are using the log file as a buffer for communication. Suppose that end users occasionally check the contents of this log file. Shouldn't the log file be localized? On the other hand, if end users rarely check the log file, the cost of translation may not be worthwhile. Your decision to localize this log file depends on a number of factors: program design, ease of use, cost of translation, and supportability.

 

 

Organizing ResourceBundle Objects

You can organize your ResourceBundle objects according to the category of objects they contain. For example, you might want to load all of the GUI labels for an order entry window into a ResourceBundle called OrderLabelsBundle. Using multiple ResourceBundle objects offers several advantages:
  • Your code is easier to read and to maintain.
  • You'll avoid huge ResourceBundle objects, which may take too long to load into memory.
  • You can reduce memory usage by loading each ResourceBundle only when needed.

 


 

Using a ListResourceBundle

This section illustrates the use of a ListResourceBundle object with a sample program called ListDemo. The text that follows explains each step involved in creating the ListDemo program, along with the ListResourceBundle subclasses that support it.

 

 

1. Create the ListResourceBundle Subclasses

A ListResourceBundle is backed up by a class file. Therefore the first step is to create a class file for every supported Locale. In the ListDemo program the base name of the ListResourceBundle is StatsBundle. Since ListDemo supports three Locale objects, it requires the following three class files:

StatsBundle_en_CA.class
StatsBundle_fr_FR.class
StatsBundle_ja_JP.class

The StatsBundle class for Japan is defined in the source code that follows. Note that the class name is constructed by appending the language and country codes to the base name of the ListResourceBundle. Inside the class the two-dimensional contents array is initialized with the key-value pairs. The keys are the first element in each pair: GDP, Population, and Literacy. The keys must be String objects and they must be the same in every class in the StatsBundle set. The values can be any type of object. In this example the values are two Integer objects and a Double object.

import java.util.*;
public class StatsBundle_ja_JP extends ListResourceBundle {
    public Object[][] getContents() {
	return contents;
    }
    private Object[][] contents = {
	{ "GDP", new Integer(21300) },
	{ "Population", new Integer(125449703) },
	{ "Literacy", new Double(0.99) },
    };
}

 

 

2. Specify the Locale

The ListDemo program defines the Locale objects as follows:

Locale[] supportedLocales = {
    new Locale("en", "CA"),
    new Locale("ja", "JP"),
    new Locale("fr", "FR")
};
Each Locale object corresponds to one of the StatsBundle classes. For example, the Japanese Locale, which was defined with the ja and JP codes, matches StatsBundle_ja_JP.class.

 

 

3. Create the ResourceBundle

To create the ListResourceBundle, invoke the getBundle method. The following line of code specifies the base name of the class (StatsBundle) and the Locale:

ResourceBundle stats =
		ResourceBundle.getBundle("StatsBundle", currentLocale);

The getBundle method searches for a class whose name begins with StatsBundle and is followed by the language and country codes of the specified Locale. If the currentLocale is created with the ja and JP codes, getBundle returns a ListResourceBundle corresponding to the class StatsBundle_ja_JP, for example.

 

4. Fetch the Localized Objects

Now that the program has a ListResourceBundle for the appropriate Locale, it can fetch the localized objects by their keys. The following line of code retrieves the literacy rate by invoking getObject with the Literacy key parameter. Since getObject returns an object, cast it to a Double:

Double lit = (Double)stats.getObject("Literacy");

 

 

5. Run the Demo Program

ListDemo program prints the data it fetched with the getBundle method:

Locale = en_CA
GDP = 24400
Population = 28802671
Literacy = 0.97

Locale = ja_JP
GDP = 21300
Population = 125449703
Literacy = 0.99

Locale = fr_FR
GDP = 20200
Population = 58317450
Literacy = 0.99

 


 

Using Predefined Formats

By invoking the methods provided by the NumberFormat class, you can format numbers, currencies, and percentages according to Locale. The material that follows demonstrates formatting techniques with a sample program called NumberFormatDemo.

 

 

Numbers

You can use the NumberFormat methods to format primitive-type numbers, such as double, and their corresponding wrapper objects, such as Double.

The following code example formats a Double according to Locale. Invoking the getNumberInstance method returns a locale-specific instance of NumberFormat. The format method accepts the Double as an argument and returns the formatted number in a String.

Double amount = new Double(345987.246);
NumberFormat numberFormatter;
String amountOut;

numberFormatter = NumberFormat.getNumberInstance(currentLocale);
amountOut = numberFormatter.format(amount);
System.out.println(amountOut + " " + 
                   currentLocale.toString());

The output from this example shows how the format of the same number varies with Locale:

345 987,246	 fr_FR
345.987,246	 de_DE
345,987.246	 en_US

 

 

Currencies

If you're writing business applications, you'll probably need to format and to display currencies. You format currencies in the same manner as numbers, except that you call getCurrencyInstance to create a formatter. When you invoke the format method, it returns a String that includes the formatted number and the appropriate currency sign.

This code example shows how to format currency in a locale-specific manner:

Double currency = new Double(9876543.21);
NumberFormat currencyFormatter;
String currencyOut;

currencyFormatter = NumberFormat.getCurrencyInstance(currentLocale);
currencyOut = currencyFormatter.format(currency);
System.out.println(currencyOut + " " + 			
                   currentLocale.toString());

The output generated by the preceding lines of code is as follows:

9 876 543,21 F	 fr_FR
9.876.543,21 DM	 de_DE
$9,876,543.21	 en_US

At first glance this output may look wrong to you, because the numeric values are all the same. Of course, 9 876 543,21 F is not equivalent to 9.876.543,21 DM. However, bear in mind that the NumberFormat class is unaware of exchange rates. The methods belonging to the NumberFormat class format currencies but do not convert them.

 

 

Percentages

You can also use the methods of the NumberFormat class to format percentages. To get the locale-specific formatter, invoke the getPercentInstance method. With this formatter, a decimal fraction such as 0.75 is displayed as 75%.

The following code sample shows how to format a percentage.

Double percent = new Double(0.75);
NumberFormat percentFormatter;
String percentOut;

percentFormatter = NumberFormat.getPercentInstance(currentLocale);
percentOut = percentFormatter.format(percent);

 


 

Customizing Formats

You can use the DecimalFormat class to format decimal numbers into locale-specific strings. This class allows you to control the display of leading and trailing zeros, prefixes and suffixes, grouping (thousands) separators, and the decimal separator. If you want to change formatting symbols, such as the decimal separator, you can use the DecimalFormatSymbols in conjunction with the DecimalFormat class. These classes offer a great deal of flexibility in the formatting of numbers, but they can make your code more complex.

The text that follows uses examples that demonstrate the DecimalFormat and DecimalFormatSymbols classes. The code examples in this material are from a sample program called DecimalFormatDemo.

 

 

Constructing Patterns

You specify the formatting properties of DecimalFormat with a pattern String. The pattern determines what the formatted number looks like.

The example that follows creates a formatter by passing a pattern String to the DecimalFormat constructor. The format method accepts a double value as an argument and returns the formatted number in a String:

DecimalFormat myFormatter = new DecimalFormat(pattern);
String output = myFormatter.format(value);
System.out.println(value + " " + pattern + " " + output);
The output for the preceding lines of code is described in the following table. The value is the number, a double , that is to be formatted. The pattern is the String that specifies the formatting properties. The output, which is a String, represents the formatted number.

value pattern output Explanation
123456.789 ###,###.### 123,456.789 The pound sign (#) denotes a digit, the comma is a placeholder for the grouping separator, and the period is a placeholder for the decimal separator.
123456.789 ###.## 123456.79 The value has three digits to the right of the decimal point, but the pattern has only two. The format method handles this by rounding up.
123.78 000000.000 000123.780 The pattern specifies leading and trailing zeros, because the 0 character is used instead of the pound sign (#).
12345.67 $###,###.### $12,345.67 The first character in the pattern is the dollar sign ($). Note that it immediately precedes the leftmost digit in the formatted output.
12345.67 \u00A5###,###.### ¥12,345.67 The pattern specifies the currency sign for Japanese yen (¥) with the Unicode value 00A5.

 

 

Locale-Sensitive Formatting

The preceding example created a DecimalFormat object for the default Locale. If you want a DecimalFormat object for a nondefault Locale, you instantiate a NumberFormat and then cast it to DecimalFormat. Here's an example:

NumberFormat nf = NumberFormat.getNumberInstance(loc);
DecimalFormat df = (DecimalFormat)nf;
df.applyPattern(pattern);
String output = df.format(value);
System.out.println(pattern + " " + output + " " + 
	           loc.toString());

Running the previous code example results in the output that follows. The formatted number, which is in the second column, varies with Locale:

###,###.###	 123,456.789	 en_US
###,###.###	 123.456,789	 de_DE
###,###.###	 123 456,789	 fr_FR

So far the formatting patterns discussed here follow the conventions of U.S. English. For example, in the pattern ###,###.## the comma is the thousands-separator and the period represents the decimal point. This convention is fine, provided that your end users aren't exposed to it. However, some applications, such as spreadsheets and report generators, allow the end users to define their own formatting patterns. For these applications the formatting patterns specified by the end users should use localized notation. In these cases you'll want to invoke the applyLocalizedPattern method on the DecimalFormat object.

 

 

Altering the Formatting Symbols

You can use the DecimalFormatSymbols class to change the symbols that appear in the formatted numbers produced by the format method. These symbols include the decimal separator, the grouping separator, the minus sign, and the percent sign, among others.

The next example demonstrates the DecimalFormatSymbols class by applying a strange format to a number. The unusual format is the result of the calls to the setDecimalSeparator, setGroupingSeparator, and setGroupingSize methods.

DecimalFormatSymbols unusualSymbols =
    new DecimalFormatSymbols(currentLocale);
unusualSymbols.setDecimalSeparator('|');
unusualSymbols.setGroupingSeparator('^');

String strange = "#,##0.###";
DecimalFormat weirdFormatter = 
               new DecimalFormat(strange, unusualSymbols);
weirdFormatter.setGroupingSize(4);

String bizarre = weirdFormatter.format(12345.678);
System.out.println(bizarre);

When run, this example prints the number in a bizarre format:

1^2345|678

 


 

Dates and Times

Date objects represent dates and times. You cannot display or print a Date object without first converting it to a String that is in the proper format. Just what is the "proper" format? First, the format should conform to the conventions of the end user's Locale. For example, Germans recognize 20.4.98 as a valid date, but Americans expect that same date to appear as 4/20/98. Second, the format should include the necessary information. For instance, a program that measures network performance may report on elapsed milliseconds. An online appointment calendar probably won't display milliseconds, but it will show the days of the week.

This section explains how to format dates and times in various ways and in a locale-sensitive manner. If you follow these techniques your programs will display dates and times in the appropriate Locale, but your source code will remain independent of any specific Locale.

 

 

Using Predefined Formats

The DateFormat class provides predefined formatting styles that are locale-specific and easy to use.

 

 

Customizing Formats

With the SimpleDateFormat class, you can create customized, locale-specific formats.

 

 

Changing Date Format Symbols

Using the DateFormatSymbols class, you can change the symbols that represent the names of months, days of the week, and other formatting elements.

 


 

Using Predefined Formats

The DateFormat class allows you to format dates and times with predefined styles in a locale-sensitive manner. The sections that follow demonstrate how to use the DateFormat class with a program called DateFormatDemo.java.

 

 

Dates

Formatting dates with the DateFormat class is a two-step process. First, you create a formatter with the getDateInstance method. Second, you invoke the format method, which returns a String containing the formatted date. The following example formats today's date by calling these two methods:

Date today;
String dateOut;
DateFormat dateFormatter;

dateFormatter = DateFormat.getDateInstance(DateFormat.DEFAULT,
					   currentLocale);
today = new Date();
dateOut = dateFormatter.format(today);

System.out.println(dateOut + " " + currentLocale.toString());

The output generated by this code follows. Notice that the formats of the dates vary with Locale. Since DateFormat is locale-sensitive, it takes care of the formatting details for each Locale.

9 avr 98	 fr_FR
9.4.1998	 de_DE
09-Apr-98	 en_US

The preceding code example specified the DEFAULT formatting style. The DEFAULT style is just one of the predefined formatting styles that the DateFormat class provides, as follows:

  • DEFAULT
  • SHORT
  • MEDIUM
  • LONG
  • FULL

The following table shows how dates are formatted for each style with the U.S. and French locales:

Style U.S. Locale French Locale
DEFAULT 10-Apr-98 10 avr 98
SHORT 4/10/98 10/04/98
MEDIUM 10-Apr-98 10 avr 98
LONG April 10, 1998 10 avril 1998
FULL Friday, April 10, 1998 vendredi, 10 avril 1998

 

 

Times

Date objects represent both dates and times. Formatting times with the DateFormat class is similar to formatting dates, except that you create the formatter with the getTimeInstance method, as follows:

DateFormat timeFormatter =
    DateFormat.getTimeInstance(DateFormat.DEFAULT,
                               currentLocale);

The table that follows shows the various predefined format styles for the U.S. and German locales:

Style U.S. Locale German Locale
DEFAULT 3:58:45 PM 15:58:45
SHORT 3:58 PM 15:58
MEDIUM 3:58:45 PM 15:58:45
LONG 3:58:45 PM PDT 15:58:45 GMT+02:00
FULL 3:58:45 oclock PM PDT 15.58 Uhr GMT+02:00

 

 

Both Dates and Times

To display a date and time in the same String, create the formatter with the getDateTimeInstance method. The first parameter is the date style, and the second is the time style. The third parameter is the Locale . Here's a quick example:

DateFormat formatter = 
    DateFormat.getDateTimeInstance(DateFormat.LONG,
                                   DateFormat.LONG,
                                   currentLocale);

The following table shows the date and time formatting styles for the U.S. and French locales:

Style U.S. Locale French Locale
DEFAULT 25-Jun-98 1:32:19 PM 25 jun 98 22:32:20
SHORT 6/25/98 1:32 PM 25/06/98 22:32
MEDIUM 25-Jun-98 1:32:19 PM 25 jun 98 22:32:20
LONG June 25, 1998 1:32:19 PM PDT 25 juin 1998 22:32:20 GMT+02:00
FULL Thursday, June 25, 1998 1:32:19 o'clock PM PDT jeudi, 25 juin 1998 22 h 32 GMT+02:00

 


 

Customizing Formats

The previous section, Using Predefined Formats, described the formatting styles provided by the DateFormat class. In most cases these predefined formats are adequate. However, if you want to create your own customized formats, you can use the SimpleDateFormat class.

The code examples that follow demonstrate the methods of the SimpleDateFormat class. You can find the full source code for the examples in the file named SimpleDateFormatDemo.

 

 

About Patterns

When you create a SimpleDateFormat object, you specify a pattern String. The contents of the pattern String determine the format of the date and time.

The following code formats a date and time according to the pattern String passed to the SimpleDateFormat constructor. The String returned by the format method contains the formatted date and time that are to be displayed.

Date today;
String output;
SimpleDateFormat formatter;

formatter = new SimpleDateFormat(pattern, currentLocale);
today = new Date();
output = formatter.format(today);
System.out.println(pattern + " " + output);

The following table shows the output generated by the previous code example when the U.S. Locale is specified:

Pattern Output
dd.MM.yy 09.04.98
yyyy.MM.dd G 'at' hh:mm:ss z 1998.04.09 AD at 06:15:55 PDT
EEE, MMM d, ''yy Thu, Apr 9, '98
h:mm a 6:15 PM
H:mm 18:15
H:mm:ss:SSS 18:15:55:624
K:mm a,z 6:15 PM,PDT
yyyy.MMMMM.dd GGG hh:mm aaa 1998.April.09 AD 06:15 PM

 

 

Patterns and Locale

The SimpleDateFormat class is locale-sensitive. If you instantiate SimpleDateFormat without a Locale parameter, it will format the date and time according to the default Locale. Both the pattern and the Locale determine the format. For the same pattern, SimpleDateFormat may format a date and time differently if the Locale varies.

In the example code that follows, the pattern is hardcoded in the statement that creates the SimpleDateFormat object:

Date today;
String result;
SimpleDateFormat formatter;

formatter = new SimpleDateFormat("EEE d MMM yy",
				 currentLocale);
today = new Date();
result = formatter.format(today);
System.out.println("Locale: " + currentLocale.toString());
System.out.println("Result: " + result);

When the currentLocale is set to different values, the preceding code example generates this output:

Locale: fr_FR
Result: ven 10 avr 98
Locale: de_DE
Result: Fr 10 Apr 98
Locale: en_US
Result: Thu 9 Apr 98

 


 

Changing Date Format Symbols

The format method of the SimpleDateFormat class returns a String composed of digits and symbols. For example, in the String "Friday, April 10, 1998," the symbols are "Friday" and "April." If the symbols encapsulated in SimpleDateFormat don't meet your needs, you can change them with the DateFormatSymbols. You can change symbols that represent names for months, days of the week, and time zones, among others. The following table lists the DateFormatSymbols methods that allow you to modify the symbols:

Setter Method Example of a Symbol the Method Modifies
setAmPmStrings PM
setEras AD
setMonths December
setShortMonths Dec
setShortWeekdays Tue
setWeekdays Tuesday
setZoneStrings PST

The following example invokes setShortWeekdays to change the short names of the days of the week from lowercase to uppercase characters. The full source code for this example is in DateFormatSymbolsDemo. The first element in the array argument of setShortWeekdays is a null String. Therefore the array is one-based rather than zero-based. The SimpleDateFormat constructor accepts the modified DateFormatSymbols object as an argument. Here is the source code:

Date today;
String result;
SimpleDateFormat formatter;
DateFormatSymbols symbols;
String[] defaultDays;
String[] modifiedDays;

symbols = new DateFormatSymbols(new Locale("en","US"));
defaultDays = symbols.getShortWeekdays();

for (int i = 0; i < defaultDays.length; i++) {
    System.out.print(defaultDays[i] + " ");
}
System.out.println();

String[] capitalDays = {
			"", "SUN", "MON", "TUE", "WED", "THU", "FRI", "SAT"};
symbols.setShortWeekdays(capitalDays);

modifiedDays = symbols.getShortWeekdays();
for (int i = 0; i < modifiedDays.length; i++) {
    System.out.print(modifiedDays[i] + " ");
}
System.out.println();
System.out.println();

formatter = new SimpleDateFormat("E", symbols);
today = new Date();
result = formatter.format(today);
System.out.println(result);

The preceding code generates this output:

   Sun	 Mon	 Tue	 Wed	 Thu	 Fri	 Sat
   SUN	 MON	 TUE	 WED	 THU	 FRI	 SAT
WED

 


 

Messages

We all like to use programs that let us know what's going on. Programs that keep us informed often do so by displaying status and error messages. Of course, these messages need to be translated so they can be understood by end users around the world. The section Isolating Locale-Specific Data discusses translatable text messages. Usually, you're done after you move a message String into a ResourceBundle. However, if you've embedded variable data in a message, you'll have to take some extra steps to prepare it for translation.

A compound message contains variable data. In the following list of compound messages, the variable data is underlined:

The disk named MyDisk contains 300 files.
The current balance of account #34-98-222 is $2,745.72.
405,390 people have visited your website since January 1, 1998.
Delete all files older than 120 days.

You might be tempted to construct the last message in the preceding list by concatenating phrases and variables as follows: double numDays;

ResourceBundle msgBundle;
...
String message = msgBundle.getString("deleteolder" 
				     + numDays.toString()  
				     + msgBundle.getString("days"));

This approach works fine in English, but it won't work for languages in which the verb appears at the end of the sentence. Because the word order of this message is hardcoded, your localizers won't be able to create grammatically correct translations for all languages.

How can you make your program localizable if you need to use compound messages? You can do so by using the MessageFormat class, which is the topic of this section.

Compound messages are difficult to translate because the message text is fragmented. If you use compound messages, localization will take longer and cost more. Therefore you should use compound messages only when necessary.

 


 

Dealing with Compound Messages

A compound message may contain several kinds of variables: dates, times, strings, numbers, currencies, and percentages. To format a compound message in a locale-independent manner, you construct a pattern that you apply to a MessageFormat object, and store this pattern in a ResourceBundle.

By stepping through a sample program, this section demonstrates how to internationalize a compound message. The sample program makes use of the MessageFormat class. The full source code for this program is in the file called MessageFormatDemo.java.

 

 

1. Identify the Variables in the Message

Suppose that you want to internationalize the following message:

Notice that we've underlined the variable data and have identified what kind of objects will represent this data.

 

 

2. Isolate the Message Pattern in a ResourceBundle

Store the message in a ResourceBundle named MessageBundle, as follows:

ResourceBundle messages =
   ResourceBundle.getBundle("MessageBundle", currentLocale);

This ResourceBundle is backed by a properties file for each Locale. Since the ResourceBundle is called MessageBundle, the properties file for U.S. English is named MessageBundle_en_US.properties. The contents of this file is as follows:

template = At {2,time,short} on {2,date,long}, we detected \
	      {1,number,integer} spaceships on the planet {0}.
planet = Mars

The first line of the properties file contains the message pattern. If you compare this pattern with the message text shown in step 1, you'll see that an argument enclosed in braces replaces each variable in the message text. Each argument starts with a digit called the argument number, which matches the index of an element in an Object array that holds the argument values. Note that in the pattern the argument numbers are not in any particular order. You can place the arguments anywhere in the pattern. The only requirement is that the argument number have a matching element in the array of argument values.

The next step discusses the argument value array, but first let's look at each of the arguments in the pattern. The following table provides some details about the arguments:

Argument Description
{2,time,short} The time portion of a Date object. The short style specifies the DateFormat.SHORT formatting style.
{2,date,long} The date portion of a Date object. The same Date object is used for both the date and time variables. In the Object array of arguments the index of the element holding the Date object is 2. (This is described in the next step.)
{1,number,integer} A Number object, further qualified with the integer number style.
{0} The String in the ResourceBundle that corresponds to the planet key.

 

 

3. Set the Message Arguments

The following lines of code assign values to each argument in the pattern. The indexes of the elements in the messageArguments array match the argument numbers in the pattern. For example, the Integer element at index 1 corresponds to the {1,number,integer} argument in the pattern. Because it must be translated, the String object at element 0 will be fetched from the ResourceBundle with the getString method. Here is the code that defines the array of message arguments:

Object[] messageArguments = {
    messages.getString("planet"),
    new Integer(7),
    new Date()
};

 

 

4. Create the Formatter

Next, create a MessageFormat object. You set the Locale because the message contains Date and Number objects, which should be formatted in a locale-sensitive manner.

MessageFormat formatter = new MessageFormat("");
formatter.setLocale(currentLocale);

 

 

5. Format the Message Using the Pattern and the Arguments

This step shows how the pattern, message arguments, and formatter all work together. First, fetch the pattern String from the ResourceBundle with the getString method. The key to the pattern is template. Pass the pattern String to the formatter with the applyPattern method. Then format the message using the array of message arguments, by invoking the format method. The String returned by the format method is ready to be displayed. All of this is accomplished with just two lines of code:

formatter.applyPattern(messages.getString("template"));
String output = formatter.format(messageArguments);

 

 

6. Run the Demo Program

The demo program prints the translated messages for the English and German locales and properly formats the date and time variables. Note that the English and German verbs ("detected" and "entdeckt") are in different locations relative to the variables:

currentLocale = en_US
At 1:15 PM on April 13, 1998, we detected 7 spaceships
on the planet Mars.
currentLocale = de_DE
Um 13.15 Uhr am 13. April 1998 haben wir 7 Raumschiffe
auf dem Planeten Mars entdeckt.

 


 

Handling Plurals

The words in a message may vary if both plural and singular word forms are possible. With the ChoiceFormat class, you can map a number to a word or a phrase, allowing you to construct grammatically correct messages.

In English the plural and singular forms of a word are usually different. This can present a problem when you are constructing messages that refer to quantities. For example, if your message reports the number of files on a disk, the following variations are possible:

There are no files on XDisk.
There is one file on XDisk.
There are 2 files on XDisk.

The fastest way to solve this problem is to create a MessageFormat pattern like this:

There are {0,number} file(s) on {1}.

Unfortunately the preceding pattern results in incorrect grammar:

There are 1 file(s) on XDisk.

You can do better than that, provided that you use the ChoiceFormat class. In this section you'll learn how to deal with plurals in a message by stepping through a sample program called ChoiceFormatDemo. This program also uses the MessageFormat class.

 

 

1. Define the Message Pattern

First, identify the variables in the message:

Next, replace the variables in the message with arguments, creating a pattern that can be applied to a MessageFormat object:

There {0} on {1}.

The argument for the disk name, which is represented by{1}, is easy enough to deal with. You just treat it like any other String variable in a MessageFormat pattern. This argument matches the element at index 1 in the array of argument values. (See step 7.)

Dealing with argument{0} is more complex, for a couple of reasons:

  • The phrase that this argument replaces varies with the number of files. To construct this phrase at run time, you need to map the number of files to a particular String. For example, the number 1 will map to the String containing the phrase is one file. The ChoiceFormat class allows you to perform the necessary mapping.
  • If the disk contains multiple files, the phrase includes an integer. The MessageFormat class lets you insert a number into a phrase.

 

 

2. Create a ResourceBundle

Because the message text must be translated, isolate it in a ResourceBundle:

ResourceBundle bundle =
   ResourceBundle.getBundle("ChoiceBundle", currentLocale);

The sample program backs the ResourceBundle with properties files. The ChoiceBundle_en_US.properties file contains the following lines:

pattern = There {0} on {1}.
noFiles = are no files
oneFile = is one file
multipleFiles = are {2} files

The contents of this properties file show how the message will be constructed and formatted. The first line contains the pattern for MessageFormat . (See step 1.) The other lines contain phrases that will replace argument {0} in the pattern. The phrase for the multipleFiles key contains the argument {2}, which will be replaced by a number.

Here is the French version of the properties file, ChoiceBundle_fr_FR.properties:

pattern = Il {0} sur {1}.
noFiles = n'y a pas de fichiers
oneFile = y a un fichier
multipleFiles = y a {2} fichiers

 

 

3. Create a Message Formatter

In this step you instantiate MessageFormat and set its Locale:

MessageFormat messageForm = new MessageFormat("");
messageForm.setLocale(currentLocale);

 

 

4. Create a Choice Formatter

The ChoiceFormat object allows you to choose, based on a double number, a particular String. The range of double numbers, and the String objects to which they map, are specified in arrays:

double[] fileLimits = {0,1,2};
String [] fileStrings = {
    bundle.getString("noFiles"),
    bundle.getString("oneFile"),
    bundle.getString("multipleFiles")
};

ChoiceFormat maps each element in the double array to the element in the String array that has the same index. In the sample code the 0 maps to the String returned by calling bundle.getString("noFiles"). By coincidence the index is the same as the value in the fileLimits array. If the code had set fileLimits[0] to seven, ChoiceFormat would map the number 7 to fileStrings[0].

You specify the double and String arrays when instantiating ChoiceFormat:

ChoiceFormat choiceForm = new ChoiceFormat(fileLimits,
                                           fileStrings);

 

 

5. Apply the Pattern

Remember the pattern you constructed in step 1? It's time to retrieve the pattern from the ResourceBundle and apply it to the MessageFormat object:

String pattern = bundle.getString("pattern");
messageForm.applyPattern(pattern);

 

 

6. Assign the Formats

In this step you assign to the MessageFormat object the ChoiceFormat object created in step 4:

Format[] formats = {choiceForm, null,
                    NumberFormat.getInstance()};
messageForm.setFormats(formats);

The setFormats method assigns Format objects to the arguments in the message pattern. You must invoke the applyPattern method before you call the setFormats method. The following table shows how the elements of the Format array correspond to the arguments in the message pattern:

Array Element Pattern Argument
choiceForm {0}
null {1}
NumberFormat.getInstance() {2}

 

 

7. Set the Arguments and Format the Message

At run time the program assigns the variables to the array of arguments it passes to the MessageFormat object. The elements in the array correspond to the arguments in the pattern. For example, messageArgument[1] maps to pattern argument {1}, which is a String containing the name of the disk. In the previous step the program assigned a ChoiceFormat object to argument {0} of the pattern. Therefore the number assigned to messageArgument[0] determines which String the ChoiceFormat object selects. If messageArgument[0] is greater than or equal to 2, the String containing the phrase are {2} files replaces argument {0} in the pattern. The number assigned to messageArgument[2] will be substituted in place of pattern argument {2}. Here's the code that tries this out:

Object[] messageArguments = {null, "XDisk", null};
for (int numFiles = 0; numFiles < 4; numFiles++) {
    messageArguments[0] = new Integer(numFiles);
    messageArguments[2] = new Integer(numFiles);
    String result = messageForm.format(messageArguments);
    System.out.println(result);
}

 

 

8. Run the Demo Program

Compare the messages displayed by the program with the phrases in the ResourceBundle of step 2. Notice that the ChoiceFormat object selects the correct phrase, which the MessageFormat object uses to construct the proper message. The output of the ChoiceFormatDemo program is as follows:

currentLocale = en_US
There are no files on XDisk.
There is one file on XDisk.
There are 2 files on XDisk.
There are 3 files on XDisk.

currentLocale = fr_FR
Il n'y a pas des fichiers sur XDisk.
Il y a un fichier sur XDisk.
Il y a 2 fichiers sur XDisk.
Il y a 3 fichiers sur XDisk.

 


 

Checking Character Properties

You can categorize characters according to their properties. For instance, X is an uppercase letter and 4 is a decimal digit. Checking character properties is a common way to verify the data entered by end users. If you are selling books online, for example, your order entry screen should verify that the characters in the quantity field are all digits.

Developers who aren't used to writing global software might determine a character's properties by comparing it with character constants. For instance, they might write code like this:

char ch;
...

// This code is WRONG!

if ((ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z'))
  // ch is a letter
...
if (ch >= '0' && ch <= '9')
  // ch is a digit
...
if ((ch == ' ') || (ch =='\n') || (ch == '\t'))
  // ch is a whitespace

The preceding code is wrong because it works only with English and a few other languages. To internationalize the previous example, replace it with the following statements:

char ch;
...

// This code is OK!

if (Character.isLetter(ch))
...
if (Character.isDigit(ch))
...
if (Character.isSpaceChar(ch))
The Character methods rely on the Unicode Standard for determining the properties of a character. Unicode is a 16-bit character encoding that supports the world's major languages. In the Java programming language char values represent Unicode characters. If you check the properties of a char with the appropriate Character method, your code will work with all major languages. For example, the Character.isLetter method returns true if the character is a letter in Chinese, German, Arabic, or another language.

The following list gives some of the most useful Character comparison methods. The Character API documentation fully specifies the methods.

  • isDigit
  • isLetter
  • isLetterOrDigit
  • isLowerCase
  • isUpperCase
  • isSpaceChar
  • isDefined

The Character.getType method returns the Unicode category of a character. Each category corresponds to a constant defined in the Character class. For instance, getType returns the Character.UPPERCASE_LETTER constant for the character A. For a complete list of the category constants returned by getType, see the Character API documentation. The following example shows how to use getType and the Character category constants. All of the expressions in these if statements are true:

if (Character.getType('a') == Character.LOWERCASE_LETTER)
...
if (Character.getType('R') == Character.UPPERCASE_LETTER)
...
if (Character.getType('>') == Character.MATH_SYMBOL)
...
if (Character.getType('_') == Character.CONNECTOR_PUNCTUATION)

 


 

Comparing Strings

Applications that sort through text perform frequent string comparisons. For example, a report generator performs string comparisons when sorting a list of strings in alphabetical order.

If your application audience is limited to people who speak English, you can probably perform string comparisons with the String.compareTo method. The String.compareTo method performs a binary comparison of the Unicode characters within the two strings. For most languages, however, this binary comparison cannot be relied on to sort strings, because the Unicode values do not correspond to the relative order of the characters.

Fortunately the Collator class allows your application to perform string comparisons for different languages.

 


 

Performing Locale-Independent Comparisons

Collation rules define the sort sequence of strings. These rules vary with locale, because various natural languages sort words differently. You can use the predefined collation rules provided by the Collator class to sort strings in a locale-independent manner.

To instantiate the Collator class invoke the getInstance method. Usually, you create a Collator for the default Locale, as in the following example:

Collator myDefaultCollator = Collator.getInstance();
You can also specify a particular Locale when you create a Collator, as follows:

Collator myFrenchCollator = Collator.getInstance(Locale.FRENCH);

The getInstance method returns a RuleBasedCollator, which is a concrete subclass of Collator. The RuleBasedCollator contains a set of rules that determine the sort order of strings for the locale you specify. These rules are predefined for each locale. Because the rules are encapsulated within the RuleBasedCollator, your program won't need special routines to deal with the way collation rules vary with language.

You invoke the Collator.compare method to perform a locale-independent string comparison. The compare method returns an integer less than, equal to, or greater than zero when the first string argument is less than, equal to, or greater than the second string argument. The following table contains some sample calls to Collator.compare:

Example Return Value Explanation
myCollator.compare("abc", "def") -1 "abc" is less than "def"
myCollator.compare("rtf", "rtf") 0 the two strings are equal
myCollator.compare("xyz", "abc") 1 "xyz" is greater than "abc"

You use the compare method when performing sort operations. The sample program called CollatorDemo uses the compare method to sort an array of English and French words. This program shows what can happen when you sort the same list of words with two different collators:

Collator fr_FRCollator = Collator.getInstance(new Locale("fr","FR"));

Collator en_USCollator = Collator.getInstance(new Locale("en","US"));

The method for sorting, called sortStrings, can be used with any Collator. Notice that the sortStrings method invokes the compare method:

public static void sortStrings Collator(collator, 
                               String[] words) {
    String tmp;
    for (int i = 0; i < words.length; i++) {
	for (int j = i + 1; j < words.length; j++) { 
	    if (collator.compare(words[i], words[j]) > 0) {
		tmp = words[i];
		words[i] = words[j];
		words[j] = tmp;
	    }
	}
    }
}
The English Collator sorts the words as follows:

peach
péché
pêche
sin

According to the collation rules of the French language, the preceding list is in the wrong order. In French péché should follow pêche in a sorted list. The French Collator sorts the array of words correctly, as follows:

peach
pêche
péché
sin

 


 

Customizing Collation Rules

The previous section discussed how to use the predefined rules for a locale to compare strings. These collation rules determine the sort order of strings. If the predefined collation rules do not meet your needs, you can design your own rules and assign them to a RuleBasedCollator object.

Customized collation rules are contained in a String object that is passed to the RuleBasedCollator constructor. Here's a simple example:

String simpleRule = "< a < b < c < d";
RuleBasedCollator simpleCollator =  new RuleBasedCollator(simpleRule);

For the simpleCollator object in the previous example, a is less than b, which is less that c, and so forth. The simpleCollator.compare method references these rules when comparing strings. The full syntax used to construct a collation rule is more flexible and complex than this simple example. For a full description of the syntax, refer to the API documentation for the RuleBasedCollator class.

The example that follows sorts a list of Spanish words with two collators. Full source code for this example is in RulesDemo.java.

The RulesDemo program starts by defining collation rules for English and Spanish. The program will sort the Spanish words in the traditional manner. When sorting by the traditional rules, the letters ch and ll and their uppercase equivalents each have their own positions in the sort order. These character pairs compare as if they were one character. For example, ch sorts as a single letter, following cz in the sort order. Note how the rules for the two collators differ:

String englishRules =
    ("< a,A < b,B < c,C < d,D < e,E < f,F " +
     "< g,G < h,H < i,I < j,J < k,K < l,L " +
     "< m,M < n,N < o,O < p,P < q,Q < r,R " +
     "< s,S < t,T < u,U < v,V < w,W < x,X " +
     "< y,Y < z,Z");

String smallnTilde = new String("\u00F1"); // ñ
String capitalNTilde = new String("\u00D1"); // Ñ

String traditionalSpanishRules =
    ("< a,A < b,B < c,C " +
     "< ch, cH, Ch, CH " +
     "< d,D < e,E < f,F " +
     "< g,G < h,H < i,I < j,J < k,K < l,L " +
     "< ll, lL, Ll, LL " +
     "< m,M < n,N " +
     "< " + smallnTilde + "," + capitalNTilde + " " +
     "< o,O < p,P < q,Q < r,R " +
     "< s,S < t,T < u,U < v,V < w,W < x,X " +
     "< y,Y < z,Z");

The following lines of code create the collators and invoke the sort routine:

try {
    RuleBasedCollator enCollator =
        new RuleBasedCollator(englishRules);
    RuleBasedCollator spCollator =
        new RuleBasedCollator(traditionalSpanishRules);

    sortStrings(enCollator, words);
    printStrings(words);

    System.out.println();

    sortStrings(spCollator, words);
    printStrings(words);
} catch  ParseException(pe) {
    System.out.println("Parse exception for rules");
}

The sort routine, called sortStrings, is generic. It will sort any array of words according to the rules of any Collator object:

public static void sortStrings Collator(collator, String[] words) {
    String tmp;
    for (int i = 0; i < words.length; i++) {
	for (int j = i + 1; j < words.length; j++) {
	    if (collator.compare(words[i], words[j]) > 0) {
		tmp = words[i];
		words[i] = words[j];
		words[j] = tmp;
	    }
	}
    }
}

When sorted with the English collation rules, the array of words is as follows:

chalina
curioso
llama
luz

Compare the preceding list with the following, which is sorted according to the traditional Spanish rules of collation:

curioso
chalina
luz
llama

 


 

Improving Collation Performance

Sorting long lists of strings is often time consuming. If your sort algorithm compares strings repeatedly, you can speed up the process by using the CollationKey class.

A CollationKey object represents a sort key for a given String and Collator. Comparing two CollationKey objects involves a bitwise comparison of sort keys and is faster than comparing String objects with the Collator.compare method. However, generating CollationKey objects requires time. Therefore if a String is to be compared just once, Collator.compare offers better performance.

The example that follows uses a CollationKey object to sort an array of words. Source code for this example is in KeysDemo.java.

The KeysDemo program creates an array of CollationKey objects in the main method. To create a CollationKey, you invoke the getCollationKey method on a Collator object. You cannot compare two CollationKey objects unless they originate from the same Collator. The main method is as follows:

static public void main(String[] args) {
    Collator enUSCollator = 
              Collator.getInstance (new Locale("en","US"));
    String [] words = {
	"peach",
	"apricot",
	"grape",
	"lemon"
    };

    CollationKey[] keys = new CollationKey[words.length];

    for (int k = 0; k < keys.length; k ++) {
	keys[k] = enUSCollator.getCollationKey(words[k]);
    }

    sortArray(keys);
    printArray(keys);
}

The sortArray method invokes the CollationKey.compareTo method. The compareTo method returns an integer less than, equal to, or greater than zero if the keys[i] object is less than, equal to, or greater than the keys[j] object. Note that the program compares the CollationKey objects, not the String objects from the original array of words. Here is the code for the sortArray method:

public static void sortArray(CollationKey[] keys) {
		
    CollationKey tmp;
    for (int i = 0; i < keys.length; i++) {
	for (int j = i + 1; j < keys.length; j++) {
	    if (keys[i].compareTo(keys[j]) > 0) {
		tmp = keys[i];
		keys[i] = keys[j];
		keys[j] = tmp; 
	    }
	}
    }
}

The KeysDemo program sorts an array of CollationKey objects, but the original goal was to sort an array of String objects. To retrieve the String representation of each CollationKey, the program invokes getSourceString in the displayWords method, as follows:

static void displayWords(CollationKey[] keys) {

    for (int i = 0; i < keys.length; i++) {
	System.out.println(keys[i].getSourceString());
    }
}

The displayWords method prints the following lines:

apricot
grape
lemon
peach

 


 

Detecting Text Boundaries

Applications that manipulate text need to locate boundaries within the text. For example, consider some of the common functions of a word processor: highlighting a character, cutting a word, moving the cursor to the next sentence, and wrapping a word at a line ending. To perform each of these functions, the word processor must be able to detect the logical boundaries in the text. Fortunately you don't have to write your own routines to perform boundary analysis. Instead, you can take advantage of the methods provided by the BreakIterator class.

 


 

About the BreakIterator Class

The BreakIterator class is locale-sensitive, because text boundaries vary with language. For example, the syntax rules for line breaks are not the same for all languages. To determine which locales the BreakIterator class supports, invoke the getAvailableLocales method, as follows:

Locale[] locales = BreakIterator.getAvailableLocales();

You can analyze four kinds of boundaries with the BreakIterator class: character, word, sentence, and potential line break. When instantiating a BreakIterator, you invoke the appropriate factory method:

  • getCharacterInstance
  • getWordInstance
  • getSentenceInstance
  • getLineInstance

Each instance of BreakIterator can detect just one type of boundary. If you want to locate both character and word boundaries, for example, you create two separate instances.

A BreakIterator has an imaginary cursor that points to the current boundary in a string of text. You can move this cursor within the text with the previous and the next methods. For example, if you've created a BreakIterator with getWordInstance, the cursor moves to the next word boundary in the text every time you invoke the next method. The cursor-movement methods return an integer indicating the position of the boundary. This position is the index of the character in the text string that would follow the boundary. Like string indexes, the boundaries are zero-based. The first boundary is at 0, and the last boundary is the length of the string. The following figure shows the word boundaries detected by the next and previous methods in a line of text:


This figure has been reduced to fit on the page.
Click the image to view it at its natural size.

You should use the BreakIterator class only with natural-language text. To tokenize a programming language, use the StreamTokenizer class.

The sections that follow give examples for each type of boundary analysis. The coding examples are from the source code file named BreakIteratorDemo.java.

 


 

Character Boundaries

You need to locate character boundaries if your application allows the end user to highlight individual characters or to move a cursor through text one character at a time. To create a BreakIterator that locates character boundaries, you invoke the getCharacterInstance method, as follows:

BreakIterator characterIterator =
         BreakIterator.getCharacterInstance(currentLocale);

This type of BreakIterator detects boundaries between user characters, not just Unicode characters.

A user character may be composed of more than one Unicode character. For example, the user character ü can be composed by combining the Unicode characters \u0075 (u) and \u00a8 (¨). This isn't the best example, however, because the character ü may also be represented by the single Unicode character \u00fc. We'll draw on the Arabic language for a more realistic example.

In Arabic the word for house is:

This word contains three user characters, but it is composed of the following six Unicode characters:

String house = "\u0628" + "\u064e" + "\u064a" + 
	       "\u0652" + "\u067a" + "\u064f";

The Unicode characters at positions 1, 3, and 5 in the house string are diacritics. Arabic requires diacritics because they can alter the meanings of words. The diacritics in the example are nonspacing characters, since they appear above the base characters. In an Arabic word processor you cannot move the cursor on the screen once for every Unicode character in the string. Instead you must move it once for every user character, which may be composed by more than one Unicode character. Therefore you must use a BreakIterator to scan the user characters in the string.

The sample program BreakIteratorDemo, creates a BreakIterator to scan Arabic characters. The program passes this BreakIterator, along with the String object created previously, to a method named listPositions:

BreakIterator arCharIterator =
		BreakIterator.getCharacterInstance(new Locale