Localization is the act of making programs behave in a region-specific way. When a program formats numbers or dates in a way specific to your part of the world, or prints messages (or accepts input) in your native language, the program is said to be localized. This section describes steps Subversion has made towards localization.
Most modern operating systems have a notion of the “current locale”—that is, the region or country whose localization conventions are honored. These conventions—typically chosen by some runtime configuration mechanism on the computer—affect the way in which programs present data to the user, as well as the way in which they accept user input.
On most Unix-like systems, you can check the values of the locale-related runtime configuration options by running the locale command:
$ locale LANG= LC_COLLATE="C" LC_CTYPE="C" LC_MESSAGES="C" LC_MONETARY="C" LC_NUMERIC="C" LC_TIME="C" LC_ALL="C"
The output is a list of locale-related environment
variables and their current values. In this example, the
variables are all set to the default C
locale, but users can set these variables to specific
country/language code combinations. For example, if one were
to set the LC_TIME
variable to
fr_CA
, then programs would know to present
time and date information formatted according a
French-speaking Canadian's expectations. And if one were to
set the LC_MESSAGES
variable to
zh_TW
, then programs would know to present
human-readable messages in Traditional Chinese. Setting the
LC_ALL
variable has the effect of changing
every locale variable to the same value. The value of
LANG
is used as a default value for any
locale variable that is unset. To see the list of available
locales on a Unix system, run the command locale
-a.
On Windows, locale configuration is done via the “Regional and Language Options” control panel item. There you can view and select the values of individual settings from the available locales, and even customize (at a sickening level of detail) several of the display formatting conventions.
The Subversion client, svn, honors the
current locale configuration in two ways. First, it notices
the value of the LC_MESSAGES
variable and
attempts to print all messages in the specified language. For
example:
$ export LC_MESSAGES=de_DE $ svn help cat cat: Gibt den Inhalt der angegebenen Dateien oder URLs aus. Aufruf: cat ZIEL[@REV]... …
This behavior works identically on both Unix and Windows
systems. Note, though, that while your operating system might
have support for a certain locale, the Subversion client still
may not be able to speak the particular language. In order to
produce localized messages, human volunteers must provide
translations for each language. The translations are written
using the GNU gettext package, which results in translation
modules that end with the .mo
filename
extension. For example, the German translation file is named
de.mo
. These translation files are
installed somewhere on your system. On Unix, they typically
live in /usr/share/locale/
, while
on Windows they're often found in the
\share\locale\
folder in Subversion's
installation area. Once installed, a module is named after
the program it provides translations for. For example, the
de.mo
file may ultimately end up
installed as
/usr/share/locale/de/LC_MESSAGES/subversion.mo
.
By browsing the installed .mo
files, you
can see which languages the Subversion client is able to
speak.
The second way in which the locale is honored involves how svn interprets your input. The repository stores all paths, filenames, and log messages in Unicode, encoded as UTF-8. In that sense, the repository is internationalized—that is, the repository is ready to accept input in any human language. This means, however, that the Subversion client is responsible for sending only UTF-8 filenames and log messages into the repository. In order to do this, it must convert the data from the native locale into UTF-8.
For example, suppose you create a file named
caffè.txt
, and then when committing the
file, you write the log message as “Adesso il caffè è
più forte”. Both the filename and log message contain
non-ASCII characters, but because your locale is set to
it_IT
, the Subversion client knows to
interpret them as Italian. It uses an Italian character set
to convert the data to UTF-8 before sending them off to the
repository.
Note that while the repository demands UTF-8 filenames and log messages, it does not pay attention to file contents. Subversion treats file contents as opaque strings of bytes, and neither client nor server makes an attempt to understand the character set or encoding of the contents.