Upper and Lower Case: Chimara Reference Manual

Upper and Lower Case

Upper and Lower Case — Changing the case of strings

Functions

unsigned char	glk_char_to_lower ()
unsigned char	glk_char_to_upper ()
glui32	glk_buffer_to_lower_case_uni ()
glui32	glk_buffer_to_upper_case_uni ()
glui32	glk_buffer_to_title_case_uni ()

Includes

#include <libchimara/glk.h>

Description

Glk has functions to manipulate the case of both Latin-1 and Unicode strings. One Latin-1 lowercase character corresponds to one uppercase character, and vice versa, so the Latin-1 functions act on single characters. The Unicode functions act on whole strings, since the length of the string may change.

Functions

glk_char_to_lower ()

unsigned char
glk_char_to_lower (unsigned char ch);

You can convert Latin-1 characters between upper and lower case with two Glk utility functions, glk_char_to_lower() and glk_char_to_upper(). These have a few advantages over the standard ANSI tolower() and toupper() macros. They work for the entire Latin-1 character set, including accented letters; they behave consistently on all platforms, since they're part of the Glk library; and they are safe for all characters. That is, if you call glk_char_to_lower() on a lower-case character, or a character which is not a letter, you'll get the argument back unchanged.

The case-sensitive characters in Latin-1 are the ranges 0x41..0x5A, 0xC0..0xD6, 0xD8..0xDE (upper case) and the ranges 0x61..0x7A, 0xE0..0xF6, 0xF8..0xFE (lower case). These are arranged in parallel; so glk_char_to_lower() will add 0x20 to values in the upper-case ranges, and glk_char_to_upper() will subtract 0x20 from values in the lower-case ranges.

Parameters

A Latin-1 character.

Returns

A lowercase or non-letter Latin-1 character.

glk_char_to_upper ()

unsigned char
glk_char_to_upper (unsigned char ch);

If ch is a lowercase character in the Latin-1 character set, converts it to uppercase. Otherwise, leaves it unchanged. See glk_char_to_lower().

Parameters

A Latin-1 character.

Returns

An uppercase or non-letter Latin-1 character.

glk_buffer_to_lower_case_uni ()

glui32
glk_buffer_to_lower_case_uni (glui32 *buf,
                              glui32 len,
                              glui32 numchars);

Unicode character conversion is trickier, and must be applied to character arrays, not single characters. These functions (glk_buffer_to_lower_case_uni(), glk_buffer_to_upper_case_uni(), and glk_buffer_to_title_case_uni()) provide two length arguments because a string of Unicode characters may expand when its case changes. The len argument is the available length of the buffer; numchars is the number of characters in the buffer initially. (So numchars must be less than or equal to len . The contents of the buffer after numchars do not affect the operation.)

The functions return the number of characters after conversion. If this is greater than len , the characters in the array will be safely truncated at len , but the true count will be returned. (The contents of the buffer after the returned count are undefined.)

The lower_case and upper_case functions do what you'd expect: they convert every character in the buffer (the first numchars of them) to its upper or lower-case equivalent, if there is such a thing.

See the Unicode spec (chapter 3.13, chapter 4.2, etc) for the exact definitions of upper, lower, and title-case mapping.

Unicode has some strange case cases. For example, a combined character that looks like “ss” might properly be upper-cased into two “S” characters. Title-casing is even stranger; “ss” (at the beginning of a word) might be title-cased into a different combined character that looks like “Ss”. The glk_buffer_to_title_case_uni() function is actually title-casing the first character of the buffer. If it makes a difference.

Parameters

buf	A character array in UCS-4.
len	Available length of `buf` .
numchars	Number of characters in `buf` .

Returns

The number of characters after conversion.

glk_buffer_to_upper_case_uni ()

glui32
glk_buffer_to_upper_case_uni (glui32 *buf,
                              glui32 len,
                              glui32 numchars);

Converts the first numchars characters of buf to their uppercase equivalents, if there is such a thing. See glk_buffer_to_lower_case_uni().

Parameters

buf	A character array in UCS-4.
len	Available length of `buf` .
numchars	Number of characters in `buf` .

Returns

The number of characters after conversion.

glk_buffer_to_title_case_uni ()

glui32
glk_buffer_to_title_case_uni (glui32 *buf,
                              glui32 len,
                              glui32 numchars,
                              glui32 lowerrest);

See glk_buffer_to_lower_case_uni(). The title_case function has an additional (boolean) flag. If the flag is zero, the function changes the first character of the buffer to upper-case, and leaves the rest of the buffer unchanged. If the flag is nonzero, it changes the first character to upper-case and the rest to lower-case.

Earlier drafts of this spec had a separate function which title-cased the first character of every word in the buffer. I took this out after reading Unicode Standard Annex #29, which explains how to divide a string into words. If you want it, feel free to implement it.

Parameters

buf	A character array in UCS-4.
len	Available length of `buf` .
numchars	Number of characters in `buf` .
lowerrest	`TRUE` if the rest of `buf` should be lowercased, `FALSE` otherwise.

Returns

The number of characters after conversion.