std / unicode

unicode

import "std:unicode"

Provides Unicode code point classification, case conversion, and UTF-8 encoding helpers. Functions operate on rune values (32-bit Unicode code points). Covers ASCII, Latin-1, and common Unicode blocks.

View source on Codeberg →

Constants

#
const (
    MaxRune        rune = 0x10FFFF
    ReplacementChar rune = 0xFFFD
    MaxASCII       rune = 0x7F
    MaxLatin1      rune = 0xFF
)

MaxRune is the maximum valid Unicode code point. ReplacementChar is the Unicode replacement character U+FFFD. MaxASCII is the maximum ASCII value. MaxLatin1 is the maximum Latin-1 value.

Classification

#
func IsLetter(r rune) bool

Reports whether the rune is a letter (category L).

#
func IsDigit(r rune) bool

Reports whether the rune is a decimal digit.

#
func IsSpace(r rune) bool

Reports whether the rune is a whitespace character, including space, tab, newline, carriage return, form feed, and vertical tab.

#
func IsUpper(r rune) bool

Reports whether the rune is an uppercase letter.

#
func IsLower(r rune) bool

Reports whether the rune is a lowercase letter.

#
func IsPunct(r rune) bool

Reports whether the rune is a Unicode punctuation character.

#
func IsControl(r rune) bool

Reports whether the rune is a control character.

#
func IsSymbol(r rune) bool

Reports whether the rune is a symbolic character.

#
func IsGraphic(r rune) bool

Reports whether the rune is a graphic character as defined by Unicode, including letters, marks, numbers, punctuation, symbols, and spaces.

#
func IsPrint(r rune) bool

Reports whether the rune is a printable character. Printable characters include letters, digits, punctuation, symbols, and the ASCII space character.

Case Conversion

#
func ToUpper(r rune) rune

Maps the rune to its uppercase form. Returns the original rune if it has no uppercase mapping.

#
func ToLower(r rune) rune

Maps the rune to its lowercase form. Returns the original rune if it has no lowercase mapping.

#
func ToTitle(r rune) rune

Maps the rune to its title case form. Returns the original rune if it has no title case mapping.

UTF-8 Helpers

#
func RuneLen(r rune) int

Returns the number of bytes required to UTF-8 encode the rune. Returns -1 if the rune is not a valid value to encode.

#
func ValidRune(r rune) bool

Reports whether the rune is a valid Unicode code point. Surrogates and values above MaxRune are invalid.

#
func IsASCII(r rune) bool

Reports whether the rune is an ASCII character (U+0000 to U+007F).

#
func IsLatin1(r rune) bool

Reports whether the rune is a Latin-1 character (U+0000 to U+00FF).

#
func SimpleFold(r rune) rune

Iterates over Unicode code points equivalent under simple case folding. Among the code points equivalent to the rune (including the rune itself), SimpleFold returns the smallest rune greater than the given rune, or the smallest rune overall if the given rune is the largest.