💾 Archived View for godocs.io › github.com › rivo › uniseg captured on 2021-12-05 at 23:47:19. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-12-04)
-=-=-=-=-=-=-
import "github.com/rivo/uniseg"
Package uniseg implements Unicode Text Segmentation according to Unicode Standard Annex #29 (http://unicode.org/reports/tr29/).
At this point, only the determination of grapheme cluster boundaries is implemented.
func GraphemeClusterCount(s string) (n int)
GraphemeClusterCount returns the number of user-perceived characters (grapheme clusters) for the given string. To calculate this number, it iterates through the string using the Graphemes iterator.
type Graphemes struct { // contains filtered or unexported fields }
Graphemes implements an iterator over Unicode extended grapheme clusters, specified in the Unicode Standard Annex #29. Grapheme clusters correspond to "user-perceived characters". These characters often consist of multiple code points (e.g. the "woman kissing woman" emoji consists of 8 code points: woman + ZWJ + heavy black heart (2 code points) + ZWJ + kiss mark + ZWJ + woman) and the rules described in Annex #29 must be applied to group those code points into clusters perceived by the user as one character.
Example
Type example.
Code:
gr := NewGraphemes("👍🏼!") for gr.Next() { fmt.Printf("%x ", gr.Runes()) }
Output:
[1f44d 1f3fc] [21]
func NewGraphemes(s string) *Graphemes
NewGraphemes returns a new grapheme cluster iterator.
func (g *Graphemes) Bytes() []byte
Bytes returns a byte slice which corresponds to the current grapheme cluster. If the iterator is already past the end or Next() has not yet been called, nil is returned.
func (g *Graphemes) Next() bool
Next advances the iterator by one grapheme cluster and returns false if no clusters are left. This function must be called before the first cluster is accessed.
func (g *Graphemes) Positions() (int, int)
Positions returns the interval of the current grapheme cluster as byte positions into the original string. The first returned value "from" indexes the first byte and the second returned value "to" indexes the first byte that is not included anymore, i.e. str[from:to] is the current grapheme cluster of the original string "str". If Next() has not yet been called, both values are 0. If the iterator is already past the end, both values are 1.
func (g *Graphemes) Reset()
Reset puts the iterator into its initial state such that the next call to Next() sets it to the first grapheme cluster again.
func (g *Graphemes) Runes() []rune
Runes returns a slice of runes (code points) which corresponds to the current grapheme cluster. If the iterator is already past the end or Next() has not yet been called, nil is returned.
func (g *Graphemes) Str() string
Str returns a substring of the original string which corresponds to the current grapheme cluster. If the iterator is already past the end or Next() has not yet been called, an empty string is returned.