Package jexer.bits

Class ExtendedGraphemeClusterUtils


  • public class ExtendedGraphemeClusterUtils
    extends java.lang.Object
    ExtendedGraphemeClusterUtils implements most, but not all, of the grapheme cluster breaking rules of Unicode TR #29 section 3.1.1. Specifically:
    • GB3 is deliberately ignored.
    • GB4 and GB5 will break at all control characters including CR.
    • GB9c is not implemented.
    • GB11 and GB12 do not count "evenness" of previous regional indicator (RI) symbols, instead always joining.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static boolean isBraille​(int ch)
      Check if character is in the braille range.
      static boolean isCjk​(int ch)
      Check if character is in the CJK range.
      static boolean isControl​(int ch)
      Check if codepoint has the Control Grapheme_Cluster_Break property.
      static boolean isCR​(int ch)
      Check if codepoint has the CR Grapheme_Cluster_Break property.
      static boolean isEmoji​(int ch)
      Check if character is in the emoji range (Emoji, Emoji_Component, Extended_Pictographic) AND not in the Basic Multilingual Plane.
      static boolean isEmojiBMP​(int ch)
      Check if character is in the emoji range of the Basic Multilingual Plane (Emoji, Emoji_Component, Extended_Pictographic).
      static boolean isEmojiCombiner​(int ch)
      Check if character will always be part of a larger emoji sequence.
      static boolean isEmojiComponent​(int ch)
      Check if character is in the Emoji_Component range.
      static boolean isExtend​(int ch)
      Check if codepoint has the Extend Grapheme_Cluster_Break property.
      static boolean isJexerDefaultGlyph​(int ch)
      Check if character is a less-common Unicode symbol that is used by a default Jexer user interface component.
      static boolean isL​(int ch)
      Check if codepoint has the L Grapheme_Cluster_Break property.
      static boolean isLegacyComputingSymbol​(int ch)
      Check if character is in the Symbols for Legacy Computing range.
      static boolean isLF​(int ch)
      Check if codepoint has the LF Grapheme_Cluster_Break property.
      static boolean isLV​(int ch)
      Check if codepoint has the LV Grapheme_Cluster_Break property.
      static boolean isLVT​(int ch)
      Check if codepoint has the LVT Grapheme_Cluster_Break property.
      static boolean isOther​(int ch)
      Check if codepoint has the Other Grapheme_Cluster_Break property.
      static boolean isPrepend​(int ch)
      Check if codepoint has the Prepend Grapheme_Cluster_Break property.
      static boolean isRegionalIndicator​(int ch)
      Check if character is a Regional Indicator (RI) symbol.
      static boolean isSpacingMark​(int ch)
      Check if codepoint has the SpacingMark Grapheme_Cluster_Break property.
      static boolean isT​(int ch)
      Check if codepoint has the T Grapheme_Cluster_Break property.
      static boolean isV​(int ch)
      Check if codepoint has the V Grapheme_Cluster_Break property.
      static boolean isZWJ​(int ch)
      Check if codepoint has the ZWJ Grapheme_Cluster_Break property.
      static void main​(java.lang.String[] args)
      Test the extended grapheme cluster boundary code.
      static boolean shouldBreak​(int firstCh, int secondCh)
      See if a grapheme cluster break should occur between two codepoints, following most of the rules of Unicode TR #29 section 3.1.1.
      static java.util.List<ComplexCell> toComplexCells​(java.lang.String input)
      Converts a string into a sequence of grapheme clusters following most of the rules of Unicode TR #29 section 3.1.1.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • isCjk

        public static boolean isCjk​(int ch)
        Check if character is in the CJK range.
        Parameters:
        ch - character to check
        Returns:
        true if this character is in the CJK range
      • isBraille

        public static boolean isBraille​(int ch)
        Check if character is in the braille range.
        Parameters:
        ch - character to check
        Returns:
        true if this character is in the braille range
      • isEmojiBMP

        public static boolean isEmojiBMP​(int ch)
        Check if character is in the emoji range of the Basic Multilingual Plane (Emoji, Emoji_Component, Extended_Pictographic).
        Parameters:
        ch - character to check
        Returns:
        true if this character is in the emoji range
      • isEmoji

        public static boolean isEmoji​(int ch)
        Check if character is in the emoji range (Emoji, Emoji_Component, Extended_Pictographic) AND not in the Basic Multilingual Plane. For a full check of ALL emoji, use 'isEmoji(x) || isEmojiBMP(x)'.
        Parameters:
        ch - character to check
        Returns:
        true if this character is in the emoji range
      • isEmojiComponent

        public static boolean isEmojiComponent​(int ch)
        Check if character is in the Emoji_Component range. Emoji_Component codepoints are part of larger sequences, but some of them can also stand alone to represent glyphs (Emoji, Extended_Pictographic).
        Parameters:
        ch - character to check
        Returns:
        true if this character is in the emoji component range
      • isEmojiCombiner

        public static boolean isEmojiCombiner​(int ch)
        Check if character will always be part of a larger emoji sequence.
        Parameters:
        ch - character to check
        Returns:
        true if this character is only used to combine/modify emoji codepoints.
      • isRegionalIndicator

        public static boolean isRegionalIndicator​(int ch)
        Check if character is a Regional Indicator (RI) symbol.
        Parameters:
        ch - character to check
        Returns:
        true if this character is a Regional Indicator (RI) symbol
      • isLegacyComputingSymbol

        public static boolean isLegacyComputingSymbol​(int ch)
        Check if character is in the Symbols for Legacy Computing range.
        Parameters:
        ch - character to check
        Returns:
        true if this character is in the Symbols for Legacy Computing range
      • isJexerDefaultGlyph

        public static boolean isJexerDefaultGlyph​(int ch)
        Check if character is a less-common Unicode symbol that is used by a default Jexer user interface component.
        Parameters:
        ch - character to check
        Returns:
        true if this character used in the default Jexer user interface somewhere
      • isPrepend

        public static boolean isPrepend​(int ch)
        Check if codepoint has the Prepend Grapheme_Cluster_Break property.
        Parameters:
        ch - character to check
        Returns:
        true if Prepend
      • isCR

        public static boolean isCR​(int ch)
        Check if codepoint has the CR Grapheme_Cluster_Break property.
        Parameters:
        ch - character to check
        Returns:
        true if CR
      • isLF

        public static boolean isLF​(int ch)
        Check if codepoint has the LF Grapheme_Cluster_Break property.
        Parameters:
        ch - character to check
        Returns:
        true if LF
      • isControl

        public static boolean isControl​(int ch)
        Check if codepoint has the Control Grapheme_Cluster_Break property.
        Parameters:
        ch - character to check
        Returns:
        true if Control
      • isExtend

        public static boolean isExtend​(int ch)
        Check if codepoint has the Extend Grapheme_Cluster_Break property.
        Parameters:
        ch - character to check
        Returns:
        true if Extend
      • isSpacingMark

        public static boolean isSpacingMark​(int ch)
        Check if codepoint has the SpacingMark Grapheme_Cluster_Break property.
        Parameters:
        ch - character to check
        Returns:
        true if SpacingMark
      • isL

        public static boolean isL​(int ch)
        Check if codepoint has the L Grapheme_Cluster_Break property.
        Parameters:
        ch - character to check
        Returns:
        true if L
      • isV

        public static boolean isV​(int ch)
        Check if codepoint has the V Grapheme_Cluster_Break property.
        Parameters:
        ch - character to check
        Returns:
        true if V
      • isT

        public static boolean isT​(int ch)
        Check if codepoint has the T Grapheme_Cluster_Break property.
        Parameters:
        ch - character to check
        Returns:
        true if T
      • isLV

        public static boolean isLV​(int ch)
        Check if codepoint has the LV Grapheme_Cluster_Break property.
        Parameters:
        ch - character to check
        Returns:
        true if LV
      • isLVT

        public static boolean isLVT​(int ch)
        Check if codepoint has the LVT Grapheme_Cluster_Break property.
        Parameters:
        ch - character to check
        Returns:
        true if LVT
      • isZWJ

        public static boolean isZWJ​(int ch)
        Check if codepoint has the ZWJ Grapheme_Cluster_Break property.
        Parameters:
        ch - character to check
        Returns:
        true if ZWJ
      • isOther

        public static boolean isOther​(int ch)
        Check if codepoint has the Other Grapheme_Cluster_Break property.
        Parameters:
        ch - character to check
        Returns:
        true if Other
      • shouldBreak

        public static boolean shouldBreak​(int firstCh,
                                          int secondCh)
        See if a grapheme cluster break should occur between two codepoints, following most of the rules of Unicode TR #29 section 3.1.1.
        Parameters:
        firstCh - the first codepoint in the sequence
        secondCh - the second codepoint in the sequence
        Returns:
        true if a break should be between these codepoints
      • toComplexCells

        public static java.util.List<ComplexCell> toComplexCells​(java.lang.String input)
        Converts a string into a sequence of grapheme clusters following most of the rules of Unicode TR #29 section 3.1.1.
        Parameters:
        input - a string of codepoints
        Returns:
        a sequence of grapheme clusters
      • main

        public static void main​(java.lang.String[] args)
        Test the extended grapheme cluster boundary code.
        Parameters:
        args - command line arguments