Skip to content

Releases: elixir-unicode/unicode_string

Unicode String version 2.1.0

30 Apr 21:56

Choose a tag to compare

Bug Fixes

  • Improve line break segmentation conformance and compatibility with ICU.

Enhancements

  • Replaces the regex-based segmentation engine with a single-pass DFA evaluator. Sentence break on a 4 KB unbroken sentence drops from ~9,200 ms to ~11 ms (~840×); word break on a 4 KB sentence from ~7,000 ms to ~12 ms (~580×); scaling is now linear in input length instead of O(N²).

Unicode String version 2.0.1

29 Apr 09:02

Choose a tag to compare

Bug Fixes

  • Fix compile + dialyzer + tests without optional :localize dependency.

Unicode String version 2.0.0

14 Apr 05:10

Choose a tag to compare

Breaking change

  • Unicode String version 2.0 and later is supported on Elixir 1.17 or later only.

Enhancements

  • Replace ex_cldr with localize as the localization library

  • Fix titalcasing the letter i - including correct handling in Turkic languages

  • Use Localize.Locale.best_match/3 for locale matching

  • Fixes to the Unicode.Break module.

Unicode String version 1.8.0

18 Jan 18:41

Choose a tag to compare

Enhancements

Unicode String version 1.7.0

28 Mar 22:58

Choose a tag to compare

Bug Fixes

  • Converts all compile-time regex compilation to runtime to be compatible with OTP 28. Performance implications are not yet known.

Unicode String version 1.6.0

17 Mar 05:54

Choose a tag to compare

Bug Fixes

  • Fix word break detection when a \p{word_break=extend} codepoint is preceeded by a letter and followed by a letter.

Enhancements

  • Updated to CLDR 47 break rules and test data.

Unicode String version 1.5.0

01 Jan 01:57

Choose a tag to compare

Enhancements

  • Update to CLDR 46.1 segmentation data and tests.

  • Pass dialyzer with :underspecs flag set.

Unicode String version 1.4.1

13 Mar 14:22

Choose a tag to compare

Bug Fixes

  • Fix performance regressing in Uncode.String.Break.next/4. Added the script bench/next.exs to allow for regression testing. Thanks to @mntns for the report. Closes #6.

Unicode String version 1.3.1

05 Mar 20:52

Choose a tag to compare

Bug Fixes

  • Fix Unicode.String.split/2 and Unicode.String.next/2 when the passing rule is :no_break rule. Thanks to @GregLMcDonald for the report. Closes #5.

Unicode String version 1.3.0

27 Feb 02:30

Choose a tag to compare

Bug Fixes

  • Fix case folding for codepoints that fold to themselves.

Enhancements

  • Adds case mapping functions Unicode.String.upcase/2, Unicode.String.downcase/2 and Unicode/String.titlecase/2. These functions implement the full Unicode Casing algorithm including conditiional mappings. They are locale-aware and a locale can be specified as a string, atom or a Cldr.LanguageTag thereby providing basic integration between unicode_string and ex_cldr.

  • Case folding always follows the :full path which allows mapping of single code points to multiple code points. There is no practical reason to implement the :simple path. As a result, the type parameter to Unicode.String.Case.Folding.fold/2 is no longer required or supported.

  • Support an ex_cldr Language Tag as a parameter to Unicode.String.Case.Folding.fold/2. In fact any map that has a :language key with a value that is an ISO 639-1 language code as a lower cased atom may be passed as a parameter.