Releases · elixir-unicode/unicode_string · GitHub

30 Apr 21:56

kipcole9

Unicode String version 2.1.0 Latest

Latest

Bug Fixes

Improve line break segmentation conformance and compatibility with ICU.

Enhancements

Replaces the regex-based segmentation engine with a single-pass DFA evaluator. Sentence break on a 4 KB unbroken sentence drops from ~9,200 ms to ~11 ms (~840×); word break on a 4 KB sentence from ~7,000 ms to ~12 ms (~580×); scaling is now linear in input length instead of O(N²).

Assets 2

29 Apr 09:02

kipcole9

Unicode String version 2.0.1

Bug Fixes

Fix compile + dialyzer + tests without optional :localize dependency.

Assets 2

14 Apr 05:10

kipcole9

Unicode String version 2.0.0

Breaking change

Unicode String version 2.0 and later is supported on Elixir 1.17 or later only.

Enhancements

Replace ex_cldr with localize as the localization library
Fix titalcasing the letter i - including correct handling in Turkic languages
Use Localize.Locale.best_match/3 for locale matching
Fixes to the Unicode.Break module.

Assets 2

18 Jan 18:41

kipcole9

Unicode String version 1.8.0

Enhancements

Updates to Unicode 17.0 data.

Assets 2

28 Mar 22:58

kipcole9

Unicode String version 1.7.0

Bug Fixes

Converts all compile-time regex compilation to runtime to be compatible with OTP 28. Performance implications are not yet known.

Assets 2

17 Mar 05:54

kipcole9

Unicode String version 1.6.0

Bug Fixes

Fix word break detection when a \p{word_break=extend} codepoint is preceeded by a letter and followed by a letter.

Enhancements

Updated to CLDR 47 break rules and test data.

Assets 2

01 Jan 01:57

kipcole9

Unicode String version 1.5.0

Enhancements

Update to CLDR 46.1 segmentation data and tests.
Pass dialyzer with :underspecs flag set.

Assets 2

13 Mar 14:22

kipcole9

Unicode String version 1.4.1

Bug Fixes

Fix performance regressing in Uncode.String.Break.next/4. Added the script bench/next.exs to allow for regression testing. Thanks to @mntns for the report. Closes #6.

Contributors

mntns

Assets 2

05 Mar 20:52

kipcole9

Unicode String version 1.3.1

Bug Fixes

Fix Unicode.String.split/2 and Unicode.String.next/2 when the passing rule is :no_break rule. Thanks to @GregLMcDonald for the report. Closes #5.

Contributors

GregLMcDonald

Assets 2

27 Feb 02:30

kipcole9

Unicode String version 1.3.0

Bug Fixes

Fix case folding for codepoints that fold to themselves.

Enhancements

Adds case mapping functions Unicode.String.upcase/2, Unicode.String.downcase/2 and Unicode/String.titlecase/2. These functions implement the full Unicode Casing algorithm including conditiional mappings. They are locale-aware and a locale can be specified as a string, atom or a Cldr.LanguageTag thereby providing basic integration between unicode_string and ex_cldr.
Case folding always follows the :full path which allows mapping of single code points to multiple code points. There is no practical reason to implement the :simple path. As a result, the type parameter to Unicode.String.Case.Folding.fold/2 is no longer required or supported.
Support an ex_cldr Language Tag as a parameter to Unicode.String.Case.Folding.fold/2. In fact any map that has a :language key with a value that is an ISO 639-1 language code as a lower cased atom may be passed as a parameter.

Assets 2