2019-03-27
Unicode 13 support ([#179]).
No longer report zero width for category Sk ([#167]).
cmake
support improvements ([#173]).
2019-05-10
Unicode 12.1 support ([#156]).
New -DUTF8PROC_INSTALL=No
option for cmake
builds to disable installation ([#152]).
Better make
support for HP-UX ([#154]).
Fixed incorrect UTF8PROC_VERSION_MINOR
version number in header and bumped shared-library version.
2019-03-30
Unicode 12 support ([#148]).
New function utf8proc_unicode_version
to return the supported Unicode version ([#151]).
Simpler character-width computation that no longer uses GNU Unifont metrics: East-Asian wide characters have width 2, and all other printable characters have width 1 ([#150]).
Fix CHARBOUND
option for utf8proc_map
to preserve U+FFFE and U+FFFF non-characters ([#149]).
Various build-system improvements ([#141], [#142], [#147]).
2018-07-24
Unicode 11 support ([#132] and [#140]).
utf8proc_NFKC_Casefold
convenience function for NFKC_Casefold
normalization ([#133]).
UTF8PROC_STRIPNA
option to strip unassigned codepoints ([#133]).
Support building static libraries on Windows (callers need to
#define UTF8PROC_STATIC
) ([#123]).
cmake
fix to avoid defining UTF8PROC_EXPORTS
globally ([#121]).
toupper
of ß (U+00df) now yields ẞ (U+1E9E) ([#134]), similar to musl;
case-folding still yields the standard "ss" mapping.
utf8proc_charwidth
now returns 1
for U+00AD (soft hyphen) and
for unassigned/PUA codepoints ([#135]).
2018-04-27
Fixed composition bug ([#128]).
Minor build fixes ([#94], [#99], [#113], [#125]).
2016-12-26:
New functions utf8proc_map_custom
and utf8proc_decompose_custom
to allow user-supplied transformations of codepoints, in conjunction
with other transformations ([#89]).
New function utf8proc_normalize_utf32
to apply normalizations
directly to UTF-32 data (not just UTF-8) ([#88]).
Fixed stack overflow that could occur due to incorrect definition
of UINT16_MAX
with some compilers ([#84]).
Fixed conflict with stdbool.h
in Visual Studio ([#90]).
Updated font metrics to use Unifont 9.0.04.
2016-07-27:
Move -Wmissing-prototypes
warning flag from Makefile
to .travis.yml
since MSVC does not understand this flag and it is occasionally useful to
build using MSVC through the Makefile
([#79]).
Use a different variable name for a nested loop in bench/bench.c
, and
declare it in a C89 way rather than inside the for
to avoid "error:
'for' loop initial declarations are only allowed in C99 mode" ([#80]).
2016-07-13:
Bug fix in utf8proc_grapheme_break_stateful
([#77]).
Tests now use versioned Unicode files, so they will no longer break when a new version of Unicode is released ([#78]).
2016-07-13:
Updated for Unicode 9.0 ([#70]).
New utf8proc_grapheme_break_stateful
to handle the complicated
grapheme-breaking rules in Unicode 9. The old utf8proc_grapheme_break
is still provided, but may incorrectly identify grapheme breaks
in some Unicode-9 sequences.
Smaller Unicode tables ([#62], [#68]). This required changes
in the utf8proc_property_t
structure, which breaks backward
compatibility if you access this struct
directly. The
functions in the API remain backward-compatible, however.
Buffer overrun fix ([#66]).
2015-11-02:
Do not export symbol for internal function unsafe_encode_char()
([#55]).
Install relative symbolic links for shared libraries ([#58]).
Enable and fix compiler warnings ([#55], [#58]).
Add missing files to make clean
([#58]).
2015-07-06:
Updated for Unicode 8.0 ([#45]).
New utf8proc_tolower
and utf8proc_toupper
functions, portable
replacements for towlower
and towupper
in the C library ([#40]).
Don't treat Unicode "non-characters" as invalid, and improved validity checking in general ([#35]).
Prefix all typedefs with utf8proc_
, e.g. utf8proc_int32_t
,
to avoid collisions with other libraries ([#32]).
Rename DLLEXPORT
to UTF8PROC_DLLEXPORT
to prevent collisions.
Fix build breakage in the benchmark routines.
More fine-grained Makefile variables (PICFLAG
etcetera), so that
compilation flags can be selectively overridden, and in particular
so that CFLAGS
can be changed without accidentally eliminating
necessary flags like -fPIC
and -std=c99
([#43]).
Updated character-width tables based on Unifont 8.0.01 ([#51]) and the Unicode 8 character categories ([#47]).
2015-03-28:
Updated for Unicode 7.0 ([#6]).
New function utf8proc_grapheme_break(c1,c2)
that returns whether
there is a grapheme break between c1
and c2
([#20]).
New function utf8proc_charwidth(c)
that returns the number of
column-positions that should be required for c
; essentially a
portable replacment for wcwidth(c)
([#27]).
New function utf8proc_category(c)
that returns the Unicode
category of c
(as one of the constants UTF8PROC_CATEGORY_xx
).
Also, a function utf8proc_category_string(c)
that returns the Unicode
category of c
as a two-character string.
cmake
script CMakeLists.txt
, in addition to Makefile
, for
easier compilation on Windows ([#28]).
Various Makefile
improvements: a make check
target to perform
tests ([#13]), make install
, a rule to automate updating the Unicode
tables, etcetera.
The shared library is now versioned (e.g. has a soname on GNU/Linux) ([#24]).
C++/MSVC compatibility ([#17]).
Most #defined
constants are now enums
([#29]).
New preprocessor constants UTF8PROC_VERSION_MAJOR
,
UTF8PROC_VERSION_MINOR
, and UTF8PROC_VERSION_PATCH
for compile-time
detection of the API version.
Doxygen-formatted documentation ([#29]).
The Ruby and PostgreSQL plugins have been removed due to lack of testing ([#22]).
2013-11-27:
c
language name)2009-08-20:
RSTRING_PTR()
and RSTRING_LEN()
instead of RSTRING()->ptr
and
RSTRING()->len
for ruby1.9 compatibility (and #define
them, if not
existent)2009-10-02:
2009-10-08:
2009-10-16:
2009-06-14:
2009-08-19:
README
file2008-10-04:
utf8proc_version
returning a string containing the version
number of the library.libutf8proc.dylib
for MacOSX.2009-05-01:
- PostgreSQL 8.3 compatibility (use of SET_VARSIZE
macro)
2007-07-25:
2007-06-25:
unistrip
, which behaves like unifold
,
but also removes all character marks (e.g. accents).2007-07-22:
utf8proc_codepoint_valid
to the C library.Makefile
from -g -O0
to -O2
utf8proc_data.c
file, is now
included in the distribution.2007-03-16:
String#utf8chars
).2006-09-21:
Integer#utf8
, which raises an exception, if the given
code-point is invalid because of being too high (this was missing yet)2006-12-26:
2006-09-20:
Release of version 1.0.1
2006-09-17:
LUMP
option, which lumps certain characters together (see lump.md
) (also used for the PostgreSQL unifold
function)STRIPMARK
option, which strips marking characters (or marks of composed characters)String#char_ary
in favour of String#utf8chars
2006-07-18:
2006-08-04:
CHARBOUND
)String#chars
, which is returning an array of UTF-8 encoded grapheme clustersNLF2LF
transformation in postgresql unifold
functionDECOMPOSE
option, if you neither use COMPOSE
or DECOMPOSE
, no normalization will be performed (different from previous versions)2006-06-05:
2006-06-20:
2006-06-02: initial release of version 0.1