Something to consider is to rerun xgettext with different parameters in case it fails. The xgettext manual says:
By default the input files are assumed to be in ASCII.
Sometimes this will lead to incorrect results (or no results at all) and xgettext might be needed to rerun with a different option. One example where fidks fails is util-linux/fdisk.c from a recent BusyBox:
$ xgettext --omit-header --extract-all --no-wrap fdisk.c
xgettext: Non-ASCII string at fdisk.c:333.
Please specify the source encoding through --from-code.
The culprit here is actually this sequence:
"\x80" "Old Minix", /* Minix 1.4a and earlier */
where xgettext thinks this might be some UTF-8 character (but, of course, it is not a valid sequence). No output file is generated in this case.
https://git.busybox.net/busybox/tree/util-linux/fdisk.c?h=1_35_stable
Another example is the attached file (lineedit.c from BusyBox, zipped) where I have replaced a string on line 893.
$ xgettext --omit-header --extract-all --no-wrap lineedit.c
xgettext: Non-ASCII string at lineedit.c:893.
Please specify the source encoding through --from-code.
and no output file will be created.
When using the --from-code parameter the string will not be correctly extracted, but an output file will be created:
$ xgettext --omit-header --extract-all --no-wrap --from-code=UTF-8 lineedit.c
lineedit.c:442: warning: internationalized messages should not contain the '\r' escape sequence
lineedit.c:893: warning: The following msgid contains non-ASCII characters.
This will cause problems to translators who use a character encoding
different from yours. Consider using a pure ASCII msgid instead.
ë
lineedit.c:893: invalid multibyte sequence
lineedit.c:893: invalid multibyte sequence
lineedit.c:893: invalid multibyte sequence
lineedit.c:893: invalid multibyte sequence
It is not ideal, but better than getting no data at all. This could use some refinement.
Please note that this isn't true for all languages according to the xgettext manual:
--from-code=NAME
encoding of input files (except for Python, Tcl, Glade)
Something to consider is to rerun
xgettextwith different parameters in case it fails. The xgettext manual says:Sometimes this will lead to incorrect results (or no results at all) and
xgettextmight be needed to rerun with a different option. One example where fidks fails isutil-linux/fdisk.cfrom a recent BusyBox:The culprit here is actually this sequence:
where
xgettextthinks this might be some UTF-8 character (but, of course, it is not a valid sequence). No output file is generated in this case.https://git.busybox.net/busybox/tree/util-linux/fdisk.c?h=1_35_stable
Another example is the attached file (
lineedit.cfrom BusyBox, zipped) where I have replaced a string on line 893.and no output file will be created.
When using the
--from-codeparameter the string will not be correctly extracted, but an output file will be created:It is not ideal, but better than getting no data at all. This could use some refinement.
Please note that this isn't true for all languages according to the
xgettextmanual: