xgettext: rerun with UTF-8 encoding and/or properly process failures

Something to consider is to rerun `xgettext` with different parameters in case it fails. The xgettext manual says:

```
By default the input files are assumed to be in ASCII.
```

Sometimes this will lead to incorrect results (or no results at all) and `xgettext` might be needed to rerun with a different option. One example where fidks fails is `util-linux/fdisk.c` from a recent BusyBox:

```
$ xgettext --omit-header --extract-all --no-wrap fdisk.c
xgettext: Non-ASCII string at fdisk.c:333.
          Please specify the source encoding through --from-code.
```

The culprit here is actually this sequence:

```
    "\x80" "Old Minix",        /* Minix 1.4a and earlier */
```

where `xgettext` thinks this might be some UTF-8 character (but, of course, it is not a valid sequence). No output file is generated in this case.

https://git.busybox.net/busybox/tree/util-linux/fdisk.c?h=1_35_stable

Another example is the attached file (`lineedit.c` from BusyBox, zipped) where I have replaced a string on line 893.

```
$ xgettext --omit-header --extract-all --no-wrap lineedit.c
xgettext: Non-ASCII string at lineedit.c:893.
          Please specify the source encoding through --from-code.
```

and no output file will be created.

When using the `--from-code` parameter the string will not be correctly extracted, but an output file will be created:

```
$ xgettext --omit-header --extract-all --no-wrap --from-code=UTF-8 lineedit.c
lineedit.c:442: warning: internationalized messages should not contain the '\r' escape sequence
lineedit.c:893: warning: The following msgid contains non-ASCII characters.
                         This will cause problems to translators who use a character encoding
                         different from yours. Consider using a pure ASCII msgid instead.
                         ë
lineedit.c:893: invalid multibyte sequence
lineedit.c:893: invalid multibyte sequence
lineedit.c:893: invalid multibyte sequence
lineedit.c:893: invalid multibyte sequence
```

It is not ideal, but better than getting no data at all. This could use some refinement.

Please note that this isn't true for all languages according to the `xgettext` manual:

```
       --from-code=NAME
              encoding of input files (except for Python, Tcl, Glade)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

xgettext: rerun with UTF-8 encoding and/or properly process failures #14

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

xgettext: rerun with UTF-8 encoding and/or properly process failures #14

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions