GUADEC 2014

Sorry for a bit belated post. I attended GUADEC this summer, in Strasbourg, France. This was my second GUADEC and it was again a great experience.

The keynotes were inspiring and the status updates on GNOME core components were really insightful. I particularly liked the Cheese, Documents/Photos, and GTK+ talks, as I learned a lot of new concepts, along with the ideas behind them. I also found the Python and GNOME talk thought-provoking.

I presented a talk on input-method. It mainly discussed smart input-method features which can be found in other platforms, while exploring ways of improving the current architecture. Afterwards, I had nice chats with Aron and William, working on the same area.

On the BoF days, I was planning I18N BoF, which ended up as a random hacking/chatting session. Sorry about that, but it was fun and there were interesting discussions about translation tools and spell checking.

IMG_0918

Thanks GUADEC organizers for the great event. Thanks GNOME Foundation for sponsoring my trip.
sponsored-badge-simple

GNOME.Asia Summit 2014

Sorry for a bit belated post. In the last month I attended GNOME.Asia, held in Beijing this year, in conjunction with FUDCon APAC. The conference was perfectly organized and I had a lot of fun with the enthusiastic people. Congratulations to the organizers on the successful event.

aPhoto by Rye Yao, licensed under CC BY-NC-SA 2.0

The conference started with the keynotes from both GNOME and Fedora sides, which gave me a nice overview of what’s happening/planned in those projects. The talks by the GNOME core developers were great. Allan exhibited the UI design of the access control of sandboxed applications and Lennart talked about the systemd development plan.
I had a chance to meet with a couple of speakers from Indonesia and found the geographical features and the culture very interesting, from the I18N point of view. Anish and I gave a talk about input method. He discussed the user interface focusing on text prediction and I talked about the implementation matters (slides).
Oh, and there was a mention that Japan could be a potential host of a GNOME event. Luckily, an expert was there, so we can expect it happen in near future ;-)

Even after the conference we had a fun time at the speaker dinner, one day trip to the Great Wall, a Beijing GNU/Linux user group meetup, and a visit to the Red Hat Beijing office. Thanks all the people I met.

Thanks GNOME Foundation for sponsoring my travel and accommodation.

sponsored-badge-simple

Native “.desktop” file support in gettext

The “.desktop” file format is simple, but it is a pain for translation tools such as gettext, which assumes that every output file (MO file, Java .properties file, etc) is generated for a single locale.

A translated .desktop file looks like:

[Desktop Entry]
Type=Application
Name=foo
Name[fr]=(French translation of "foo")
Name[de]=(German translation of "foo")
...

As you see, it mixes up the translations in multiple locales. This does not naturally fit in the current xgettext/msgfmt workflow. Traditionally, application developers have to use a wrapper script (aka intltool) to deal with this.

On the other hand, there has been a long-standing request to support the .desktop file format in gettext, to eliminate the extra dependency. So, I decided to give it a try. With the current patch set, you can extract translatable strings with xgettext as usual:

$ xgettext yourapp.desktop -o yourapp.pot

It doesn’t have much difference with intltool-extract, but xgettext properly interprets character escapes (\s, \n, \t, \r) and lists (foo;bar;baz;). This could prevent translation errors such as missing ‘;’ at the end of a list value.

Then, you can merge translations back to a .desktop file with msgfmt, one locale at a time:

$ msgfmt --desktop --template=yourapp.desktop.in --locale=fr -o fr.desktop fr.po
$ msgfmt --desktop --template=fr.desktop --locale=de -o de.desktop de.po
...
$ msgfmt --desktop --template=yo.desktop --locale=zh -o yourapp.desktop zh.po

It could be inefficient, as it requires writing entire file for each locale, while intltool-merge processes multiple locales at once. However, I don’t think it would be an issue, since .desktop files are usually small.

I’d like to land this in the next release, so GNOME3 applications can be localized without intltool (perhaps you might remember that gettext recently got support for GtkBuilder and GSettings schema). I’ve created a sample project using those features, based on GTK+ application tutorial.

libtextinput, a minimal input method framework, as a library

Lately, I’m working on a new input method framework called libtextinput. Well, actually, I was doing some experiments making ibus-daemon lighter, but later I decided to write a new one with my own design. Yes, I know I’m reinventing the wheel but the design is quite different from other frameworks:

  • It is not a session service, but a library, so it can be directly linked to a compositor, etc. The API is modelled after the Wayland input-method protocol and much simpler than other frameworks. The main difference is that there is no concept of “input context”, so clients can directly interact with IMEs.
  • While it mainly supports in-process IMEs, it also supports IMEs running as a separate process. This feature is particularly useful for heavy IMEs like Chinese Intelligent Pinyin or Japanese Kana Kanji conversion. Out-process IMEs are automatically restarted after crash.

For more details, see the overview and TiEngineManager chapters in the reference (still terse though). It is still in early development stage but a demo program has just started working:


If you are unable to play the movie with your browser, you can download it from here (WebM, 3.1MB).

ucd-substr

A few days ago, we RH i18n team had a lightning talk session using a TV conference system. Unfortunately, the system was non-free and not privacy aware. So I presented the lowest priority topic among my public todo items — a data format which efficiently represents the Unicode character database (UnicodeData.txt, note: 1.4MB) while providing flexible search functionality. Actually, though there are similar libraries already, few of them provide partial keyword matching.

I showed a simple algorithm using two suffix arrays, along with the size estimates. Today, I’ve prototyped it in Python as mental gymnastics. For those who might be interested, here is the code (and also a bit modified slides).

It can be used like this:

$ ./build.py UnicodeData.txt

$ du -ah names.* words.*
208K	names.data
72K	names.id
284K	names.sa
100K	words.data
32K	words.id
204K	words.sa

$ ./search.py PROLO
KATAKANA-HIRAGANA PROLONGED SOUND MARK
HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK

$ ./search.py 'OF P'
SYRIAC END OF PARAGRAPH
SLICE OF PIZZA
PILE OF POO
END OF PROOF

DSO experiment

At the GNU 30th meeting, I was doing some experiments toward a lightweight input method architecture.

IBus is based on “everything is a process” model, where each engine runs as a separate process. This is good for security and also helps developers to prototype IME in their favourite programming language, like Python. On the other hand, the approach has a potential drawback: performance. To handle a single key event, it requires context switches between processes and D-Bus IPC. As mentioned at the input method BOF at GUADEC, the performance penalty could be significant when used under Wayland, as there will be more processes involved: application, compositor, protocol translator (ibus-wayland), ibus-daemon, and engine.

So, basically, the idea is to reduce the number of processes. For IBus, given that almost all major IME have been ported to C, it should be possible to load them as a DSO instead of spawning them as a separate process.

Here’s the code for this experiment, called gisl (g* input source loader).

As noted in README, engine binary needs to be linked as PIE (Position Independent Executable) and export a stub function. The engine can then be called through the simple API of gisl, as follows:

#include <gisl/gisl.h>

static void
commit_text_cb (GislInputSource *source, const gchar *text)
{
  g_print ("Got commit_text ('%s')\n", text);
}

int
main (int argc, char **argv)
{
  GislLoader *loader;
  GislInputSource *source;
  GError *error;

  error = NULL;
  loader = gisl_loader_new ("/usr/libexec/ibus-engine-enchant", &error);
  if (!loader)
    {
      g_printerr ("Cannot load ibus-engine-enchant: %s\n",
                  error->message);
      g_error_free (error);
      exit (1);
    }

  error = NULL;
  source = gisl_loader_create_input_source (loader, "enchant", &error);
  if (!source)
    {
      g_printerr ("Cannot create enchant input source: %s\n",
                  error->message);
      g_error_free (error);
      exit (1);
    }

  g_signal_connect (source, "commit-text",
                    G_CALLBACK (commit_text_cb), NULL);

  g_print ("Calling focus_in\n");
  gisl_input_source_focus_in (source);

  g_print ("Calling process_key_event ('a')\n");
  gisl_input_source_process_key_event (source, 0x61, 38, 0);

  g_print ("Calling process_key_event ('\\n')\n");
  gisl_input_source_process_key_event (source, 0xff0d, 36, 0);

  g_object_unref (source);
  g_object_unref (loader);

  return 0;
}

Note that the engine binary itself still works with IBus. Also the API is not IBus specific, though it currently only supports IBus enignes.

I don’t know where this project is going though, I’ll show some benchmark results with a complete IM example in the next post.

GHM 2013

I’m just back home after four weeks in Europe. The stay ended with the 7th GHM (GNU Hackers’ Meeting) at IRILL in Paris, France. As usual, the meeting was cosy and warm, but this time quite a few people (around 40) attended. The topics varied from kernel (Hurd), compiler (GDC), distribution (Guix), network protocol (GNUnet), accessibility, Emacs, statistics (PSPP), and free software activism (April), and all the talks were very interesting. Also, as a newbie GNU maintainer, I got some good discussions out there, including one with Patrice Dumas on translation of texinfo files using the Texinfo XML format.

Luca preparing the opening
Luca preparing the opening

Thanks to the organizers, speakers, and attendees. Thanks again to GNOME Foundation for allowing me to attend another conference with the GUADEC travel sponsorship.

GUADEC 2013

From August 1 to 8, I attended GUADEC for the first time. The conference overall was lively and quite different from whatever I’ve attended in the past. In the first four core days, there were a lot of talks from the core developers and contributors. I specially enjoyed the security talk by Stef Walter, where he is trying to minimize the distractions with security questions.

Brno city tour after day 4
Brno city tour after day 4

Aside from the talks, there were several BOFs (including Wayland, accessibility, input methods, open fonts) where I got plenty of good discussions, which resulted in finally landing the Wayland patch for IBus (though it is still a technology preview and not enabled by default), GSettings schema support for gettext, etc.

After the conference, I have been staying in the city for one more week and working from the Red Hat Brno office, and discussed more about input methods toward Wayland. Thanks Rui and Giovanni for helping my stay.

I’ll take a vacation from this weekend and move back to Prague, Paris, and then Tokyo, but will definitely miss the quiet atmosphere and quality beers in Brno.

Thanks GNOME Foundation for the travel sponsorship.
sponsored-badge-simple

gettext 0.18.3 released

gettext は枯れたソフトウェアではあるのですが、リリースから半年も経つと新しい機能が増えてくるもので、新しいバージョンをリリースすることになりました。図らずも前回はクリスマス、今日は七夕ですね。

今回の目玉はプログラミング言語のサポート強化です。新たに Glade 3, JavaScript, Lua, Vala のソースコードから gettext の呼び出しを(割と)正確に抽出できるようになりました。また、 PO から MO 生成時には、Python 3 でよく使われる進んだフォーマット文字列のチェックも可能になっています。

他に msgfmt と msgattrib の改良、Mac OS X 上での setlocale() のバグ修正、mingw (mingw-w64 ではない) 向けクロスコンパイルの修正、地味なところでは、新しい automake との親和性向上やテストの並列実行対応などが含まれています。詳しくはアナウンスをご覧ください。