Babl throughput comparative

By Marti Maria | October 2, 2019

From time to time, I discover wonderful things like this:

GIMP 2.10 release notes

“GIMP now uses LittleCMS v2, which allows it to use ICC v4 color profiles. It also partially relies on the babl library for handling color transforms, since babl is simply up to 10 times faster than LCMS2 for the cases we tested both of them on. Eventually babl could replace LittleCMS in GIMP.”

OMG! something seems very wrong with the Little CMS engine!! How it can be so slow despite all optimizations it internally has? Should I plan a major rewrite for those parts?

Ok, let’s face the problem. First thing to solve an issue is to accept it, then know the root cause and finally dedicate time. So, let’s write a small program to check the performace in terms of throughput. And let’s find how slow lcms2 is.

The program does few things. It uses babl, lcms2 and lcms2 with the brand-new “fast-float plugin”, this latter is under GPL3 and promises some speed improvement… will see.

Will try to measure the throughput on 8bits transforms, that is by far the most used layout in any image editor, GIMP for example.

On the babl side, the program does this:

	static const Babl* openBablProfile(const char* profile_filename)
	{
		int len;
		const char* error = NULL;
		char* profile_data = readFile(profile_filename, &len);
		
		const Babl* profile = babl_space_from_icc(profile_data, len, 
							BABL_ICC_INTENT_RELATIVE_COLORIMETRIC, &error);
			
		if (profile == NULL)
			die(error);
		
		free(profile_data);
		
		return babl_format_with_space ("R'G'B' u8", profile);	
	}
	
	
	static void measure_babl(const char* profile_file_1, const char* profile_file_2)
	{
		clock_t clocks;
		
		babl_init();
	
		const Babl* space_1 = openBablProfile(profile_file_1);
		const Babl* space_2 = openBablProfile(profile_file_2);
		
		const Babl* transform = babl_fast_fish(space_1, space_2, "fast");
		...
		babl_process(transform, source, destination, PIXEL_COUNT)

As you can see, it creates two RGB spaces of 8 bits per component, and also allocates a “fish” that would do the translation. This is equivalent to lcms2 concepts of “profile” and “color transform”

In the lcms2 side the code is very alike:

	static void measure_lcms2_raw(const char* profile_file_1, const char* profile_file_2, 
																			const char* who)
	{
		cmsHPROFILE hIn = cmsOpenProfileFromFile(profile_file_1, "r");
		cmsHPROFILE hOut = cmsOpenProfileFromFile(profile_file_2, "r");
	
		cmsHTRANSFORM xform = cmsCreateTransform(hIn, TYPE_RGB_8, hOut, TYPE_RGB_8,
															INTENT_RELATIVE_COLORIMETRIC, 0);
		
		cmsCloseProfile(hIn);
		cmsCloseProfile(hOut);
		...
		cmsDoTransform(xform, source, destination, PIXEL_COUNT);

Since lcms2 can deal with profile files, I need not to load the profile into memory. Otherwise I’m using a standard color transform, with no particular indication about fastness/accuracy tradeoff.

The data to measure the througput is 8 bpc RGB generated in a way that would force any cache to fail:

	0x00 0x01 0x02  |  0x03 0x04 0x05 | 0x06 0x07 0x08 | ...

Alas, I found that the babl library only can handle matrix-shaper profiles and in some cases V4 desn’t work either. Ok, let’s play the game and check it by using two very known V2 ICC profiles:

sRGB Color Space Profile.icm - This is the traditional sRGB V2 color space, present on every single computer in the world.
AdobeRGB1998.icc - Another V2 profile, broadly used.

Ok, so this is a really low profile (no pun intended) for our test. But anyway let’s check it out.

Throughput1

Since this was measured on a WSL running on a PC, I have tried too a real Debian, in a MiniPC, with similar results

Througput2

What a surprise! It happens that actually, lcms2 is not such slow at all!

If you look at the output, in this low-end machine, lcms2 takes 3.59 seconds in front of 6.36 of babl, this is 77% faster, and this is with the raw lcms2, the one that comes with MIT license and the one you can use freely.

On the last line you can see the plug-in in action. Here the time lowers to 1.7 seconds, its a huge 273% boost!

When doing this test, I found babl is consistenly given results that are off by one digital count. lcms2 agress with ColorSync and the Adobe engine (photoshop). babl gives, even using the “exact” mode, an offset of 1. For example if lcms would return (120, 122, 13) babl is likely to return (121, 123, 14) It is not visually important but seems a roundoff problem.

And finally, babl doesn’t like conversions when source and destination profile is same. May sound silly, who in the world would do such conversion? The answer is when you convert from an image with embedded profile to your workspace, you have no idea what would be the embedded profile. Many times images embed sRGB and workspace is also sRGB.

Througput3

At the end the claim was not true, at least on 8 bits per sample, with the babl-dev packages Debian has and with lcms2-2.11. I may be wrong, or maybe I missed a very important part. Pardon me if this is the case. Would love to hear from other people to repeat the experiment, even with different code. Please contact me at marti.maria {at} littlecms {dot} com

Update:

My head was buzzing about all that maybe is wrong because I missed some important detail. What if their claim was about 16 bits for component? So I updated the test program to this one:

Here you can download the test2.c program

It is basically the same, but using 16 bits per component instead of 8. This is computationally more complex and many optimizations that are valid on 8 bit cannot be used anymore. Anyway numbers follow the same trend.

Througput2

This is on WSL, which it is indeed a very useful tool.

Update 2:

Finally found a combination where babl is faster! This is on 32-bit floating point, a rare situation for nobody but the most specialized photographers. But this can be solved easely by using the plug-in. On this case, the plug-in really shines!