js' blog

How @""-literals work
Created: 02.08.2013 10:54 UTC

I was recently asked how the @""-literals work in the runtime and in ObjFW, which resulted in a lengthy explaination which I thought might be worth blogging.

Whenever the compiler finds a @""-literal like @"foo", the compiler first creates a struct for that string which looks like this:

static struct {
	Class isa;
	const char *string;
	unsigned int length;
} constant_string_1 = { Nil, "foo", 3 };

This is basically an object which is not allocated on the heap, but resides in section .data and does not have its isa pointer set. As the isa pointer needs to be set correctly, some initialization is required by the runtime, for which the compiler creates another struct which looks like this:

static struct {
	const char *class_name;
	id instances[3];
} static_instances = {
	"OFConstantString",
	{
		(id)&constant_string_1,
		(id)&constant_string_2,
		NULL
	}
};

A pointer to static_instances is then stored in the symtab. The symtab contains pointers to all selectors used in the compilation unit, pointers to the structs for all classes implemented in the compilation unit, pointers to the structs for all categories implemented in the compilation unit and pointers to all static instances in the compilation unit, together with the number of classes and the number of categories. All pointers except the pointers to the selectors are stored in a single array. The first class is at index 0, the first category at index cls_def_cnt (the number of classes) and the first list of static instances at index cls_def_cnt + cat_def_cnt (the number of categories). Of course, there can be multiple lists of static instances, as not all static instances need to be of the same class, but in practice, they usually are. After the last list of static instances, the array is terminated with a NULL pointer.

This symtab is then pointed to by the module struct, which is a struct generated for every compilation unit that uses Objective-C. The module struct also contains the ABI version for the structs it references, the name of the compilation unit (this is usually unused) and its size.

Finally, the compiler emits a function that is declared with __attribute__((constructor)) which looks like this:

static void
init(void)
{
	__objc_exec_class(&module);
}

__objc_exec_class is a function of the runtime that parses the module, the symtab, the classes etc. and of course also the static instances. It keeps a list of all static instances for which it could not resolve the class yet, as it is possible that a compilation unit containing static instances of class Foo is registered before the compilation unit containing class Foo is registered. Each time a new class is registered, the runtime checks that list to see if there are static instances whose class just became available and initializes them if that's the case. Eventually, all compilation units were registered and the runtime resolved all classes for the static instances, so that all static instances are valid objects.

While from the perspective of the runtime everything is done now, there is still some more to be done in ObjFW: As the object generated by the compiler and initialized by the runtime does not contain more than the C string and the length, it is necessary to calculate missing information like whether the string is a valid UTF-8 string, its unicode length, etc. In order to do this, instances of class OFConstantString call -[finishInitialization] on any message sent to it and then resend that same message to self. -[finishInitialization] calculates the missing information and class-swizzles the object to class OFString_const, so that the resending of the message in OFConstantString actually calls the correct code instead of doing the initialization again.

After all this, the @""-literal is finally fully initialized and behaves just like any other OFString

All structs used by the runtime can be found in src/runtime/runtime-private.h. The structs prefixed with objc_abi_ are the ones emitted by the compiler, while all that are only prefixed with objc_ are those used by the runtime. The structs generated by the compiler are actually not documented, and other runtimes don't use the strict separation between ABI structs and runtime structs: They only declare the runtime structs so that for example a class name string is stored in a Class variable and they just fix it in-place, while ObjFW's runtime uses two structs that point to the same data; the objc_abi_ structs are used before initialization and the objc_ structs after. This results in having structs that are exactly what the compiler emits and not adjusted to the runtime, so that runtime-private.h is actually what comes closest to a documentation of the ABI. (And yes, creating those struct declarations sometimes felt like reverse engineering something proprietary, as there was no documentation ;).)

PS: This is also the 100th blog post! :)