Now Reading
Understanding Goal-C by transpiling it to C++

Understanding Goal-C by transpiling it to C++

2023-12-02 08:45:26


Apple closely pushes for Swift because the programming language for its platforms.
Nonetheless, Goal-C will not be going anyplace but. A 2023 examine reveals that
“Objective-C is still at the core of iOS and is used directly or indirectly
by most
apps”
.
Additionally, most frameworks shipped on macOS (as we saw on a previous
post
)
are nonetheless written in Goal-C.

As you in all probability know, Goal-C is a superset of C. In reality, the Objective-C
runtime

is a plain C library. An superior trick that Wojciech
Reguła
just lately launched me to is to transpile
Goal-C to C++. This can be a nice solution to study extra concerning the Goal-C
runtime, and the way Goal-C works beneath the hood.

On this article, we’ll transpile an instance Goal-C program to C++,
spotlight some attention-grabbing components of the generated code, and discover a number of the
historical past and present standing of this work on the LLVM
mission.

Instance: Transpiling “Hiya World”

Let’s have a look at an instance, primarily based on the next pattern Goal-C program:

// fundamental.m
#import <Basis/Basis.h>

int fundamental() {
  @autoreleasepool {
    NSLog(@"Hiya World");
  }

  return EXIT_SUCCESS;
}

To transpile this Goal-C program to C++, we are able to use Clang’s
-rewrite-objc
possibility, together with the -Wno-everything choice to quiet warnings which can be
irrelevant for the sake of this publish, and the -fno-ms-extensions to disable
Microsoft-specific extensions (extra on this later):

$ xcrun clang fundamental.m -o fundamental.cc -rewrite-objc -Wno-everything -fno-ms-extensions

The fundamental.cc output might be a reasonably large C++ file (over 60k strains on my
system) that appears one thing like this:

#ifndef __OBJC2__
#outline __OBJC2__
#endif
struct objc_selector; struct objc_class;
struct __rw_objc_super {
	struct objc_object *object;
	struct objc_object *superClass;
	__rw_objc_super(struct objc_object *o, struct objc_object *s) : object(o), superClass(s) {}
};

// ...

int fundamental() {
  /* @autoreleasepool */ { __AtAutoreleasePool __autoreleasepool;
    NSLog((NSString *)&__NSConstantStringImpl__var_folders_sy_wb_f149x2v9_j6xdhfrtr9c00000gn_T_main_fca8a5_mi_0);
  }
  return 0;
}
static struct IMAGE_INFO { unsigned model; unsigned flag; } _OBJC_IMAGE_INFO = { 0, 2 };

Let’s discover some attention-grabbing components of the ensuing code, beginning with a
easy one.

Whereas we received’t showcase it on this article, -rewrite-objc can be used
to transpile Goal-C++ to C++.

Inspecting NSString static strings

Right here is our preliminary easy
NSLog
invocation:

Which the re-writer translated to:

NSLog((NSString *)&__NSConstantStringImpl__var_folders_sy_wb_f149x2v9_j6xdhfrtr9c00000gn_T_main_6b2f4b_mii_0);

Our “Hiya World” fixed string is statically allotted as a __NSConstantStringImpl,

static __NSConstantStringImpl __NSConstantStringImpl__var_folders_sy_wb_f149x2v9_j6xdhfrtr9c00000gn_T_main_6b2f4b_mii_0 __attribute__ ((part ("__DATA, __cfstring"))) = {__CFConstantStringClassReference,0x000007c8,"Hiya World",11};

The __NSConstantStringImpl construction appears to be like like this:

struct __NSConstantStringImpl {
  int *isa;
  int flags;
  char *str;
#if _WIN64
  lengthy lengthy size;
#else
  lengthy size;
#endif
};

Cross-referencing this with the brace initialization of our
__NSConstantStringImpl occasion, we are able to decide that the item is a
__CFConstantStringClassReference, that it has the flags 0x000007c8, that
the precise string is Hiya World, and that its size is 11. In case you are
curious concerning the flags integer, the CFString
implementation,
a part of the Core
Foundation

framework, tells us that it’s an immutable, UTF-8 string that makes use of the default
allocator, and whose contents are usually not freed up.

The (part ("__DATA, __cfstring")) attribute specifies that the string should
be saved within the __cfstring part of the __DATA (learn/write) phase of
the ensuing
Mach-O
executable. To higher perceive this, let’s compile the “Hiya World”
Goal-C program (within the standard method) and examine it utilizing the open-source
MachOView desktop utility.

Inspecting `__DATA_CONST.__cfstring` and `__TEXT.__cstring` with MachOView

On this instance, C string literals are saved at particular offsets of the
__cstring part of the __TEXT (read-only) phase, and the CFString
objects are saved within the __cstring part of the __DATA_CONST phase,
pointing again on the offset of the C strings.

Be aware that the Clang Goal-C to C++ re-writer doesn’t add a const
qualifier to the __NSConstantStringImpl occasion, ensuing within the object
being saved within the __DATA phase, as an alternative of the __DATA_CONST phase
as the traditional Goal-C compilation course of appears to do. We’ll contact on
why these variations exist later within the publish.

Much more curiously, we are able to see the members of the __NSConstantStringImpl
construction being specified by the executable. The primary entry corresponds to the
isa offset, the second entry corresponds to the flags integer, the third
entry corresponds to the str C string offset (as we noticed earlier than), and the
fourth entry corresponds to the size of the string.

Mach-O example of `__NSConstantStringImpl`

Coming again to the generated C++ code, earlier than invoking NSLog, the
__NSConstantStringImpl occasion is handled as a forged to NSString, which is
outlined as follows:

// @class NSString;
#ifndef _REWRITER_typedef_NSString
#outline _REWRITER_typedef_NSString
typedef struct objc_object NSString;
typedef struct {} _objc_exc_NSString;
#endif

In keeping with the above definition, NSString is an alias (typedef) to
objc_object, which based on the Objective-C
runtime
,
corresponds to a pointer to an arbitrary Goal-C object. That’s,
objc_object equals the well-known id Goal-C kind. In reality, the
generated C++ code defines id like this:

typedef struct objc_class *Class;
struct objc_object {
    Class _Nonnull isa __attribute__((deprecated));
};
typedef struct objc_object *id;

Inspecting @autoreleasepool blocks

Because the introduction of
ARC (Automated
Reference Counting), the
NSAutoReleasePool
can’t be straight used, and was changed by @autoreleasepool blocks.

If we check out the generated C++ code, we are able to see that Clang re-wrote the
@autoreleasepool block as follows:

/* @autoreleasepool */ { __AtAutoreleasePool __autoreleasepool;
  NSLog((NSString *)&__NSConstantStringImpl__var_folders_sy_wb_f149x2v9_j6xdhfrtr9c00000gn_T_main_fca8a5_mi_0);
}

The important thing right here is the __AtAutoreleasePool class, outlined near the start
of the generated file:

struct __AtAutoreleasePool {
  __AtAutoreleasePool() {atautoreleasepoolobj = objc_autoreleasePoolPush();}
  ~__AtAutoreleasePool() {objc_autoreleasePoolPop(atautoreleasepoolobj);}
  void * atautoreleasepoolobj;
};

This can be a C++ RAII (Useful resource
Acquisition Is Initialization) wrapper over the objc_autoreleasePoolPush and
objc_autoreleasePoolPop personal C features of the runtime.

These features are usually not coated by the Apple documentation, and are usually not
declared on the general public headers of the Goal-C runtime, which you’ll be able to
affirm with the next grep(1) command:

$ grep objc_autorelease $(xcode-select --print-path)/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/embrace/objc/*

In a previous
article
,
we explored extract the dyld shared cache of your system libraries.
Assuming your extracted cache is positioned at $HOME/dyld-cache-arm64e, you possibly can
affirm objc_autoreleasePoolPush and objc_autoreleasePoolPop are globally
uncovered symbols of libobjc.A.dylib utilizing nm(1):

$ nm -g $HOME/dyld-cache-arm64e/usr/lib/libobjc.A.dylib | grep objc_autorelease
00000001800a4afc T __objc_autoreleasePoolPop
00000001800a4b00 T __objc_autoreleasePoolPrint
00000001800a4af8 T __objc_autoreleasePoolPush
0000000180075850 T _objc_autorelease
00000001800739ec T _objc_autoreleasePoolPop
00000001800738ac T _objc_autoreleasePoolPush
0000000180076b8c T _objc_autoreleaseReturnValue

You can too discover references to those features within the TDB that declares
exported symbols for libobjc.A.dylib:

$ grep objc_autorelease < $(xcode-select --print-path)/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib/libobjc.A.tbd
           __objc_atfork_parent, __objc_atfork_prepare, __objc_autoreleasePoolPop,
           __objc_autoreleasePoolPrint, __objc_autoreleasePoolPush, __objc_beginClassEnumeration,
           _objc_allocateProtocol, _objc_autorelease, _objc_autoreleasePoolPop,
           _objc_autoreleasePoolPush, _objc_autoreleaseReturnValue, _objc_begin_catch,

Coming again to our generated code, these personal features that aren’t
declared within the Goal-C runtime headers are consumed like this:

extern "C" __declspec(dllimport) void * objc_autoreleasePoolPush(void);
extern "C" __declspec(dllimport) void objc_autoreleasePoolPop(void *);

Microsoft Extensions

You may be puzzled by the seemingly Home windows-specific __declspec(dllimport)
attribute.

See Also

Let’s dig a bit into it. I’m working AppleClang 1500.0.40.1 (Xcode 15.0.1),
which corresponds to LLVM 16. In LLVM
16, the Goal-C re-writer we’re utilizing is applied in
clang/lib/Frontend/Rewrite/RewriteModernObjC.cpp.

You may need famous
clang/lib/Frontend/Rewrite/RewriteObjC.cpp,
which corresponds to the previous
-rewrite-legacy-objc
Clang possibility. That re-writer is deprecated and shouldn’t be used anymore.

Looking into RewriteModernObjC.cpp, we are able to see that the re-writer has
varied conditionals round LangOpts.MicrosoftExt for performing
Microsoft-specific rewrites. For instance, strains 5930 to
5935

comprise the next logic:

if (LangOpts.MicrosoftExt) {
  Preamble += "#outline __OBJC_RW_DLLIMPORT extern "C" __declspec(dllimport)n";
  Preamble += "#outline __OBJC_RW_STATICIMPORT extern "C"n";
}
else
  Preamble += "#outline __OBJC_RW_DLLIMPORT externn";

As you would possibly count on, that is the explanation we initially handed the
-fno-ms-extensions. Nonetheless, these Microsoft-specific conditionals are usually not
constantly dealt with in the mean time. For instance, you would possibly discover FIXME
feedback just like the one in strains 1012 to
1014
:

// FIXME. Is that this attribute right in all circumstances?
Setr = "nextern "C" __declspec(dllimport) "
"void objc_setProperty (id, SEL, lengthy, id, bool, bool);n";

Extra particular to our case, the re-writer (incorrectly?) hardcodes
__declspec(dllimport) for objc_autoreleasePoolPush and
objc_autoreleasePoolPop in strains 6045 to
6046
:

Preamble += "extern "C" __declspec(dllimport) void * objc_autoreleasePoolPush(void);n";
Preamble += "extern "C" __declspec(dllimport) void objc_autoreleasePoolPop(void *);nn";

Is Goal-C only a transpiler?

If you happen to obtained this far, you may be questioning how LLVM makes use of this
Goal-C re-writer. Whenever you compile Goal-C, this re-writer will not be
used.

As a substitute, LLVM has an Goal-C frontend that straight compiles to LLVM
IR (Intermediate Illustration), which
is reworked to machine code by the LLVM backend. You’ll be able to peek into the
production-ready Goal-C frontend for LLVM 16 at
clang/lib/CodeGen/CGObjC.cpp.

Limitations of the re-writer

The truth that regular Goal-C compilation follows a distinct course of
explains some inconsistencies we noticed with the re-writer on this article, like
the truth that static strings are put within the __DATA phase as an alternative of within the
__DATA_CONST phase and lacking conditionals round Microsoft-specific
extensions and dllimport.

Other than minor inconsistencies, the re-writer appears to have many different
points. Until you present trivial examples that don’t make use of the
Foundation framework,
the generated C++ code doesn’t compile. For instance, whereas experimenting with
the “Hiya World” program introduced in the beginning of this chapter, I discovered
references to flawed construction names, some Goal-C @property declarations
not being re-written, invalid typedef aliases, and extra.

If we take a detour into LLVM once more, Clang’s
README
states that “Clang is helpful for a lot of issues past simply compiling
code: we intend for Clang to be host to a lot of totally different source-level
instruments.”
Seems that the Goal-C re-writer is simply an facet experiment
best-effort instrument started in
2007

by Chris Lattner, creator of LLVM and Swift.

During the last 15 years, this re-writer experiment had constant informal
contributions and a rising end-to-end test
suite
.
Even whether it is nonetheless not excellent, you possibly can already study many issues about
Goal-C with it!


Thanks for stopping by!

If you happen to like my work, regulate JSON BinPack, an open-source binary format for the Web of Issues with a robust concentrate on space-efficiency.


Feedback? Drop me a line at jv@jviotti.com

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top