Self-registering class descriptors: deleting MINIT boilerplate in a PHP extension
A C23 pattern using __attribute__((constructor)) and a topo-sorted registry to replace the hand-maintained MINIT class list in the ScyllaDB PHP driver — plus the build-time generator that emits the de
On this page
The ScyllaDB PHP driver used to open MINIT with ~70 calls to hand-written define_*() functions, one per registered class. The order mattered — parent before child, interface before implementer, Value before Bigint — and the only way to know you'd got it wrong was a segfault during php -m.
That whole block is now one call. Each class declares itself at file scope, a __attribute__((constructor)) appends it to a linked list at .so load time, and php_scylladb_class_registry_minit() topologically sorts by FQN and registers in dependency order. Missing or cyclic deps fail loudly at MINIT with the class name in the message.
This post is the second in the ScyllaDB PHP driver series. The pattern isn't novel — kernel module init tables and static_init-style registries have done this in C for decades — but it solves a problem PHP extension authors hit on day one and most of us paper over with a hand-maintained switch statement. If you've ever stared at a 50-line MINIT and wished the file declared its own registration, this is what that looks like.
the problem MINIT actually has
A PHP extension declares its classes in PHP_MINIT_FUNCTION(ext). The mechanical part is INIT_CLASS_ENTRY + zend_register_internal_class (or, since PHP 8.3, the register_class_*() helpers generated from .stub.php). The political part is that you have to call them in the right order.
zend_register_internal_class_ex wants a parent zend_class_entry*. If the parent hasn't been registered yet, you pass NULL and the child silently extends nothing. Interface implementers want the interface ce* to be present before they call zend_class_implements. So MINIT becomes:
// MINIT — the version this PR replaced
define_Cassandra_Value(); // interface, no deps
define_Cassandra_Numeric(); // interface, no deps
define_Cassandra_RetryPolicy(); // interface, no deps
define_Cassandra_Bigint(); // implements Value, Numeric
define_Cassandra_RetryPolicy_DefaultPolicy(); // implements RetryPolicy
// … 65 more lines, all order-sensitiveThere's no checker. The compiler can't tell you that Bigint came before Value; it's just two function calls in a .c file. Renaming a class, adding an interface, or reordering for readability all have the same failure mode: a class entry with the wrong parent, discovered the first time someone instantiates it.
The other problem is locality. The class lives in src/Bigint.c. Its interfaces live in src/Numeric.stub.php and src/Value.stub.php. But the registration order lives in src/php_driver.c — a file most contributors never edit. Adding a class meant editing three or four places, in two languages, and remembering an ordering rule that wasn't enforced anywhere.
the registry
Every class file declares a descriptor at file scope:
// src/RetryPolicy/DefaultPolicy.c (excerpt)
PHP_SCYLLADB_REGISTER_CLASS(
retry_policy_default_policy, // C identifier suffix
"Cassandra\\RetryPolicy\\DefaultPolicy", // FQN
&php_scylladb_retry_policy_default_policy_ce, // where to publish the ce*
"Cassandra\\RetryPolicy", // parent FQN (or nullptr)
php_scylladb_retry_policy_default_policy_register
)The macro expands to a static descriptor struct + a constructor function that appends it to a global linked list:
// src/Registry/Registry.h
#define PHP_SCYLLADB_REGISTER_CLASS(slug, _name, _ce_out, _parent, _register_fn) \
static const char *const scylladb_cls_##slug##_deps[] = { (_parent), nullptr }; \
static php_scylladb_class_descriptor_t scylladb_cls_##slug = { \
.name = (_name), \
.deps = ((_parent) == nullptr ? nullptr : scylladb_cls_##slug##_deps), \
.ce_out = (_ce_out), \
.register_ = (_register_fn), \
.next = nullptr, \
.registered = false, \
}; \
__attribute__((constructor)) \
static void scylladb_cls_register_##slug##_ctor(void) { \
php_scylladb_class_registry_add(&scylladb_cls_##slug); \
}__attribute__((constructor)) is a GCC/Clang extension — also accepted by recent MSVC via /Zc:__attribute__ — that runs the function during the dynamic linker's _init phase, before dlopen returns to PHP and before MINIT fires. By the time php_scylladb_class_registry_minit() runs, every descriptor in every .o linked into the extension has already added itself to registry_head.
There's one linker subtlety to flag now, because it'll bite you later: if a .o file has no externally referenced symbols, the static linker drops it from the final binary and the constructor never runs. The CMake wiring deals with this by linking the descriptor archives with $<LINK_LIBRARY:WHOLE_ARCHIVE,…>:
# cmake/GenStubs.cmake — module libs are linked whole, no GC
target_link_libraries(php_scylladb_ext PRIVATE
"$<LINK_LIBRARY:WHOLE_ARCHIVE,php_scylladb_retry_policy>"
"$<LINK_LIBRARY:WHOLE_ARCHIVE,php_scylladb_value>"
# …
)Without that, the linker sees DefaultPolicy_descriptor.o, notices nothing else references it, drops it, and the constructor — which exists purely for its side effect — never runs. The class then doesn't exist at MINIT. It's the constructor-attribute equivalent of forgetting extern.
topo-sort at MINIT
The runtime side is small enough to paste in full:
// src/Registry/Registry.c — topo sort, Kahn-ish
void php_scylladb_class_registry_minit(void) {
bool progress = true;
while (progress) {
progress = false;
for (php_scylladb_class_descriptor_t *d = registry_head; d; d = d->next) {
if (d->registered) continue;
zend_class_entry *resolved[MAX_DEPS_PER_CLASS] = { nullptr };
size_t n_deps = 0;
bool defer = false;
if (d->deps != nullptr) {
for (size_t i = 0; d->deps[i] != nullptr; i++) {
bool dep_deferred = false;
zend_class_entry *ce = resolve_dep(d->deps[i], &dep_deferred);
if (dep_deferred) { defer = true; break; }
if (ce == nullptr) {
zend_error_noreturn(E_CORE_ERROR,
"scylladb registry: class '%s' declares dep '%s' which is neither registered nor known to Zend",
d->name, d->deps[i]);
}
resolved[i] = ce;
n_deps = i + 1;
}
}
if (defer) continue;
zend_class_entry *ce = d->register_(n_deps > 0 ? resolved : nullptr);
*(d->ce_out) = ce;
d->registered = true;
progress = true;
}
}
/* Anything still un-registered means a cyclic or missing dep. */
for (php_scylladb_class_descriptor_t *d = registry_head; d; d = d->next) {
if (!d->registered) {
zend_error_noreturn(E_CORE_ERROR,
"scylladb registry: class '%s' could not be registered (cyclic or missing registry-owned dep)",
d->name);
}
}
}It's O(n²) — for each pass over the list, every descriptor whose deps are now resolved gets registered, and the loop reruns until a full pass makes no progress. The driver has ~120 classes, MINIT runs once per process, and the constant factor is strcmp on FQN strings. The full registration pass is below 1 ms on a 2023 M2.
resolve_dep does the only interesting thing in the file: it tries the registry first (so registry-owned classes resolve cleanly), and falls back to zend_lookup_class for anything it doesn't own:
static zend_class_entry *resolve_dep(const char *fqn, bool *deferred) {
*deferred = false;
php_scylladb_class_descriptor_t *p = registry_find(fqn);
if (p != nullptr) {
if (!p->registered) { *deferred = true; return nullptr; }
return *(p->ce_out);
}
zend_string *lookup = zend_string_init(fqn, strlen(fqn), 0);
zend_class_entry *ce = zend_lookup_class(lookup);
zend_string_release(lookup);
return ce;
}That second branch is what makes SPL parents work. \Cassandra\Exception\DivideByZeroException ultimately inherits from \RangeException, which the driver doesn't own and can't INIT_CLASS_ENTRY on. zend_lookup_class finds it in Zend's global class table, the registry treats it as already-resolved, and the child registers normally.
The error messages name the culprit. If you add a class with a dep on "Cassandra\\Foo" and nothing declares Foo, MINIT aborts with:
PHP Fatal error: scylladb registry: class 'Cassandra\Bar' declares dep
'Cassandra\Foo' which is neither registered nor known to ZendThat's the bit I care about most. The old MINIT failed silently — a class would register with parent_ce = NULL and the bug would surface during instanceof checks. The registry fails at module load, before a single request runs.
the part you don't write
Two generators, one stub. The split between them is the whole point.
gen_stub.php ships with php-src and every modern extension uses it. Feed it DefaultPolicy.stub.php and it emits DefaultPolicy_arginfo.h containing register_class_Cassandra_RetryPolicy_DefaultPolicy(). That function is where the zend_class_entry is actually born — INIT_CLASS_ENTRY, class flags, property declarations, zend_register_internal_class_ex. Mature, version-aware, not mine to touch.
tools/gen_descriptor/gen_class_descriptor.php is mine. 513 lines of straight PHP, no dependencies. Same stub, different output: DefaultPolicy_descriptor.c.
// tools/gen_descriptor/gen_class_descriptor.php
/**
* What it produces:
* - the `php_scylladb_<snake>_ce` global — the pointer the rest of the
* extension reads; the class entry itself is built by register_class_*()
* from _arginfo.h
* - the `zend_object_handlers php_scylladb_<snake>_handlers` global
* - a register fn that calls register_class_*() with deps wired from
* the stub's extends/implements, then applies create_object + handler
* overrides via weakly-declared callbacks, and calls a weak
* post_register hook
* - the PHP_SCYLLADB_REGISTER_CLASS / _DEPS macro invocation
*/Read the split this way: php-src builds the class entry, my generator hangs it on the registry. Both run on every build, both write into the build directory, neither file lives in git. CMake exposes them as siblings — php_scylladb_generate_arginfo() and php_scylladb_generate_descriptor() — and you point both at the same .stub.php.
That leaves four things per class for a human to write. ZEND_METHOD bodies. The convention-named callbacks — _new, _free, _compare, _gc, _clone, _cast, _hash_value, _post_register — declared as weak symbols so the ones you skip stay NULL. Any public C helpers like _instantiate. And the stub itself. Everything in between is build output.
There are two annotations:
@scylladb-value-handlersin a class docblock opts into thephp_scylladb_value_handlersstruct (standard handlers + an extrahash_valueslot). Applied toBigint,Set,Map,Blob,Inet, and the rest of the Value-typed classes.@scylladb-no-generateis the escape hatch for classes that need a fully hand-written descriptor. Currently used byCustom(no stub at all — legacyINIT_CLASS_ENTRYpath) and the SPL-rooted exception batch insrc/Exception/exceptions.c, which registers 23 classes in one call and doesn't want the per-class machinery.
Adding a class now looks like this:
- Write
Foo.stub.phpwithextends/implementsdeclared in PHP. - Define
php_scylladb_foo_new(and any other convention-named callbacks) inFoo.c. - List the stub in the module's
CMakeLists.txtunderphp_scylladb_generate_arginfoandphp_scylladb_generate_descriptor.
No edits to php_scylladb.c. No edits to include/php_scylladb_types.h. No order to remember.
what it isn't
- It's not portable to MSVC's default mode.
__attribute__((constructor))is a GCC/Clang feature. MSVC needs/Zc:__attribute__or the.CRT$XCUsection trick. The driver doesn't target Windows yet, so this hasn't bitten us. - It does not save you from cycles in the dep graph. The topo-sort detects them at MINIT and aborts; it does not break them. A circular
A → B → Ais your bug. - It is not a substitute for ZEND_BEGIN_MODULE_GLOBALS or the rest of MINIT. Functions, INI entries, constants, persistent resources — all still register the old way. The registry only owns class entries.
- The whole-archive link is non-negotiable. If you skip the
LINK_LIBRARY:WHOLE_ARCHIVEguard for any module, you get a silently-missing class, which is the exact failure mode the registry was designed to eliminate. Audit it on every new module. - Generated descriptors mean stub correctness matters. A typo in
extends \Cassandra\Fooin a.stub.phpnow becomes a MINIT-time failure with a clear message, not a compile error. That's a feature for me; if you prefer compile-time failure, this is a regression.
where to start
The full implementation is two files, ~170 lines of C: Registry.h and Registry.c. The generator is one PHP file at tools/gen_descriptor/gen_class_descriptor.php. All three are MIT-licensed; lift them into your own extension if the pattern fits.
I wrote the registry the week I deleted ZendCPP. Once the per-class descriptor was a generated file, the C++ that backed it stopped having a job.
Enjoyed this post?
Here are a few ways to stay connected or work together.
Found this useful? Share it.