Now Reading
Including weak references to CHICKEN

Including weak references to CHICKEN

2023-06-28 03:44:02

Just lately, a person was asking on IRC why CHICKEN does not give correct line numbers for error name traces within the interpreter. That is certainly quite odd, as a result of the compiler provides numbered traces simply wonderful.

After a quick dialogue, we figured that this was in all probability as a result of line numbers are saved in a “database” (hash desk) which maps the supply expression to a line quantity. As a result of you possibly can name preserve calling (eval) with recent enter, the variety of expressions evaluated is probably infinite.

This may result in unbounded progress of the road quantity database, finally consuming up all accessible reminiscence.

We would like to repair this. An amazing answer for this drawback could be weak references. A weak reference is a reference to a price that does not trigger the rubbish collector to carry on to that worth. If different issues nonetheless consult with the worth, it’s preserved and the weak reference is maintained. If solely weak references consult with a price, it could be collected.

The road quantity database might then include such weak references to supply expressions. If an expression is not held onto, it may be collected and the road quantity finally eliminated. This may flip the database from an everyday hash desk right into a weak hash desk.

This requires hooking into the rubbish collector in order that it is aware of about these references and might drop them. Since our collector is a copying collector, addresses of objects change over time, and a weak reference must be up to date if one thing remains to be hanging onto it, with out itself inflicting the weakly held object to stay round.

A quick recap of garbage collection

To elucidate how we modified our rubbish collector, I’ll begin with a fast high-level recap of how the rubbish assortment works. I am going to clarify simply sufficient to cowl what is required to grasp weak references. For a extra in-depth have a look at CHICKEN’s rubbish assortment, see my post on the GC.

First, we have now to grasp that CHICKEN shops all its values in a machine phrase. It distinguishes between “rapid” values and non-immediate or “block” values. Speedy values could be both fixnums (bit 0 is about) or different “small” values (bit 1 is about). Non-immediates are recognised as a result of the decrease two bits are zero, which could be very handy, as a pointer to a word-aligned tackle occurs to have the decrease two bits cleared as properly!

In different phrases, non-immediate values are merely C pointers, whereas the rapid values are encoded lengthy integers. So, non-immediates are allotted in reminiscence and represented by a pointer to them. On the pointer’s tackle we discover a header which encodes what precisely is saved there, and (optionally) the information, the place we retailer the contents of compound objects.

  typedef struct
  {
    C_word header;
    C_word information[];    
  } C_SCHEME_BLOCK;

In typical Scheme objects, the information consists of slots. For instance, the automotive and the cdr of a pair are slots, the weather of a vector are slots, and many others. Every slot comprises a Scheme worth. In a graphical illustration, a block object seems to be like this:

The info illustration shall be vital to grasp how we implement weak pairs, and to comply with the rubbish assortment course of.

If you need extra particulars, see my post about data representation in CHICKEN.

A tale of two spaces

CHICKEN divides the heap up in two halves. Whereas this system is operating, just one half is in use. When this half fills up, we begin the rubbish assortment course of. We hint all the info that is nonetheless thought of “dwell” (aka GC “roots”) and duplicate it over to the opposite half.

Let us take a look at an instance:

  (cons 5 '())    (outline x (cons (vector 42) '())) 

The half that was full is named the fromspace and the half that begins out empty is named the tospace. Initially, the tospace is empty, and the info is simply within the fromspace:

Then, we transfer the objects, one by one, beginning on the roots. So within the instance, the pair that we named “x” will get moved first to tospace, whereas its slots nonetheless level into fromspace:

The adjustments are in yellow. After tracing the roots, we hint all of the contents of the objects in fromspace and duplicate over the pointed-to objects:

Lastly, we’re carried out. Any remaining objects in fromspace are rubbish and could be ignored, successfully “clearing” fromspace, and we flip the 2 areas in order that fromspace turns into tospace and vice versa:

Onward! I mean, forward!

This was a quite simple case. A extra advanced case is when there are a number of objects with mutual references. We should by some means preserve observe of which objects obtained moved the place, in order that we do not copy the identical object greater than as soon as.

One other instance:

  (outline shr (vector 42))
  (outline x (cons shr 1)
  (outline y (cons shr 2))

This may seem like the next when visualized:

As I’ve defined, roots are copied first, so as an instance shr will get copied first:

As you possibly can see, this introduces one thing new. There is a forwarding pointer saved on the tackle the place we initially discovered the header of the pair often known as shr. That is carried out in order that we do not have to traverse all of the objects pointing to shr (which might even be cyclic!). As a substitute, each time we discover an object that holds an tackle the place there now’s a forwarding pointer, we derefence the pointer and alter the item to level to the goal tackle.

So, if x is copied subsequent, we get the next image:

We really *at all times* go away behind a forwarding pointer when an object is copied, as a result of the GC doesn’t “know” whether or not something references the item. I skipped that within the preliminary examples to maintain it easy. And at last, y could be copied:

Now you possibly can see, the forwarding pointers are nonetheless there in fromspace, however nothing references them anymore. The GC is now carried out. After all, fromspace and tospace shall be flipped (however I am not displaying that right here).

How weak pairs work in the GC

Let’s make just a few adjustments to our instance:

  (outline zero (weak-cons (vector 0) 0))    (weak-cons (vector 9) 9)   (outline shr (vector 42))
  (outline x (cons shr 1)
  (outline y (weak-cons shr 2))   (set! shr #f)   (outline z (weak-cons (vector #f) 3)) 

Now, y is modified to be a weak pair holding shr in its automotive, whereas x remains to be a standard pair with the identical object in its automotive. The shr variable can be cleared, in order that solely x holds onto shr‘s outdated worth strongly. We have added just a few extra weak pairs whereas we’re at it.

This weak-cons process doesn’t exist in CHICKEN 5.3.0; that is what we’ll be including. It creates a pair whose automotive area is a weak reference. The cdr area is an everyday reference. The rationale for having the cdr be an everyday reference is that this permits us to construct up lists of things which can be collected. The “backbone” of the checklist will stay intact, however the checklist entries shall be cleared (see under).

Let’s check out our new preliminary scenario:

The GC will begin with the dwell objects, as traditional. Let’s begin on the high, copying zero:

Since zero is a weak pair, we will not traverse the automotive slot, maintaining the vector it factors to in fromspace. That is intentional, because the GC treats weak pairs in a different way; solely the cdr slot will get scanned, whereas the automotive will get skipped and left as a “dangling” pointer into fromspace.

We’ll repair that later, as we do not wish to preserve this example after the rubbish assortment course of is completed.

Usually we would choose up shr subsequent, however that isn’t a root anymore. Let’s proceed with x as a substitute:

Then, as we scan over the contents of x, we encounter what was shr and transfer it, updating the pointer in x:

Subsequent, we transfer y:

You may discover that as a result of it is a weak pair, its automotive slot doesn’t get up to date to the brand new location of what was shr, although that vector has already been copied. As talked about earlier than, we’ll repair this up later.

Lastly, z will get moved:

I am certain you’ve got observed that now we’re left with a giant spaghetti bowl of pointers, a few of that are nonetheless pointing into fromspace. So let’s examine how we will repair this mess.

Fixing up the dangling weak pairs, naively

It simply so occurs that CHICKEN internally already has help for weak pairs. These are pairs whose automotive area holds objects that could be collected. These are used solely within the image desk, in order that symbols aren’t maintained indefinitely. That is vital as a result of it stops image desk stuffing assaults. These are a type of Denial of Service (DoS) by forcing the system to eat up all reminiscence.

On this present implementation, the one weak pairs in existence are these within the image desk’s hash bucket chains. So, the answer there’s quite simple: traverse all of the image desk buckets and replace weak pointers.

So we traverse the slots like throughout regular GC’s mark operation, however with two exceptions:

  • Once we mark the slot, we do not trigger the contents to be copied over into tospace, however solely chase any forwarding pointers.
  • Once we encounter a pointer into fromspace, we exchange the slot with a particular “damaged weak pair” marker.

That may yield this ultimate image:

See Also

As you possibly can see, all the weak pairs have their automotive slots up to date, even those that we might contemplate as rubbish, like the 2 on the high. These get the particular worth #!bwp, or “damaged weak pointer” to point that the worth initially saved there isn’t any longer dwell.

Smarter fixing up of weak pairs

Traversing all of the weak pairs is ok after we know each single one within the system. When solely hash desk buckets within the image desk could also be weak, that is the case. However now we wish to expose weak pairs to the person, what can we do?

The best answer could be so as to add a desk or checklist of all of the weak pairs as they’re constructed, or because the GC encounters them. This has just a few issues:

  • It’s wasteful of reminiscence, as the additional pointers on this desk could be redundant.
  • If carried out throughout rubbish assortment, we’ll must dynamically allocate throughout a GC, when reminiscence is already tight (and it is gradual).
  • We would find yourself traversing probably masses and a great deal of useless weak pairs that may be themselves collected, as within the instance above.

Alternatively, you possibly can traverse the complete heap once more after GC and mark the weak pairs. In truth, that is what Chez Scheme does. Nevertheless, Chez has separate heaps for various kinds of objects (often known as a BIBOP scheme). This has the benefit of traversing solely the dwell weak pairs, as a substitute of each single dwell object.

In CHICKEN, we would be traversing each single object, as we retailer every thing on a single heap. If the heap is massive, this might be wasteful. However at the least we would not must traverse useless weak pairs!

When wanting into this drawback, I made a decision to review the Schemes which have weak pointers. These have been Chez Scheme, Chibi Scheme and MIT Scheme. Of these, solely MIT Scheme has a Cheney-style copying rubbish collector with a single heap like CHICKEN does. The sources to MIT Scheme are well-documented, so I rapidly discovered how they do it, and boy is it intelligent!

MIT Scheme’s answer:

  • Traverses solely the still-live weak pairs,
  • introduces solely a single phrase of reminiscence overhead,
  • and is brilliantly easy when you get the way it works.

Garbage collecting weak pairs, MIT style

Let’s take a more in-depth have a look at that finish state of that final GC run. Do not forget that a forwarding pointer overwrites the unique header of the item that used to dwell there. However the reminiscence of the slots is nonetheless sitting there, allotted however unused!

The good thought in MIT Scheme is that we will “recycle” that unused reminiscence and retailer a linked checklist of still-live weak pairs in that house. The additional phrase of overhead I discussed earlier than acts as the top of that checklist. Let us take a look at how that appears throughout the GC, by replying that final run, however with MIT Scheme’s algorithm for monitoring weak pairs:

The preliminary state is identical, besides we have now the extra weak pair chain head. The GC once more begins on the high, copying zero, identical as earlier than, however with one addition. When the forwarding pointer will get created, we discover that it is a weak pair. This causes the weak pair chain head to get modified so it factors to the brand new forwarding pointer. The unique worth of the chain head (the empty checklist marker) will get saved the place the outdated weak pair’s automotive slot was.

Once more, we proceed with x, which is an everyday pair, so nothing attention-grabbing occurs to the house behind the forwarding pointer:

Then, as we scan over the contents of x, we once more encounter what was shr and transfer it, updating the pointer in x. The forwarding pointer for shr can be not put within the chain as a result of shr isn’t a weak pair:

Subsequent, we transfer y. As a result of it’s a weak pair, we once more hyperlink up its forwarding pointer into the chain that we’re build up. This implies the pointer within the weak chain head (which pointed to zero‘s authentic tackle) will get put within the outdated automotive area of y, and the top itself now factors to y‘s forwarding pointer:

Lastly, z will get moved. Since it’s also a weak pair, when making the forwarding pointer we additionally hyperlink it up into the chain:

Lastly, we now solely must traverse the weak pairs by following the path of pointers beginning on the head of the dwell weak pair chain. For every of the forwarding pointers within the chain:

  • Observe the forwarding pointer to its corresponding dwell weak pair,
  • take that pair’s automotive,
  • examine if the automotive refers to a forwarding pointer. If it does:
  • replace the automotive to the forwarding pointer’s vacation spot.

I am not going to do that one after the other because it’s fairly apparent. I’ll present you the results of this stroll:

You may see that we have solely up to date three out of 4 weak pairs – precisely the variety of weak pairs that obtained copied throughout this GC and are nonetheless thought of dwell. As a result of solely weak pairs that exist in tospace have to be up to date, we’re doing absolutely the minimal work crucial.

The good factor about that is, that even if you happen to produce tons of rubbish weak pairs, they haven’t any extra affect on GC efficiency. Solely these weak pairs that survive rubbish assortment shall be scanned and up to date.

When implementing this enchancment I observed that our efficiency on benchmarks improved just a little bit, although they do not use weak pairs straight. This was all on account of solely scanning copied weak pairs and never having to scan the whole image desk on each GC.

With the help for user-facing weak pairs in place, supporting line numbers within the interpreter was relatively easy. Which means debugging your code shall be simpler beginning with the upcoming 5.4 launch of CHICKEN!

Further reading

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top