Now Reading
When LIMIT 9 works however LIMIT 10 hangs

When LIMIT 9 works however LIMIT 10 hangs

2023-05-31 09:11:49

I received a Slack message from colleagues at a significant accomplice. They’d up to date their dev setting to assist WebSockets, in order that Neon’s serverless driver might be used there, however then they’d run right into a bizarre subject. 

The nub of it was this:

This hangs:

const { rows } = await sql`SELECT * FROM pokemon ORDER BY random() LIMIT 10`;

This works

const { rows } = await sql`SELECT * FROM pokemon LIMIT 10`;

Reproducibly, the question with out an ORDER BY ran high-quality each time. Reproducibly, the question with ORDER BY random() failed each time. Often, it hung. Sometimes, it got here again with a direct Error: Connection terminated unexpectedly. Both means: it by no means labored. 

I used to be intrigued by this. I used to be additionally eager to repair it.

Our serverless driver redirects the Postgres binary protocol — which is ordinarily carried over TCP — over a WebSocket connection, so that it can be used from serverless platforms that assist WebSockets however not TCP.

Our accomplice was utilizing undici to assist WebSockets in Node 18 and above, and ws to assist them in Node 17 and under. Node 17 and under labored, Node 18 and above didn’t. Plus, undici’s WebSockets assist was labelled as ‘experimental’. So that they’d recognized undici as a superb place to start out in search of the issue.

I created a minimal check case to run in Node, and saved it as wstest.mjs:



import { WebSocket } from 'undici';




const question = 'SELECT * FROM pokemon ORDER BY random() LIMIT 10';

import { neonConfig, Pool } from '@neondatabase/serverless';

neonConfig.webSocketConstructor = WebSocket;

const connectionString = 

  'postgres://jawj:passwd@proj-name.eu-central-1.aws.neon.tech/principal';

const pool = new Pool({ connectionString });
const consequence = await pool.question(question);
console.log(consequence);
pool.finish();

Of the 4 doable combos of uncommenting right here — (ws, undici) x (brief question, lengthy question) — the lengthy question despatched through undici was the one one which saved failing.

Wiresharking

To attempt to see the place issues have been getting caught, I opened Wireshark, entered a seize filter (host proj-name.eu-central-1.aws.neon.tech), ran my check script, and commenced eyeballing community packets.

However — oh. Our driver’s WebSockets are safe, over `wss:`, so all I noticed was the beginning of a TLS 1.3 negotiation after which numerous encrypted TLS ‘utility information’ packets.

What I wanted was to produce Wireshark with the TLS keys. Fortunately, it seems that Node makes this very easy. A quick internet search later, I ran:

node --tls-keylog=keylog.txt wstest.mjs

I then pointed Wireshark at keylog.txt (through Preferences … > Protocols > TLS > (Pre-)Grasp-Secret log filename), restarted the seize, and ran the script once more.

(1)

Post image

Progress: now we will see the decrypted information. Highlighted in inexperienced we have now a GET request adopted by a HTTP/1.1 101 Switching Protocols message, indicating that the WebSocket connection received accepted by the server. However following that, Wireshark doesn’t determine any WebSocket site visitors. Only a single substantive message is recorded, and Wireshark sees it solely as a decrypted (however nonetheless opaque) TLS transmission.

Be aware: a database password is proven in, or recoverable from, a number of screenshots on this web page. This password has since been modified!

How does this evaluate with what we get after we use the ws WebSocket package deal as an alternative?

(2)

Post image

The following substantive packet after the Switching Protocols message has the identical 222-byte size as earlier than. However this time it’s recognized as a WebSocket Binary message (we will additionally see that it’s masked, and that it constitutes the ultimate — and certainly solely — installment of the info being despatched on this specific transmission).

Let’s unmask the info through the use of the related tab on the backside of the bottom-right-hand pane, and see what’s being despatched.

(3)

Post image

There are three separate brief binary Postgres-protocol messages jammed collectively right here: a StartupMessage, a PasswordMessage, and Query message (we pipeline these for pace, as I’ve mentioned before). That is all high-quality and simply as anticipated.

So: what’s the distinction between screenshots 1 and a pair of, that are the undici and ws variations of the identical communication? Byte 1 is similar in each: it says that that is the ultimate (solely) packet of this binary transmission. Byte 2 is similar in each, too: it says that the transmission is masked, and that we must always deal with the subsequent two bytes as a 16-bit payload size.

So then bytes 3 and 4 must be that 16-bit payload size — and that is the place issues disintegrate. The ws message says we have now 126 bytes (00000000 01111110) of payload. That sounds believable. The undici message says we have now 25,888 bytes (01100101 00100000) of payload … in a 222 byte packet? Yeah: this one is fishy.

I attempted some extra requests utilizing undici, and the 2-byte payload size discipline was totally different each time. It appeared to include basically random numbers. Often, the supposed payload size was longer than the actual size of 126 bytes, after which the request hung, ready on the promised extra information that was by no means truly going to be despatched. Sometimes, the supposed payload size was shorter than the actual size, and in these circumstances I received that Connection terminated unexpectedly error I famous earlier.

WebSocket payload lengths

It was time to learn up on the WebSocket framing protocol and, particularly, how payload lengths are communicated. Within the identify of saving just a few bytes right here and there, it turns on the market are three different ways a WebSocket payload length can be encoded in a WebSocket body. Lengths as much as 125 bytes are expressed within the final 7 bits (after the masks bit) of the second byte of the body. For lengths between 126 and 65,535 bytes, these seven bits are set to 126 (1111110), and bytes 3 and 4 maintain the size. And for lengths of greater than 65,535 bytes, the seven bits are set to 127 (1111111), and the size is then discovered within the subsequent 8 bytes, 3 by way of 10.

The RFC illustrates this with the next determine:

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Prolonged payload size    |
|I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
|N|V|V|V|       |S|             |   (if payload len==126/127)   |
| |1|2|3|       |Ok|             |                               |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
|     Prolonged payload size continued, if payload len == 127  |
+ - - - - - - - - - - - - - - - +-------------------------------+
|                               |Masking-key, if MASK set to 1  |
+-------------------------------+-------------------------------+
| Masking-key (continued)       |          Payload Knowledge         |
+-------------------------------- - - - - - - - - - - - - - - - +
:                     Payload Knowledge continued ...                :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
:                     Payload Knowledge continued ...                |
+---------------------------------------------------------------+

A-ha! There’s a clue right here about why a shorter question may succeed the place an extended question fails. We already noticed that the longer question above, SELECT * FROM pokemon ORDER BY random() LIMIT 10, brings the entire payload to 126 bytes. As luck would have it, that is precisely the smallest worth that requires a 16-bit prolonged payload size. 

In contrast, the shorter question (the one with out an ORDER BY clause) will clearly end in a shorter payload size: one which thus matches within the 7 bits of the second byte, and thus workouts a unique code path. Actually, we will see that shortening our failing question by only one character — for instance, LIMIT 9 rather than LIMIT 10 — could be sufficient to vary the trail taken right here. 

See Also

Hacking on undici

With all this in thoughts, I started to poke round within the undici GitHub repo. It was mercifully easy to seek out the related code in lib/websocket/frame.js. I homed in on these few strains:

let payloadLength = bodyLength 
    
    if (bodyLength > maxUnsigned16Bit) {
      
      payloadLength = 127
    } else if (bodyLength > 125) {
      
      payloadLength = 126
    }

    const buffer = Buffer.allocUnsafe(bodyLength + offset)
    
    if (payloadLength === 126) {
      new DataView(buffer.buffer).setUint16(2, bodyLength)
    }

These strains are how the 16-bit size worth will get written: by making a throwaway DataView into the buffer that holds the WebSocket body, and calling its setUint16() technique. It’s maybe just a little sudden that the worth isn’t written utilizing the buffer’s personal writeUInt16BE() technique. However the logic appears principally sound, proper? Proper?

At this level, I feel I made a cup of tea.

I got here again to the laptop computer, recaffeinated, and it hit me! I remembered I’d had an identical am-I-going-mad? run-in with JavaScript’s DataView whereas engaged on subtls. Knowledge wasn’t being written the place it was presupposed to be written, and I’d been at a loss to know why. After a couple of minutes tearing my hair out, I’d found out the repair, and as a part of a bigger overall commit I’d added two extra arguments to new DataView():

Post image

Buffers, ArrayBuffers and subarrays

How did that repair work? Properly, any specific Buffer (or typed array) in Node is backed by an ArrayBuffer. That may both be a devoted ArrayBuffer, which has the identical dimension because the Buffer itself, or it may be a slice of a bigger ArrayBuffer. 

A typical method to create the latter type of Buffer — one which’s backed by some half of a bigger ArrayBuffer — is to name the subarray() technique of an current buffer. No reminiscence will get copied right here. What’s returned by subarray() is a brand new Buffer object backed by a part of the identical block of reminiscence that backs the buffer it was referred to as on. That is very helpful for optimizing pace and reminiscence use.

After all, when buffers share the identical reminiscence, we have now to keep in mind that writing to both of them might make modifications to the opposite. And, in fact, after we do something with a bigger ArrayBuffer that underlies a smaller Buffer — resembling passing it to a brand new DataView, as undici did — we completely should be sure we reference the suitable a part of it. (To be sincere, it’s a fairly nasty footgun that new DataView() defaults byteOffset to zero and byteLength to the ArrayBuffer’s size, somewhat than insisting that these arguments are offered explicitly).

Lengthy story brief: including the related byteLength and byteOffset to the decision to new DataView() instantly fastened undici’s WebSockets subject. I filed a PR and logged off for the day feeling fairly good about it.

Wait, what?

I used to be awake for an hour or two in the midst of the evening, ideas wandering, when it occurred to me that I didn’t totally perceive why this repair had labored right here. Certain, it couldn’t do any hurt so as to add in byteLength and byteOffset, however why ought to they be wanted? In spite of everything, the buffer whose ArrayBuffer is handed to the DataView was allotted just a few strains earlier than. No one seems to have referred to as subarray() right here.

Looking at the docs the next morning, I realized one thing new. Buffer.allocUnsafe() isn’t unsafe solely as a result of it omits to zero out the bytes of the buffer it arms over, which was what I’d thought I knew. Buffer.allocUnsafe() additionally reserves the proper to allocate small buffers as sub-arrays of a worldwide pool it retains mendacity round for the aim. (This pool on my set up of Node is set to 8192 bytes, and buffers as much as half that dimension could also be allotted from it).

The upshot is that the buffer allotted for a modestly-sized WebSocket body will very seemingly have a non-zero byteOffset into its backing ArrayBuffer, as a result of its backing ArrayBuffer is that world pool. Assuming a zero byteOffset for it’s subsequently Very Unhealthy in two alternative ways:

  1. The 16-bit payload size that was presupposed to be written as bytes 3 and 4 of the WebSocket body just isn’t written there. So, as an alternative, what finally ends up because the alleged payload size, in bytes 3 and 4, is no matter arbitrary information was there already.
  2. That 16-bit payload size is definitely written someplace else, as bytes 3 and 4 of Buffer.allocUnsafe()’s world pool. These bytes might be a part of one other energetic buffer, and thus we additionally stand a superb probability of corrupting information elsewhere.

A contented ending

I now knew why the repair labored.

The undici staff requested me to additionally add a unit check for the change, which I did. On the identical time, I took the chance to simplify the code through the use of writeUInt16BE() as an alternative choice to a DataView. My pull request was merged, and has now been launched in model 5.22.1 of undici.

Efficiency: a footnote

A follow-up pull request by an undici dev explains that the DataView strategy was initially taken for causes of efficiency. I used to be shocked that making a throw-away object for this single write could be the quickest method to do issues, so I did just a little benchmarking of some options:


new DataView(b.buffer).setUint16(2, worth)


new DataView(b.buffer, b.byteOffset, b.byteLength).setUint16(2, worth)


b.writeUInt16BE(worth, 2);


b[2] = worth >>> 8;
b[3] = worth & 0xff;

The bit-twiddling choice is the quickest of those, at 90ms for 1,000,000 iterations on my Intel MacBook. However the writeUInt16BE() strategy is simply marginally slower at 100ms, simpler on the mind, and greater than twice as quick as utilizing DataView. To be sincere, these variations (over 1,000,000 iterations) are so small that I’d go for readability each time. In any case, the usage of DataView right here was an optimization that, although supposed to hurry issues up, truly slowed issues down … and broke them completely within the course of.

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top