Introduction to Loro’s Wealthy Textual content CRDT – Loro
Introduction to Loro’s Wealthy Textual content CRDT
This text presents the wealthy textual content CRDT algorithm applied in Loro, complying with Peritext (opens in a new tab)‘s standards for seamless wealthy textual content collaboration. Moreover, it may be constructed on high of any Record CRDT algorithms and switch them into wealthy textual content CRDTs.
If CRDTs are new to you, our article What are CRDTs gives a short introduction.
Background
Loro is based on the Replayable Event Graph (REG) algorithm proposed by Seph Light, however this algorithm can not combine the unique model of Peritext. This motivates us to create a brand new wealthy textual content algorithm. It’s unbiased of the particular Record CRDTs, thus working properly with REG, and is developed on high of them to ascertain a wealthy textual content CRDT.
Earlier than diving into the algorithm of Loro’s wealthy textual content CRDT, I might wish to briefly introduce REG and Peritext, and why Peritext can’t be used on REG.
Recap on Record CRDTs
Recap on Record CRDTs
Unlike OT, most List-oriented CRDTs assign a unique ID to each item or character, often corresponding to the operation ID of its insertion. With unique IDs for each character, we can reliably reference a character or position through its ID.
The unique ID eliminates concerns about consistent position descriptions during synchronization. For instance, deletions are straightforward by specifying the deleted character’s ID, and insertions are described using the IDs of adjacent characters. In cases of concurrent insertions at the same location, List CRDT algorithms resolve the consistency issues.
However, a notable limitation of List CRDTs is the use of ‘tombstones’. Upon deletion of a character, it is not fully removed but replaced with a tombstone, maintaining the ID’s position. Depending on the algorithm, this tombstone may be removed once all participating nodes acknowledge the deletion. However, it can be challenging to determine if all peers have received the corresponding deletion operation. This information often means additional overhead for many CRDTs. Thus, the simplest solution is not to perform any tombstone collection.
Brief Introduction to Replayable Event Graph
Seph Gentle (opens in a new tab), the writer of Replayable Occasion Graph, is presently engaged on the paper, and it is extremely anticipated.
REG is a novel CRDT algorithm that mixes the strengths of each OT and CRDTs. It has the distributed nature of CRDT that allows P2P collaboration and information possession. Furthermore, it achieves minimal overhead in eventualities devoid of concurrent edits, just like OT.
Whether or not in real-time collaboration or multi-end synchronization, a directed acyclic graph (DAG) kinds over the historical past of those parallel edits, just like Git’s historical past. The REG algorithm data the historical past of person edits on the DAG.
In contrast to typical CRDTs, REG can file simply the unique description of operations, not the metadata of CRDTs. As an illustration, in textual content modifying eventualities, the RGA algorithm (opens in a new tab) wants the op ID and Lamport timestamp (opens in a new tab) of the character to the left to find out the insertion level. Yjs (opens in a new tab)/Fugue, nevertheless, requires the op ID of each the left and proper characters at insertion. In distinction, REG simplifies this by solely recording the index on the time of insertion. Loro, which makes use of Fugue (opens in a new tab) upon REG, inherits these benefits.
An index shouldn’t be a secure place descriptor, because the index of an operation will be affected by different operations. For instance, if you happen to spotlight content material from index=x
to index=y
, and concurrently somebody inserts n characters at index=n
the place n<x
, then your highlighted vary ought to shift to cowl from x+n
to y+n
. However REG can decide the precise place of this index by replaying historical past. Thus, it might probably reconstruct the corresponding CRDT construction by replaying historical past.
Reconstructing historical past might sound time-consuming, however REG can backtrack just some. When merging updates from distant sources, it solely must replay operations parallel to the distant replace, reconstructing the native CRDTs to calculate the diff after making use of distant operations to the present doc.
The REG algorithm excels with its quick native replace speeds and eradicate issues about tombstone assortment in CRDTs.As an illustration, if an operation has been synchronized throughout all endpoints, no new operations will happen concurrently with it, permitting it to be safely faraway from the historical past.
What’s Fugue
Fugue is a brand new CRDT textual content algorithm, introduced in The Art of the Fugue: Minimizing Interleaving in Collaborative Text Editing (opens in a new tab) by Matthew Weidner (opens in a new tab) et al., properly solves the interleaving drawback.
The interleaving drawback was proposed within the paper Interleaving anomalies in collaborative text editors (opens in a new tab) by Martin Kleppmann et al.
An instance of interleaving:
- A sort “Hi there ” from left to proper/proper to left
- B sort “Hello ” from left to proper/proper to left
- The anticipated outcome: “Hi there Hello ” or “Hello Hi there “
- The interleaving outcome could seem like: “HHeil lo”
- This occurs when typing from proper to left in RGA.
An instance of an interleaving anomaly when utilizing fractional indexing (opens in a new tab) CRDT on textual content content material.
Supply: **Martin Kleppmann, Victor B. F. Gomes, Dominic P. Mulligan, and Alastair R. Beresford. 2019. Interleaving anomalies in collaborative textual content editors. https://doi.org/10.1145/3301419.3323972 (opens in a new tab)
The Fugue paper (opens in a new tab) summarizes the present state of the interleaving issues within the desk.
Souece: Weidner, M., Light, J., & Kleppmann, M. (2023). The Artwork of the Fugue: Minimizing Interleaving in Collaborative Textual content Enhancing. ArXiv. /abs/2305.00583
The interleaving drawback typically are unsolvable when there are greater than 2 websites. See Fugue (opens in a new tab) paper Appendix B, Proof of Theorem 5 for detailed clarification.
The case the place the interleaving drawback is unsolvable
Supply: Weidner, M., Light, J., & Kleppmann, M. (2023). The Artwork of the Fugue: Minimizing Interleaving in Collaborative Textual content Enhancing. ArXiv. /abs/2305.00583
Nevertheless, we are able to nonetheless decrease the prospect of interleaving. Fugue introduces the idea of maximal non-interleaving and solves it with a chic algorithm that’s simple to optimize. The definition of maximal non-interleaving makes a variety of sense to me and leaves little room for ambiguity. I will not reiterate the definition right here. However the fundamental concept is first to unravel ahead interleaving by leftOrigin. If there’s nonetheless ambiguity, then clear up the backward interleaving by rightOrigin. (The leftOrigin and rightOrigin consult with the ids of the unique neighbors when the character is inserted, identical to Yjs)
Transient Introduction to Peritext
Peritext (opens in a new tab) was proposed by Geoffrey Litt et al. It is the primary paper to debate wealthy textual content CRDTs. It may well merge concurrent edits in wealthy textual content format whereas preserving users’ intent as much as possible (opens in a new tab). Its main focus is merging the codecs and annotations of wealthy textual content content material, comparable to daring, italic, and feedback. It was applied in Automerge (opens in a new tab) and crdt-richtext (opens in a new tab).
💡 The precise definition of person intent within the context of concurrent wealthy textual content modifying cannot be clearly defined in just a few phrases. It is best understood by explicit examples.
Peritext is designed to unravel a few important challenges:
Firstly, it addresses the anticipated issues arising from conflicting model edits. As an illustration, contemplate a textual content instance, “The fast fox jumped.” If Consumer A highlights “The fast” in daring and Consumer B highlights “fast fox jumped,” the best merge ought to lead to the complete sentence, “The fast fox jumped,” being daring. Nevertheless, current algorithms may not meet this expectation, leading to both “The fast fox” or “The” and “jumped” being daring as a substitute.
Moreover, Peritext manages conflicts between model and textual content edits. In the identical instance, if Consumer A highlights “The fast” in daring, however Consumer B adjustments the textual content to “The quick fox jumped,” the best merge ought to lead to “The quick” being daring.
What’s extra, Peritext takes into consideration completely different expectations for increasing kinds. For instance, if you happen to sort after a daring textual content, you’ll sometimes need the brand new textual content to proceed being daring. Nevertheless, if you happen to’re typing after a hyperlink or a remark, you possible would not need the brand new enter to change into a part of the hyperlink or remark.
Why Unique Peritext Cannot Be Instantly Used with REG
On the one hand, Peritext’s algorithm expresses style ranges through character OpIDs (opens in a new tab). With out replaying historical past, CRDTs primarily based on REG can not decide the particular positions corresponding to those OpIDs.
Alternatively, it is not possible to mannequin Peritext on REG by replaying. It’s because REG’s “native backtracking suffices” depends on the algorithm satisfying “the identical operation will produce the identical impact, whatever the present state,” which Peritext doesn’t adhere to. For instance, when inserting the character “x” at place p
, whether or not “x” is daring depends upon “whether or not p
is surrounded by daring” and “whether the tombstones at p
contain boundaries of bold and other styles (opens in a new tab).”
Loro’s Wealthy Textual content CRDT
Algorithm
Loro implements rich text using special control characters called ‘style anchors’. Each matching pair of start anchor and end anchor contains the following information:
- The op ID of the style operation
- The style’s key-value pair
- The style’s Lamport timestamp (opens in a new tab)
- Type enlargement habits: Determines whether or not newly inserted textual content earlier than or after the model boundaries ought to inherit the model.
The strategy to find out a personality’s model is as follows:
- Discover all model anchor pairs that embrace the character, the place every pair is created by the identical model operation
- Mixture pairs based on the important thing. There could also be a number of model pairs with the identical key however completely different values. In such instances, the worth with the best Lamport timestamp is chosen (if Lamport timestamps are equal, then use the peer ID to interrupt the tie)
Opposite to Yjs’s method of using control characters (opens in a new tab) for wealthy textual content, our algorithm pairs begin and finish anchors once they originate from the identical model operation. This strategy precisely handles the next eventualities:
These particular management characters should not uncovered to the person; every management character is successfully of zero size from the person’s perspective. Our information construction helps varied strategies of measuring textual content size for indexing textual content content material. Apart from Unicode, UTF-16, and UTF-8, we additionally measure our wealthy textual content size in Entity size
. It treats every model anchor as an entity with a size of 1 and measures plain textual content in Unicode size.
Native Habits
Multiple valid insertion points can exist when users insert text at a specific Unicode index. It occurs due to style anchors, which are zero-length elements from the user’s perspective.
For example, in the case of <b>Hello</b> world
, when a user inserts content at Unicode index=5
, they face the choice of inserting to the left or right of </b>
. If the user sets the expand behavior of bold to expand forward, the new character will be inserted to the left of </b>
, making the inserted text bold as well.
When users delete text, Loro uses an additional mapping layer to avoid deleting style anchors within the text range.
To model the deletion of a style, a new style anchor pair with a null value is added.
We can implement the following optimizations to remove redundant style anchors:
- The style anchors that include no text can be removed.
- When styles completely negate each other, like a span of bold is canceled by a span of unbold, we can remove their style anchors.
All these behaviors happen locally, and the algorithm is independent of the specific List CRDT.
Behavior When Inserting Text at Style Boundaries
Most modern rich text editors (Google Doc, Microsoft Word, Notion) behave as follows: when new text is entered right after bold text, the new text should inherit the bold style; when entered after a hyperlink, the new content should not inherit the hyperlink style. Different styles have varying preferences for text insertion positions, leading to potential conflicts. This is reflected in the degree of freedom we have when inserting new text.
Users interact with rich text based on text-based indexes, like the Unicode index. Since style anchors have a Unicode Length of 0, a Unicode index with n style anchors presents n + 1 potential insertion positions.
We select the insertion position based on the following rules:
- Insertions occur before a start anchor of a style that should not expand backward.
- Insertions occur before style anchors that signify the end of bold-like marks (expand = “after” or expand = “both”).
- Insertions occur after style anchors that signify the end of link-like marks (expand = “none” or expand = “before”).
Rule 1 should be prioritized over rules 2 and 3 to prevent the accidental creation of a new style (opens in a new tab).
The present technique first scans ahead to seek out the final place satisfying guidelines 1 and a couple of.
Then, it scans backward to seek out the primary place satisfying rule 3.
Merging Distant Updates
Loro treats style anchors as a special element and handles them using the same List CRDT for resolving concurrent conflicts. The logic related to rich text is independent of the particular List CRDT. Therefore, this algorithm can rely on any List CRDT algorithm for merging remote operations. Loro utilizes the Fugue (opens in a new tab) Record CRDT algorithm.
When new model anchors are inserted by distant updates, new kinds are added; if outdated model anchors are deleted, the corresponding outdated kinds are eliminated.
Robust Eventual Consistency
The internal state of this algorithm consists of a list of elements, each either a text segment or a style anchor. The rich text document is derived from this internal state.
The internal state achieves strong eventual consistency through the upstream List CRDT.
Identical internal states result in identical rich text documents. Hence, the same set of updates will produce the same rich text documents, evidencing the strong eventual consistency of this algorithm.
Criteria in Peritext
The Peritext paper (opens in a new tab) specifies the intent-preserving merge habits for wealthy textual content inline format. Loro’s wealthy textual content algorithm efficiently passes all take a look at instances outlined therein.
1. Concurrent Formatting and Insertion
Loro easily supports this case by treating style anchors as special elements alongside text.
2. Overlapping Formatting
This case has been analyzed earlier. Since our style anchors contain style op ID information, we know there are two bold segments: one from 0 to 5 and another from 3 to 11, allowing us to merge them.
Multiple style types are easily supported.
Like Peritext, we model unbolding by adding a new style with the key bold
and the value null
.
The final value of each style key on each character is determined by the style with the greatest Lamport (opens in a new tab) timestamp that features the character. Thus, it simply helps this case.
3. Textual content Insertion at Span Boundaries
Insertion right after a bold style should result in the newly inserted text also being bold.
However, insertion right after a link style should result in the newly inserted text not having the hyperlink style.
4. Styles that Support Overlapping
The problem of overlapping styles is related to how we represent them.
We represent the rich text using Quill’s Delta (opens in a new tab) format.
[
{ insert: 'Gandalf', attributes: { bold: true } },
{ insert: ' the ' },
{ insert: 'Grey', attributes: { color: '#cccccc' } }
]
Nevertheless, it can not deal with instances with a number of values assigned to the identical key. So, it is a headache to deal with the kinds that help overlapping.
For instance, within the above case, the textual content “fox” is commented on by each Alice and Bob. We will not characterize it with Quill’s Delta format immediately.
So the doable workaround contains:
Flip the attribute worth into an inventory
[
{ insert: "The ", attributes: { comment: [{ ...commentA }] } },
{
insert: "fox",
attributes: { remark: [{ ...commentA }, { ...commentB }] },
},
{ insert: " jumped", attributes: { remark: [{ ...commentB }] } },
]
Use op ID that creates the op as the important thing of the attribute
[
{ insert: "The ", attributes: { "id:0@A": { key: "comment", ...commentA } } },
{
insert: "fox",
attributes: {
"id:0@A": { key: "comment", ...commentA },
"id:0@B": { key: "comment", ...commentB },
},
},
{
insert: " jumped",
attributes: { "id:0@B": { key: "comment", ...commentA } },
},
]
However each require particular behaviors for each CRDT lib and for software code, that are painful to work with.
Lastly, we discovered that the optimum strategy to characterize an overlappable model was to make use of <key>:
as a prefix and permit customers to assign a singular suffix to create a definite model key.
This technique simplifies the CRDTs library code, because it does not require dealing with particular instances.
It successfully addresses eventualities the place a number of feedback overlap and can be user-friendly for software coding.
[
{ insert: "The ", attributes: { "comment:alice": "Hi" } },
{
insert: "fox",
attributes: { "comment:alice": "Hi", "comment:bob": "Jump" },
},
{ insert: " jumped", attributes: { "comment:bob": "Jump" } },
]
Following is the instance code in Loro:
const doc = new Loro();
doc.configTextStyle({
remark: { broaden: "none", }
})
const textual content = doc.getText("textual content");
textual content.insert(0, "The fox jumped.");
textual content.mark({ begin: 0, finish: 7 }, "remark:alice", "Hello");
textual content.mark({ begin: 4, finish: 14 }, "remark:bob", "Leap");
anticipate(textual content.toDelta()).toStrictEqual([
{
insert: "The ", attributes: { "comment:alice": "Hi" },
},
{
insert: "fox",
attributes: {
"comment:alice": "Hi",
"comment:bob": "Jump"
},
},
{
insert: " jumped", attributes: { "comment:bob": "Jump" },
},
{
insert: ".",
}
])
Implementation of Loro’s Wealthy Textual content Algorithm
The following is an overview of Loro’s implementation as of January, 2024.
Architecture of Loro
In line with the properties of Replayable Event Graph, Loro uses OpLog
and DocState
as the internal state.
OpLog
is dedicated to recording history, while DocState
only records the current document state and does not include historical operation information. When applying updates from remote sources, Loro uses the relevant operations from OpLog
and computes the diff through a DiffCalculator
. This diff is then applied to DocState
. This architecture also makes time travel easier to implement.
For more details, see the documentation on DocState and OpLog (opens in a new tab).
Implementation of Loro’s Wealthy Textual content CRDT
For rich text, Loro reuses the same DiffCalculator
as Loro List, based on the Fugue (opens in a new tab) algorithm. Consequently, the first logic associated to wealthy textual content is concentrated in DocState
. This contains expressing kinds, inserting new characters, and representing a number of index codecs.
Within the illustration of wealthy textual content state, we distinguish between the info construction ContentTree
, which expresses the textual content (together with model anchors), and StyleRangeMap
, which expresses kinds.
Each constructions are constructed on B+Bushes.
ContentTree
is accountable for environment friendly textual content discovering, insertion, and deletion. It may well index particular insertion/deletion positions utilizing Unicode/UTF-8/UTF-16/Entity index. It doesn’t retailer what particular model every textual content phase ought to have.
We constructed the next B+Tree construction primarily based on our generic-btree library (opens in a new tab) to specific textual content in reminiscence:
- Every inside node within the B+Tree shops the Unicode char size, UTF-16 size, UTF-8 size, and Entity size of its subtree. The Entity size considers the size of fashion anchors as 1, in any other case 0.
- The leaf nodes of the B+Tree are textual content or model anchors.
StyleRangeMap
is accountable for environment friendly updating/querying of fashion ranges.
Within the StyleRangeMap
B+Tree expressing kinds:
- Every inside node shops the
entity size
of its subtree. - Every leaf node shops the gathering of fashion info for the corresponding vary and its `entity size.
Separating the textual content ContentTree
and magnificence StyleRangeMap
into two constructions goals for higher efficiency optimization. On wealthy textual content, model info is usually not plentiful and tends to have good continuity, comparable to a number of paragraphs having the identical format, which will be expressed with a single leaf node. Nevertheless, our construction for storing textual content is unsuitable for leaf nodes with giant content material, as conversion time between completely different encoding codecs would change into excessively lengthy.
When a person inserts a brand new character at Unicode index
= i, the next happens:
- Discover the place at
Unicode index
= i inContentTree
. - Examine if there are any adjoining model anchors at this place. If not, immediately insert.
- If there are, resolve whether or not to insert to the left or proper of the corresponding model anchor primarily based on its sort and properties. If there are a number of such model anchors, insert them based on the earlier part on “Behavior When Inserting Text at Style Boundaries”.
Testing
We have written tests for the criteria proposed by Peritext and passed all of them.
To ensure the correctness of our CRDTs, we have added numerous fuzzing tests to simulate different collaborative behaviors, synchronization behaviors, and time-travel behaviors. These tests check for the strong eventual consistency and the correctness of internal invariants. We run these fuzzing tests continuously for several days after every critical modification to avoid oversights.
How to Use
Before using the Loro’s rich text module, it is necessary to define the configuration for rich text styles, specifying the expand behavior for different keys and whether overlap is allowed.
Here is an example of using Loro’s rich text in JavaScript:
const doc = new Loro();
doc.configTextStyle({
bold: {
expand: "after",
},
comment: {
expand: "none",
overlap: true
},
link: {
expand: "none"
}
});
const text = doc.getText("text");
text.insert(0, "Hello world!");
text.mark({ start: 0, end: 5 }, "bold", true);
expect(text.toDelta()).toStrictEqual([
{
insert: "Hello",
attributes: { bold: true },
},
{
insert: " world!",
}
] as Delta<string>[]);
text.insert(5, "!");
expect(text.toDelta()).toStrictEqual([
{
insert: "Hello!",
attributes: { bold: true },
}, {
insert: " world!",
}
] as Delta<string>[]);
Summary
This article presents Loro’s rich text algorithm design and implementation. Its correctness is readily demonstrable. It can be built upon any existing List CRDT algorithm. It allows Loro to support rich text collaboration using REG and Fugue (opens in a new tab), combining the strengths of a number of CRDT algorithms.
We’re repeatedly refining its design and actively looking for design companions. We’re open to all types of suggestions and constructive criticism. Ought to you’ve gotten any proposals for collaboration, please attain out to zx@loro.dev