Myers' diff algorithm in Scala
Dissonance implements Eugene Myers' diff algorithm in Scala as a pure function on immutable data structures. Using it is as simple as calling diff(left, right)
, where left
and right
are sequences of like-typed data to be compared; the result is an instance of Diff
, a sequence of additions, deletions and no-change nodes representing each item in the left and right sequence.
- implements Myers' diff algorithm
- can be used with any data type, not just strings
- specify a custom comparison function enabling fine-grained merges on "similar" data
- diffs are simple immutable structures of
Del
,Ins
andPar
nodes
Dissonance is available as a binary for Scala 3.4.0 and later, from Maven Central. To include it in an sbt
build, use the coordinates:
libraryDependencies += "dev.soundness" % "dissonance-core" % "0.3.0"
Dissonance provides only a single method, diff
, which takes a left
and right
parameter, both sequences of the same type, and generates a sequence of the edits required to transform the left sequence into the right sequence, a result of type Diff
. Each element of the resultant sequence is either a Par
value (for parity), corresponding to a value in both the left and right sequences, an Ins
value (for insertions) which exists only in the right sequence, or a Del
value (for deletions) which exists only in the left sequence.
The naming of these enumeration cases corresponds to a translation of the left sequence into the right sequence, but could describe a translation from the right sequence to the left if the roles are reversed. The Diff#flip
method can automatically reverse the translation.
Each of the three possible Edit
cases, Ins
, Del
and Par
includes the relevant value, as well as the indices of that value in each sequence it exists in: for Ins
, the right; for Del
, the left, and for Par
both a left
and right
index.
By default, elements of the left and right sequences will be considered the same (producing Par
values) if they are equal according to Java's universal equality method, AnyRef#equals
. However, other forms of equality (or similarity) may exist, and it may be useful to consider two elements to be the same, even if they are not equal according to AnyRef#equals
. A common example would be if they had the same ID, even if their content is different.
The diff
method takes an optional third parameter, compare
, of type (ElemType, ElemType) -> Boolean
which determines whether two elements are the same for the purposes of the diff.
Given a Diff
instance, created from two sequences, its collate
method can be used to group subsequences of changes together into changed and unchanged regions. For any given diff result, this collation can be done unambiguously: runs of adjacent Par
s are grouped together into an Unchanged
instance, and between each run will be one or more Ins
and/or Del
values which are grouped into a Changed
instance, consisting of a sequence of deletions and a sequence of insertions.
Conventionally, the diff algorithm will output deletions before insertions, but in terms of correctness, the order in which deletions and insertions are applied does not matter. The result of collation will always be an alternating sequence of Changed
and Unchanged
sections.
Diff
also provides a method, rdiff
, which will use these collated changes and perform a further diff on each changed section which contains at least one insertion and at least one deletion, since some of these may be more usefully viewed as substitutions or replacements, rather than unrelated insertions and deletions. The rdiff
method requires a similarity function, (ElemType, ElemType) -> Boolean
, which can identify values in the left and right sequences which are similar (but which have already been identified as non-equal). These are thus presented as Sub
nodes.
Since the order in which each deletion or insertion is applied within a changed section does not affect correctness, each nested diff may decide to identify certain insertion/deletion pairs as "substitutes" and fit the remaining insertions and deletions around them.
The result is an instance of RDiff
containing a (now flattened) sequence of Ins
s, Del
s, Par
s and Sub
s. Note that all but Sub
have the supertype, Edit
, while all have the supertype, Change
.
Furthermore, rdiff
has a second, optional, parameter, subSize
, which may be used to automatically convert short, changed sections with an equal number of insertions and deletions, into a sequence of Sub
s, provided they are less than subSize
. The value defualts to 1
. So, as isolated insertion/deletion region of length subSize
would become a series of Sub
s, regardless of whether the similarity function returns true
or false
.
A Diff
instance, say diff
, may be applied to a sequence, seq
, with,
diff.(seq)
to produce a new sequence. This is performed lazily, so the result is a LazyList
.
By default, a Par
"edit" will leave the element in the original sequence unchanged. However, has an optional second parameter,
update
, which can be provided to specify how Par
edits should be handled. Of course, if the edit is a Par
, the original and diff versions of element should be equal anyway, but may not be if, for example, the diff was constructed with a different "compare" function or the diff is being incorrectly applied to the wrong input.
Three reasonable alternatives for update
are:
- to ignore the value in the diff, and keep the original (the default),
- to ignore the original value and replace it with the value in the diff
- to check that the original and diff elements are equal, and fail if they are not
But more involved possibilities are available which could "merge" the original and diff elements, or prefer one or the other.
Dissonance is classified as maturescent. For reference, Soundness projects are categorized into one of the following five stability levels:
- embryonic: for experimental or demonstrative purposes only, without any guarantees of longevity
- fledgling: of proven utility, seeking contributions, but liable to significant redesigns
- maturescent: major design decisions broady settled, seeking probatory adoption and refinement
- dependable: production-ready, subject to controlled ongoing maintenance and enhancement; tagged as version
1.0.0
or later - adamantine: proven, reliable and production-ready, with no further breaking changes ever anticipated
Projects at any stability level, even embryonic projects, can still be used, as long as caution is taken to avoid a mismatch between the project's stability level and the required stability and maintainability of your own project.
Dissonance is designed to be small. Its entire source code currently consists of 358 lines of code.
Dissonance will ultimately be built by Fury, when it is published. In the meantime, two possibilities are offered, however they are acknowledged to be fragile, inadequately tested, and unsuitable for anything more than experimentation. They are provided only for the necessity of providing some answer to the question, "how can I try Dissonance?".
Copy the sources into your own project
Read the
fury
file in the repository root to understand Dissonance's build structure, dependencies and source location; the file format should be short and quite intuitive. Copy the sources into a source directory in your own project, then repeat (recursively) for each of the dependencies.The sources are compiled against the latest nightly release of Scala 3. There should be no problem to compile the project together with all of its dependencies in a single compilation.
Build with Wrath
Wrath is a bootstrapping script for building Dissonance and other projects in the absence of a fully-featured build tool. It is designed to read the
fury
file in the project directory, and produce a collection of JAR files which can be added to a classpath, by compiling the project and all of its dependencies, including the Scala compiler itself.Download the latest version of
wrath
, make it executable, and add it to your path, for example by copying it to/usr/local/bin/
.Clone this repository inside an empty directory, so that the build can safely make clones of repositories it depends on as peers of
dissonance
. Runwrath -F
in the repository root. This will download and compile the latest version of Scala, as well as all of Dissonance's dependencies.If the build was successful, the compiled JAR files can be found in the
.wrath/dist
directory.
Contributors to Dissonance are welcome and encouraged. New contributors may like to look for issues marked beginner.
We suggest that all contributors read the Contributing Guide to make the process of contributing to Dissonance easier.
Please do not contact project maintainers privately with questions unless there is a good reason to keep them private. While it can be tempting to repsond to such questions, private answers cannot be shared with a wider audience, and it can result in duplication of effort.
Dissonance was designed and developed by Jon Pretty, and commercial support and training on all aspects of Scala 3 is available from Propensive OÜ.
Dissonance's purpose is to detect differences—or dissonance, the places where they are not in agreement—between different objects.
In general, Soundness project names are always chosen with some rationale, however it is usually frivolous. Each name is chosen for more for its uniqueness and intrigue than its concision or catchiness, and there is no bias towards names with positive or "nice" meanings—since many of the libraries perform some quite unpleasant tasks.
Names should be English words, though many are obscure or archaic, and it should be noted how willingly English adopts foreign words. Names are generally of Greek or Latin origin, and have often arrived in English via a romance language.
The logo shows three sets of concentric circles, offset from each other, and creating an interference pattern, alluding to dissonance between them.
Dissonance is copyright © 2025 Jon Pretty & Propensive OÜ, and is made available under the Apache 2.0 License.