analysis!rsc: Model SAT
Dependency hell is NP-complete. However possibly we are able to climb out.
The package deal model choice downside is to discover a set of dependencies that can be utilized to construct a top-level package deal P that’s full (all dependencies glad) and appropriate (no two incompatible packages are chosen). There could also be no such set, due to the diamond dependency downside: maybe A wants B and C; B wants D model 1, not 2; and C wants D model 2, not 1. On this case, assuming it is not potential to decide on each variations of D, there isn’t any approach to construct A.
A package deal supervisor wants an algorithm to pick package deal variations: whenever you run apt-get set up perl
, it might assume you imply the most recent model of Perl, however then it has to discover a approach to fulfill Perl’s transitive dependencies, or else to print an comprehensible clarification of why Perl cannot be put in. You would possibly moderately marvel: how costly is it to resolve this downside, within the worst case? You most likely don’t need your package deal supervisor to take hours or days or years to resolve whether or not it may well set up Perl.
Sadly, the model choice downside is NP-complete,
which implies that we’re exceedingly unlikely to search out an algorithm assured to run rapidly on each enter.
This submit offers a proof of NP-completeness for model choice,
seems to be at how present package deal managers cope,
and briefly discusses potential approaches to keep away from an NP-complete process.
Proof of NP-Completeness
To contemplate NP-completeness, we have to shift from our trendy world of algorithms with wealthy outputs to the restricted world of complexity concept,
the place algorithms have one boolean output: sure or no.
On this world of complexity concept, we’ll outline the VERSION downside (they’re at all times all caps) to ask whether or not there’s a legitimate model choice.
This boolean VERSION downside is barely half of our authentic downside, and we are able to show that it is NP-complete.
To take action, we have to show two separate information: that VERSION is in NP and that VERSION is NP-hard.
An issue is in NP if each “sure” reply has an easily-checked polynomial-size clarification.
VERSION is in NP, as a result of any “sure” reply might be defined by itemizing the chosen package deal variations.
This listing is not any larger than the enter and might be checked for correctness in time no worse than quadratic
within the enter (most likely linear, relying on particulars of the computing mannequin).
An issue is NP-hard if an environment friendly resolution for that downside might be tailored into an environment friendly resolution to each different downside in NP.
That is a reasonably tall order, however it’s sufficient for us to point out learn how to adapt an environment friendly resolution for VERSION
into an environment friendly resolution for one different NP-hard downside (name it HARD) after which depend on the truth that
another person has confirmed that an environment friendly resolution for HARD might be tailored into an environment friendly resolution
for each different downside in NP.
A helpful instance of an NP-complete (in NP and NP-hard) downside is 3-SAT.
In 3-SAT, the enter is a boolean method over some variety of boolean variables,
constrained to be a conjunction (an AND) of some variety of disjunctions (ORs)
of three literals every, the place a literal is a variable or its negation.
For instance, right here is an enter for 3-SAT (∧ means AND, ∨ means OR, and ¬ means NOT):
It’s satisfiable by precisely one task to the variables—x1 = 0, x2 = 1, x3 = 1, x4 = 0—so the reply is sure.
If we lengthen it so as to add yet another clause,
then it’s unsatisfiable by any task to the variables, so the reply is not any.
The overall type of a 3-SAT occasion is a method F that’s the conjunction of
clauses C1 via Cn over variables V1 via Vm,
the place every Ci is a disjunction of three literals, every of the shape
xj or ¬ xj for some variable xj .
Duplicate literals in a clause are allowed, as in (¬ x2 ∨ ¬ x2 ∨ x3) and (x2 ∨ x2 ∨ x2) above.
We will convert any 3-SAT occasion to a VERSION occasion with the identical reply.
In regards to the package deal supervisor we’ll assume solely that:
- A package deal can listing zero or extra packages or particular package deal variations as dependencies.
- To put in a package deal, all its dependencies should be put in.
- Every model of a package deal can have totally different dependencies.
- Two totally different variations of a package deal can’t be put in concurrently.
We’ll abbreviate package deal P
model V
as P:V
(now utilizing fixed-width font for packages to tell apart from the usual math italics for formulation).
A dependency on P:V
should be glad by model V
precisely, not V
-1 and never V
+1.
Given a 3-SAT method, we are able to create a package deal F
representing the entire method,
packages C1
, C2
, …, Cn
representing every clause,
and packages X1
, X2
, …, Xm
representing every variable.
Every package deal Xj
has two variations Xj:0
and Xj:1
.
As assumed above, Xj:0
and Xj:1
battle and can’t each be put in.
Xj:1
being put in corresponds to xj = 1 within the authentic method.
Bundle Ci
has three variations numbered 0, 1, 2, every of which will depend on a literal from the corresponding clause.
For instance, if C5 is (x1 ∨ ¬ x2 ∨ x4), then C5:0
will depend on X1:1
, C5:1
will depend on X2:0
, and C5:2
will depend on X4:1
.
Ci:ok
being put in corresponds to Ci‘s ok’th literal being true (and subsequently Ci being true) within the authentic method.
Bundle F
will depend on C1
, C2
, …, Cn
.
F
being put in implies that every one the Ci
are put in, which corresponds to all of the Ci being true and subsequently to F being true.
If the package deal supervisor can discover a approach to set up package deal F
, then a satisfying task for the
authentic method might be learn out from the set up standing of Xj:1
for every variable xj.
Equally, if the method is satisfiable, the satisfying task offers a method the package deal supervisor
may efficiently set up F
.
Subsequently, we have transformed the 3-SAT occasion right into a corresponding VERSION occasion with the identical reply,
which establishes that 3-SAT might be solved utilizing VERSION, so VERSION is NP-hard.
Since VERSION is in NP and is NP-hard, VERSION is NP-complete.
Implementations
The assumptions above are fairly minimal:
packages have a listing of dependencies,
a package deal’s dependencies can change with its personal model to model,
a package deal’s dependencies might be restricted to particular variations of these dependencies,
and it’s potential for 2 variations of a package deal to battle with one another.
That could be the naked minimal for a package deal supervisor to be helpful.
Some package deal managers may not enable a dependency to listing a selected model,
as an alternative requiring a variety, however we are able to simply change the model necessities 0 and 1
to ≤ 0 and ≥ 1.
Some package deal managers may not assume that totally different variations of a package deal
battle by default, nevertheless it should be no less than potential to specify such a battle:
there cannot be two /bin/bash
on a Unix system, or two definitions of printf
constructed right into a C program.
The assumptions are true of each package deal supervisor I’ve checked out:
Debian’s APT, RedHat’s RPM, Rust’s Cargo, Node’s npmjs, Java’s Maven, Haskell’s Cabal, and extra.
The implication is that these package deal managers faces an NP-complete process.
Every should select between presumably taking a really very long time
to resolve on an set up technique or presumably reporting an installable
package deal as uninstallable.
(After all, a given implementation could inadvertently do each.)
Knuth writes in Volume 4, Fascicle 6:
The story of satisfiability is the story of a triumph of software program engineering,
blended with wealthy doses of lovely arithmetic. Because of elegant new knowledge
constructions and different methods, trendy SAT solvers are capable of deal routinely
with sensible issues that contain many 1000’s of variables, though such
issues had been thought to be hopeless just some years in the past.
In follow, it does appear that trendy package deal managers are transferring towards utilizing SAT solvers:
0install began with heuristics however found it necessary to change to a SAT solver.
Chef, a techniques integration framework, makes use of the dep-selector Ruby bindings for the Gecode constraint solver.
Dart’s pub features a backtracking solver that often takes a long time.
Debian’s apt-get makes use of heuristics by default however can invoke a SAT solver
and might
take user preferences into account.
The Debian High quality Assurance staff additionally runs a solver to establish uninstallable packages of their repos.
Eclipse makes use of the sat4j SAT solver to manage installation of its plugins.
Fedora’s DNF (“Dandified yum”) makes use of a SAT solver in an experimental mode.
FreeBSD’s pkg, additionally utilized by DragonflyBSD, makes use of the picosat SAT solver.
OCaml’s OPAM can invoke a SAT solver locally or remotely over a network. Like with Debian’s apt-get, OPAM’s solver can take person preferences into consideration,
and the OPAM repos are scanned for uninstallable packages.
OpenSUSE‘s package deal supervisor makes use of libsolv, “a free package deal dependency solver utilizing a satisfiability algorithm.” There’s additionally OpenSUSE’s zypper, which makes use of its personal libzypp SAT solver.
Python’s Anaconda makes use of a SAT solver however can take a long time.
Rust’s Cargo makes use of a basic backtracking solver. It additionally permits a number of variations of a crate to be linked into the ultimate binary.
Solaris’s pkg, additionally utilized by Illumos and generally often called IPS, uses the minisat SAT solver.
Swift’s package manager makes use of a basic backtracking solver.
[I would like to add more package managers here. If you know details for one (or something here is wrong), please email me or send a tweet. Thanks.]
Alternatives?
How ought to we react to the truth that package deal model choice is NP-complete?
One response is to embrace the complexity and be grateful that
SAT solvers are nearly as good as they’re.
One other response is to ask whether or not this may presumably be a good suggestion.
Possibly we shouldn’t be constructing instruments that require fixing this downside.
Possibly one thing has gone fallacious in the best way we develop software program.
If package deal model choice is NP-complete, which means
the search area of potential package deal mixtures
is simply too giant and complicated for environment friendly systematic evaluation;
what about environment friendly systematic testing?
If a search finds a conflict-free mixture, why ought to we consider the mixture will work?
The absence of a model battle could point out solely that the mixture is untested.
If a search does not discover a conflict-free mixture, how can that failure
be defined to a developer in a manner that makes it clear
what to do subsequent?
Software program is difficult sufficient to get proper with out admitting
NP-complete issues into our software program configuration selections.
Let’s reexamine how we bought right here and the way we’d escape.
The proof above will depend on these of assumptions, copied from above:
- A package deal can listing zero or extra packages or particular package deal variations as dependencies.
- To put in a package deal, all its dependencies should be put in.
- Every model of a package deal can have totally different dependencies.
- Two totally different variations of a package deal can’t be put in concurrently.
The standard knowledge, as I urged above, is that these are roughly
the “the naked minimal for a package deal supervisor to be helpful,”
however possibly we are able to discover a approach to cut back them in any case.
One approach to keep away from NP-completeness is to assault assumption 1:
what if, as an alternative of permitting a dependency to listing particular package deal variations,
a dependency can solely specify a minimal model?
Then there’s a trivial algorithm for locating the packages to make use of:
begin with the most recent model of what you need to set up,
after which get the most recent model of all its dependencies,
recursively.
Within the authentic diamond dependency at first of this text,
A wants B and C, and B and C want totally different variations of D.
If B wants D 1.5 and C wants D 1.6, the construct can use D 1.6 for each.
If B does not work with D 1.6,
then both the model of B we’re contemplating is buggy or D 1.6 is buggy.
The buggy model ought to be faraway from circulation completely,
after which a brand new launched model ought to repair the issue.
Including a battle to the dependency graph as an alternative
is like documenting a bug as an alternative of fixing it.
One other approach to keep away from NP-completeness is to assault assumption 4:
what if two totally different variations of a package deal may very well be put in
concurrently?
Then virtually any search algorithm will discover a mixture of
packages to construct this system; it simply may not be the
smallest potential mixture (that is nonetheless NP-complete).
If B wants D 1.5 and C wants D 2.2, the construct can embrace each
packages within the closing binary, treating them as distinct packages.
I discussed above that there cannot be two definitions of printf
constructed right into a C program,
however languages with express module techniques should not have any
downside together with separate copies of D (beneath totally different fully-qualified names) right into a program.
One other approach to keep away from NP-completeness is to mix the earlier two.
Because the examples already trace at,
if packages observe semantic versioning,
a package deal supervisor would possibly robotically use the most recent model
of a dependency inside a serious model however then deal with
totally different main variations as totally different packages.
One rationale for such restrictions is that builders are seemingly not considering
about the whole area of all potential package deal mixtures when constructing
or testing software program. It could assist for the builders and their instruments to
agree about how software program is constructed.
If any of those approaches might be made to work in follow,
it may go a good distance towards simplifying the operation
and understandability of language package deal managers.
Proofs that Debian and RedHat package installation are both NP-complete are given in
“EDOS deliverable WP2-D2.1: Report on Formal Management of Software Dependencies” (2005), pages 49-50.
The tough step within the discount of 3-SAT to package deal set up
is learn how to assemble a disjunction.
The EDOS proofs encode the disjunction utilizing the package deal supervisor’s
capacity to specify a listing of options for a single dependency,
both immediately (in Debian) or utilizing “offers” directives (in RedHat).
For instance, these techniques enable a pseudo-package text-editor
to be outlined that’s thought of put in when any of the actual packages
ed
, vi
, or acme
is put in.
The dependency specs for a language package deal supervisor
like Rust’s Cargo are dramatically easier than these for Debian and RedHat,
and so the EDOS proofs don’t apply.
One would possibly subsequently hope that language package deal managers face
a better (not NP-complete) job.
The brand new proof above dashes that hope.
(One approach to view the proof above is that it simulates the “offers” directive
within the final instance by defining a text-editor
package deal with three variations,
one in all which will depend on ed
, one on vi
, and one on acme
.)
By encoding the disjunction within the altering dependencies
of various variations of a package deal, the brand new proof works with out modification
for each Debian’s and RedHat’s package deal managers but additionally applies to primarily
any foreseeable working system or language package deal supervisor.
I believe that the majority language package deal supervisor authors assumed
the issue they confronted was NP-complete, however I have been unable to search out
prior written proofs of that truth.
Just a few dependency techniques use constraint solvers as an alternative of SAT solvers,
however the underlying downside is still NP-complete.
In 2008, Daniel Burrows wrote a weblog submit about using dpkg to solve Sudoku problems.
Because of Sam Boyer for pointing me on the EDOS report
and for his glorious overview of package management.
Roberto Di Cosmo has written a lot of followups to the EDOS report,
listed here, specifically,
“Dependency solving: a separate concern in component evolution management,”
which accommodates an up to date proof.
That line of analysis applies SAT solvers but additionally works to take person preferences into consideration.
One other associated line of labor is “OPIUM: Optimal Package Install/Uninstall Manager” by Tucker et al., ICSE 2007. OPIUM was the starting point for 0install’s solver.
Jaroslav Tulach found the same proof as above in 2009.
Because of HN reader edwintorok for the link.
The discussion of Tulach’s proof on LtU mentions Daniel Burrows’s 2005 paper “Modelling and Resolving Software Dependencies,” however that paper’s proof is extra just like the EDOS proof than Tulach’s proof / the proof above.
Many readers despatched extra hyperlinks to references and to package deal managers with SAT solvers. Because of all.