# Prolog for information science – Emir’s weblog

*by*Phil Tadros

I reveal a broadly relevant sample which integrates Prolog as a essential

part in an information science evaluation. Analytic strategies are used to generate

properties in regards to the information underneath examine and Prolog is used to motive in regards to the

information by way of the generated properties. The publish consists of some examples of piece-wise

regression on timeseries information by symbolic reasoning. I additionally focus on the overall

sample of software a bit.

Given some information, the majority of “information science” for me is the examine of what the info

implies and whether or not it may be coerced into the context particular position normally

determined by somebody aside from me. Depending on the context, the aim could also be to

discover help for a speculation, to forecast, to foretell a latent variable, to

clarify a phenomena, and so forth. Nearly all of the worth I carry rests in

taking as enter the info and the position it’s anticipated to play, and producing as

output a computationally tractable downside the answer to which allows that

position for the info.

In my very own apply – at the very least in its mature kind – I discover myself principally relying

on reasoning with easy strategies akin to Linear Regression, kNN, kMeans and PCA.

I believe that is principally as a result of I can perceive what the outcomes suggest for the

“form” of the info, in the meantime reasoning in regards to the outcomes does the heavy lifting.

Regardless of the significance of reasoning in my work, I by no means made it express, nor did

I consciously separate it from its material. Once I turned conscious of this,

it appeared to me that I ought to make my reasoning express at the very least in order that I might

(1) motive extra precisely, and (2) immediately study to motive higher.

It’s simpler to reveal what it means to make reasoning express and the

impact it has on the construction of an evaluation. What follows are a few

illustrative examples. I take advantage of SWI Prolog because the symbolic reasoning engine and

logic language, and Python for every thing else.

I’ll begin with a contrived instance meant for instance the separation between

the reasoning factor and the remaining. Beneath (black line) is a few mock 2D information

constructed by becoming a member of collectively 5 randomly chosen traces right into a collection. The

x-axis is simply an index which runs from 0-999. The colored segments are a

piecewise linear becoming derived symbolically utilizing Prolog. I’ll subsequent describe

how.

Say I have no idea something in regards to the development of the info collection above besides

that it’s made up of traces and the goal of the evaluation is to seek out all of the linear

sections. My normal scheme can be to make use of Python to create symbols which

signify details in regards to the information, after which to make use of Prolog to motive about these

symbols. One broadly relevant manner of producing symbols is to partition the

information at random alongside the explanatory variables (the (x) on this case) and match

linear regressions to every partition section. That is then repeated many instances

as a way to seize views of the info throughout many partitions. The result’s tons

of overlapping segments for which I can file info akin to:

- The place the section begins and ends.
- The becoming parameters.
- Correlation with the info.

Given some information (X,Y), the code beneath will produce rounds of random partitions

(4-10) with segments at the very least 20 models large, till there are at the very least (N)

qualifying (completely linear) segments.

```
import numpy as np
from sklearn.linear_model import LinearRegression
i = 0
np.random.seed(10)
whereas i < N:
knots = make_knots(
np.random.randint(4,10),
20,
np.min(X),
np.max(X))
for begin, cease in zip(knots[:-1], knots[1:]):
cond = X[(X>=start) & (X<=stop)]
x, y = X[cond].reshape(-1,1), Y[cond]
m = LinearRegression().match(x,y)
p = m.predict(x)
i += 1
coef = np.spherical(m.coef_[0],4)
rating = m.rating(x,y)
if rating ==1.0:
print(f"section(i{i},{begin},{cease},{coef},{rating}).")
```

Word that the code prints out strings within the following format:

`section(Id,Begin,Finish,Gradient,Rating).`

I copy/paste these immediately into Prolog for demonstration functions, however there

are much less handbook methods.

The easy process above produces loads of symbols to motive about. For

instance, segments could also be separate, overlapping, subsuming, adjoining, and so forth.

Gradients could also be equal, comparable, opposing, and so forth. For these issues,

I’ll solely want to make use of a fraction of the data obtainable. To supply the

segmentation illustrated within the graph above, it is sufficient to word that if

segments are completely linear they usually overlap, then they should be the identical line.

I can use this assertion as a manner of merging segments collectively as a way to create

larger steady linear *spans*. In Prolog I might specific this as follows:

```
overlap(I1,I2) :-
section(I1,Start1,End1,_,_),
section(I2,Start2,End2,_,_),
Start2 > Start1,
End1 > Start2,
End2 > End1.
:- desk attain/2.
attain(I1,I2) :-
overlap(I1,I2).
attain(I1,I2) :-
overlap(I1,X),
attain(X,I2).
```

That’s to say, section (I1) overlaps with section (I2) if it begins exterior of

(I2) however ends within it. A section (I2) will be “reached” from section (I1)

*iff* there’s a daisy chain of overlapping segments connecting them. Primarily based on

the relationships above, the next question would produce all reachable spans

(daisy chains of segments):

`?- setof((I1,I2),attain(I1,I2),Spans).`

**Prolog digression**: *overlap* is deliberately non-commutative to forestall

backtracking from looping. I take advantage of

tabling on the *attain*

predicate as a result of SLD decision may be very inefficient on this case.

To calculate an optimum piece-wise linear becoming, I want to seek out the subset of

all of the non-overlapping spans generated by *attain* which outcome within the maximal

protection of the info. This may be so simple as sorting the spans by size in

descending order and including them to an inventory while ensuring new additions do

not overlap with the outdated ones. In Prolog:

```
use_module(library(apply)).
use_module(library(pairs)).
span_overlap((I1,I2),(I3,I4)) :-
section(I1,Start1,_,_,_),
section(I2,_,End1,_,_),
section(I3,Start2,_,_,_),
section(I4,_,End2,_,_),
((End1 >= Start2, End2 >= End1)
;(End2 >= Start1, End1 >= End2)
;(Start1 >= Start2, End2 >= Start1)
;(Start2 >= Start1, End1 >= Start2)).
not_span_overlap((I1,I2),(I3,I4)) :-
+ span_overlap((I1,I2),(I3,I4)).
span_length((I1,I2),R) :-
section(I1,Begin,_,_,_),
section(I2,_,Finish,_,_),
R is Begin-Finish.
span_set([],R,R).
span_set([(X,Y)|T],A,R) :-
maplist(not_span_overlap((X,Y)),A),
(X=Y;attain(X,Y)),!,
span_set(T,[(X,Y)|A],R).
span_set([_|T],A,R) :-
span_set(T,A,R).
spans([],R,R).
spans([(I1,I2)|T],A,R):-
section(I1,Begin,_,_,_),
section(I2,_,Finish,_,_),
spans(T,[(Start,End)|A],R).
doable(X,Y) :-
section(X,Start1,_,_,_),
section(Y,Start2,_,_,_),
Start2 >= Start1.
find_cover(SortSpan) :-
findall((X,Y),doable(X,Y),Potential),
map_list_to_pairs(span_length,Potential,Pairs),
keysort(Pairs,Sorted),
pairs_values(Sorted,BySpan),
span_set(BySpan,[],SpanById),
spans(SpanById,[],Span),
kind(Span,SortSpan).
```

The *span_set* predicate does the vital work, while the *find_cover*

predicate is the entry level. I might have used *attain* immediately, nevertheless it

massively over generates as a result of there are normally some ways of getting

between two segments. It seems to be rather more environment friendly to record out

the doable spans and take a look at whether or not they’re reachable immediately as above.

Lastly, I can simply question for the quilt:

`?- find_cover(Cowl).`

which outputs the segmentation within the plot above:

```
Cowl = [
(0, 110), (112, 201), (204, 336),
(343, 647), (653, 758), (760, 998)].
```

The above instance is contrived and simple to resolve by extra direct means however

hopefully it made the method pretty clear. On this instance I apply a lot the

identical technique however this time to carry out a piece-wise isotonic becoming of the

FTSE 100 inventory index. The graph beneath reveals the final 5 years of the FTSE-100

index at weekly intervals. The segmentation was created by reasoning about

random segments simply as within the instance above, besides an Isotonic Regression

was fitted as an alternative of a Linear Regression.

The next info was recorded:

`section(Id,Begin,Cease,Course,Match)`

I made the next minimal adjustment to the Prolog code:

```
same_direction(I1,I2) :-
section(I1,_,_,G1,_),
section(I2,_,_,G2,_),
G1 = G2.
:- desk attain/2.
attain(I1,I2) :-
same_direction(I1,I2),
overlap(I1,I2).
attain(I1,I2) :-
same_direction(I1,X),
same_direction(X,I2),
overlap(I1,X),
attain(X,I2).
```

I’ve added the *same_direction* predicate which requires that the route of

the isotonic regression is identical in overlapping segments for them to be

reachable. I’ve additionally relaxed the random partitioning situation to a correlation

of (ge 0.9). I can now question for the quilt as earlier than:

`?- find_cover(Cowl).`

which outputs the segmentation within the plot above:

`Cowl = [(1, 22), (23, 88), (97, 260)].`