# «Abstract. Evolution of Knowledge Bases (KBs) expressed in Description Logics (DLs) proved its importance. Recent studies of the topic mostly focussed ...»

Capturing Instance Level Ontology Evolution

for DL-Lite

Evgeny Kharlamov and Dmitriy Zheleznyakov

KRDB Research Centre, Free University of Bozen-Bolzano, Italy

last_name@inf.unibz.it

Abstract. Evolution of Knowledge Bases (KBs) expressed in Description Logics (DLs) proved its importance. Recent studies of the topic mostly focussed on

model-based approaches (MBAs), where an evolution (of a KB) results in a set of

models. For KBs expressed in tractable DLs, such as DL-Lite, it was shown that the evolution suffers from inexpressibility, i.e., the result of evolution cannot be expressed in DL-Lite. What is missing in these studies is understanding: in which DL-Lite fragments evolution can be captured, what causes the inexpressibility, which logics is sufﬁcient to express evolution, whether and how one can approximate it in DL-Lite. This work provides some understanding of these issues for eight of MBAs which cover the case of both update and revision. We found what causes inexpressibility and isolated a fragment of DL-Lite where evolution is expressible. For this fragment we provided polynomial-time algorithms to compute evolution results. For the general case we proposed techniques (based on what we called prototypes) to capture DL-Lite evolution corresponding to a well-known Winslett’s approach in a DL SHOIQ (which is subsumed by OWL 2 DL). We also showed how to approximate this evolution in DL-Lite.

1 Introduction Description Logics (DLs) provide excellent mechanisms for representing structured knowledge by means of Knowledge Bases (KBs) K that are composed of two components: TBox (describes intensional or general knowledge about an application domain) and ABox (describes facts about individual objects). DLs constitute the foundations for various dialects of OWL, the Semantic Web ontology language.

Traditionally DLs have been used for modeling static and structural aspects of application domains [1]. Recently, however, the scope of KBs has broadened, and they are now used also for providing support in the maintenance and evolution phase of information systems. This makes it necessary to study evolution of Knowledge Bases [2], where the goal is to incorporate new knowledge N into an existing KB K so as to take into account changes that occur in the underlying application domain. In general, N is represented by a set of formulas denoting properties that should be true after K has evolved, and the result of evolution, denoted K N, is also intended to be a set of formulas. In the case where N interacts with K in an undesirable way, e.g., by causing the KB or relevant parts of it to become unsatisﬁable, N cannot be simply added to the KB. Instead, suitable changes need to be made in K so as to avoid this undesirable interaction, e.g., by deleting parts of K conﬂicting with N. Different choices for changes are possible, corresponding to different approaches to semantics for KB evolution [3,4,5].

2 Evgeny Kharlamov and Dmitriy Zheleznyakov An important group of approaches to evolution semantics, that we focus in this paper, is called model-based (MBAs). Under MBAs the result of evolution K N is a set of models of N that are minimally distanced from models of K. Depending on what the distance between models is and how to measure it, eight different MBAs were introduced (see Section 2.2 for details). Since K N is a set of models, while K and N are logical theories, it is desirable to represent K N as a logical theory using the same language as for K and N. Thus, looking for representations of K N is the main challenge in studies of evolution under MBAs. When K and N are propositional theories, representing K N is well understood [5], while it becomes dramatically more complicated as soon as K and N are ﬁrst-order, e.g., DL KBs [6].

Model based evolution of KBs where K and N are written in a language of the DL-Lite family [7] has been recently extensively studied [6,8,9]. The focus on DL-Lite is not surprising since it is the basis of OWL 2 QL, a tractable OWL 2 proﬁle. It has been shown that for every of the eight MBAs one can ﬁnd DL-Lite K and N such that K N cannot be expressed in DL-Lite [10,11], i.e., DL-Lite is not closed under MBA evolution. This phenomenon was also noted in [6,10] for some of the eight semantics.

What is missing in all these studies of evolution for DL-Lite is understanding of (1) DL-Lite wrt evolution: What DL-Lite fragments are closed under MBAs? What DL-Lite formulas are in charge of inexpressibility?

(2) Evolution wrt DL-Lite : Is it possible and how to capture evolution of DL-Lite KBs in richer logics? What are these logics?

(3) Approximation of evolution results: For DL-Lite KB K and an ABox N, is it possible and how to do “good” approximations of K N in DL-Lite?

In this paper we study the problems (1)-(3) for so-called ABox evolution, i.e., N is a new ABox and the TBox of K should remain the same after the evolution. ABox evolution is important for areas, e.g., artifact-centered service interoperation (http:// www.acsi-project.eu/), where the structural knowledge (TBox) is well crafted and stable, while (ABox) facts about individuals may get changed. These ABox changes should be reﬂected in KBs in a way that the TBox is not affected. Our study covers both the case of ABox updates and ABox revision [4].

The contributions of the paper are: We provide relationships between MBAs for DL-LiteR by showing which approaches subsume each other (Section 3). We introduce DL-Litepr, a restriction on DL-LiteR where disjointness of concepts with role projecR tions is forbidden. We show that DL-Litepr is closed under most of MBA evolutions R and provide polynomial-time algorithms to compute (representations of) K N (Section 4). For DL-LiteR we focus on an important MBA corresponding to a well accepted Winslett’s semantics and show how to capture K N for this semantics in a DL SHOIQ (Section 5). We show what combination of assertions in T together with N can lead to inexpressibility of (T, A) N in DL-LiteR (Section 5.1). For the case when K N is not expressible in DL-LiteR we study how to approximate it in DL-LiteR (Section 5.4).

2 Preliminaries

2.1 DL-LiteR We introduce some basic notions of DLs, (see [1] for more details). We consider a logic DL-LiteR of DL-Lite family of DLs [7,12]. DL-LiteR has the following constructs for Capturing Instance Level Ontology Evolution for DL-Lite 3 (complex) concepts and roles: (i) B ::= A | ∃R, (ii) C ::= B | ¬B, (iii) R ::= P | P −, where A and P stand for an atomic concept and role, respectively, which are just names.

A knowledge base (KB) K = (T, A) is compounded of two sets of assertions: TBox T, and ABox A. DL-LiteR TBox assertions are concept inclusion assertions of the form B C and role inclusion assertions R1 R2, while ABox assertions are membership assertions of the form A(a), ¬A(a), and R(a, b). The active domain of K, denoted adom(K), is the set of all constants occurring in K. In Section 5 we will also talk about a DL SHOIQ [1] while we do not deﬁne it here due to space limit.

The semantics of DL-Lite KBs is given in the standard way, using ﬁrst order interpretations, all over the same countable domain ∆. An interpretation I is a function ·I that assigns to each C a subset C I of ∆, and to R a binary relation RI over ∆ in a way that (¬B)I = ∆ \ B I, (∃R)I = {a | ∃a.(a, a ) ∈ RI }, and (P − )I = {(a2, a1 ) | (a1, a2 ) ∈ P I }. We assume that ∆ contains the constants and that cI = c (we adopt standard names). Alternatively, we view interpretations as sets of atoms and say that A(a) ∈ I iff a ∈ AI and P (a, b) ∈ I iff (a, b) ∈ P I. An interpretation I is a model of a membership assertion A(a) (resp., ¬A(a)) if a ∈ AI (resp., a ∈ AI ), of P (a, b) if (a, b) ∈ P I, and of an assertion D1 D2 if D1 ⊆ D2.

I I / As usual, we use I |= F to denote that I is a model of an assertion F, and I |= K denotes that I |= F for each F in K. We use Mod(K) to denote the set of all models of K. A KB is satisﬁable if it has at least one model. The DL-Lite family has nice computational properties, for example, KB satisﬁability has polynomial-time complexity in the size of the TBox and logarithmic-space in the size of the ABox [12,13]. We use entailment on KBs K |= K in the standard sense. An ABox A T -entails an ABox A, denoted A |=T A, if T ∪ A |= A, and A is T -equivalent to A, denoted A ≡T A, if A |=T A and A |=T A.

The deductive closure of a TBox T, denoted cl(T ), is the set of all TBox assertions F such that T |= F. For satisﬁable KBs K = (T, A), a full closure of A (wrt T ), denoted fclT (A), is the set of all membership assertions f (both positive and negative) over adom(K) such that A |=T f. In DL-LiteR both cl(T ) and fclT (A) are computable in time quadratic in, respectively, |T |, i.e., the number of assertions of T, and |T ∪ A|. In our work we assume that all TBoxes and ABoxes are closed, while results are extendable to arbitrarily KBs.

A homomorphism h from a model I to a model J is a mapping from ∆ to ∆ satisfying: (i) h(a) = a for every constant a; (ii) if α ∈ AI (resp., (α, β) ∈ P I ), then h(α) ∈ AJ (resp., (h(α), h(β)) ∈ P J ) for every A (resp., P ). A canonical model of K can is a model which can be homomorphically embedded in every model of K, denoted IK can or just I when K is clear from the context.

2.2 Evolution of Knowledge Bases This section is based on [10]. Let K = (T, A) be a DL-LiteR KB and N a “new” ABox.

We study how to incorporate N ’s assertions into K, that is, how K evolves under N [2].

More practically, we study evolution operators that take K and N as input and return, possibly in polynomial time, a DL-LiteR K = (T, A ) (with the same TBox as K) that captures the evolution, and which we call the (ABox) evolution of K under N. Based on the evolution principles of [10], we require K and K to be satisﬁable. A DL-LiteR KB K = (T, A) and an ABox N is a evolution setting if K and (T, N ) are satisﬁable.

4 Evgeny Kharlamov and Dmitriy Zheleznyakov

Summing up across the different possibilities, we have three dimensions, which give eight semantics of evolution according to MBAs by choosing: (1) the local or the global approach, (2) atoms or symbols for deﬁning distances, and (3) set inclusion or cardinality to compare symmetric differences. In Figure 1, right, we depict these three dimensions.

We denote each of these eight possibilities by a combination of three symbols, indicating the choice in each dimension, e.g., La denotes the local semantics where the distances # are expressed in terms of cardinality of sets of atoms.

Closure Under Evolution. Let D be a DL and M one of the eight MBAs introduced above. We say D is closed under evolution wrt M (or evolution wrt M is expressible in

D) if for every evolution setting K and N written in D, there is a KB K written in D such that Mod(K ) = K N, where K N is the evolution result under semantics M.

We showed in [10,11] that DL-Lite is not closed under any of the eight model based semantics. The observation underlying these results is that on the one hand, the minimality of change principle intrinsically introduces implicit disjunction in the evolved KB. On the other hand, since DL-Lite is a slight extension of Horn logic [14], it does not allow one to express genuine disjunction (see Lemma 1 in [10] for details).

Let M be a set of models that resulted from the evolution of (T, A) with N. A KB (T, A ) is a sound approximation of M if M ⊆ Mod(T, A ). A sound approximation (T, A ) is minimal if for every sound approximation (T, A ) inequivalent to (T, A ), it holds that Mod(T, A ) ⊂ Mod(T, A ), i.e., (T, A ) is minimal wrt “⊆”.

**3 Relationships Between Model-Based Semantics**

Let S1 and S2 be two evolution semantics and D a logic language. Then S1 is subsumed by S2 wrt D, denoted (S1 sem S2 )(D), or just S1 sem S2 when D is clear from the context, if K S1 N ⊆ K S2 N for all satisﬁable KBs K and N written in D, where K Si N denotes evolution under Si. Two semantics S1 and S2 are equivalent (wrt D), denoted (S1 ≡sem S2 )(D), if (S1 sem S2 )(D) and (S2 sem S1 )(D). Further in this section we will consider K and N written in DL-LiteR. The following theorem shows the subsumption relation between different semantics. We depict these relations in Figure 2 using solid arrows. The ﬁgure is complete in the following sense: there is a solid path (a sequence of solid arrows) between any two semantics S1 and S2 iff there is a subsumption S1 sem S2.

Algorithm 1: Algorithm AlignAlg((T, A), N ) for A deterministic computation Consider an algorithm AlignAlg (see Algorithm 1) that inputs an evolution setting K, N, and returns the alignment Align(I can, N ) of a canonical model I can of K: it drops all the assertions of fclT (A) contradicting N and keeps the rest. Using AlignAlg we can compute representation of K N in DL-Litepr : R

4.2 Capturing Symbol-Based Evolution Observe that symbol-based semantics behave differently from atom-based ones: two local semantics (on set inclusion and cardinality) coincide, as well as two global semantics,

**while there is no subsumption between local and global ones, as depicted in Figure 2:**

As a corollary of Theorem 5, in general the approach presented in Theorem 2 does not work for computing K N under any of the symbol-based MBAs. At the same time, as follows from the following Theorems 6 and 8, this approach gives complete approximations of all symbol-based semantics, while it approximates global semantics better than the local ones.

Consider the algorithm SymAlg in Algorithm 2 that will be used for evolutions on symbols. It works as follows: it inputs an evolution setting (T, A), N and a unary property Π of assertions. Then for every atom φ in N it checks whether φ satisﬁes Π (Line 4).

If it the case, SymAlg deletes from AlignAlg((T, A), N ) all literals φ that share concept name with φ. Both local and global semantics have their own Π: ΠG and ΠL.