Abstract
The World Wide Web has evolved from the original hypertext system envisioned by physicists to a planet wide medium that has already transformed most of our lives. Web 3.0 refers to a combination of advances that will change the internet radically. It is the advent of a brave new paradigm where we will interact and solve problems together through a network of A.I. assistants. Also known as the semantic web, a term coined by Tim Berners-Lee, the man who invented the (first) World Wide Web. In essence, this new paradigm is a place where machines can read Web pages much as we humans read them, a place where search engines and software agents can better troll the Net and find what we are looking for. The vision is to extend principles of the Web from documents to data. This extension will allow to fulfill more of the Web’s potential, in that it will allow data to be shared effectively by wider communities, and to be processed automatically by tools as well as manually. A very basic definition for web 3.0 is that it is an expert system where there is a software agent for any task assigned by the user which takes in an input runs it through a knowledge database and then generates an output through inference.
Web 3.0 is not a separate Web but an extension of the current one, in which
Information is given well-defined meaning, better enabling computers and people to work in Co-operation. For this web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning. Artificial-intelligence researchers have studied such systems since long before the Web was developed. Knowledge representation, as this technology is often called, is currently in a state comparable to that of hypertext before the advent of the Web: it is clearly a good idea, and some very nice demonstrations exist, but it has not yet changed the world. It contains the seeds of important applications, but to realize its full potential it must be linked into a single global system.
HOW THE INTERNERT WORKS
Introduction
Web 3.0 is nothing but adding a layer of intelligence to the existing web which could be better understood by looking at the underlying implementation of the present day web. The Internet workings include a technical design and a management structure. The management structure consists of a generally democratic collection of loosely-coupled organizations and working groups with mostly non-overlapping responsibilities. The technical design is founded on a complex, interlocking set of hierarchical tree-like structures like Internet Protocol addresses and domain names, mixed with networked structures like packet switching and routing protocols, all tied together with millions of lines of sophisticated software that continues to get better all the time.
The Internet's architecture is described in its name, a short from of the compound word "inter-networking". This architecture is based in the very specification of the standard TCP/IP protocol, designed to connect any two networks which may be very different in internal hardware, software, and technical design. Once two networks are interconnected, communication with TCP/IP is enabled end-to-end, so that any node on the Internet has the near magical ability to communicate with any other no matter where they are. This openness of design has enabled the Internet architecture to grow to a global scale.
HOW YOUR URL GETS ROUTED
The Domain Name System (DNS) as a whole consists of a network of servers that map Internet domain names like www.yahoo.com to a local IP addresses. The DNS enables domain names to stay constant while the underlying network topology and IP addresses change. This provides stability at the application level while enabling network applications to find and communicate with each other using the Internet protocol no matter how the underlying physical network changes.
Internet domain names come in four main types -- top-level domains, second-level domains, third-level domains, and country domains. Internet domain names are the alphanumeric identifiers we use to refer to hosts on the Internet, like "www.getme_donnel.com".
Internet domain names are organized by their levels, with the higher levels on the right. For example, for the domain "www.getme_donnel.com" the top-level domain is "com", the second-level domain is "getme_donnel", and the third-level domain is "www.getme_donnel.com".
Top-level Internet domains like ".com" are shared by all the organizations in the domain. Second-level domain names like "yahoo.com" and "livinginternet.com" are registered by individuals and organizations. Second-level domains are the addresses commonly used to host Internet applications like web hosting and email addressing.
Third-level Internet domain names are created by those that own second-level domains. Third-level domains can be used to set up individual domains for specific purposes, such as a domain for web access and one for mail, or a separate site for a special purpose:
• www.livinginternet.com
• mail.livinginternet.com
• rareorchids.livinginternet.com
Each country in the world has its own top-level Internet domain with a unique alphabetic designation. For e.g. www.google.co.in
The Domain Name System (DNS) servers distribute the job of mapping domain names to IP addresses among servers allocated to each domain.
Each second-level domain must have at least one domain name server responsible for maintenance of information about that domain and all subsidiary domains, and response to queries about those domains from other computers on the Internet. For example, management of domain name information and queries for the LivingInternet.com domain is handled by a specific DNS server that takes care of the load required. This distributed architecture was designed to enable the Internet to grow, where as the number of domains grew, the number of DNS servers can grow to keep pace with the load.
Today, everyone who registers a second-level domain name must at the same time designate two DNS servers to manage queries and return the current IP address for addresses in that domain. The primary domain name server is always consulted first, and the secondary domain name server is queried if the primary doesn't answer, providing a backup and important support to overall Internet reliability.
The application that underlies almost all DNS server software on the Internet is an open source program called BIND, currently maintained by the Internet Systems Consortium. When your computer was added to the Internet, one of the initial setup tasks was to specify a default domain name server, usually maintained by your local Internet Service Provider, and almost certainly a variant of the BIND server software.
When your computer tries to access a domain like "www.livinginternet.com", the domain name system works like this:
• Your computer asks your default DNS server if it knows the IP address for www.livinginternet.com. If the DNS server has been asked that question recently, then it will have the answer stored in its local cache, and can answer immediately.
• Otherwise, your DNS server queries the central zone files for the address of the primary domain name server for livinginternet.com, and is answered with something like "ns1.livinginternet.com".
• Your DNS server will ask the livinginternet.com DNS server for the IP address of www.livinginternet.com, which will then look up the answer and send it back.
• Your DNS server will store the IP address returned in its local cache, and make the address available to your computer.
• Your computer then contacts www.livinginternet.com with the standard Internet routing protocols by using the returned IP address.
The IP address assigned to a computer may change frequently because of physical moves or network reconfigurations. The major advantage of the network of DNS servers is that domain names stay the same even when IP addresses change, and so the domain name servers can transparently take care of the mapping.
WEB 2.0
The phrase Web 2.0 refers to a perceived second-generation of web-based communities and hosted services — such as social-networking sites, wikis and folksonomies — which aim to facilitate collaboration and sharing between users. It became popular following the first O'Reilly Media Web 2.0 conference in 2004, and has since become widely adopted.
Although the term suggests a new version of the World Wide Web, it does not refer to an update to Web technical specifications, but to changes in the ways software developers and end-users use the web as a platform. According to Tim O'Reilly, "Web 2.0 is the business revolution in the computer industry caused by the move to the internet as platform, and an attempt to understand the rules for success on that new platform.
Some technology experts, notably Tim Berners-Lee, have questioned whether one can use the term in a meaningful way, since many of the technology components of "Web 2.0" have existed since the early days of the Web.
Characteristics of "Web 2.0"
While interested parties continue to debate the definition of a Web 2.0 application, a Web 2.0 website may exhibit some basic common characteristics. These might include:
• "Network as platform" — delivering (and allowing users to use) applications entirely through a browser.
• Users owning the data on a site and exercising control over that data.
• Architecture of participation that encourages users to add value to the application as they use it. This stands in sharp contrast to hierarchical access-control in applications, in which systems categorize users into roles with varying degrees of functionality.
• A rich, interactive, user-friendly interface based on Ajax or similar frameworks.
• Some social-networking aspects.
The concept of Web-as-participation-platform captures many of these characteristics. Bart Decrem, a founder and former CEO of Flock, calls Web 2.0 the "participatory Web and regards the Web-as-information-source as Web 1.0.
Relationship of Web 3.0 to the Hypertext Web
Markup
Many files on a typical computer can be loosely divided into documents and data. Documents, like mail messages, reports and brochures, are read by humans. Data, like calendars, addressbooks, playlists and spreadsheets, are presented using an application program which lets them be viewed, searched and combined in many ways.
Currently, the World Wide Web is based mainly on documents written in Hypertext Markup Language (HTML), a markup convention that is used for coding a body of text interspersed with multimedia objects such as images and interactive forms. The semantic web involves publishing the data in a language, Resource Description Framework (RDF), specifically for data, so that it can be manipulated and combined just as can data files on a local computer.
The HTML language describes documents and the links between them. RDF, by contrast, describes arbitrary things such as people, meetings, and airplane parts. For example, with HTML and a tool to render it (perhaps Web browser software, perhaps another user agent), one can create and present a page that lists items for sale. The HTML of this catalog page can make simple, document-level assertions such as "this document's title is 'Widget Superstore'". But there is no capability within the HTML itself to assert unambiguously that, for example, item number X586172 is an Acme Gizmo with a retail price of €199, or that it is a consumer product. Rather, HTML can only say that the span of text "X586172" is something that should be positioned near "Acme Gizmo" and "€ 199", etc. There is no way to say "this is a catalog" or even to establish that "Acme Gizmo" is a kind of title or that "€ 199" is a price. There is also no way to express that these pieces of information are bound together in describing a discrete item, distinct from other items perhaps listed on the page.
Descriptive and extensible
The semantic web addresses this shortcoming, using the descriptive technologies Resource Description Framework (RDF) and Web Ontology Language (OWL), and the data-centric, customizable Extensible Markup Language (XML). These technologies are combined in order to provide descriptions that supplement or replace the content of Web documents. Thus, content may manifest as descriptive data stored in Web-accessible databases, or as markup within documents (particularly, in Extensible HTML (XHTML) interspersed with XML, or, more often, purely in XML, with layout/rendering cues stored separately). The machine-readable descriptions enable content managers to add meaning to the content, i.e. to describe the structure of the knowledge we have about that content. In this way, a machine can process knowledge itself, instead of text, using processes similar to human deductive reasoning and inference, thereby obtaining more meaningful results and facilitating automated information gathering and research by computer.
TRANSFORMING THE WEB INTO A DATABASE
The first step towards a "Web 3.0" is the emergence of "The Data Web" as structured data records are published to the Web in reusable and remotely queryable formats, such as XML, RDF and microformats. The recent growth of SPARQL technology provides a standardized query language and API for searching across distributed RDF databases on the Web. The Data Web enables a new level of data integration and application interoperability, making data as openly accessible and linkable as Web pages. The Data Web is the first step on the path towards the full Semantic Web. In the Data Web phase, the focus is principally on making structured data available using RDF. The full Semantic Web stage will widen the scope such that both structured data and even what is traditionally thought of as unstructured or semi-structured content (such as Web pages, documents, etc.) will be widely available in RDF and OWL semantic formats.
AN EVOLUTIONARY PATH TO ARTIFICIAL INTELLIGENCE
Web 3.0 has also been used to describe an evolutionary path for the Web that leads to artificial intelligence that can reason about the Web in a quasi-human fashion. Some parts of this new web are based on results of Artificial Intelligence research, like knowledge representation (e.g., for ontologies), model theory (e.g., for the precise semantics of RDF and RDF Schemas), or various types of logic (e.g., for rules). Even though some regard this as an unobtainable vision, companies such as IBM and Google are implementing new technologies on data mining that are yielding surprising information on making predictions regarding the stock market. There is also debate over whether the driving force behind Web 3.0 will be intelligent systems, or whether intelligence will emerge in a more organic fashion, from systems of intelligent people, such as via collaborative filtering services like del.icio.us, Flickr and Digg that extract meaning and order from the existing Web and how people interact with it.
BASIC WEB 3.0 CONCEPTS
Knowledge domains
A knowledge domain is something like Physics, Chemistry, Biology, Politics, the Web, Sociology, Psychology, History, etc. There can be many sub-domains under each domain each having their own sub-domains and so on.
Information vs. Knowledge
To a machine, knowledge is comprehended information (aka new information produced through the application of deductive reasoning to exiting information). To a machine, information is only data, until it is processed and comprehended.
Ontologies
Ontologies are not knowledge nor are they information. They are meta-information. In other words, ontologies are information about information. In the context of Web 3.0, they encode, using an ontology language, the relationships between the various terms within the information. Those relationships, which may be thought of as the axioms (basic assumptions), together with the rules governing the inference process, both enable as well as constrain the interpretation (and well-formed use) of those terms by the Info Agents to reason new conclusions based on existing information, i.e. to think. In other words, theorems (formal deductive propositions that are provable based on the axioms and the rules of inference) may be generated by the software, thus allowing formal deductive reasoning at the machine level. And given that an ontology, as described here, is a statement of Logic Theory, two or more independent Info Agents processing the same domain-specific ontology will be able to collaborate and deduce an answer to a query, without being driven by the same software.
Inference Engines
In the context of Web 3.0, Inference engines will be combining the latest innovations from the artificial intelligence (AI) field together with domain-specific ontologies, domain inference rules, and query structures to enable deductive reasoning on the machine level.
Info Agents
Info Agents are instances of an Inference Engine, each working with a domain-specific ontology. Two or more agents working with a shared ontology may collaborate to deduce answers to questions. Such collaborating agents may be based on differently designed Inference Engines and they would still be able to collaborate.
Proofs and Answers
The interesting thing about Info Agents is that they will be capable of not only deducing answers from existing information but they will also be able to formally test propositions (represented in some query logic) that are made directly or implied by the user. This test-of-truth feature assumes the use of an ontology language (as a formal logic system) and an ontology where all propositions (or formal statements) that can be made can be computed (i.e. proved true or false) and were all such computations are decidable in finite time. The language may be OWL-DL or any language that, together with the ontology in question, satisfy the completeness and decidability conditions.
Once machines can understand and use information, using a standard ontology language, the world will never be the same. It will be possible to have an info agent (or many info agents) among the virtual AI-enhanced workforce each having access to different domain specific comprehension space and all communicating with each other to build a collective consciousness.
Questions can be posed to your info agent or agents to find you the nearest restaurant that serves Italian cuisine, which is not possible with the current search engines. But that is just a very simple example of the deductive reasoning machines will be able to perform on information they have.
Far more awesome implications can be seen when you consider that every area of human knowledge will be automatically within the comprehension space of your info agents. That is because each info agent can communicate with other info agents who are specialized in different domains of knowledge to produce a collective consciousness encompasses all human knowledge.
SOFTWARE AGENTS
The real power of Web 3.0 or the Semantic Web will be realized when people create many programs that collect Web content from diverse sources, process the information and exchange the results with other programs. The effectiveness of such software agents will increase exponentially as more machine-readable Web content and automated services (including other agents) become available. The Semantic Web promotes this synergy: even agents that were not expressly designed to work together can transfer data among themselves when the data come with semantics.
Another vital feature will be digital signatures, which are encrypted blocks of data that
Computers and agents can use to verify that the attached information has been provided by a specific trusted source. Agents should be skeptical of assertions that they read
on the Semantic Web until they have checked the sources of information.
Many automated Web-based services already exist without semantics, but other programs
such as agents have no way to locate one that will perform a specific function. This process, called service discovery, can happen only when there is a common language to describe a service in a way that lets other agents "understand" both the function offered and how to take advantage of it.
COMPONENTS
XML, XML Schema, RDF, OWL, SPARQL
The semantic web comprises the standards and tools of XML, XML Schema, RDF, RDF Schema and OWL. The OWL Web Ontology Language Overview describes the function and relationship of each of these components of the semantic web:
W3C Semantic Web Layer Cake
• XML : provides an elemental syntax for content structure within documents, yet associates no semantics with the meaning of the content contained within.
• XML Schema : is a language for providing and restricting the structure and content of elements contained within XML documents.
• RDF : is a simple language for expressing data models, which refer to objects ("resources") and their relationships. An RDF-based model can be represented in XML syntax.
• RDF Schema : is a vocabulary for describing properties and classes of RDF-based resources, with semantics for generalized-hierarchies of such properties and classes.
• OWL : adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes.
• SPARQL : is a protocol and query language for semantic web data sources
RESOURCE DESCRIPTION FRAMEWORK
Introduction
The Resource Description Framework (RDF) is an infrastructure that enables the encoding, exchange and reuse of structured metadata. RDF is an application of XML that imposes needed structural constraints to provide unambiguous methods of expressing semantics. RDF additionally provides a means for publishing both human-readable and machine-processable vocabularies designed to encourage the reuse and extension of metadata semantics among disparate information communities. The structural constraints RDF imposes to support the consistent encoding and exchange of standardized metadata provides for the interchangeability of separate packages of metadata defined by different resource description communities.
The World Wide Web affords unprecedented access to globally distributed information. Metadata, or structured data about data, improves discovery of and access to such information. The effective use of metadata among applications, however, requires common conventions about semantics, syntax, and structure. Individual resource description communities define the semantics, or meaning, of metadata that address their particular needs. Syntax, the systematic arrangement of data elements for machine-processing, facilitates the exchange and use of metadata among multiple applications. Structure can be thought of as a formal constraint on the syntax for the consistent representation of semantics.
The Resource Description Framework (RDF), developed under the auspices of the World Wide Web Consortium (W3C), is an infrastructure that enables the encoding, exchange, and reuse of structured metadata. This infrastructure enables metadata interoperability through the design of mechanisms that support common conventions of semantics, syntax, and structure. RDF uses XML (eXtensible Markup Language) as a common syntax for the exchange and processing of metadata. The XML syntax is a subset of the international text processing standard SGML (Standard Generalized Markup Language [SGML]) specifically intended for use on the Web. The XML syntax provides vendor independence, user extensibility, validation, human readability, and the ability to represent complex structures. By exploiting the features of XML, RDF imposes structure that provides for the unambiguous expression of semantics and, as such, enables consistent encoding, exchange, and machine-processing of standardized metadata. RDF supports the use of conventions that will facilitate modular interoperability among separate metadata element sets. These conventions include standard mechanisms for representing semantics that are grounded in a simple, yet powerful, data model discussed below. RDF additionally provides a means for publishing both human-readable and machine-processable vocabularies. Vocabularies are the set of properties, or metadata elements, defined by resource description communities. The ability to standardize the declaration of vocabularies is anticipated to encourage the reuse and extension of semantics among disparate information communities.
The RDF Data Model
RDF provides a model for describing resources. Resources have properties (attributes or characteristics). RDF defines a resource as any object that is uniquely identifiable by a Uniform Resource Identifier (URI). The properties associated with resources are identified by property-types, and property-types have corresponding values. Property-types express the relationships of values associated with resources. In RDF, values may be atomic in nature (text strings, numbers, etc.) or other resources, which in turn may have their own properties. A collection of these properties that refers to the same resource is called a description. At the core of RDF is a syntax-independent model for representing resources and their corresponding descriptions. The following graphic (Figure 1) illustrates a generic RDF description.
Figure 1
The application and use of the RDF data model can be illustrated by concrete examples. Consider the following statements:
1. "The author of Document 1 is John Smith"
2. "John Smith is the author of Document 1"
To humans, these statements convey the same meaning (that is, John Smith is the author of a particular document). To a machine, however, these are completely different strings. Whereas humans are extremely adept at extracting meaning from differing syntactic constructs, machines remain grossly inept. Using a triadic model of resources, property-types and corresponding values, RDF attempts to provide an unambiguous method of expressing semantics in a machine-readable encoding.
RDF provides a mechanism for associating properties with resources. So, before anything about Document 1 can be said, the data model requires the declaration of a resource representing Document 1. Thus, the data model corresponding to the statement "the author of Document 1 is John Smith" has a single resource Document 1, a property-type of author and a corresponding value of John Smith. To distinguish characteristics of the data model, the RDF Model and Syntax specification represents the relationships among resources, property-types, and values in a directed labeled graph. In this case, resources are identified as nodes, property-types are defined as directed label arcs, and string values are quoted. Given this representation, the data model corresponding to the statement is graphically expressed as (Figure 2):
Figure 2
If additional descriptive information regarding the author were desired, e.g., the author's email address and affiliation, an elaboration on the previous example would be required. In this case, descriptive information about John Smith is desired. As was discussed in the first example, before descriptive properties can be expressed about the person John Smith, there needs to be a unique identifiable resource representing him. Given the directed label graph notation in the previous example, the data model corresponding to this description is graphically represented as (Figure 3):
Figure 3
In this case, "John Smith" the string is replaced by a uniquely identified resource denoted by Author_001 with the associated property-types of name, email and affiliation. The use of unique identifiers for resources allows for the unambiguous association of properties. This is an important point, as the person John Smith may be the value of several different property-types. John Smith may be the author of Document 1, but also may be the value of a particular company describing the set of current employees. The unambiguous identification of resources provides for the reuse of explicit, descriptive information.
In the previous example the unique identifiable resource for the author was created, but not for the author's name, email or affiliation. The RDF model allows for the creation of resources at multiple levels. Concerning the representation of personal names, for example, the creation of a resource representing the author's name could have additionally been described using "firstname", "middlename" and "surname" property-types. Clearly, this iterative descriptive process could continue down many levels. What, however, are the practical and logical limits of these iterations?
There is no one right answer to this question. The answer is dependent on the domain requirements. These issues must be addressed and decided upon in the standard practice of individual resource description communities. In short, experience and knowledge of the domain dictate which distinctions should be captured and reflected in the data model.
The RDF data model additionally provides for the description of other descriptions. For instance, often it is important to assess the credibility of a particular description (e.g., "The Library of Congress told us that John Smith is the author of Document 1"). In this case the description tells us something about the statement "John Smith is the author of Document 1", specifically, that the Library of Congress asserts this to be true. Similar constructs are additionally useful for the description of collections of resources. For instance, "John Smith is the author of Documents 1, 2, and 3". While these statements are significantly more complex, the same data model is applicable.
The RDF Syntax
RDF defines a simple, yet powerful model for describing resources. A syntax representing this model is required to store instances of this model into machine-readable files and to communicate these instances among applications. XML is this syntax. RDF imposes formal structure on XML to support the consistent representation of semantics.
RDF provides the ability for resource description communities to define semantics. It is important, however, to disambiguate these semantics among communities. The property-type "author", for example, may have broader or narrower meaning depending on different community needs. As such, it is problematic if multiple communities use the same property-type to mean very different things. To prevent this, RDF uniquely identifies property-types by using the XML namespace mechanism. XML namespaces provide a method for unambiguously identifying the semantics and conventions governing the particular use of property-types by uniquely identifying the governing authority of the vocabulary. For example, the property-type "author" defined by the Dublin Core Initiative as the "person or organization responsible for the creation of the intellectual content of the resource" and is specified by the Dublin Core CREATOR element. An XML namespace is used to unambiguously identify the Schema for the Dublin Core vocabulary by pointing to the definitive Dublin Core resource that defines the corresponding semantics. Additional information on RDF Schemas is discussed latter. If the Dublin Core RDF Schema, however, is abbreviated as "DC", the data model representation for this example would be (Figure 4):
Figure 4
This more explicit declaration identifies a resource Document 1 with the semantics of property-type Creator unambiguously defined in the context of DC (the Dublin Core vocabulary). The value of this property-type is John Smith.
The corresponding syntactic way of expressing this statement using XML namespaces to identify the use of the Dublin Core Schema is:
In this case, both the RDF and Dublin Core schemas are declared and abbreviated as "RDF" and "DC" respectively. The RDF Schema is declared as a boot-strapping mechanism for the declaration of the necessary vocabulary needed for expressing the data model. The Dublin Core Schema is declared in order to utilize the vocabulary defined by this community. The URI associated with the namespace declaration references the corresponding schemas. The element
In the more advanced example, where additional descriptive information regarding the author is required, similar syntactic constructs are used. In this case, while it may still be desirable to use the Dublin Core CREATOR property-type to represent the person responsible for the creation of the intellectual content, additional property-types "name", "email" and "affiliation" are required. For this case, since the semantics for these elements are not defined in Dublin Core, an additional resource description standard may be utilized. It is feasible to assume the creation of an RDF schema with the semantics similar to the vCard specification designed to automate the exchange of personal information typically found on a traditional business card, could be introduced to describe the author of the document. The data model representation for this example with the corresponding business card schema defined as CARD would be (Figure 5):
in which the RDF, Dublin Core, and the "Business Card" schemas are declared and abbreviated as "RDF", "DC" and "CARD" respectively. In this case, the value associated with the property-type DC:Creator is now a resource. While the reference to the resource is an internal identifier, an external URI, for example, to a controlled authority of names, could have been used as well. Additionally, in this example, the semantics of the Dublin Core CREATOR element have been refined by the semantics defined by the schema referenced by CARD. The structural constraints RDF imposes to support the consistent encoding and exchange of standardized metadata provides for the interchangeability of separate packages of metadata defined by different resource description communities.
The RDF Schema
RDF Schemas are used to declare vocabularies, the sets of semantics property-types defined by a particular community. RDF schemas define the valid properties in a given RDF description, as well as any characteristics or restrictions of the property-type values themselves. The XML namespace mechanism serves to identify RDF Schemas.
A human and machine-processable description of an RDF schema may be accessed by de-referencing the schema URI. If the schema is machine-processable, it may be possible for an application to learn some of the semantics of the property-types named in the schema. To understand a particular RDF schema is to understand the semantics of each of the properties in that description. RDF schemas are structured based on the RDF data model. Therefore, an application that has no understanding of a particular schema will still be able to parse the description into the property-type and corresponding values and will be able to transport the description intact (e.g., to a cache or to another application).
The exact details of RDF schemas are currently being discussed in the W3C RDF Schema working group. It is anticipated, however, that the ability to formalize human-readable and machine-processable vocabularies will encourage the exchange, use, and extension of metadata vocabularies among disparate information communities. RDF schemas are being designed to provide this type of formalization.
XML
The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.
Introduction
Extensible Markup Language, abbreviated XML, describes a class of data objects called XML documents and partially describes the behavior of computer programs which process them. XML is an application profile or restricted form of SGML, the Standard Generalized Markup Language. By construction, XML documents are conforming SGML documents.
XML documents are made up of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form markup. Markup encodes a description of the document's storage layout and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure.
[Definition: A software module called an XML processor is used to read XML documents and provide access to their content and structure.] [Definition: It is assumed that an XML processor is doing its work on behalf of another module, called the application.] This specification describes the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application.
Each XML document has both a logical and a physical structure. Physically, the document is composed of units called entities. An entity may refer to other entities to cause their inclusion in the document. A document begins in a "root" or document entity. Logically, the document is composed of declarations, elements, comments, character references, and processing instructions, all of which are indicated in the document by explicit markup.
XML SCHEMA
XML Schema, published as a W3C Recommendation in May 2001, is one of several XML schema languages. It was the first separate schema language for XML to achieve Recommendation status by the W3C.
Like all XML schema languages, XML Schema can be used to express a schema: a set of rules to which an XML document must conform in order to be considered 'valid' according to that schema. However, unlike most other schema languages, XML Schema was also designed with the intent of validation resulting in a collection of information adhering to specific datatypes, which can be useful in the development of XML document processing software, but which has also provoked criticism.
An XML Schema instance is an XML Schema Definition (XSD) and typically has the filename extension ".xsd". The language itself is sometimes informally referenced as XSD. It has been suggested that WXS (for W3C XML Schema) is a more appropriate initialism though this acronym has not been in a widespread use and W3C working group rejected it. XSD is also an initialism for XML Schema Datatypes, the datatype portion of XML Schema.
OWL
Introduction
The Web Ontology Language (OWL) is a language for defining and instantiating Web ontologies. An OWL ontology may include descriptions of classes, along with their related properties and instances. OWL is designed for use by applications that need to process the content of information instead of just presenting information to humans. It facilitates greater machine interpretability of Web content than that supported by XML, RDF, and RDF Schema (RDF-S) by providing additional vocabulary along with a formal semantics. OWL is based on earlier languages OIL and DAML+OIL, and is now a W3C recommendation.
OWL is seen as a major technology for the future implementation of a Semantic Web. It is playing an important role in an increasing number and range of applications, and is the focus of research into tools, reasoning techniques, formal foundations and language extensions.
OWL was designed to provide a common way to process the semantic content of web information. It was developed to augment the facilities for expressing semantics (meaning) provided by XML, RDF, and RDF-S. Consequently, it may be considered an evolution of these web languages in terms of its ability to represent machine-interpretable semantic content on the web. Since OWL is based on XML, OWL information can be easily exchanged between different types of computers using different operating systems, and application languages. Because the language is intended to be read by computer applications, it is sometimes not considered to be human-readable, although this may be a tool issue. OWL is being used to create standards that provide a framework for asset management, enterprise integration, and data sharing on the Web.
An extended version of OWL, (sometimes called OWL 1.1, but with no official status) has been proposed which includes increased expressiveness, a simpler data model and serialization, and a collection of well-defined sub-languages each with known computational properties.
OWL currently has three sublanguages (sometimes also referred to as 'species'): OWL Lite, OWL DL, and OWL Full. These three increasingly expressive sublanguages are designed for use by specific communities of implementers and users.
• OWL Lite supports those users primarily needing a classification hierarchy and simple constraints. For example, while it supports cardinality constraints, it only permits cardinality values of 0 or 1. It should be simpler to provide tool support for OWL Lite than its more expressive relatives, and OWL Lite provides a quick migration path for thesauri and other taxonomies. OWL Lite also has a lower formal complexity than OWL DL; see the section on OWL Lite in the OWL Reference for further details.
• OWL DL supports those users who want the maximum expressiveness while retaining computational completeness (all conclusions are guaranteed to be computed) and decidability (all computations will finish in finite time). OWL DL includes all OWL language constructs, but they can be used only under certain restrictions (for example, while a class may be a subclass of many classes, a class cannot be an instance of another class). OWL DL is so named due to its correspondence with description logic, a field of research that has studied the logics that form the formal foundation of OWL.
• OWL Full is meant for users who want maximum expressiveness and the syntactic freedom of RDF with no computational guarantees. For example, in OWL Full a class can be treated simultaneously as a collection of individuals and as an individual in its own right. OWL Full allows an ontology to augment the meaning of the pre-defined (RDF or OWL) vocabulary. It is unlikely that any reasoning software will be able to support complete reasoning for every feature of OWL Full.
Ontology developers adopting OWL should consider which sublanguage best suits their needs. The choice between OWL Lite and OWL DL depends on the extent to which users require the more-expressive constructs provided by OWL DL. The choice between OWL DL and OWL Full mainly depends on the extent to which users require the meta-modeling facilities of RDF Schema (e.g. defining classes of classes, or attaching properties to classes). When using OWL Full as compared to OWL DL, reasoning support is less predictable since complete OWL Full implementations do not currently exist.
OWL Full can be viewed as an extension of RDF, while OWL Lite and OWL DL can be viewed as extensions of a restricted view of RDF. Every OWL (Lite, DL, Full) document is an RDF document, and every RDF document is an OWL Full document, but only some RDF documents will be a legal OWL Lite or OWL DL document. Because of this, some care has to be taken when a user wants to migrate an RDF document to OWL. When the expressiveness of OWL DL or OWL Lite is deemed appropriate, some precautions have to be taken to ensure that the original RDF document complies with the additional constraints imposed by OWL DL and OWL Lite. Among others, every URI that is used as a class name must be explicitly asserted to be of type owl:Class (and similarly for properties), every individual must be asserted to belong to at least one class (even if only owl:Thing), the URI's used for classes, properties and individuals must be mutually disjoint
1.2 Why OWL?
The Semantic Web is a vision for the future of the Web in which information is given explicit meaning, making it easier for machines to automatically process and integrate information available on the Web. The Semantic Web will build on XML's ability to define customized tagging schemes and RDF's flexible approach to representing data. The first level above RDF required for the Semantic Web is an ontology language what can formally describe the meaning of terminology used in Web documents. If machines are expected to perform useful reasoning tasks on these documents, the language must go beyond the basic semantics of RDF Schema. The OWL Use Cases and Requirements Document provides more details on ontologies, motivates the need for a Web Ontology Language in terms of six use cases, and formulates design goals, requirements and objectives for OWL.
OWL has been designed to meet this need for a Web Ontology Language. OWL is part of the growing stack of W3C recommendations related to the Semantic Web.
SPARQL
SPARQL (pronounced "sparkle") is an RDF query language; its name is a recursive acronym that stands for SPARQL Protocol and RDF Query Language. It is undergoing standardization by the RDF the World Wide Web Consortium. SPARQL essentially consists of a standard query language, a data access protocol and a data model (which is basically RDF). There's a big difference between blindly searching the entire Web and querying actual data models which makes it of an advantage.
Most uses of the SPARQL acronym refer to the RDF query language. In this usage, SPARQL is a syntactically-SQL-like language for querying RDF databases. It can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware.
SPARQL protocol is a means of conveying SPARQL queries from query clients to query processors. It is described abstractly with WSDL 2.0 (Web service description language) which contains one interface, Sparql Query, which in turn contains one operation, query. Sparql Query is the protocol's only interface. It contains one operation, query, which is used to convey a SPARQL query string and, optionally, an RDF dataset description.
No comments:
Post a Comment