1. Introduction
Server is an adjective in the term server operating system. A server operating system is intended, enabled, or better able to run server applications. A server computer (often called server for short) is a computer system that has been designated for running a specific server application or applications. A computer that is designated for only one server application is often named for that application. For example, when Apache HTTP Server (software) is a company's web server, the computer running it is also called the web server. Server applications can be divided among server computers over an extreme range, depending upon the workload. Under light loading, every server application can run concurrently on a single computer.
Server hardware
Server operating systems
Servers on the Internet
1.1 Server hardware
CPU speeds are far less critical for many servers than they are for many desktops. Not only are typical server tasks likely to be delayed more by I/O requests than processor requirements, but the lack of any graphical user interface (GUI) in many servers frees up very large amounts of processing power for other tasks, making the overall processor power requirement lower. If a great deal of processing power is required in a server, there is a tendency to add more CPUs rather than increase the speed of a single CPU, again for reasons of reliability and redundancy. The major difference between servers and desktop computers is not in the hardware but in the software. Servers often run operating systems that are designed specifically for use in servers
1.2 Server operating systems
The Microsoft Windows operating system is predominant among desktop computers, but in the world of servers, the most popular operating systems—such as FreeBSD, Solaris, and GNU/Linux—are derived from or similar to the UNIX operating system. UNIX was originally a minicomputer operating system, and as servers gradually replaced traditional minicomputers, UNIX was a logical and efficient choice of operating system for the servers
1.3 Servers on the Internet
Almost the entire structure of the Internet is based upon a client-server model. Many millions of servers are connected to the Internet and run continuously throughout the world.Among the many services provided by Internet servers are: the Web; the Domain Name System; electronic mail; file transfer; instant messaging; streaming audio and video, online gaming, and countless others. Virtually every action taken by an ordinary Internet user requires one or more interactions with one or more servers.
Connects to:
• Internet via one of
Ethernet
Modem
Common Manufacturers:
International Business Machines
Sun Microsystems
HP
Apple Computer
Motorola
Dell
Supermicro
2. History
Complete microcomputers were placed on cards and packaged in standard 19-inch racks in the 1970s soon after the introduction of 8-bit microprocessors. This architecture was used in the industrial process control industry as an alternative to minicomputer control systems. Programs were stored in EPROM on early models and were limited to a single function with a small realtime executive.
The VMEBus architecture (ca. 1981) defined a computer interface which included implementation of a board-level computer that was installed in a chassis backplane with multiple slots for pluggable boards that provide I/O, memory, or additional computing. The PCI Industrial Computer Manufacturers Group PICMG developed a chassis/blade structure for the then emerging Peripheral Component Interconnect bus PCI which is called CompactPCI. Common among these chassis based computers was the fact that the entire chassis was a single system. While a chassis might include multiple computing elements to provide the desired level of performance and redundancy, there was always one board in charge, one master board coordinating the operation of the entire system. PICMG expanded the CompactPCI specification with the use of standard Ethernet connectivity between boards across the backplane. The PICMG 2.16 CompactPCI Packet Switching Backplane specification was adopted in Sept 2001 (PICMG specifications). This provided the first open architecture for a multi-server chassis. PICMG followed with the larger and more feature rich AdvancedTCA specification targeting the telecom industry's need for a high availability and dense computing platform with extended product life (10+ years). While AdvancedTCA system and board pricing is typically higher than blade servers, AdvancedTCA suppliers claim that low operating expenses and total cost of ownership can make AdvancedTCA-based solutions a cost effective alternative for many building blocks of the next generation telecom network.
The name blade server appeared when a card included the processor, memory, I/O and non-volatile program storage (flash memory or small hard disk(s)). This allowed a complete server, with its operating system and applications, to be packaged on a single card / board / blade. These blades could then operate independently within a common chassis, doing the work of multiple separate server boxes more efficiently. Less space consumption is the most obvious benefit of this packaging, but additional efficiency benefits have become clear in power, cooling, management, and networking due to the pooling or sharing of common infrastructure to supports the entire chassis, rather than providing each of these on a per server box basis.
[One opinion is that…] The architecture of blade servers is expected to move closer to mainframe architectures. Although current systems act as a cluster of independent computers, future systems may add resource virtualization and higher levels of integration with the operating system to increase reliability.
[An alternate opinion is that…] The architecture of blade servers will remain a set of separate servers within a common chassis. But, the chassis will be offered in a wider variety of sizes, providing different levels of aggregation and different cost points. On top of this independent server blade structure, management capabilities will be enhanced that allow operation of a collection of servers as a single compute resource pool. The administrators and users will not need to care which server an application is on, since the management software will automatically take care of allocating server (and I/O) resources to the application as needed. Over time, the individual servers will not need to be uniquely managed, but the overall roubustness of the data center will still benefit from the separation of individual servers, whose interconnect is provided by standard networking technology. This provides isolation at a very low level so that a fault in one piece of hardware impacts only that work executing on that specific hardware at that time. The broader management software can then recover the impacted work by initiating it on another server blade elsewhere in the data center.
3. Overview
It is now estimated that after just three years on the market, blade servers account for seven percent of all server shipments sold; with estimates showing that by 2008 they will make up thirty percent of all servers sold. It is expected that they will be the fastest growing server form factor through 2009. There are many factors to consider when deciding if blade servers are right for your organization. These include but are not limited to costs, work loads and how predictable the work load is, performance and power consumption, uptime and repair, proprietary designs and disk storage considerations. With 3rd generation blade servers on the market the jury is still out on whether they will truly fulfill their promise of resiliency, repair efficiency, cost efficiency and dynamic load handling. Blades must continue to evolve toward this vision while still offering improvements in modularity, back-plane performance and the aggregation of blades into a single virtual-server image.
Currently five manufacturers make up more than 75% of all the blade servers sold in the market. These manufacturers are HP, IBM, Sun, Fujitsu/ Siemens and Dell. With the list of target applications growing for blade servers and the number of manufacturers increasing the future for this technology certainly looks bright.
4. Why Blade Servers?
The provisioning capability of our existing rack-mounted server environment is both lengthy and inflexible. Our search for a better approach led us to blade servers, an emerging technology that promised reduced costs in areas such as capital expenditures, operational expenses, and physical plant requirements while improving efficiency. Blade servers allow for up-front provisioning of the chassis and switch components when adding compute blades as needed. This provides a more dynamic provisioning model, which results in just-intime provisioning as well as the ability to work around data center freeze periods
5. What are Blade Servers?
Blade servers are a new trend in data center technology. Hardware manufacturers combine all server system hardware—one or more microprocessors, memory, disk drives, and network controllers—onto a single electronic circuit board, or blade. The blades are designed to plug into a system chassis and share common components such as power supplies, fans, CD-ROM and floppy drives, Ethernet switches, and system ports. Blade servers are sometimes referred to as headless devices because they have no monitor or keyboard of their own.
Individual blade servers come in various heights, including 5.25 inches (3U), 1.75 inches (1U), and smaller. (A U is a standard measure of vertical height in an equipment cabinet and is equal to1.75 inches.) The thin, modular design of these blades allows more computing power to exist in a smaller space than typical 1U rack servers.
Blades servers can help cut network administration costs and improve server manageability by consolidating many widely distributed servers onto one rack. This decreases the amount of space needed in a data center and eliminates the need to travel between server locations. The shared networking and power infrastructure reduces the number of cables coming off of a rack and lowers the chance of an administrator accidentally unplugging the wrong device. The reduced heat output and power consumption can also help trim energy costs. Another advantage of blade deployment is the ease and speed with which you can add more servers. Rather than spending hours installing a rack-mounting rail kit, administrators can plug in a new blade server in seconds. Replacing a damaged server is just as quick. Each blade is completely independent of the others; inserting or removing a blade has no effect on any other operational blade in the same chassis. Blade servers are ideal for organizations with large data centers and for those that use server farms to host Web sites. Blade servers are typically used in a server cluster that is dedicated to a single application (such as file sharing, Web page serving and caching, and streaming audio and video content). In a clustered environment, blade servers can be set up to perform load balancing and provide failover capabilities.
6. Architecture
A general blade server architecture is shown in Figure 1. The hardware components of a blade server are the switch blade, chassis (with fans, temperature sensors, etc), and multiple compute blades. The outside world connects through the rear of the chassis to a switch card in the blade server. The switch card is provisioned to distribute packets to blades within the blade server. All these components are wrapped together with network management system software provided by the blade server vendor. The specifics on the blade server architecture vary from vendor to vendor
7. Advantages of blade servers
Blade servers make it possible to accommodate a large number of servers and switches in one chassis at high density. The hardware unit containing a chassis is called a server blade, and the server blades are connected to switches called switch blades through a middle plane. Compared to traditional servers, blade servers have the following advantages
:
1. More servers can be placed in each rack.
2. They can accommodate computing devices and network devices in one chassis.
3. Switches and servers do not need to be connected by cables.
4. Server blades and switch blades are hot-pluggable.
5. They can accommodate equipment called management blades that allow administrators to collectively manage the entire hardware, for example, server blades, switch blades, power units, and fans, in a chassis.
7.1 Management Functions:
Management blades provide the following management functions:
• Configuration management
Manages the configuration and state of the blades in a chassis and detects changes that occur in the blades.
• Fault management
Detects faults occurring in server blades,switch blades, power units, and fans. This function also issues alarms to management tools using Simple Network Management Protocol (SNMP) traps.
• Power management
Performs on/off switching and rebooting of a chassis, server blades, and switch blades.
Figure 1 shows an example hardware configuration of a blade server
8. Issues
Business systems that use blade servers have a large number of server blades, and when a large number are in use, it is important to ensure they do not degrade system reliability. Although many factors can degrade a system’s reliability, the most important ones to consider are:
1) Hardware faults
2) Software faults
3) Insufficient resources to deal with unexpected load increases
4) Human error
Regarding factors 1) and 2), it is expensive to improve the quality of hardware and software. A blade server helps to reduce the service downtime, because new servers can be added and failed servers can be replaced without disconnecting and reconnecting the complicated network cables. Regarding factor 3), it is difficult to accurately estimate system size in an open system.
Moreover, in a Web system, it is difficult to accurately predict the amount of transactions. Therefore, it is necessary to build a system that includes an allowance for peak workloads. As a result, there are many resources that are usually not used and systems inevitably go down when the number of transactions exceeds the predicted level. On the other hand, the number of active servers in a blade computing system can easily be increased, so only the minimum number of servers needs to be used in the initial installation and servers can easily be added when the system load increases. Regarding factor 4), it is impossible to eliminate human error completely, but the risk of it occurring can be reduced by automating system operation as much as possible. However, in a traditional system, servers and networks are separated, so the system configuration strongly depends on human operation. With a blade computing system, networks and servers are consolidated in a single chassis, so it is easier to automate system operation and therefore minimize human error than in a traditional system.
9. Systems using blade servers
A high-availability and cost-effective system using blade servers is constructed as follows:
1. Bare-metal server blades are prepared for a server pool that can be shared by several business systems.
2. If a fault occurs, the environment and settings are restored to another server blade in the server pool.
3. When a server resource shortage occurs in the load sharing server group because of a load increase, a server from the server pool is added to the load sharing server group. Then, the environment settings are restored to extend the load sharing server group.
4. When the fault is repaired or the system load returns to normal, the server that was replaced or added is returned to the server pool.
Figure 2 shows an image of a high-availability system using blade servers. Provisioning provides a quick and flexible service by preparing IT resources in advance.
It brings the following benefits to customers:
It enables customers to visualize the impact of physical faults on services and thereby reduce the time needed to recover from faults.
It quickly reconstructs a system when a fault occurs and therefore reduces system down-time.
It quickly adds servers when the system load increases and therefore reduces the amount of time the service level is degraded and makes it easier to use server resources more effectively.
By automating intervention tasks such as installation and maintenance, it reduces the risk of human error and makes operation more cost-effective.
Management components
The management components are implemented by the blade server hardware and the existing management software. Table 1 shows the management functions required for provisioning on blade servers.
10. Comparison
10.1 Blade Servers
Power consumption reduced from 260 amps to 60
Heat generation reduced by 15%
Total environment cost down 10%
Proprietary
Cheaper than rack after purchasing 8 or 10 blades
10.2 Standard Rack Servers
Power consumption reduced from 260 amps to 220
Same heat generation
Total environment cost remains the same
11. Major components of the xSeries 336 Type 8837 server
Light Path LEDs and buttons:
The Light Path LEDs and buttons are on top of the operator information panel. The following illustration shows the LEDs on the Light Path Diagnostics panel, followed by a description of the buttons and each LED.
12. Future Developments
At present, an administrator must analyze the root cause and make decisions to take action before provisioning. However, if human intervention
remains necessary, it is difficult to respond quickly to business demands, which change at dizzying speed. Moreover, it is difficult to take action in advance before a problem occurs.In the future, systems themselves will perform analyses,make decisions autonomously based on business requirements, and reconstruct systems organically. The key technologies for achieving these functions are considered to be system composition and autonomic control.
12.1 System composition
Designs logical system layouts and generates suitable physical arrangements using resource management. Its goal is to design a system based on business requirements (e.g., reduced system cost) and a service level agreement (SLA) to provide,for example, a guaranteed maximum response time.
12.2 Autonomic control
Automates provisioning tasks when a problem occurs. It predicts the future service level from the viewpoint of business requirements (e.g.,load increases in a specific period) and logs data of the past (e.g., the amount of resources used). If it predicts an inability to satisfy the SLA, it automatically performs dynamic system reconstruction. If the technologies and resource management described in this paper work together, autonomic provisioning can be realized. Figure 7 shows the relationships among these components.
Conclusion
Blade servers can greatly improve the reliability of business systems. Provisioning realizes swift action and efficient resource utilization when faults and load increases occur. In order to realize provisioning, resource management for configuration management, pool management, and automation is indispensable.In the future, provisioning solutions will have extended support ranges and these new solutions will form the basis of autonomic provisioning.
BIBLIOGRAPHY
www.ibm.com/systems/bladecenter
www.intel.com/products/server/blades/index.htm
www.spectrum.ieee.org/apr06/1106/4
www.sun.com/servers/blades/8000/
www.hp.com/products1/servers/carrier_grade/products/atca_bladesystem/index.html
Saturday, October 27, 2007
WEB 3.0
WEB 3.0
Abstract
The World Wide Web has evolved from the original hypertext system envisioned by physicists to a planet wide medium that has already transformed most of our lives. Web 3.0 refers to a combination of advances that will change the internet radically. It is the advent of a brave new paradigm where we will interact and solve problems together through a network of A.I. assistants. Also known as the semantic web, a term coined by Tim Berners-Lee, the man who invented the (first) World Wide Web. In essence, this new paradigm is a place where machines can read Web pages much as we humans read them, a place where search engines and software agents can better troll the Net and find what we are looking for. The vision is to extend principles of the Web from documents to data. This extension will allow to fulfill more of the Web’s potential, in that it will allow data to be shared effectively by wider communities, and to be processed automatically by tools as well as manually. A very basic definition for web 3.0 is that it is an expert system where there is a software agent for any task assigned by the user which takes in an input runs it through a knowledge database and then generates an output through inference.
Web 3.0 is not a separate Web but an extension of the current one, in which
Information is given well-defined meaning, better enabling computers and people to work in Co-operation. For this web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning. Artificial-intelligence researchers have studied such systems since long before the Web was developed. Knowledge representation, as this technology is often called, is currently in a state comparable to that of hypertext before the advent of the Web: it is clearly a good idea, and some very nice demonstrations exist, but it has not yet changed the world. It contains the seeds of important applications, but to realize its full potential it must be linked into a single global system.
HOW THE INTERNERT WORKS
Introduction
Web 3.0 is nothing but adding a layer of intelligence to the existing web which could be better understood by looking at the underlying implementation of the present day web. The Internet workings include a technical design and a management structure. The management structure consists of a generally democratic collection of loosely-coupled organizations and working groups with mostly non-overlapping responsibilities. The technical design is founded on a complex, interlocking set of hierarchical tree-like structures like Internet Protocol addresses and domain names, mixed with networked structures like packet switching and routing protocols, all tied together with millions of lines of sophisticated software that continues to get better all the time.
The Internet's architecture is described in its name, a short from of the compound word "inter-networking". This architecture is based in the very specification of the standard TCP/IP protocol, designed to connect any two networks which may be very different in internal hardware, software, and technical design. Once two networks are interconnected, communication with TCP/IP is enabled end-to-end, so that any node on the Internet has the near magical ability to communicate with any other no matter where they are. This openness of design has enabled the Internet architecture to grow to a global scale.
HOW YOUR URL GETS ROUTED
The Domain Name System (DNS) as a whole consists of a network of servers that map Internet domain names like www.yahoo.com to a local IP addresses. The DNS enables domain names to stay constant while the underlying network topology and IP addresses change. This provides stability at the application level while enabling network applications to find and communicate with each other using the Internet protocol no matter how the underlying physical network changes.
Internet domain names come in four main types -- top-level domains, second-level domains, third-level domains, and country domains. Internet domain names are the alphanumeric identifiers we use to refer to hosts on the Internet, like "www.getme_donnel.com".
Internet domain names are organized by their levels, with the higher levels on the right. For example, for the domain "www.getme_donnel.com" the top-level domain is "com", the second-level domain is "getme_donnel", and the third-level domain is "www.getme_donnel.com".
Top-level Internet domains like ".com" are shared by all the organizations in the domain. Second-level domain names like "yahoo.com" and "livinginternet.com" are registered by individuals and organizations. Second-level domains are the addresses commonly used to host Internet applications like web hosting and email addressing.
Third-level Internet domain names are created by those that own second-level domains. Third-level domains can be used to set up individual domains for specific purposes, such as a domain for web access and one for mail, or a separate site for a special purpose:
• www.livinginternet.com
• mail.livinginternet.com
• rareorchids.livinginternet.com
Each country in the world has its own top-level Internet domain with a unique alphabetic designation. For e.g. www.google.co.in
The Domain Name System (DNS) servers distribute the job of mapping domain names to IP addresses among servers allocated to each domain.
Each second-level domain must have at least one domain name server responsible for maintenance of information about that domain and all subsidiary domains, and response to queries about those domains from other computers on the Internet. For example, management of domain name information and queries for the LivingInternet.com domain is handled by a specific DNS server that takes care of the load required. This distributed architecture was designed to enable the Internet to grow, where as the number of domains grew, the number of DNS servers can grow to keep pace with the load.
Today, everyone who registers a second-level domain name must at the same time designate two DNS servers to manage queries and return the current IP address for addresses in that domain. The primary domain name server is always consulted first, and the secondary domain name server is queried if the primary doesn't answer, providing a backup and important support to overall Internet reliability.
The application that underlies almost all DNS server software on the Internet is an open source program called BIND, currently maintained by the Internet Systems Consortium. When your computer was added to the Internet, one of the initial setup tasks was to specify a default domain name server, usually maintained by your local Internet Service Provider, and almost certainly a variant of the BIND server software.
When your computer tries to access a domain like "www.livinginternet.com", the domain name system works like this:
• Your computer asks your default DNS server if it knows the IP address for www.livinginternet.com. If the DNS server has been asked that question recently, then it will have the answer stored in its local cache, and can answer immediately.
• Otherwise, your DNS server queries the central zone files for the address of the primary domain name server for livinginternet.com, and is answered with something like "ns1.livinginternet.com".
• Your DNS server will ask the livinginternet.com DNS server for the IP address of www.livinginternet.com, which will then look up the answer and send it back.
• Your DNS server will store the IP address returned in its local cache, and make the address available to your computer.
• Your computer then contacts www.livinginternet.com with the standard Internet routing protocols by using the returned IP address.
The IP address assigned to a computer may change frequently because of physical moves or network reconfigurations. The major advantage of the network of DNS servers is that domain names stay the same even when IP addresses change, and so the domain name servers can transparently take care of the mapping.
WEB 2.0
The phrase Web 2.0 refers to a perceived second-generation of web-based communities and hosted services — such as social-networking sites, wikis and folksonomies — which aim to facilitate collaboration and sharing between users. It became popular following the first O'Reilly Media Web 2.0 conference in 2004, and has since become widely adopted.
Although the term suggests a new version of the World Wide Web, it does not refer to an update to Web technical specifications, but to changes in the ways software developers and end-users use the web as a platform. According to Tim O'Reilly, "Web 2.0 is the business revolution in the computer industry caused by the move to the internet as platform, and an attempt to understand the rules for success on that new platform.
Some technology experts, notably Tim Berners-Lee, have questioned whether one can use the term in a meaningful way, since many of the technology components of "Web 2.0" have existed since the early days of the Web.
Characteristics of "Web 2.0"
While interested parties continue to debate the definition of a Web 2.0 application, a Web 2.0 website may exhibit some basic common characteristics. These might include:
• "Network as platform" — delivering (and allowing users to use) applications entirely through a browser.
• Users owning the data on a site and exercising control over that data.
• Architecture of participation that encourages users to add value to the application as they use it. This stands in sharp contrast to hierarchical access-control in applications, in which systems categorize users into roles with varying degrees of functionality.
• A rich, interactive, user-friendly interface based on Ajax or similar frameworks.
• Some social-networking aspects.
The concept of Web-as-participation-platform captures many of these characteristics. Bart Decrem, a founder and former CEO of Flock, calls Web 2.0 the "participatory Web and regards the Web-as-information-source as Web 1.0.
Relationship of Web 3.0 to the Hypertext Web
Markup
Many files on a typical computer can be loosely divided into documents and data. Documents, like mail messages, reports and brochures, are read by humans. Data, like calendars, addressbooks, playlists and spreadsheets, are presented using an application program which lets them be viewed, searched and combined in many ways.
Currently, the World Wide Web is based mainly on documents written in Hypertext Markup Language (HTML), a markup convention that is used for coding a body of text interspersed with multimedia objects such as images and interactive forms. The semantic web involves publishing the data in a language, Resource Description Framework (RDF), specifically for data, so that it can be manipulated and combined just as can data files on a local computer.
The HTML language describes documents and the links between them. RDF, by contrast, describes arbitrary things such as people, meetings, and airplane parts. For example, with HTML and a tool to render it (perhaps Web browser software, perhaps another user agent), one can create and present a page that lists items for sale. The HTML of this catalog page can make simple, document-level assertions such as "this document's title is 'Widget Superstore'". But there is no capability within the HTML itself to assert unambiguously that, for example, item number X586172 is an Acme Gizmo with a retail price of €199, or that it is a consumer product. Rather, HTML can only say that the span of text "X586172" is something that should be positioned near "Acme Gizmo" and "€ 199", etc. There is no way to say "this is a catalog" or even to establish that "Acme Gizmo" is a kind of title or that "€ 199" is a price. There is also no way to express that these pieces of information are bound together in describing a discrete item, distinct from other items perhaps listed on the page.
Descriptive and extensible
The semantic web addresses this shortcoming, using the descriptive technologies Resource Description Framework (RDF) and Web Ontology Language (OWL), and the data-centric, customizable Extensible Markup Language (XML). These technologies are combined in order to provide descriptions that supplement or replace the content of Web documents. Thus, content may manifest as descriptive data stored in Web-accessible databases, or as markup within documents (particularly, in Extensible HTML (XHTML) interspersed with XML, or, more often, purely in XML, with layout/rendering cues stored separately). The machine-readable descriptions enable content managers to add meaning to the content, i.e. to describe the structure of the knowledge we have about that content. In this way, a machine can process knowledge itself, instead of text, using processes similar to human deductive reasoning and inference, thereby obtaining more meaningful results and facilitating automated information gathering and research by computer.
TRANSFORMING THE WEB INTO A DATABASE
The first step towards a "Web 3.0" is the emergence of "The Data Web" as structured data records are published to the Web in reusable and remotely queryable formats, such as XML, RDF and microformats. The recent growth of SPARQL technology provides a standardized query language and API for searching across distributed RDF databases on the Web. The Data Web enables a new level of data integration and application interoperability, making data as openly accessible and linkable as Web pages. The Data Web is the first step on the path towards the full Semantic Web. In the Data Web phase, the focus is principally on making structured data available using RDF. The full Semantic Web stage will widen the scope such that both structured data and even what is traditionally thought of as unstructured or semi-structured content (such as Web pages, documents, etc.) will be widely available in RDF and OWL semantic formats.
AN EVOLUTIONARY PATH TO ARTIFICIAL INTELLIGENCE
Web 3.0 has also been used to describe an evolutionary path for the Web that leads to artificial intelligence that can reason about the Web in a quasi-human fashion. Some parts of this new web are based on results of Artificial Intelligence research, like knowledge representation (e.g., for ontologies), model theory (e.g., for the precise semantics of RDF and RDF Schemas), or various types of logic (e.g., for rules). Even though some regard this as an unobtainable vision, companies such as IBM and Google are implementing new technologies on data mining that are yielding surprising information on making predictions regarding the stock market. There is also debate over whether the driving force behind Web 3.0 will be intelligent systems, or whether intelligence will emerge in a more organic fashion, from systems of intelligent people, such as via collaborative filtering services like del.icio.us, Flickr and Digg that extract meaning and order from the existing Web and how people interact with it.
BASIC WEB 3.0 CONCEPTS
Knowledge domains
A knowledge domain is something like Physics, Chemistry, Biology, Politics, the Web, Sociology, Psychology, History, etc. There can be many sub-domains under each domain each having their own sub-domains and so on.
Information vs. Knowledge
To a machine, knowledge is comprehended information (aka new information produced through the application of deductive reasoning to exiting information). To a machine, information is only data, until it is processed and comprehended.
Ontologies
Ontologies are not knowledge nor are they information. They are meta-information. In other words, ontologies are information about information. In the context of Web 3.0, they encode, using an ontology language, the relationships between the various terms within the information. Those relationships, which may be thought of as the axioms (basic assumptions), together with the rules governing the inference process, both enable as well as constrain the interpretation (and well-formed use) of those terms by the Info Agents to reason new conclusions based on existing information, i.e. to think. In other words, theorems (formal deductive propositions that are provable based on the axioms and the rules of inference) may be generated by the software, thus allowing formal deductive reasoning at the machine level. And given that an ontology, as described here, is a statement of Logic Theory, two or more independent Info Agents processing the same domain-specific ontology will be able to collaborate and deduce an answer to a query, without being driven by the same software.
Inference Engines
In the context of Web 3.0, Inference engines will be combining the latest innovations from the artificial intelligence (AI) field together with domain-specific ontologies, domain inference rules, and query structures to enable deductive reasoning on the machine level.
Info Agents
Info Agents are instances of an Inference Engine, each working with a domain-specific ontology. Two or more agents working with a shared ontology may collaborate to deduce answers to questions. Such collaborating agents may be based on differently designed Inference Engines and they would still be able to collaborate.
Proofs and Answers
The interesting thing about Info Agents is that they will be capable of not only deducing answers from existing information but they will also be able to formally test propositions (represented in some query logic) that are made directly or implied by the user. This test-of-truth feature assumes the use of an ontology language (as a formal logic system) and an ontology where all propositions (or formal statements) that can be made can be computed (i.e. proved true or false) and were all such computations are decidable in finite time. The language may be OWL-DL or any language that, together with the ontology in question, satisfy the completeness and decidability conditions.
Once machines can understand and use information, using a standard ontology language, the world will never be the same. It will be possible to have an info agent (or many info agents) among the virtual AI-enhanced workforce each having access to different domain specific comprehension space and all communicating with each other to build a collective consciousness.
Questions can be posed to your info agent or agents to find you the nearest restaurant that serves Italian cuisine, which is not possible with the current search engines. But that is just a very simple example of the deductive reasoning machines will be able to perform on information they have.
Far more awesome implications can be seen when you consider that every area of human knowledge will be automatically within the comprehension space of your info agents. That is because each info agent can communicate with other info agents who are specialized in different domains of knowledge to produce a collective consciousness encompasses all human knowledge.
SOFTWARE AGENTS
The real power of Web 3.0 or the Semantic Web will be realized when people create many programs that collect Web content from diverse sources, process the information and exchange the results with other programs. The effectiveness of such software agents will increase exponentially as more machine-readable Web content and automated services (including other agents) become available. The Semantic Web promotes this synergy: even agents that were not expressly designed to work together can transfer data among themselves when the data come with semantics.
Another vital feature will be digital signatures, which are encrypted blocks of data that
Computers and agents can use to verify that the attached information has been provided by a specific trusted source. Agents should be skeptical of assertions that they read
on the Semantic Web until they have checked the sources of information.
Many automated Web-based services already exist without semantics, but other programs
such as agents have no way to locate one that will perform a specific function. This process, called service discovery, can happen only when there is a common language to describe a service in a way that lets other agents "understand" both the function offered and how to take advantage of it.
COMPONENTS
XML, XML Schema, RDF, OWL, SPARQL
The semantic web comprises the standards and tools of XML, XML Schema, RDF, RDF Schema and OWL. The OWL Web Ontology Language Overview describes the function and relationship of each of these components of the semantic web:
W3C Semantic Web Layer Cake
• XML : provides an elemental syntax for content structure within documents, yet associates no semantics with the meaning of the content contained within.
• XML Schema : is a language for providing and restricting the structure and content of elements contained within XML documents.
• RDF : is a simple language for expressing data models, which refer to objects ("resources") and their relationships. An RDF-based model can be represented in XML syntax.
• RDF Schema : is a vocabulary for describing properties and classes of RDF-based resources, with semantics for generalized-hierarchies of such properties and classes.
• OWL : adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes.
• SPARQL : is a protocol and query language for semantic web data sources
RESOURCE DESCRIPTION FRAMEWORK
Introduction
The Resource Description Framework (RDF) is an infrastructure that enables the encoding, exchange and reuse of structured metadata. RDF is an application of XML that imposes needed structural constraints to provide unambiguous methods of expressing semantics. RDF additionally provides a means for publishing both human-readable and machine-processable vocabularies designed to encourage the reuse and extension of metadata semantics among disparate information communities. The structural constraints RDF imposes to support the consistent encoding and exchange of standardized metadata provides for the interchangeability of separate packages of metadata defined by different resource description communities.
The World Wide Web affords unprecedented access to globally distributed information. Metadata, or structured data about data, improves discovery of and access to such information. The effective use of metadata among applications, however, requires common conventions about semantics, syntax, and structure. Individual resource description communities define the semantics, or meaning, of metadata that address their particular needs. Syntax, the systematic arrangement of data elements for machine-processing, facilitates the exchange and use of metadata among multiple applications. Structure can be thought of as a formal constraint on the syntax for the consistent representation of semantics.
The Resource Description Framework (RDF), developed under the auspices of the World Wide Web Consortium (W3C), is an infrastructure that enables the encoding, exchange, and reuse of structured metadata. This infrastructure enables metadata interoperability through the design of mechanisms that support common conventions of semantics, syntax, and structure. RDF uses XML (eXtensible Markup Language) as a common syntax for the exchange and processing of metadata. The XML syntax is a subset of the international text processing standard SGML (Standard Generalized Markup Language [SGML]) specifically intended for use on the Web. The XML syntax provides vendor independence, user extensibility, validation, human readability, and the ability to represent complex structures. By exploiting the features of XML, RDF imposes structure that provides for the unambiguous expression of semantics and, as such, enables consistent encoding, exchange, and machine-processing of standardized metadata. RDF supports the use of conventions that will facilitate modular interoperability among separate metadata element sets. These conventions include standard mechanisms for representing semantics that are grounded in a simple, yet powerful, data model discussed below. RDF additionally provides a means for publishing both human-readable and machine-processable vocabularies. Vocabularies are the set of properties, or metadata elements, defined by resource description communities. The ability to standardize the declaration of vocabularies is anticipated to encourage the reuse and extension of semantics among disparate information communities.
The RDF Data Model
RDF provides a model for describing resources. Resources have properties (attributes or characteristics). RDF defines a resource as any object that is uniquely identifiable by a Uniform Resource Identifier (URI). The properties associated with resources are identified by property-types, and property-types have corresponding values. Property-types express the relationships of values associated with resources. In RDF, values may be atomic in nature (text strings, numbers, etc.) or other resources, which in turn may have their own properties. A collection of these properties that refers to the same resource is called a description. At the core of RDF is a syntax-independent model for representing resources and their corresponding descriptions. The following graphic (Figure 1) illustrates a generic RDF description.
Figure 1
The application and use of the RDF data model can be illustrated by concrete examples. Consider the following statements:
1. "The author of Document 1 is John Smith"
2. "John Smith is the author of Document 1"
To humans, these statements convey the same meaning (that is, John Smith is the author of a particular document). To a machine, however, these are completely different strings. Whereas humans are extremely adept at extracting meaning from differing syntactic constructs, machines remain grossly inept. Using a triadic model of resources, property-types and corresponding values, RDF attempts to provide an unambiguous method of expressing semantics in a machine-readable encoding.
RDF provides a mechanism for associating properties with resources. So, before anything about Document 1 can be said, the data model requires the declaration of a resource representing Document 1. Thus, the data model corresponding to the statement "the author of Document 1 is John Smith" has a single resource Document 1, a property-type of author and a corresponding value of John Smith. To distinguish characteristics of the data model, the RDF Model and Syntax specification represents the relationships among resources, property-types, and values in a directed labeled graph. In this case, resources are identified as nodes, property-types are defined as directed label arcs, and string values are quoted. Given this representation, the data model corresponding to the statement is graphically expressed as (Figure 2):
Figure 2
If additional descriptive information regarding the author were desired, e.g., the author's email address and affiliation, an elaboration on the previous example would be required. In this case, descriptive information about John Smith is desired. As was discussed in the first example, before descriptive properties can be expressed about the person John Smith, there needs to be a unique identifiable resource representing him. Given the directed label graph notation in the previous example, the data model corresponding to this description is graphically represented as (Figure 3):
Figure 3
In this case, "John Smith" the string is replaced by a uniquely identified resource denoted by Author_001 with the associated property-types of name, email and affiliation. The use of unique identifiers for resources allows for the unambiguous association of properties. This is an important point, as the person John Smith may be the value of several different property-types. John Smith may be the author of Document 1, but also may be the value of a particular company describing the set of current employees. The unambiguous identification of resources provides for the reuse of explicit, descriptive information.
In the previous example the unique identifiable resource for the author was created, but not for the author's name, email or affiliation. The RDF model allows for the creation of resources at multiple levels. Concerning the representation of personal names, for example, the creation of a resource representing the author's name could have additionally been described using "firstname", "middlename" and "surname" property-types. Clearly, this iterative descriptive process could continue down many levels. What, however, are the practical and logical limits of these iterations?
There is no one right answer to this question. The answer is dependent on the domain requirements. These issues must be addressed and decided upon in the standard practice of individual resource description communities. In short, experience and knowledge of the domain dictate which distinctions should be captured and reflected in the data model.
The RDF data model additionally provides for the description of other descriptions. For instance, often it is important to assess the credibility of a particular description (e.g., "The Library of Congress told us that John Smith is the author of Document 1"). In this case the description tells us something about the statement "John Smith is the author of Document 1", specifically, that the Library of Congress asserts this to be true. Similar constructs are additionally useful for the description of collections of resources. For instance, "John Smith is the author of Documents 1, 2, and 3". While these statements are significantly more complex, the same data model is applicable.
The RDF Syntax
RDF defines a simple, yet powerful model for describing resources. A syntax representing this model is required to store instances of this model into machine-readable files and to communicate these instances among applications. XML is this syntax. RDF imposes formal structure on XML to support the consistent representation of semantics.
RDF provides the ability for resource description communities to define semantics. It is important, however, to disambiguate these semantics among communities. The property-type "author", for example, may have broader or narrower meaning depending on different community needs. As such, it is problematic if multiple communities use the same property-type to mean very different things. To prevent this, RDF uniquely identifies property-types by using the XML namespace mechanism. XML namespaces provide a method for unambiguously identifying the semantics and conventions governing the particular use of property-types by uniquely identifying the governing authority of the vocabulary. For example, the property-type "author" defined by the Dublin Core Initiative as the "person or organization responsible for the creation of the intellectual content of the resource" and is specified by the Dublin Core CREATOR element. An XML namespace is used to unambiguously identify the Schema for the Dublin Core vocabulary by pointing to the definitive Dublin Core resource that defines the corresponding semantics. Additional information on RDF Schemas is discussed latter. If the Dublin Core RDF Schema, however, is abbreviated as "DC", the data model representation for this example would be (Figure 4):
Figure 4
This more explicit declaration identifies a resource Document 1 with the semantics of property-type Creator unambiguously defined in the context of DC (the Dublin Core vocabulary). The value of this property-type is John Smith.
The corresponding syntactic way of expressing this statement using XML namespaces to identify the use of the Dublin Core Schema is:
John Smith
In this case, both the RDF and Dublin Core schemas are declared and abbreviated as "RDF" and "DC" respectively. The RDF Schema is declared as a boot-strapping mechanism for the declaration of the necessary vocabulary needed for expressing the data model. The Dublin Core Schema is declared in order to utilize the vocabulary defined by this community. The URI associated with the namespace declaration references the corresponding schemas. The element (which can be interpreted as the element RDF in the context of the RDF namespace) is a simple wrapper that marks the boundaries in an XML document where the content is explicitly intended to be mappable into an RDF data model instance. The element (the element Description in the context of the RDF namespace) is correspondingly used to denote or instantiate a resource with the corresponding URI http://uri-of-Document-1. And the element in the context of the represents a property-type DC:Creator and a value of "John Smith". The syntactic representation is designed to reflect the corresponding data model.
In the more advanced example, where additional descriptive information regarding the author is required, similar syntactic constructs are used. In this case, while it may still be desirable to use the Dublin Core CREATOR property-type to represent the person responsible for the creation of the intellectual content, additional property-types "name", "email" and "affiliation" are required. For this case, since the semantics for these elements are not defined in Dublin Core, an additional resource description standard may be utilized. It is feasible to assume the creation of an RDF schema with the semantics similar to the vCard specification designed to automate the exchange of personal information typically found on a traditional business card, could be introduced to describe the author of the document. The data model representation for this example with the corresponding business card schema defined as CARD would be (Figure 5):
in which the RDF, Dublin Core, and the "Business Card" schemas are declared and abbreviated as "RDF", "DC" and "CARD" respectively. In this case, the value associated with the property-type DC:Creator is now a resource. While the reference to the resource is an internal identifier, an external URI, for example, to a controlled authority of names, could have been used as well. Additionally, in this example, the semantics of the Dublin Core CREATOR element have been refined by the semantics defined by the schema referenced by CARD. The structural constraints RDF imposes to support the consistent encoding and exchange of standardized metadata provides for the interchangeability of separate packages of metadata defined by different resource description communities.
The RDF Schema
RDF Schemas are used to declare vocabularies, the sets of semantics property-types defined by a particular community. RDF schemas define the valid properties in a given RDF description, as well as any characteristics or restrictions of the property-type values themselves. The XML namespace mechanism serves to identify RDF Schemas.
A human and machine-processable description of an RDF schema may be accessed by de-referencing the schema URI. If the schema is machine-processable, it may be possible for an application to learn some of the semantics of the property-types named in the schema. To understand a particular RDF schema is to understand the semantics of each of the properties in that description. RDF schemas are structured based on the RDF data model. Therefore, an application that has no understanding of a particular schema will still be able to parse the description into the property-type and corresponding values and will be able to transport the description intact (e.g., to a cache or to another application).
The exact details of RDF schemas are currently being discussed in the W3C RDF Schema working group. It is anticipated, however, that the ability to formalize human-readable and machine-processable vocabularies will encourage the exchange, use, and extension of metadata vocabularies among disparate information communities. RDF schemas are being designed to provide this type of formalization.
XML
The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.
Introduction
Extensible Markup Language, abbreviated XML, describes a class of data objects called XML documents and partially describes the behavior of computer programs which process them. XML is an application profile or restricted form of SGML, the Standard Generalized Markup Language. By construction, XML documents are conforming SGML documents.
XML documents are made up of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form markup. Markup encodes a description of the document's storage layout and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure.
[Definition: A software module called an XML processor is used to read XML documents and provide access to their content and structure.] [Definition: It is assumed that an XML processor is doing its work on behalf of another module, called the application.] This specification describes the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application.
Each XML document has both a logical and a physical structure. Physically, the document is composed of units called entities. An entity may refer to other entities to cause their inclusion in the document. A document begins in a "root" or document entity. Logically, the document is composed of declarations, elements, comments, character references, and processing instructions, all of which are indicated in the document by explicit markup.
XML SCHEMA
XML Schema, published as a W3C Recommendation in May 2001, is one of several XML schema languages. It was the first separate schema language for XML to achieve Recommendation status by the W3C.
Like all XML schema languages, XML Schema can be used to express a schema: a set of rules to which an XML document must conform in order to be considered 'valid' according to that schema. However, unlike most other schema languages, XML Schema was also designed with the intent of validation resulting in a collection of information adhering to specific datatypes, which can be useful in the development of XML document processing software, but which has also provoked criticism.
An XML Schema instance is an XML Schema Definition (XSD) and typically has the filename extension ".xsd". The language itself is sometimes informally referenced as XSD. It has been suggested that WXS (for W3C XML Schema) is a more appropriate initialism though this acronym has not been in a widespread use and W3C working group rejected it. XSD is also an initialism for XML Schema Datatypes, the datatype portion of XML Schema.
OWL
Introduction
The Web Ontology Language (OWL) is a language for defining and instantiating Web ontologies. An OWL ontology may include descriptions of classes, along with their related properties and instances. OWL is designed for use by applications that need to process the content of information instead of just presenting information to humans. It facilitates greater machine interpretability of Web content than that supported by XML, RDF, and RDF Schema (RDF-S) by providing additional vocabulary along with a formal semantics. OWL is based on earlier languages OIL and DAML+OIL, and is now a W3C recommendation.
OWL is seen as a major technology for the future implementation of a Semantic Web. It is playing an important role in an increasing number and range of applications, and is the focus of research into tools, reasoning techniques, formal foundations and language extensions.
OWL was designed to provide a common way to process the semantic content of web information. It was developed to augment the facilities for expressing semantics (meaning) provided by XML, RDF, and RDF-S. Consequently, it may be considered an evolution of these web languages in terms of its ability to represent machine-interpretable semantic content on the web. Since OWL is based on XML, OWL information can be easily exchanged between different types of computers using different operating systems, and application languages. Because the language is intended to be read by computer applications, it is sometimes not considered to be human-readable, although this may be a tool issue. OWL is being used to create standards that provide a framework for asset management, enterprise integration, and data sharing on the Web.
An extended version of OWL, (sometimes called OWL 1.1, but with no official status) has been proposed which includes increased expressiveness, a simpler data model and serialization, and a collection of well-defined sub-languages each with known computational properties.
OWL currently has three sublanguages (sometimes also referred to as 'species'): OWL Lite, OWL DL, and OWL Full. These three increasingly expressive sublanguages are designed for use by specific communities of implementers and users.
• OWL Lite supports those users primarily needing a classification hierarchy and simple constraints. For example, while it supports cardinality constraints, it only permits cardinality values of 0 or 1. It should be simpler to provide tool support for OWL Lite than its more expressive relatives, and OWL Lite provides a quick migration path for thesauri and other taxonomies. OWL Lite also has a lower formal complexity than OWL DL; see the section on OWL Lite in the OWL Reference for further details.
• OWL DL supports those users who want the maximum expressiveness while retaining computational completeness (all conclusions are guaranteed to be computed) and decidability (all computations will finish in finite time). OWL DL includes all OWL language constructs, but they can be used only under certain restrictions (for example, while a class may be a subclass of many classes, a class cannot be an instance of another class). OWL DL is so named due to its correspondence with description logic, a field of research that has studied the logics that form the formal foundation of OWL.
• OWL Full is meant for users who want maximum expressiveness and the syntactic freedom of RDF with no computational guarantees. For example, in OWL Full a class can be treated simultaneously as a collection of individuals and as an individual in its own right. OWL Full allows an ontology to augment the meaning of the pre-defined (RDF or OWL) vocabulary. It is unlikely that any reasoning software will be able to support complete reasoning for every feature of OWL Full.
Ontology developers adopting OWL should consider which sublanguage best suits their needs. The choice between OWL Lite and OWL DL depends on the extent to which users require the more-expressive constructs provided by OWL DL. The choice between OWL DL and OWL Full mainly depends on the extent to which users require the meta-modeling facilities of RDF Schema (e.g. defining classes of classes, or attaching properties to classes). When using OWL Full as compared to OWL DL, reasoning support is less predictable since complete OWL Full implementations do not currently exist.
OWL Full can be viewed as an extension of RDF, while OWL Lite and OWL DL can be viewed as extensions of a restricted view of RDF. Every OWL (Lite, DL, Full) document is an RDF document, and every RDF document is an OWL Full document, but only some RDF documents will be a legal OWL Lite or OWL DL document. Because of this, some care has to be taken when a user wants to migrate an RDF document to OWL. When the expressiveness of OWL DL or OWL Lite is deemed appropriate, some precautions have to be taken to ensure that the original RDF document complies with the additional constraints imposed by OWL DL and OWL Lite. Among others, every URI that is used as a class name must be explicitly asserted to be of type owl:Class (and similarly for properties), every individual must be asserted to belong to at least one class (even if only owl:Thing), the URI's used for classes, properties and individuals must be mutually disjoint
1.2 Why OWL?
The Semantic Web is a vision for the future of the Web in which information is given explicit meaning, making it easier for machines to automatically process and integrate information available on the Web. The Semantic Web will build on XML's ability to define customized tagging schemes and RDF's flexible approach to representing data. The first level above RDF required for the Semantic Web is an ontology language what can formally describe the meaning of terminology used in Web documents. If machines are expected to perform useful reasoning tasks on these documents, the language must go beyond the basic semantics of RDF Schema. The OWL Use Cases and Requirements Document provides more details on ontologies, motivates the need for a Web Ontology Language in terms of six use cases, and formulates design goals, requirements and objectives for OWL.
OWL has been designed to meet this need for a Web Ontology Language. OWL is part of the growing stack of W3C recommendations related to the Semantic Web.
SPARQL
SPARQL (pronounced "sparkle") is an RDF query language; its name is a recursive acronym that stands for SPARQL Protocol and RDF Query Language. It is undergoing standardization by the RDF the World Wide Web Consortium. SPARQL essentially consists of a standard query language, a data access protocol and a data model (which is basically RDF). There's a big difference between blindly searching the entire Web and querying actual data models which makes it of an advantage.
Most uses of the SPARQL acronym refer to the RDF query language. In this usage, SPARQL is a syntactically-SQL-like language for querying RDF databases. It can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware.
SPARQL protocol is a means of conveying SPARQL queries from query clients to query processors. It is described abstractly with WSDL 2.0 (Web service description language) which contains one interface, Sparql Query, which in turn contains one operation, query. Sparql Query is the protocol's only interface. It contains one operation, query, which is used to convey a SPARQL query string and, optionally, an RDF dataset description.
Abstract
The World Wide Web has evolved from the original hypertext system envisioned by physicists to a planet wide medium that has already transformed most of our lives. Web 3.0 refers to a combination of advances that will change the internet radically. It is the advent of a brave new paradigm where we will interact and solve problems together through a network of A.I. assistants. Also known as the semantic web, a term coined by Tim Berners-Lee, the man who invented the (first) World Wide Web. In essence, this new paradigm is a place where machines can read Web pages much as we humans read them, a place where search engines and software agents can better troll the Net and find what we are looking for. The vision is to extend principles of the Web from documents to data. This extension will allow to fulfill more of the Web’s potential, in that it will allow data to be shared effectively by wider communities, and to be processed automatically by tools as well as manually. A very basic definition for web 3.0 is that it is an expert system where there is a software agent for any task assigned by the user which takes in an input runs it through a knowledge database and then generates an output through inference.
Web 3.0 is not a separate Web but an extension of the current one, in which
Information is given well-defined meaning, better enabling computers and people to work in Co-operation. For this web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning. Artificial-intelligence researchers have studied such systems since long before the Web was developed. Knowledge representation, as this technology is often called, is currently in a state comparable to that of hypertext before the advent of the Web: it is clearly a good idea, and some very nice demonstrations exist, but it has not yet changed the world. It contains the seeds of important applications, but to realize its full potential it must be linked into a single global system.
HOW THE INTERNERT WORKS
Introduction
Web 3.0 is nothing but adding a layer of intelligence to the existing web which could be better understood by looking at the underlying implementation of the present day web. The Internet workings include a technical design and a management structure. The management structure consists of a generally democratic collection of loosely-coupled organizations and working groups with mostly non-overlapping responsibilities. The technical design is founded on a complex, interlocking set of hierarchical tree-like structures like Internet Protocol addresses and domain names, mixed with networked structures like packet switching and routing protocols, all tied together with millions of lines of sophisticated software that continues to get better all the time.
The Internet's architecture is described in its name, a short from of the compound word "inter-networking". This architecture is based in the very specification of the standard TCP/IP protocol, designed to connect any two networks which may be very different in internal hardware, software, and technical design. Once two networks are interconnected, communication with TCP/IP is enabled end-to-end, so that any node on the Internet has the near magical ability to communicate with any other no matter where they are. This openness of design has enabled the Internet architecture to grow to a global scale.
HOW YOUR URL GETS ROUTED
The Domain Name System (DNS) as a whole consists of a network of servers that map Internet domain names like www.yahoo.com to a local IP addresses. The DNS enables domain names to stay constant while the underlying network topology and IP addresses change. This provides stability at the application level while enabling network applications to find and communicate with each other using the Internet protocol no matter how the underlying physical network changes.
Internet domain names come in four main types -- top-level domains, second-level domains, third-level domains, and country domains. Internet domain names are the alphanumeric identifiers we use to refer to hosts on the Internet, like "www.getme_donnel.com".
Internet domain names are organized by their levels, with the higher levels on the right. For example, for the domain "www.getme_donnel.com" the top-level domain is "com", the second-level domain is "getme_donnel", and the third-level domain is "www.getme_donnel.com".
Top-level Internet domains like ".com" are shared by all the organizations in the domain. Second-level domain names like "yahoo.com" and "livinginternet.com" are registered by individuals and organizations. Second-level domains are the addresses commonly used to host Internet applications like web hosting and email addressing.
Third-level Internet domain names are created by those that own second-level domains. Third-level domains can be used to set up individual domains for specific purposes, such as a domain for web access and one for mail, or a separate site for a special purpose:
• www.livinginternet.com
• mail.livinginternet.com
• rareorchids.livinginternet.com
Each country in the world has its own top-level Internet domain with a unique alphabetic designation. For e.g. www.google.co.in
The Domain Name System (DNS) servers distribute the job of mapping domain names to IP addresses among servers allocated to each domain.
Each second-level domain must have at least one domain name server responsible for maintenance of information about that domain and all subsidiary domains, and response to queries about those domains from other computers on the Internet. For example, management of domain name information and queries for the LivingInternet.com domain is handled by a specific DNS server that takes care of the load required. This distributed architecture was designed to enable the Internet to grow, where as the number of domains grew, the number of DNS servers can grow to keep pace with the load.
Today, everyone who registers a second-level domain name must at the same time designate two DNS servers to manage queries and return the current IP address for addresses in that domain. The primary domain name server is always consulted first, and the secondary domain name server is queried if the primary doesn't answer, providing a backup and important support to overall Internet reliability.
The application that underlies almost all DNS server software on the Internet is an open source program called BIND, currently maintained by the Internet Systems Consortium. When your computer was added to the Internet, one of the initial setup tasks was to specify a default domain name server, usually maintained by your local Internet Service Provider, and almost certainly a variant of the BIND server software.
When your computer tries to access a domain like "www.livinginternet.com", the domain name system works like this:
• Your computer asks your default DNS server if it knows the IP address for www.livinginternet.com. If the DNS server has been asked that question recently, then it will have the answer stored in its local cache, and can answer immediately.
• Otherwise, your DNS server queries the central zone files for the address of the primary domain name server for livinginternet.com, and is answered with something like "ns1.livinginternet.com".
• Your DNS server will ask the livinginternet.com DNS server for the IP address of www.livinginternet.com, which will then look up the answer and send it back.
• Your DNS server will store the IP address returned in its local cache, and make the address available to your computer.
• Your computer then contacts www.livinginternet.com with the standard Internet routing protocols by using the returned IP address.
The IP address assigned to a computer may change frequently because of physical moves or network reconfigurations. The major advantage of the network of DNS servers is that domain names stay the same even when IP addresses change, and so the domain name servers can transparently take care of the mapping.
WEB 2.0
The phrase Web 2.0 refers to a perceived second-generation of web-based communities and hosted services — such as social-networking sites, wikis and folksonomies — which aim to facilitate collaboration and sharing between users. It became popular following the first O'Reilly Media Web 2.0 conference in 2004, and has since become widely adopted.
Although the term suggests a new version of the World Wide Web, it does not refer to an update to Web technical specifications, but to changes in the ways software developers and end-users use the web as a platform. According to Tim O'Reilly, "Web 2.0 is the business revolution in the computer industry caused by the move to the internet as platform, and an attempt to understand the rules for success on that new platform.
Some technology experts, notably Tim Berners-Lee, have questioned whether one can use the term in a meaningful way, since many of the technology components of "Web 2.0" have existed since the early days of the Web.
Characteristics of "Web 2.0"
While interested parties continue to debate the definition of a Web 2.0 application, a Web 2.0 website may exhibit some basic common characteristics. These might include:
• "Network as platform" — delivering (and allowing users to use) applications entirely through a browser.
• Users owning the data on a site and exercising control over that data.
• Architecture of participation that encourages users to add value to the application as they use it. This stands in sharp contrast to hierarchical access-control in applications, in which systems categorize users into roles with varying degrees of functionality.
• A rich, interactive, user-friendly interface based on Ajax or similar frameworks.
• Some social-networking aspects.
The concept of Web-as-participation-platform captures many of these characteristics. Bart Decrem, a founder and former CEO of Flock, calls Web 2.0 the "participatory Web and regards the Web-as-information-source as Web 1.0.
Relationship of Web 3.0 to the Hypertext Web
Markup
Many files on a typical computer can be loosely divided into documents and data. Documents, like mail messages, reports and brochures, are read by humans. Data, like calendars, addressbooks, playlists and spreadsheets, are presented using an application program which lets them be viewed, searched and combined in many ways.
Currently, the World Wide Web is based mainly on documents written in Hypertext Markup Language (HTML), a markup convention that is used for coding a body of text interspersed with multimedia objects such as images and interactive forms. The semantic web involves publishing the data in a language, Resource Description Framework (RDF), specifically for data, so that it can be manipulated and combined just as can data files on a local computer.
The HTML language describes documents and the links between them. RDF, by contrast, describes arbitrary things such as people, meetings, and airplane parts. For example, with HTML and a tool to render it (perhaps Web browser software, perhaps another user agent), one can create and present a page that lists items for sale. The HTML of this catalog page can make simple, document-level assertions such as "this document's title is 'Widget Superstore'". But there is no capability within the HTML itself to assert unambiguously that, for example, item number X586172 is an Acme Gizmo with a retail price of €199, or that it is a consumer product. Rather, HTML can only say that the span of text "X586172" is something that should be positioned near "Acme Gizmo" and "€ 199", etc. There is no way to say "this is a catalog" or even to establish that "Acme Gizmo" is a kind of title or that "€ 199" is a price. There is also no way to express that these pieces of information are bound together in describing a discrete item, distinct from other items perhaps listed on the page.
Descriptive and extensible
The semantic web addresses this shortcoming, using the descriptive technologies Resource Description Framework (RDF) and Web Ontology Language (OWL), and the data-centric, customizable Extensible Markup Language (XML). These technologies are combined in order to provide descriptions that supplement or replace the content of Web documents. Thus, content may manifest as descriptive data stored in Web-accessible databases, or as markup within documents (particularly, in Extensible HTML (XHTML) interspersed with XML, or, more often, purely in XML, with layout/rendering cues stored separately). The machine-readable descriptions enable content managers to add meaning to the content, i.e. to describe the structure of the knowledge we have about that content. In this way, a machine can process knowledge itself, instead of text, using processes similar to human deductive reasoning and inference, thereby obtaining more meaningful results and facilitating automated information gathering and research by computer.
TRANSFORMING THE WEB INTO A DATABASE
The first step towards a "Web 3.0" is the emergence of "The Data Web" as structured data records are published to the Web in reusable and remotely queryable formats, such as XML, RDF and microformats. The recent growth of SPARQL technology provides a standardized query language and API for searching across distributed RDF databases on the Web. The Data Web enables a new level of data integration and application interoperability, making data as openly accessible and linkable as Web pages. The Data Web is the first step on the path towards the full Semantic Web. In the Data Web phase, the focus is principally on making structured data available using RDF. The full Semantic Web stage will widen the scope such that both structured data and even what is traditionally thought of as unstructured or semi-structured content (such as Web pages, documents, etc.) will be widely available in RDF and OWL semantic formats.
AN EVOLUTIONARY PATH TO ARTIFICIAL INTELLIGENCE
Web 3.0 has also been used to describe an evolutionary path for the Web that leads to artificial intelligence that can reason about the Web in a quasi-human fashion. Some parts of this new web are based on results of Artificial Intelligence research, like knowledge representation (e.g., for ontologies), model theory (e.g., for the precise semantics of RDF and RDF Schemas), or various types of logic (e.g., for rules). Even though some regard this as an unobtainable vision, companies such as IBM and Google are implementing new technologies on data mining that are yielding surprising information on making predictions regarding the stock market. There is also debate over whether the driving force behind Web 3.0 will be intelligent systems, or whether intelligence will emerge in a more organic fashion, from systems of intelligent people, such as via collaborative filtering services like del.icio.us, Flickr and Digg that extract meaning and order from the existing Web and how people interact with it.
BASIC WEB 3.0 CONCEPTS
Knowledge domains
A knowledge domain is something like Physics, Chemistry, Biology, Politics, the Web, Sociology, Psychology, History, etc. There can be many sub-domains under each domain each having their own sub-domains and so on.
Information vs. Knowledge
To a machine, knowledge is comprehended information (aka new information produced through the application of deductive reasoning to exiting information). To a machine, information is only data, until it is processed and comprehended.
Ontologies
Ontologies are not knowledge nor are they information. They are meta-information. In other words, ontologies are information about information. In the context of Web 3.0, they encode, using an ontology language, the relationships between the various terms within the information. Those relationships, which may be thought of as the axioms (basic assumptions), together with the rules governing the inference process, both enable as well as constrain the interpretation (and well-formed use) of those terms by the Info Agents to reason new conclusions based on existing information, i.e. to think. In other words, theorems (formal deductive propositions that are provable based on the axioms and the rules of inference) may be generated by the software, thus allowing formal deductive reasoning at the machine level. And given that an ontology, as described here, is a statement of Logic Theory, two or more independent Info Agents processing the same domain-specific ontology will be able to collaborate and deduce an answer to a query, without being driven by the same software.
Inference Engines
In the context of Web 3.0, Inference engines will be combining the latest innovations from the artificial intelligence (AI) field together with domain-specific ontologies, domain inference rules, and query structures to enable deductive reasoning on the machine level.
Info Agents
Info Agents are instances of an Inference Engine, each working with a domain-specific ontology. Two or more agents working with a shared ontology may collaborate to deduce answers to questions. Such collaborating agents may be based on differently designed Inference Engines and they would still be able to collaborate.
Proofs and Answers
The interesting thing about Info Agents is that they will be capable of not only deducing answers from existing information but they will also be able to formally test propositions (represented in some query logic) that are made directly or implied by the user. This test-of-truth feature assumes the use of an ontology language (as a formal logic system) and an ontology where all propositions (or formal statements) that can be made can be computed (i.e. proved true or false) and were all such computations are decidable in finite time. The language may be OWL-DL or any language that, together with the ontology in question, satisfy the completeness and decidability conditions.
Once machines can understand and use information, using a standard ontology language, the world will never be the same. It will be possible to have an info agent (or many info agents) among the virtual AI-enhanced workforce each having access to different domain specific comprehension space and all communicating with each other to build a collective consciousness.
Questions can be posed to your info agent or agents to find you the nearest restaurant that serves Italian cuisine, which is not possible with the current search engines. But that is just a very simple example of the deductive reasoning machines will be able to perform on information they have.
Far more awesome implications can be seen when you consider that every area of human knowledge will be automatically within the comprehension space of your info agents. That is because each info agent can communicate with other info agents who are specialized in different domains of knowledge to produce a collective consciousness encompasses all human knowledge.
SOFTWARE AGENTS
The real power of Web 3.0 or the Semantic Web will be realized when people create many programs that collect Web content from diverse sources, process the information and exchange the results with other programs. The effectiveness of such software agents will increase exponentially as more machine-readable Web content and automated services (including other agents) become available. The Semantic Web promotes this synergy: even agents that were not expressly designed to work together can transfer data among themselves when the data come with semantics.
Another vital feature will be digital signatures, which are encrypted blocks of data that
Computers and agents can use to verify that the attached information has been provided by a specific trusted source. Agents should be skeptical of assertions that they read
on the Semantic Web until they have checked the sources of information.
Many automated Web-based services already exist without semantics, but other programs
such as agents have no way to locate one that will perform a specific function. This process, called service discovery, can happen only when there is a common language to describe a service in a way that lets other agents "understand" both the function offered and how to take advantage of it.
COMPONENTS
XML, XML Schema, RDF, OWL, SPARQL
The semantic web comprises the standards and tools of XML, XML Schema, RDF, RDF Schema and OWL. The OWL Web Ontology Language Overview describes the function and relationship of each of these components of the semantic web:
W3C Semantic Web Layer Cake
• XML : provides an elemental syntax for content structure within documents, yet associates no semantics with the meaning of the content contained within.
• XML Schema : is a language for providing and restricting the structure and content of elements contained within XML documents.
• RDF : is a simple language for expressing data models, which refer to objects ("resources") and their relationships. An RDF-based model can be represented in XML syntax.
• RDF Schema : is a vocabulary for describing properties and classes of RDF-based resources, with semantics for generalized-hierarchies of such properties and classes.
• OWL : adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes.
• SPARQL : is a protocol and query language for semantic web data sources
RESOURCE DESCRIPTION FRAMEWORK
Introduction
The Resource Description Framework (RDF) is an infrastructure that enables the encoding, exchange and reuse of structured metadata. RDF is an application of XML that imposes needed structural constraints to provide unambiguous methods of expressing semantics. RDF additionally provides a means for publishing both human-readable and machine-processable vocabularies designed to encourage the reuse and extension of metadata semantics among disparate information communities. The structural constraints RDF imposes to support the consistent encoding and exchange of standardized metadata provides for the interchangeability of separate packages of metadata defined by different resource description communities.
The World Wide Web affords unprecedented access to globally distributed information. Metadata, or structured data about data, improves discovery of and access to such information. The effective use of metadata among applications, however, requires common conventions about semantics, syntax, and structure. Individual resource description communities define the semantics, or meaning, of metadata that address their particular needs. Syntax, the systematic arrangement of data elements for machine-processing, facilitates the exchange and use of metadata among multiple applications. Structure can be thought of as a formal constraint on the syntax for the consistent representation of semantics.
The Resource Description Framework (RDF), developed under the auspices of the World Wide Web Consortium (W3C), is an infrastructure that enables the encoding, exchange, and reuse of structured metadata. This infrastructure enables metadata interoperability through the design of mechanisms that support common conventions of semantics, syntax, and structure. RDF uses XML (eXtensible Markup Language) as a common syntax for the exchange and processing of metadata. The XML syntax is a subset of the international text processing standard SGML (Standard Generalized Markup Language [SGML]) specifically intended for use on the Web. The XML syntax provides vendor independence, user extensibility, validation, human readability, and the ability to represent complex structures. By exploiting the features of XML, RDF imposes structure that provides for the unambiguous expression of semantics and, as such, enables consistent encoding, exchange, and machine-processing of standardized metadata. RDF supports the use of conventions that will facilitate modular interoperability among separate metadata element sets. These conventions include standard mechanisms for representing semantics that are grounded in a simple, yet powerful, data model discussed below. RDF additionally provides a means for publishing both human-readable and machine-processable vocabularies. Vocabularies are the set of properties, or metadata elements, defined by resource description communities. The ability to standardize the declaration of vocabularies is anticipated to encourage the reuse and extension of semantics among disparate information communities.
The RDF Data Model
RDF provides a model for describing resources. Resources have properties (attributes or characteristics). RDF defines a resource as any object that is uniquely identifiable by a Uniform Resource Identifier (URI). The properties associated with resources are identified by property-types, and property-types have corresponding values. Property-types express the relationships of values associated with resources. In RDF, values may be atomic in nature (text strings, numbers, etc.) or other resources, which in turn may have their own properties. A collection of these properties that refers to the same resource is called a description. At the core of RDF is a syntax-independent model for representing resources and their corresponding descriptions. The following graphic (Figure 1) illustrates a generic RDF description.
Figure 1
The application and use of the RDF data model can be illustrated by concrete examples. Consider the following statements:
1. "The author of Document 1 is John Smith"
2. "John Smith is the author of Document 1"
To humans, these statements convey the same meaning (that is, John Smith is the author of a particular document). To a machine, however, these are completely different strings. Whereas humans are extremely adept at extracting meaning from differing syntactic constructs, machines remain grossly inept. Using a triadic model of resources, property-types and corresponding values, RDF attempts to provide an unambiguous method of expressing semantics in a machine-readable encoding.
RDF provides a mechanism for associating properties with resources. So, before anything about Document 1 can be said, the data model requires the declaration of a resource representing Document 1. Thus, the data model corresponding to the statement "the author of Document 1 is John Smith" has a single resource Document 1, a property-type of author and a corresponding value of John Smith. To distinguish characteristics of the data model, the RDF Model and Syntax specification represents the relationships among resources, property-types, and values in a directed labeled graph. In this case, resources are identified as nodes, property-types are defined as directed label arcs, and string values are quoted. Given this representation, the data model corresponding to the statement is graphically expressed as (Figure 2):
Figure 2
If additional descriptive information regarding the author were desired, e.g., the author's email address and affiliation, an elaboration on the previous example would be required. In this case, descriptive information about John Smith is desired. As was discussed in the first example, before descriptive properties can be expressed about the person John Smith, there needs to be a unique identifiable resource representing him. Given the directed label graph notation in the previous example, the data model corresponding to this description is graphically represented as (Figure 3):
Figure 3
In this case, "John Smith" the string is replaced by a uniquely identified resource denoted by Author_001 with the associated property-types of name, email and affiliation. The use of unique identifiers for resources allows for the unambiguous association of properties. This is an important point, as the person John Smith may be the value of several different property-types. John Smith may be the author of Document 1, but also may be the value of a particular company describing the set of current employees. The unambiguous identification of resources provides for the reuse of explicit, descriptive information.
In the previous example the unique identifiable resource for the author was created, but not for the author's name, email or affiliation. The RDF model allows for the creation of resources at multiple levels. Concerning the representation of personal names, for example, the creation of a resource representing the author's name could have additionally been described using "firstname", "middlename" and "surname" property-types. Clearly, this iterative descriptive process could continue down many levels. What, however, are the practical and logical limits of these iterations?
There is no one right answer to this question. The answer is dependent on the domain requirements. These issues must be addressed and decided upon in the standard practice of individual resource description communities. In short, experience and knowledge of the domain dictate which distinctions should be captured and reflected in the data model.
The RDF data model additionally provides for the description of other descriptions. For instance, often it is important to assess the credibility of a particular description (e.g., "The Library of Congress told us that John Smith is the author of Document 1"). In this case the description tells us something about the statement "John Smith is the author of Document 1", specifically, that the Library of Congress asserts this to be true. Similar constructs are additionally useful for the description of collections of resources. For instance, "John Smith is the author of Documents 1, 2, and 3". While these statements are significantly more complex, the same data model is applicable.
The RDF Syntax
RDF defines a simple, yet powerful model for describing resources. A syntax representing this model is required to store instances of this model into machine-readable files and to communicate these instances among applications. XML is this syntax. RDF imposes formal structure on XML to support the consistent representation of semantics.
RDF provides the ability for resource description communities to define semantics. It is important, however, to disambiguate these semantics among communities. The property-type "author", for example, may have broader or narrower meaning depending on different community needs. As such, it is problematic if multiple communities use the same property-type to mean very different things. To prevent this, RDF uniquely identifies property-types by using the XML namespace mechanism. XML namespaces provide a method for unambiguously identifying the semantics and conventions governing the particular use of property-types by uniquely identifying the governing authority of the vocabulary. For example, the property-type "author" defined by the Dublin Core Initiative as the "person or organization responsible for the creation of the intellectual content of the resource" and is specified by the Dublin Core CREATOR element. An XML namespace is used to unambiguously identify the Schema for the Dublin Core vocabulary by pointing to the definitive Dublin Core resource that defines the corresponding semantics. Additional information on RDF Schemas is discussed latter. If the Dublin Core RDF Schema, however, is abbreviated as "DC", the data model representation for this example would be (Figure 4):
Figure 4
This more explicit declaration identifies a resource Document 1 with the semantics of property-type Creator unambiguously defined in the context of DC (the Dublin Core vocabulary). The value of this property-type is John Smith.
The corresponding syntactic way of expressing this statement using XML namespaces to identify the use of the Dublin Core Schema is:
In this case, both the RDF and Dublin Core schemas are declared and abbreviated as "RDF" and "DC" respectively. The RDF Schema is declared as a boot-strapping mechanism for the declaration of the necessary vocabulary needed for expressing the data model. The Dublin Core Schema is declared in order to utilize the vocabulary defined by this community. The URI associated with the namespace declaration references the corresponding schemas. The element
In the more advanced example, where additional descriptive information regarding the author is required, similar syntactic constructs are used. In this case, while it may still be desirable to use the Dublin Core CREATOR property-type to represent the person responsible for the creation of the intellectual content, additional property-types "name", "email" and "affiliation" are required. For this case, since the semantics for these elements are not defined in Dublin Core, an additional resource description standard may be utilized. It is feasible to assume the creation of an RDF schema with the semantics similar to the vCard specification designed to automate the exchange of personal information typically found on a traditional business card, could be introduced to describe the author of the document. The data model representation for this example with the corresponding business card schema defined as CARD would be (Figure 5):
in which the RDF, Dublin Core, and the "Business Card" schemas are declared and abbreviated as "RDF", "DC" and "CARD" respectively. In this case, the value associated with the property-type DC:Creator is now a resource. While the reference to the resource is an internal identifier, an external URI, for example, to a controlled authority of names, could have been used as well. Additionally, in this example, the semantics of the Dublin Core CREATOR element have been refined by the semantics defined by the schema referenced by CARD. The structural constraints RDF imposes to support the consistent encoding and exchange of standardized metadata provides for the interchangeability of separate packages of metadata defined by different resource description communities.
The RDF Schema
RDF Schemas are used to declare vocabularies, the sets of semantics property-types defined by a particular community. RDF schemas define the valid properties in a given RDF description, as well as any characteristics or restrictions of the property-type values themselves. The XML namespace mechanism serves to identify RDF Schemas.
A human and machine-processable description of an RDF schema may be accessed by de-referencing the schema URI. If the schema is machine-processable, it may be possible for an application to learn some of the semantics of the property-types named in the schema. To understand a particular RDF schema is to understand the semantics of each of the properties in that description. RDF schemas are structured based on the RDF data model. Therefore, an application that has no understanding of a particular schema will still be able to parse the description into the property-type and corresponding values and will be able to transport the description intact (e.g., to a cache or to another application).
The exact details of RDF schemas are currently being discussed in the W3C RDF Schema working group. It is anticipated, however, that the ability to formalize human-readable and machine-processable vocabularies will encourage the exchange, use, and extension of metadata vocabularies among disparate information communities. RDF schemas are being designed to provide this type of formalization.
XML
The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.
Introduction
Extensible Markup Language, abbreviated XML, describes a class of data objects called XML documents and partially describes the behavior of computer programs which process them. XML is an application profile or restricted form of SGML, the Standard Generalized Markup Language. By construction, XML documents are conforming SGML documents.
XML documents are made up of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form markup. Markup encodes a description of the document's storage layout and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure.
[Definition: A software module called an XML processor is used to read XML documents and provide access to their content and structure.] [Definition: It is assumed that an XML processor is doing its work on behalf of another module, called the application.] This specification describes the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application.
Each XML document has both a logical and a physical structure. Physically, the document is composed of units called entities. An entity may refer to other entities to cause their inclusion in the document. A document begins in a "root" or document entity. Logically, the document is composed of declarations, elements, comments, character references, and processing instructions, all of which are indicated in the document by explicit markup.
XML SCHEMA
XML Schema, published as a W3C Recommendation in May 2001, is one of several XML schema languages. It was the first separate schema language for XML to achieve Recommendation status by the W3C.
Like all XML schema languages, XML Schema can be used to express a schema: a set of rules to which an XML document must conform in order to be considered 'valid' according to that schema. However, unlike most other schema languages, XML Schema was also designed with the intent of validation resulting in a collection of information adhering to specific datatypes, which can be useful in the development of XML document processing software, but which has also provoked criticism.
An XML Schema instance is an XML Schema Definition (XSD) and typically has the filename extension ".xsd". The language itself is sometimes informally referenced as XSD. It has been suggested that WXS (for W3C XML Schema) is a more appropriate initialism though this acronym has not been in a widespread use and W3C working group rejected it. XSD is also an initialism for XML Schema Datatypes, the datatype portion of XML Schema.
OWL
Introduction
The Web Ontology Language (OWL) is a language for defining and instantiating Web ontologies. An OWL ontology may include descriptions of classes, along with their related properties and instances. OWL is designed for use by applications that need to process the content of information instead of just presenting information to humans. It facilitates greater machine interpretability of Web content than that supported by XML, RDF, and RDF Schema (RDF-S) by providing additional vocabulary along with a formal semantics. OWL is based on earlier languages OIL and DAML+OIL, and is now a W3C recommendation.
OWL is seen as a major technology for the future implementation of a Semantic Web. It is playing an important role in an increasing number and range of applications, and is the focus of research into tools, reasoning techniques, formal foundations and language extensions.
OWL was designed to provide a common way to process the semantic content of web information. It was developed to augment the facilities for expressing semantics (meaning) provided by XML, RDF, and RDF-S. Consequently, it may be considered an evolution of these web languages in terms of its ability to represent machine-interpretable semantic content on the web. Since OWL is based on XML, OWL information can be easily exchanged between different types of computers using different operating systems, and application languages. Because the language is intended to be read by computer applications, it is sometimes not considered to be human-readable, although this may be a tool issue. OWL is being used to create standards that provide a framework for asset management, enterprise integration, and data sharing on the Web.
An extended version of OWL, (sometimes called OWL 1.1, but with no official status) has been proposed which includes increased expressiveness, a simpler data model and serialization, and a collection of well-defined sub-languages each with known computational properties.
OWL currently has three sublanguages (sometimes also referred to as 'species'): OWL Lite, OWL DL, and OWL Full. These three increasingly expressive sublanguages are designed for use by specific communities of implementers and users.
• OWL Lite supports those users primarily needing a classification hierarchy and simple constraints. For example, while it supports cardinality constraints, it only permits cardinality values of 0 or 1. It should be simpler to provide tool support for OWL Lite than its more expressive relatives, and OWL Lite provides a quick migration path for thesauri and other taxonomies. OWL Lite also has a lower formal complexity than OWL DL; see the section on OWL Lite in the OWL Reference for further details.
• OWL DL supports those users who want the maximum expressiveness while retaining computational completeness (all conclusions are guaranteed to be computed) and decidability (all computations will finish in finite time). OWL DL includes all OWL language constructs, but they can be used only under certain restrictions (for example, while a class may be a subclass of many classes, a class cannot be an instance of another class). OWL DL is so named due to its correspondence with description logic, a field of research that has studied the logics that form the formal foundation of OWL.
• OWL Full is meant for users who want maximum expressiveness and the syntactic freedom of RDF with no computational guarantees. For example, in OWL Full a class can be treated simultaneously as a collection of individuals and as an individual in its own right. OWL Full allows an ontology to augment the meaning of the pre-defined (RDF or OWL) vocabulary. It is unlikely that any reasoning software will be able to support complete reasoning for every feature of OWL Full.
Ontology developers adopting OWL should consider which sublanguage best suits their needs. The choice between OWL Lite and OWL DL depends on the extent to which users require the more-expressive constructs provided by OWL DL. The choice between OWL DL and OWL Full mainly depends on the extent to which users require the meta-modeling facilities of RDF Schema (e.g. defining classes of classes, or attaching properties to classes). When using OWL Full as compared to OWL DL, reasoning support is less predictable since complete OWL Full implementations do not currently exist.
OWL Full can be viewed as an extension of RDF, while OWL Lite and OWL DL can be viewed as extensions of a restricted view of RDF. Every OWL (Lite, DL, Full) document is an RDF document, and every RDF document is an OWL Full document, but only some RDF documents will be a legal OWL Lite or OWL DL document. Because of this, some care has to be taken when a user wants to migrate an RDF document to OWL. When the expressiveness of OWL DL or OWL Lite is deemed appropriate, some precautions have to be taken to ensure that the original RDF document complies with the additional constraints imposed by OWL DL and OWL Lite. Among others, every URI that is used as a class name must be explicitly asserted to be of type owl:Class (and similarly for properties), every individual must be asserted to belong to at least one class (even if only owl:Thing), the URI's used for classes, properties and individuals must be mutually disjoint
1.2 Why OWL?
The Semantic Web is a vision for the future of the Web in which information is given explicit meaning, making it easier for machines to automatically process and integrate information available on the Web. The Semantic Web will build on XML's ability to define customized tagging schemes and RDF's flexible approach to representing data. The first level above RDF required for the Semantic Web is an ontology language what can formally describe the meaning of terminology used in Web documents. If machines are expected to perform useful reasoning tasks on these documents, the language must go beyond the basic semantics of RDF Schema. The OWL Use Cases and Requirements Document provides more details on ontologies, motivates the need for a Web Ontology Language in terms of six use cases, and formulates design goals, requirements and objectives for OWL.
OWL has been designed to meet this need for a Web Ontology Language. OWL is part of the growing stack of W3C recommendations related to the Semantic Web.
SPARQL
SPARQL (pronounced "sparkle") is an RDF query language; its name is a recursive acronym that stands for SPARQL Protocol and RDF Query Language. It is undergoing standardization by the RDF the World Wide Web Consortium. SPARQL essentially consists of a standard query language, a data access protocol and a data model (which is basically RDF). There's a big difference between blindly searching the entire Web and querying actual data models which makes it of an advantage.
Most uses of the SPARQL acronym refer to the RDF query language. In this usage, SPARQL is a syntactically-SQL-like language for querying RDF databases. It can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware.
SPARQL protocol is a means of conveying SPARQL queries from query clients to query processors. It is described abstractly with WSDL 2.0 (Web service description language) which contains one interface, Sparql Query, which in turn contains one operation, query. Sparql Query is the protocol's only interface. It contains one operation, query, which is used to convey a SPARQL query string and, optionally, an RDF dataset description.
Friday, October 12, 2007
Free Space Optics
1. INTRODUCTION.
A fiber optic communication link uses light sources and detectors to send and receive information through a fiber optic cable. Similarly, FSO uses light sources and detectors to send and receive information, but through the atmosphere instead of a cable. The motivation for FSO is to eliminate the cost, time, and effort of installing fiber optic cable, yet retain the benefit of high data rates (up to 1 Gb/s and beyond) for transmission of voice, data, images, and video. However, swapping light propagation through a precisely manufactured dielectric waveguide for propagation through the atmosphere imposes significant penalties on performance. Specifically, the effective distance of FSO links is limited; depending on atmospheric conditions the maximum range is 2-3 km, but 200-500 meters is typical to meet telco grades of availability. Thus, at present, FSO systems are used primarily in last mile applications to connect end users to a broadband network backbone as shown in Figure 1. Although FSO equipment is undergoing continuous development, the emphasis is on improving its application to local area networks (LAN) and, in some cases, MANs (e.g., to close a short gap in a ring network), but not to long-haul relay systems. The design goal of a long-haul transmission system is to maximize the separation of relays in spanning distances between cities and countries. For that purpose, FSO is uneconomical compared to fiber optic or microwave radio systems .
Figure 1. Example of End-user Access to Backbone Network using FSO System
2. HISTORY
The engineering maturity of Free Space Optics (FSO) is often underestimated, due to a misunderstanding of how long Free Space Optics (FSO) systems have been under development. Historically, Free Space Optics (FSO) or optical wireless communications was first demonstrated by Alexander Graham Bell in the late nineteenth century (prior to his demonstration of the telephone!). Bell’s Free Space Optics (FSO) experiment converted voice sounds into telephone signals and transmitted them between receivers through free air space along a beam of light for a distance of some 600 feet. Calling his experimental device the “photophone,” Bell considered this optical technology – and not the telephone – his preeminent invention Because it did not require wires for transmission. Although Bell’s photophone never became a commercial reality, it demonstrated the basic principle of optical communications. By addressing the principal engineering challenges of Free Space Optics (FSO), the aerospace/defense activity established a strong foundation upon which today’s commercial laser-based Free Space Optics (FSO) systems are based.
3. OVERVIEW
Optical wireless communication has emerged as a viable technology for next generation indoor and outdoor broadband wireless applications. Applications range from short-range wireless communication links providing network access to portable computers, to last- mile links bridging gaps between end users and existing fiber optic communications backbones, and even laser communications in outer-space links. Indoor optical wireless communication is also called wireless infrared communication, while outdoor optical wireless communication is commonly known as free space optical (FSO) communication. In applying wireless infrared communication, non-directed links, which do not require precise alignment between transmitter and receiver, are desirable. They can be categorized as either line-of-sight (LOS) or diffuse links. LOS links require an unobstructed path for reliable communication, whereas diffuse links rely on multiple optical paths from surface reflections. On the other hand, FSO communication usually involves directed LOS and point-to-point laser links from transmitter to receiver through the atmosphere. FSO communication over few kilometer distances has been demonstrated at multi- Gbps data rates. FSO technology offers the potential of broadband communication capacity using unlicensed optical wavelengths. However, in- homogeneities in the temperature and pressure of the atmosphere lead to refractive index variations along the transmission path. These refractive index variations lead to spatial and temporal variations in optical intensity incident on a receiver, resulted in fading. In FSO communication, faded links caused by such atmospheric effects can cause performance degradation manifested by increased bit error rate (BER) and transmission delays. FSO technology has also emerged as a key technology for the development of rapidly deployable, secure, communication and surveillance systems, which can cooperate with other technologies to provide a robust, advanced sensor communication network. However, the LOS requirement for optical links reduces flexibility in forming FSO communication networks. Compared with broadcast radio frequency (RF) networks, FSO networks do not have an obvious simple ability to distribute data and control information within the network. The objective of the research work presented here is to answer the following questions regarding: 1) how to improve the performance of FSO links for long-range FSO communications, where atmospheric turbulence effects can be severe; and 2) how to accommodate the broadcast requirements for short-range FSO sensor networking applications. These challenging problems are addressed by two different approaches, yet there is a possibility that these two techniques can be combined and realized in one general purpose FSO communication system.
3.1 Comparison of Free Space Optical and Radio Frequency
Technologies
Traditionally, wireless technology is almost always associated with radio transmission, although transmission by carriers other than RF waves, such as optical waves, might be more advantageous for certain applications. The principal advantage of FSO technology is very high bandwidth availability, which could provide broadband wireless extensions to Internet backbones providing service to end-users. This could enable the prospect of delay-free web browsing and data library access, electronic commerce, streaming audio and video, video-on-demand, video teleconferencing, real-time medical imaging transfer, enterprise networking and work-sharing capabilities, which could require as much as a 100 Mbps data rate on a sustained basis. In addition, FSO permits the use of narrow divergence, directional laser beams, which if deployed appropriately, offer essentially very secure channels with low probability of interception or detection (LPI/LPD). Narrow FSO beams also have considerable obscuration penetrating capability. For example, penetration of dense fog over a kilometer distance is quite feasible at Gbps data rates with beam divergence of 0.1 mrad. The tight antenna patterns of FSO links allow considerable spatial re-use, and wireless networks using such connectivity are highly scalable, in marked contrast to ad- hoc RF networks, which are intrinsically non-scalable. However, FSO has some drawbacks as well. Since a LOS path is required from transmitter to receiver, narrow beam point-to-point FSO links are subject to atmospheric turbulence and obscuration from clouds, fog, rain, and snow causing performance degradation and possible loss of connectivity. In addition, FSO links can have a relatively short range, because the noise from ambient light is high, and also because the square- law nature of direct detection receiver doubles the effective path loss (in dB) when compared to a linear detector. Obviously, FSO communication will not replace RF communication, rather they will co-exist. Hybrid FSO/RF networks combine the advantages and avoid the disadvantages of FSO or RF alone. Even if the FSO connectivity cannot be provided all the time, the aggregate data rate in such networks is markedly greater than if RF links were used alone. RF alone does not have the band width for the transfer of certain types of data, for example high-definition video quality full-spectrum motion imagery. Hybrid wireless networks will provide maximum availability and capacity.
3.2 Free Space Optical Networking
The individual FSO link between transmitter and receiver can be naturally extended to an FSO network topology. FSO networks could serve in a metropolitan area to form a backbone of base stations providing service to both fixed and mobile users. For some other application where no base station is present, FSO transceivers may need to be able to communicate with one another to form ad-hoc sensor networking. In both cases, a scalable, robust, and controlled network topology is required.
A major drawback of directed LOS systems is their inability to deal with broadcast communication modes. In networking, broadcasting capability is frequently required in order to establish communication among multiple nodes. With this capability, networking data and control messages can easily be flooded over the whole network. This problem can be eliminated by using non-directed LOS optical links, which can be described as omnidirectional links. There have been a number of systems based on non-directed communications . They provide reliable performance mainly in short-range applications, i.e. several to tens of meters. For longer range networking applications, such as in a military context with a network of high-altitude aircraft or in battlefield scenarios, only directional FSO links can provide high data rate capability and channel security. For directional FSO links, pointing, acquisition, and tracking (PAT) schemes are necessary in order to establish and allow the flow of information within the networks. PAT involves a beamsteering device, which may be mechanical, such as galvo- mirror or gimbal; or nonmechanical, using an acousto-optic crystal or piezo-electric actuator. In typical directional FSO systems, there is a trade-off between the requirements of maximizing received optical power, yet minimizing the PAT sensitivity of the system.
Free Space Optics (FSO) communications, also called Free Space Photonics (FSP) or Optical Wireless, refers to the transmission of modulated visible or infrared (IR) beams through the atmosphere to obtain optical communications. Like fiber, Free Space Optics (FSO) uses lasers to transmit data, but instead of enclosing the data stream in a glass fiber, it is transmitted through the air. Free Space Optics (FSO) works on the same basic principle as Infrared television remote controls, wireless keyboard or wireless Palm® devices.
4. HOW FSO WORKS?
Free Space Optics (FSO) transmits invisible, eye-safe light beams from one "telescope" to another using low power infrared lasers in the teraHertz spectrum. The beams of light in Free Space Optics (FSO) systems are transmitted by laser light focused on highly sensitive photon detector receivers. These receivers are telescopic lenses able to collect the photon stream and transmit digital data containing a mix of Internet messages, video images, radio signals or computer files. Commercially available systems offer capacities in the range of 100 Mbps to 2.5 Gbps, and demonstration systems report data rates as high as 160 Gbps. Free Space Optics (FSO) systems can function over distances of several kilometers. As long as there is a clear line of sight between the source and the destination, and enough transmitter power, Free Space Optics (FSO) communication is possible.
4.1 Technology description.
4.1.1 General Framework.
Communication system design is concerned with tradeoffs between channel length, bit rate, and error performance. The generalized schema of a single-link communication system in Figure 2 provides the necessary framework to compare fiber optic and FSO technologies. Under each block are characteristics that transform its signal input to the different physical form of the signal output. The superscript N for each block transform represents noise contributed to the signal. For example, the “channel” block degrades the transmitter output signal due to processes listed under the block for fiber optic cable or FSO. Although both are optical communication systems, the fundamental difference between fiber optic and FSO systems is their propogation channels: dielectric waveguide versus the atmosphere. As a consequence, signal propagation, equipment design, and system planning are different for each type of system. The main thesis of the following discussion is that, because of their different propagation channels, the performance of FSO cannot be expected to match that of advanced fiber optic systems; therefore FSO applications will be more limited.
Figure 2. Single-link Communication System
4.2 FSO Characteristics.
Figure 3. Block Diagram, FSO Communication System
A generalized FSO system is shown in Figure 3. The baseband transmission bit stream is an input to the modulator, turning the direct current bias current on and off to modulate the laser diode (LD) or light emitting diode (LED) light source. The modulated beam then passes through a collimating lens that forms the beam into a parallel ray propagating through the atmosphere. A fundamental physical constraint, the diffraction limit, comes into play at this point. It says that the beam of an intensity modulated (non-coherent) light source cannot be focused to an area smaller than that at its source. Apart from the effects of atmospheric processes, even in vacuum, a light beam propagating through free space undergoes divergence or spreading. Recalling the single-link communication system in Figure 2, the transmitted FSO beam is transformed by several physical processes inherent to the atmosphere: frequency-selective (line) absorption, scattering, turbulence, and sporadic misalignment of transmitter and receiver due to displacement (twist and sway) of buildings or structures upon which the FSO equipment is mounted. These processes are non- stationary, which means that their influence on a link changes unpredictably with time and position. At the distant end, a telescope collects and focuses a fraction of the light beam onto a photo-detector that converts the optical signal to an electrical signal. The detected signal is then amplified and passes to processing, switching, and distribution stages. The. Figure 5 is an illustration of a simplified single-beam FSO transceiver that shows how the major functional blocks of the equipment are arranged and integrated.
Figure 5. Single-beam FSO Transceiver.
5. FACTORS AFFECTING THE PERFORMANCE OF THE FSO SYSTEM
The non-stationary atmospheric processes absorption, scattering, refractive turbulence, and displacement, are the factors that most limit the performance of FSO systems. A brief description of each is given in the following paragraphs. Divergence determines how much useful signal energy will be collected at the receive end of a communication link. It also determines how sensitive a link will be to displacement disturbances (see below). Of the processes that cause attenuation, divergence is the only one that is independent of the transmission medium; it will occur in vacuo just as much as in a stratified atmosphere. Laser light can be characterized as partially coherent, quasimonochromatic electromagnetic waves passing a point in a wave field. At the transmitter, beam divergence is caused by diffraction around the circular aperture at the end of the telescope. In practice, an FSO transmit beam is defocused from the diffraction limit enough to be larger than the diameter of the telescope at the receive end, and thus maintain alignment with the receiver in the face of random displacement disturbances.
5.1 Absorption.
Molecules of some gases in the atmosphere absorb laser light energy; primarily water vapor, Carbon Dioxide (CO2), and Methane, Natural Gas (CH4). The presence of these gases along a path changes unpredictably with the weather over time. Thus their effect on the availability of the link is also unpredictable. Another way of stating this is that different spectrum windows of transmission open up at different times, but to take advantage of these, the transmitter would have to be able to switch (or retune) to different wavelengths in a sort of wavelength diversity technique.
5.2 Scattering.
Another cause of light wave attenuation in the atmosphere is scattering from aerosols and particles. The actual mechanism is known as Mie scatter in which aerosols and particles comprising fog, clouds, and dust, roughly the same size as the light’s wavelength, deflect the light from its original direction. Some scattered wavelets travel a longer path to the receiver, arriving out of phase with the direct (unscattered) ray. Thus destructive interference may occur which causes attenuation. Note how attenuation is much more pronounced for the spectrum in 6(b) for transmission through fog.
Figure 6. Transmission Spectra for Light Traveling through (a) Clear
Air, and (b) Moderate Fog.
5.3 Refractive turbulence.
The photograph in Figure 7 shows the change from a smooth laminar structure of the atmosphere to turbulence. In the laminar region light refraction is predictable and constant, whereas in the turbulent region it changes from point to point, and from instant to instant. Small temperature fluctuations in regions of turbulence along a path cause changes in the index of refraction. One effect of the varying refraction is scintillation, the twinkling or shimmer of objects on a horizon, which is caused by random fluctuations in the amplitude of the light. Another effect is random fluctuations in the phases of the light’s constituent wavelengths, which reduces the resolution of an image.
figure 7
Refractive turbulence is common on rooftops where heating of the surface during daylight hours leads to heat radiation throughout the day. Also, rooftop air conditioning units are a source of refractive turbulence. These items must be considered when installing FSO transceivers to minimize signal fluctuations and beam shifts over time.
5.4 Displacement.
For an FSO link, alignment is necessary to ensure that the transmit beam divergence angle matches up with the field of view of the receive telescope. However, since FSO beams are quite narrow, misalignment due to building twist and sway as well as refractive turbulence can interrupt the communication link. One method of combating displacement is to defocus the beam so that a certain amount of displacement is possible without breaking the link. Another method is to design the FSO head with a spatial array of multiple beams so that at least one is received when the others are displaced. The latter technique circumvents the problem of displacement without sacrificing the intensity of the beam. FSO is technologically very similar to communication using fiber optic cables. Both use laser light to carry the 1s and 0s of digital data. But while traditional fiber optics transmits the laser light through a strand of glass, FSO sends the laser light through the air (“free space”). Since the two technologies are so similar, they share the same advantages of high data rate capacity and protocol independence. Both technologies are also very secure.
Some principle attributes of FSO communication:
1. Directional transmission with an extremely narrow transmit beam for point-to-point (line of sight) connectivity
2. The absence of “side lobe” signals
3. Complete, uninterrupted links required for successful communication
4. Protocol transparent transmission
5. Physical Layer operation
6. “Plug and Play” devices
These key features allow for very secure transmission over an FSO channel. To understand why this is the case, we first need to consider what must take place to successfully steal a communication signal. Two criteria must be satisfied for an individual to overcome the security in a network:
(1) they must intercept enough of the signal to reconstruct data packets and
(2) they must be able to decode that information.
If these two primary requirements cannot be met, the security of the network will remain intact. Given these two conditions, we will now examine how the above attributes of FSO transmission can be used to maintain a secure data link.
6. NETWORKING CONSIDERATIONS.
6.1 Characteristics of Transmission Control Protocol (TCP).
Because TCP does not differentiate between packet loss due to link errors and packet delay due to network congestion, FSO networking can be seriously crippled by packet loss due to signal attenuation (such as that caused by heat, fog, sand, or dirt). The effect of attenuation-induced packet loss is to invoke TCP’s congestion control algorithms, seriously reducing throughput on any particular link.
6.2 Routing Protocol Issues.
To maintain link and path availability, multiple routes from each node must be maintained due to the easily disrupted nature of FSO networking. Because FSO links are easily disrupted due to occlusion and other factors both on a very short time scale (millisecond to minute) as well as on a longer scale (minutes or more), normal routing protocols are not adequate. Normal routing protocols do not deal well with the very short time scale disruptions and, by design, are intended to deal with longer disruptions only (minutes or more). Three normal routing protocols, Routing Information Protocol (RIP), Open Shortest Path First (OSPF), and Enhanced Interior Gateway Routing Protocol (EIGRP), can take 10 to 90 seconds to discover a wireless link failure and re-route the traffic accordingly; during which time, data will be lost as the network will continue to attempt to use the failed link. To reacquire or reestablish a link that went down for perhaps a second or less at an inopportune time in the route status discovery cycle could take just as long. Mobile Ad Hoc Network (MANET) Protocols are being developed to be more responsive to topology dynamics, but are better suited to bandwidth constrained links as they trade routing performance for a reduction in network overhead. The best option that we have seen to date to overcome the routing problem is to exploit the ability of OSPF and EIGRP to respond to a loss of carrier at the physical interface. One study has shown that, after linking this to the existing re-route triggering mechanism in EIGRP, that re-routing can occur after 10 milliseconds as opposed to an average of 12 seconds.
6.3 Serial Networking Considerations.
Technical control facilities (TCF) are currently based on multiplexing data serially. The majority of the information processed through a TCF is serial data and voice. The usual multiplexing technique is Time Division Multiplexing (TDM) where each user is assigned to one (or more) ports of a multiplexer. All of the ports are then aggregated into one data stream. The current infrastructure allows transmission from point to point by many different means including radio transmission, wire, and fiber. FSO is able to transmit and receive this data seamlessly. User networks and the networks in the TCFs have started migrating to Internet Protocol (IP) based systems and will continue to do so. FSO is able to handle the transmission requirements for this migration.
7. MATURITY OF THE TECHNOLOGY.
As noted earlier, the free space propagation channel is essentially uncontrollable, so that FSO is more akin to microwave radio than to fiber optics. The opportunities for advancing the FSO art fall into two areas: equipment enhancements at the physical layer and system enhancements at the network layer. The physical layer enhancements would mitigate atmospheric and displacement disturbances, whereas the network layer would implement decision logic to buffer, retransmit, or reroute traffic in the event of an impassable link.Changeable atmospheric conditions along a path favor different wavelengths at different times; no single wavelength is optimal under all conditions. This raises the question whether FSO link performance can be improved by adaptively changing the source wavelength to match the conditions. Quantum cascade lasers (QCL), for example, can be tuned over a wide range of long-infrared (IR) wavelengths that includes the known atmospheric low absorption windows. Adaptive retuning to an optimal transmission wavelength, in response to dynamic conditions, might be done using either a single laser or an array of fixed wavelength lasers. In any case, one study indicates that adaptive retuning may result in only marginal improvements to link performance. At the receive end of a link, it turns out that the thermal noise from an array of small photo detectors is less than the noise from a single large detector with an equivalent field of view. Thus a significant improvement in the noise performance of FSO receivers is possible using the photo detector array. Scattering through fog and dust causes pulse spreading that leads to inter-symbol interference. A decision feedback adaptive equalizer has been proposed to combat this effect, but the authors caution that it would be effective only for relatively low data rates. Furthermore, adaptive optics could use wavefront sensors, and deformable mirrors and lenses to reduce FSO wavefront distortion from refractive turbulence. One author claims that, under certain circumstances, adaptive optics could provide several orders of magnitude improvement in BER against scintillation caused by turbulence. Several commercial FSO products use pointing and tracking control systems to compensate for displacement induced alignment errors. Existing systems employ electromechanical two-axis gimbal designs, therefore they are relatively expensive to adjust and maintain. As a non-mechanical alternative, optical phased arrays (OPA) are under development in which the phase difference of an array of lasers is controlled to form a desired beamwidth and orientation. Such arrays would be part of both the transmitter and receiver assemblies so as to achieve the maximum alignment over a path. The algorithms for such control systems are also an active research area in which the goal is replace simple proportional-integral-derivative (PID) loops with adaptive neural-network-based algorithms that enable more accurate estimates of the stochastic processes of particular FSO links.
At the network level buffering and retransmitting data are conventional communication protocol strategies, but they are less than optimal for networks bearing real-time services such as voice and video in addition to computer data. The concept of topology control has been proposed as a method of dealing with link degradation or outages without interrupting services. The idea is to establish a mesh of stations over a desired coverage area that would adaptively reroute traffic in response to link interruptions. This scheme requires either a proliferation of point-to-point transceivers for the network or an advanced pointing and tracking control system to accomplish the rerouting. Sophisticated software would also be required to monitor and control the route switching.
8. BENEFITS OF THE TECHNOLOGY.
The attraction of FSO is its high data transmission rate and its exemption from spectrum regulation. The latter is especially significant for military ground forces setting up camps and forward operating bases overseas. Whereas application for frequency assignments in the United States is a ponderous process, in a foreign country it is all the more so, and fraught with some uncertainty; the request may be denied, or services may be impaired by interferers due to poor frequency planning or intentional jamming. At the very least it is time consuming. To be able to circumvent the spectrum management bureaucracy is a huge advantage given urgent communication requirements. Since light beams do not interfere with each other as long as they are not coaxial, commanders need not be concerned with electromagnetic compatibility problems. FSO is as ready a resource as a light bulb in a socket, and installation of FSO equipment is quick and inexpensive. FSO’s drawbacks in the commercial world are perhaps not as serious in the military context. Using short FSO repeater spacings for camp communications may still be more economical than installing fiber optic cable and it allows more flexibility for re-routing lines of communication as the camp grows. In the Southwest Asia Theater for example, FSO could free up tactical equipment that has been used as a stopgap for camp communications, and eliminate runs of loose field wire. FSO would carry all communication services, not just voice or data separately. In the future the layout of new camps should perhaps plan for lanes for the paths of an FSO network. The transceivers should be placed low to the ground to employ short rigid mounts, but not so low as to be adversely affected by the bottom atmospheric layer disturbed by radiative heat energy from the ground surface.
9. CHALLENGES OF THE TECHNOLOGY.
9.1 Laser eye safety.
It is important to keep in mind, especially if FSO is to gain widespread use for camp communications, that lasers must be operated within certain levels of irradiance [w/m2] for eye safety. The harmful level of exposure is a function of wavelength and is tabulated in American National Standards Institute (ANSI) Standard Z136.1.
9.2 Disruption by weather.
Although FSO may at times be capable of greater range, its greater susceptibility to degradation from incidents of heavy fog or dust will drive down its attainable availability figures. This will depend on which region of the world FSO is planned for. For example, frequent dust storms of such severity as to result in black out conditions often occur in tactical desert conditions. Furthermore, the summer heat in the desert and along coastlines induces extreme refractive turbulence that would cause optical defocusing and beam wander.
10. ADVANTAGES
The transmission medium selection is based on many differing engineering requirements with cost and schedule being major considerations. FSO in serial transmission may be advantageous when requirements call for short transmission paths requiring quick installations. FSO devices have advantages to radio and fiber based systems if speed of installation is the dominating concern when providing the last mile connectivity. The setup of these systems is quick and as long as the distance requirements are within their scope of operation these devices may be considered as a viable option.
Main advantages are:
Quick link setup
License-free operation
High transmission security
High bit rates
Low bit error rate
No Fresnel zone necessary
Low snow and rain impact
Full duplex transmission
Protocol transparency
No interference
Great EMI behavior
In some devices, the beam can be visible, facilitating aiming and detection of failures.
11. DISADVANTAGES
As the serial data nature of TCFs change into IP based infrastructures point-to-point applications will decrease in favor of network centric infrastructures. This will reduce point-to-point applications in general. The limited link distance provided by FSO equipment limits the consideration of transmission applications to last mile applications. Path selection must be engineered to ensure that there are no obstacles that would impair signal quality.Technology disadvantages and behavior When used in a vacuum, for example for inter-space craft communication, FSO may provide similar performance to that of fibre-optic systems. However, for terrestrial applications, the principle limiting factors are:
•Atmospheric absorption
•Rain (lower attenuation)
•Fog (10..~100dB/km attenuation)
•Snow (lower attenuation)
•Scintillation (lower attenuation)
•Background light
•Shadowing
•Pointing stability in wind
•Pollution / smog
•If the sun goes exactly behind the transmitter, it can swamp the signal.
These factors cause an attenuated receiver signal and lead to higher bit error ratio (BER). To overcome these issues, vendors found some solutions, like multi-beam or multi-path architectures, which use more than one sender and more than one receiver. Some state-of-the-art devices also have larger fade margin (extra power, reserved for rain, smog, fog). To keep an eye-safe environment, good FSO systems have a limited laser power density and support laser classes 1 or 1M. Atmospheric and fog attenuation, which are exponential in nature, limit practical range of FSO devices to several kilometres.
12. PRODUCTS.
12.1 Current Products.
Current FSO technology is still developing. The number of manufacturers and types of systems are growing. In traditional FSO technology a single light source transmits to a single receiver. These systems typically have a throughput of 1 Gb/s. The distance transmitted is very limited from 200 to 1000 meters (typical systems operate up to 500 meters). Reliability of these devices is typically 99.9 percent in clear conditions, varying greatly depending on distance and weather conditions. The current cost of these systems is from $2500 - $3000 per unit (twice that per link). These traditional types of FSO products were evaluated by USAISEC’s engineering and evaluation facility, the Technology Integration Center (TIC) at Fort Huachuca, Arizona. The evaluations were to determine if an FSO solution could provide extensions to, a back up for, or an alternative to wired link technology in support of the Installation Information Infrastructure Modernization Program (I3MP). Recommendations for use were made for LightPointe Flight Spectrum 1.25G (TR. No. AMSEL-IE-TI-03067, July 2003), MRV TS3000G (TR. No. AMSEL-IE-TI-03070, July 2003), and Alcatel SONAbeam (TR. No. AMSEL-IE-TI- 03081, September 2003). The Terabeam Elliptica (TR. No. AMSEL-IE-TI-03068, July 2003)) was recommended as a backup link only due to bandwidth limitations (TR No. AMSEL-IE-04009, November 2003). Another product, AirFiber 5800 (TR No. AMSEL-IE-TI-03059, July 2003) was not recommended, because the manufacturer is no longer in business. Field testing was scheduled (TR No. AMSEL-IE-TI-05003) in Germany to test FSO technology over time and varying weather conditions. The preliminary field tests indicated that weather was a significant factor in link performance. In another military field application at the Pentagon, the SONAbeam S-Series FSO configuration performed with no link outages except when the line of sight path was blocked by helicopter air traffic. This was a point-to-point link and the loss of line of site path caused link outages. The link between the Pentagon and the Navy Annex covered approximately 500 meters. This loss of line-of-sight issue was significant at the Pentagon due to repeated path blockage by the air traffic eventually leading to the link being discontinued after 1 year of service. Industry has recognized the weather anomaly as a significant issue. SonaBeam and WaveBridge systems have four redundant lasers transmitting to a receiver. This provides physical diversity, increases link performance, and allows for a limited extended range increase over single source FSO products. The range increase provides an additional 1000 meters extending the total link distance to 2000 plus meters. Several manufacturers such as Pulse’s Omni-Node use active pointing and tracking control systems. FSO Mesh Network systems have also been developed. Omni-Node by Pulse provides three transceivers per device with an active tracking system. Also included in this product offering is redundant link fail-over. Hybrid systems using FSO and millimeter microwave technology are also available. Such systems are available from AirFiber and LightPointe. Hybrid systems approach carrier class reliability of 99.999 percent over 1 km at 1.25 GBs. These systems reduce the vulnerability of FSO during heavy fog conditions by using the millimeter microwave path and conversely reduce the vulnerability of millimeter microwave during heavy rain by using the FSO system. The two weather conditions rarely are simultaneous. Distance limitations are still less than 2 kms.
12.2 Near Future Products.
Crinis Networks has introduced an FSO product that competes with Ethernet and Fast Ethernet LAN connectivity for indoor applications. Crinis uses the terminology “indoor Free Space Optics (iFSO)” to describe this application. The Federal Communications Commission (FCC) issued license guidance for "E-Band" in October 2003. E-Band is an upper-millimeter wave band that operates over 71-76 Gigahertz (GHz), 81-86 GHz, and 92-95 GHz bands. It is licensed by the link, which can be done on line in a matter of days. It is meant to allow industry to use as a last mile solution for broadband applications. This technology should be a competitor with FSO and/or as part of the Hybrid system. Bandwidth of these devices is 1.25 Gb/s. Range is up to 2 kms. Manufacturers include Loea and ElvaLink. Costs are approximately $20K per link.
13. POTENTIAL APPLICATIONS
The current reliability of FSO systems with varying weather conditions severely limit the wide spread military application of these devices. Under conditions of rapid deployment requiring interconnected network nodes, these products provide a good temporary solution. This is especially true in urban areas. Due to the possibility of link interference due to obstruction and weather instability, the systems should be replaced with a cable infrastructure when possible. Mesh systems and multiple transmitter systems are an upgrade to the original FSO concept but have similar issues of reliability. Hybrid systems offer higher reliability and performance approaching carrier class reliability. Hybrid systems offer the most likely solution for military systems, but need further testing in varying conditions to confirm reliability in the deployed environment.
Typically scenarios for use are:
•LAN-to-LAN connections on campuses at Fast Ethernet or Gigabit Ethernet speeds.
•LAN-to-LAN connections in a city. Example, Metropolitan area network.
•To cross a road or other barriers.
•Speedy service delivery of high bandwidth access to fiber networks.
•Converged Voice-Data-Connection.
•Temporary network installation (for events or other purposes).
•Reestablish high-speed connection quickly (disaster recovery).
•As an alternative or upgrade add-on to existing wireless technologies.
•As a safety add-on for important fiber connections (redundancy).
•For communications between spacecraft, including elements of a satellite constellation.
•For interstellar communication.
The lightbeam can be very narrow, which makes FSO hard to intercept, improving security. FSO provides vastly improved EMI behavior using light instead of microwaves.
14. CONCLUSION.
While it is obviously an up and coming technology, it could also easily be described as only mature enough in its current state to use in limited applications. The applications that FSO technology seems most suited to are clear weather, short distance link establishment, such as last-mile connections to broadband network backbones, and backbone links between buildings in a MAN or CAN environment. There is also significant potential for use of this technology in temporary networks, where the advantages of being able to establish a CAN quickly or be able to relocate the network in the relatively short time frame outweigh the network unreliability issues. It should be noted that tactical implementations of this technology, or any highly-mobile implementation, are possible, but in its current state FSO has challenges providing adequate enough reliability to be considered a solution for the mobile Warfighter without resorting to a hybrid solution of FSO paired with another transmission technology (typically Millimeter Wave). Finally, past and current implementations and tests indicate that any future implementations of FSO technology should be carefully evaluated to ensure that no potential link interruptions are a factor before making the decision to actually implement an FSO link.
15. REFERENCES
1.http://www.freespaceoptic.com/
2.http://freespaceoptics.org/
3.http://www.free-space-optic.org/
4.http://www.wikipedia.org/
Subscribe to:
Posts (Atom)