Linked Data Vocabulary
Simple Knowledge Organisation System
(SKOS) [SKOS-REFERENCE] is a vocabulary description language for RDF designed for representing traditional knowledge organization systems such as enterprise taxonomies in RDF.
Linked Data API
A REST API that allows data publishers to provide URLs to lists of things and clients to retrieve machine-readable data from those URLs.
Friend of a Friend
A Semantic Web vocabulary describing people and their relationships for use in resource descriptions. Commonly called ["FOAF".]
Neutral URI
A URI that avoids the exposure of implementation details within the URI itself.
Linked Data client
A Web client that supports HTTP content negotiation for the retrieval of Linked Data from URLs and/or SPARQL endpoints. A Linked Data client understands standard REST API, for example the Linked Data REST API. Examples of Linked Data clients include: Tim Berners-Lee's early Tabulator browser, gFacet, and the Callimachus Shell (CaSH).
Dataset, RDF
A collection of RDF data, comprising one or more RDF graphs that is published, maintained, or aggregated by a single provider. In SPARQL, an RDF Dataset represents a collection of RDF graphs over which a query may be performed.
Linkset
A collection of RDF links between two datasets.
Graph
A collection of objects (represented by "nodes") any of which may be connected by links between them.
Quad Store
A colloquial phrase for an RDF database that stores RDF triples plus an additional element of information, often used to collect statements into groups.
Linked Open Data Cloud
A colloquial phrase for the total collection of Linked Data published on the Web.
cURL
A command line Open Source/Free Software client that can transfer data, including machine readable RDF, from or to a server using one of its many supported protocols.
Linking Open Data Project
A community activity started in 2007 by the W3C's Semantic Web Education and Outreach (SWEO) Interest Group. The project's stated goal is to "make data freely available to everyone".
DBpedia
A community effort to extract structured information from Wikipedia and make it available on the Web. DBpedia is often depicted as a hub for the Data Cloud. An RDF representation of the metadata held in Wikipedia and made available for SPARQL query on the World Wide Web.
Closed World
A concept from Artificial Intelligence and refers to a model of uncertainty that an agent assumes about the external world. In a closed world, the agent presumes that what is not known to be true must be false. This is a common assumption underlying relational databases, most forms of logical programming.
Connection
A concept from computer networking. It refers to a transport layer virtual circuit established between two programs for the purpose of communication.
Data Market
A data market, also called a Data Marketplace, is an online (broker) service to enable discovery and access to a large collection of datasets offered by a range of data providers. Examples include Infochimps, Azure Marketplace and Factual. Data Markets may include open as well as paid-for data, and may offer value added services such as APIs and visualizations and programmatic data access.
Linked Open Data Cloud diagram
A diagram representing datasets published by the Linking Open Data project from 2007-2011. The diagram stopped being updated when individual datasets could no longer be meaningfully represented in a single diagram due to the number of total datasets.
Directed Graph
A directed graph is a graph in which the links between nodes are directional, i.e., they only go from one node to another. RDF represents things (nouns) and the relationships between them (verbs) in a directed graph. In RDF, links are labelled by being assigned unique URIs.
Resource Description Framework (RDF)
A family of international standards for data interchange on the Web produced by W3C. Resource Description Framework (RDF) is based on the idea of identifying things using Web identifiers or HTTP URIs, and describing resources in terms of simple properties and property values.
CC-BY-SA License
A form of Creative Commons license for resources released online. Work available under a CC-BY-SA license means you can include it in any other work under the condition that you give proper attribution. If you create derivative works (such as modified or extended versions), then you must also license them as CC-BY-SA.
Ontology
A formal model that allows knowledge to be represented for a specific domain. An ontology describes the types of things that exist (classes), the relationships between them (properties) and the logical ways those classes and properties can be used together (axioms).
Internationalized Resource Identifier
A global identifier standardized by joint action of the World Wide Web Consortium and Internet Engineering Task Force. An IRI may or may not be resolvable on the Web. A generalization of URIs that allow characters from the Universal Character Set (Unicode). Slowly replacing URIs.
Persistent Identifier Scheme
A mechanism for resolution of virtual resources. Persistent Uniform Resource Locator (PURLs) implement one form of persistent identifier for virtual resources. PURLs are valid URLs and their components must map to the URL specification. The scheme part tells a computer program, such as a Web browser, which protocol to use when resolving the address. The scheme used for PURLs is generally HTTP. Other persistent identifier schemes include Digital Object Identifiers (DOIs), Life Sciences Identifiers (LSIDs) and INFO URIs. All persistent identification schemes provide unique identifiers for (possibly changing) virtual resources, but not all schemes provide curation opportunities.
RDF-JSON
A name for one of the early proposals for serializing RDF in JavaScript Object Notation (JSON) [RFC4627]. RDF-JSON is still widely used. Originally proposed as the Talis Platform API Output Type. See also a concrete syntax in JSON [RFC4627] for RDF as defined in the RDF Concepts and Abstract Syntax [RDF-CONCEPTS] W3C Recommendation and JSON-LD which are more recent W3C documents.
Namespace IRI
A namespace IRI is a base IRI shared by all terms in a given vocabulary or ontology.
Linked Data
A pattern for hyperlinking machine-readable data sets to each other using Semantic Web techniques, especially via the use of RDF and URIs. Enables distributed SPARQL queries of the data sets and a browsing or discovery approach to finding information (as compared to a search strategy). Linked Data is intended for access by both humans and machines. Linked Data uses the RDF family of standards for data interchange (e.g., RDF/XML, RDFa, Turtle) and query (SPARQL). If Linked Data is published on the public Web, it is generally called Linked Open Data.
Dublin Core Metadata Initiative
A public, not-for-profit organization with a mission to promote interoperable metadata design and innovative practice. The Dublin Core Metadata Initiative (DCMI) manages the long-term curation and development of metadata standards such as the Dublin Core Element Set and DCMI Metadata Terms.
Semantic Web Search Engine
A search engine capable of making use of semantic technologies to model its knowledge base and to deliver content.
Protocol
A set of instructions for transferring data from one computer to another over a network. A protocol standard defines both message formats and the rules for sending and receiving those messages. One of the most common Internet protocols is the Hypertext Transfer Protocol (HTTP).
Linked Data Platform
A specification that defines a REST API to read and write Linked Data for the purposes of enterprise application integration. The Linked Data Platform describes the use of a REST API for accessing, updating, creating and deleting resources from servers. See also [LDP-ONE]
N-Triples
A subset of Turtle that defines a line-based format to encode a single RDF graph. Used primarily as an exchange format for RDF data.
Comma Separated Values (CSV)
A tabular data format in which columns of information are separated by comma characters. CSV files are a non-proprietary format and are considered 3-star data on the 5-star scale.
RDF database
A type of database designed specifically to store and retrieve RDF information. May be implemented as a triple store, quad store or other type.
Dublin Core Metadata Terms
A vocabulary of bibliographic terms used to describe both physical publications and those on the Web. An extended set of terms beyond those basic terms found in the Dublin Core Metadata Element Set
Content Negotiation
Also called "conneg", refers to a phase in establishing a network connection. In the HTTP Protocol, the use of a message header to indicate which response formats a client will accept. Content negotiation allows HTTP servers to provide different versions of a resource representation in response to any given URI request. See also [HTTP Protocol 1.1]. See also Connection.
API
An Application Programming Interface (API) is an abstraction implemented in software that defines how others should make use of a software package such as a library or other reusable program. APIs are used to provide developers access to data and functionality from a given system.
Resource Description Framework in attributes (RDFa)
An RDF syntax encoded in HTML documents. RDFa provides a set of markup attributes to augment the visual information on the Web with machine-readable hints. It is a standard of the World Wide Web Consortium.
RDF/XML
An RDF syntax encoded in XML. A standard of the W3C. [RDF]
ETL
An abbreviation for extract, transform, load. Linked Data developers routinely extract data from a relational database, transform data to RDF Triples, and load it into an RDF database.
Response
An action by a server taken as the result of a request by a client. In HTTP, a response provides a resource representation to the calling client.
REST API
An application programming interface (API) implemented using HTTP and the principles of REST to allow actions on Web resources. The most common actions are to create, retrieve, update and delete resources
Representational State Transfer (REST)
An architectural style for information systems used on the Web. It explains some of the Web's key features, such as extreme scalability and robustness to change. REST is the foundation of the World Wide Web and the dominant Web service design model. The term "Representational State Transfer" was introduced and defined in 2000 by Roy Thomas Fielding in his doctoral dissertation. See also "Architectural Styles and the Design of Network-based Software Architectures" by Roy Thomas Fielding.
Semantic Web
An evolution or part of the World Wide Web that consists of machine-readable data in RDF and an ability to query that information in standard ways (e.g. via SPARQL)
Apache License
Apache License, version 2.0 is used for many Linked Data tools and projects. It is a popular Open Source license published by the Apache Software Foundation.
Controlled Vocabulary
Carefully selected sets of terms that are used to describe units of information; used to create taxonomies, thesauri and ontologies. In traditional settings the terms in the controlled vocabularies are words or phrases, in a linked data setting then they are normally assigned unique identifiers (URIs) which in turn link to descriptive phrases.
CURIEs
Compact URI expressions (CURIEs) are an RDFa approach for shortening URIs.
DCAT
Data Catalog Vocabulary (DCAT) is an RDF vocabulary. It is designed to facilitate interoperability between data catalogs published on the Web. See also Data Catalog Vocabulary (DCAT).
Data Cloud
Data cloud, also called the Linked Data Cloud, is a visual representation of datasets published as Linked Data. Many academic institutions republish data from their respective governments as Linked Data, often enhancing the representation in the process.
Machine Readable Data
Data formats that may be readily parsed by computer programs without access to proprietary libraries. For example, CSV, TSV and RDF formats are machine readable, but PDF and Microsoft Excel are not. Creating and publishing data following Linked Data principles helps search engines and humans to find, access and re-use data. Once information is found, computer programs can re-use data without the need for custom scripts to manipulate the content. Publishing machine readable data using Linked Data principles provides a human and machine readable version. For example, Wikipedia includes a Web page about the color Red. DBpedia, the database containing structured content contained in Wikipedia, allows a Linked Data client to look up "Red" [http://wikipedia.org/wiki/Red] by changing "wiki" to "data" and appending the appropriate file extension. $ curl -L http://dbpedia.org/data/Red.ttl
Data Modeling
Data modeling is a process of organizing data and information describing it into a faithful representation of a specific domain of knowledge. Linked data modeling applies modeling techniques based on Linked Data Principles.
Provenance
Data related to where, when and how information was acquired.
Description Logic
Description Logic (DL) is a family of knowledge representation languages with varying and adjustable expressivity. DL is used in artificial intelligence for formal reasoning on the concepts of an application domain. The Web Ontology Language (OWL) provides a standards-based way to exchange ontologies and includes a Description Logic semantics as well as an RDF based semantics. Biomedical informatics applications often use DL for codification of healthcare and life sciences knowledge.
Document Type Definition
Document Type Definition (DTD) refers to a type of schema for defining a markup language, such as in XML or HTML (or their predecessor SGML).
Domain Name System (DNS)
Domain Name System (DNS) refers to the Internet's mechanism for mapping between a human-readable host name (e.g. www.example.com) and an Internet Protocol (IP) Address (e.g. 203.20.51.10).
Dublin Core Metadata Element Set
Dublin Core Metadata Element Set refers to a vocabulary of fifteen properties for use in resource descriptions, such as may be found in a library card catalog (creator, publisher, etc). The Dublin Core Metadata Element Set, also known as "DC Elements", is the most commonly used vocabulary for Linked Data applications.
Free/Libre/Open Source Software
Free, also known as Libre or Open Source, is a generic and internationalized term for software released under an Open Source license.
Natural Keys
Human-readable categories and sub-identifiers within a URI that reflect what the identifier describes. They are recommended when creating URIs so that people reading RDF in its source format (mostly developers) will be able to more quickly understand it.
HyperText Transfer Protocol
HyperText Transfer Protocol (HTTP) is the standard transmission protocol [RFC2616] used on the World Wide Web to transfer hypertext requests and information between Web servers and Web clients (such as browsers). It is an IETF standard.
Internet Engineering Task Force
IETF is an open international community concerned with the evolution of Internet architecture and the operation of the Internet. It has defined standards such as HTTP and DNS.
International Organization for Standards
ISO refers to a network of the national standards institutes of over 160 countries that cooperate to define international standards. It defines many standards including, in the linked data context, formats for dates and currency.
Resource
In an RDF context, a resource can be anything that an RDF graph describes. A resource can be addressed by a Unified Resource Identifier (URI).
Object
In the context of RDF, the object is the final part of an RDF statement.
Entity
In the sense of an entity-attribute-value model, an entity is synonymous with the Subject of an RDF Triple.
What earns 5 stars in, 5 Star Open Data?
In your RDF, have the identifiers be links (URLs) to useful data sources.
Inference
Inference is the process of deriving logical conclusions from a set of starting assumptions. Using Linked Data, existing relationships are modeled as a set of (named) relationships between resources. Linked Data helps humans and machines to find new relationships through automatic procedures that generate new relationships based on the data and based on some additional information in the form of a vocabulary.
Metadata
Information used to administer, describe, preserve, present, use or link other information held in resources, especially knowledge resources, be they physical or virtual. Metadata may be further subcategorized into several types (including general, access and structural metadata). Linked Data incorporates human and machine readable metadata along with it, making it self describing.
Metadata Object Description Schema
It is a bibliographic description system intended to be a compromise between MARC and DC metadata. It is implemented in XML Schema. See DC, MARC, XSD.
JSON
JavaScript Object Notation (JSON) is syntax for storing and exchanging text based information. JSON has proven to be a highly useful and popular object serialization and messaging format for the Web. See also: the application/json Media Type for JavaScript Object Notation (JSON) [RFC4627].
JSON-LD
JavaScript Object Notation for Linking Data (JSON-LD) [JSON-LD] is a language-independent data format for representing Linked Data, based on JSON. JSON-LD is capable of serializing any RDF graph or dataset and most, but not all, JSON-LD documents can be directly transformed to RDF. JSON-LD Syntax is easy for humans to read and write as well as, easy for machines to parse and generate. JSON-LD is an appropriate Linked Data interchange language for JavaScript environments, Web service and NoSQL databases.
Creative Commons Licenses
Licenses that include legal statements by the owner of copyright in intellectual property specifically allowing people to use or redistribute the copyrighted work in accordance with conditions specified therein.
Linked Open Data
Linked Data published on the public Web and licensed under one of several open licenses permitting reuse. Publishing Linked Open Data enables distributed SPARQL queries of the data sets and a "browsing" or "discovery" approach to finding information, as compared to a search strategy. See also: "Linked Data: Structured Data on the Web" [LD-FOR-DEVELOPERS] and "Linked Data: Evolving the Web into a Global Data Space" [HOWTO-LODP]
Government Open Data
Many government authorities have mandated publication of data to the public Web. The broad intention is to facilitate the maintenance of open societies and support governmental accountability and transparency initiatives. To realize the goals of improved efficiency, transparency and accountability, re-use of structured content available on the Web is enhanced by following Linked Data Principles.
Modeling Process
Modeling process in the context of RDF refers to the act by subject matter experts to work with developers to capture the context of data and define the relationships of the data. By doing so, high quality of Linked Data is obtained since capturing organizational knowledge about the meaning of the data within the RDF data model means the data is more likely to be reused correctly. Well defined context ensures better understanding, proper reuse, and is critical when establishing linkages to other data sets.
N3
N3 is an abbreviation for Notation3. It has a readable RDF syntax used for expressing assertion and logic. N3 [N3] is a superset of RDF, extending the RDF model by adding formulae (literals which are graphs themselves), variables, logical implication, and functional predicates. See also [Turtle].
ORG Ontology
ORG is an RDF vocabulary to enable publication of information about organizations and organizational structures, even at governmental level.
Data Warehouse
One approach to data integration in which data from various operational data systems is extracted, cleaned, transformed and copied to a centralized repository. The centralized repository can then be used for data mining or answering analytical queries. By contrast, Linked Data assumes a distributed approach of data management using HTTP URIs to describe and access information resources. A Linked Data approach is seen as an valid alternative to the centralized data warehouse approach especially when integrating datasets available on the public Web.
Query
Programmatic retrieval of resources and their relationships. Using the SPARQL language, developers issue queries based on (triple) patterns.
Linked Data Principles
Provide a common API for data on the Web which is more convenient than many separately and differently designed APIs published by individual data suppliers. TB-Lee, , proposed the following principles upon which Linked Data is based: 1-Use URIs to name things; 2-Use HTTP URIs so that things can be referred to and looked up ("dereferenced") by people and user agents; 3-When someone looks up a URI, provide useful information, using the open Web standards such as RDF, SPARQL; 4-Include links to other related things using their URIs when publishing on the Web.
What earns 1 star in, 5 Star Open Data?
Publish data on the Web in any format (e.g., PDF, JPEG) accompanied by an explicit Open License (expression of rights).
What earns 4 stars in, 5 Star Open Data?
Publish structured data on the Web as RDF (eg Turtle, RDFa, JSON-LD, SPARQL)
What earns 3 stars in, 5 Star Open Data?
Publish structured data on the Web in a documented, non-proprietary data format (e.g., CSV, KML).
What earns 2 stars in, 5 Star Open Data?
Publish structured data on the Web in a machine-readable format (e.g., XML).
R2RML
R2RML (RDB to RDF Mapping Language) is a language for expressing customized mappings from relational databases to RDF datasets. Such mappings provide the ability to view existing relational data in the RDF data model, expressed in a structure and target vocabulary of the mapping author's choice.
Raw Data
Raw data refers to machine-readable files from the wilderness released without any specific effort to make them applicable to a particular application. Raw data typically requires additional scripts or programs to process the data.
5 Star Linked Open Data
Refers to an incremental framework for deploying data. Tim Berners-Lee, the inventor of the Web and initiator of the Linked Data project, suggested a 5 star deployment scheme for Linked Open Data. The 5 Star Linked Data system is cumulative. Each additional star presumes the data meets the criteria of the previous step(s). 5 Star Linked Open Data includes an Open License (expression of rights) and assumes publications on the public Web. Organizations may elect to publish 5 Star Linked Data, without the word "open", implying that the data does not include an Open License (expression of rights) and does not imply publication on the public Web.
Open Government Data
Refers to content that is published on the public Web by government authorities in a variety of non-proprietary formats.
Request
Request refers to a stage in the HTTP protocol. A request message from a client to a server includes, within the first line of that message, the method to be applied to the resource, the identifier of the resource, and the protocol version in use.
Schema
Schema refers to a data model that represents the relationships between a set of concepts. Some types of schemas include relational database schemas (which define how data is stored and retrieved), taxonomies and ontologies.
Sesame
Sesame is an Open Source Software implementation of a Semantic Web development framework. It supports the storage, retrieval and analysis of RDF information. See also [Open RDF].
Sindice
Sindice WAS a search engine for Linked Data. It offers search and querying capabilities across the data it knows about, as well as specialized APIs and tools for presenting Linked Data summaries.
Semantic Web Standards
Standards of the World Wide Web Consortium relating to the Semantic Web, including RDF [RDF], RDFa [RDFa-PRIMER], SKOS [SKOS-REFERENCE], OWL [OWL2] and SPARQL 1.1 Overview [SPARQL-11].
Data Hub, The
The Data Hub is a specific site offering a community-run catalogue of data sets of data on the Internet, powered by the open-source data portal platform CKAN. The Data Hub is an openly editable open data catalogue in the style of Wikipedia.
Message
The basic unit of HTTP communication. It consists of a structured sequence of octets matching the syntax defined as an HTTP Message and transmitted via the connection.
Semantic Technologies
The broad set of technologies that related to the extraction, representation, storage, retrieval and analysis of machine-readable information.
Predicate
The middle term (the linkage, or "verb") in an RDF statement. For example, in the statement "Alice knows Bob" then "knows" is the predicate which connects "Alice" (the subject of the statement) to "Bob" (the object of the statement).
Fragment Identifier
The part of an HTTP URI that follows a hash symbol ('#'). Fragment identifiers are not passed to Web servers by Web clients such as Web browsers.
HyperText Markup Language
The predominant markup language for hypertext pages on the Web. HyperText Markup Language (HTML) defines the structure of Web pages and is a family of W3C standards.
RDF Schema
The simplest RDF vocabulary description language. It provides much less descriptive capability than the Simple Knowledge Organization System (SKOS) or the Web Ontology Language (OWL). A standard of the W3C [RDFS]
Persistent Uniform Resource Locator
URLs that act as permanent identifiers in the face of a dynamic and changing Web infrastructure. Persistent Uniform Resource Locators (PURLs) redirect to the current location of or proxy specific Web content. A user of a PURL always uses the same Web address, even though the resource in question may have moved or changed ownership.
Dereferenceable URIs
When an HTTP client can look up a URI using the HTTP protocol and retrieve a description of the resource, it is called a dereferenceable URI. Dereferenceable URIs applies to URIs that are used to identify classic HTML documents and URIs that are used in the Linked Data context [COOL-SWURIS] to identify real-world objects and abstract concepts.