Tech Terms
Distributed Computing
When the capacity of that single server is reached, you need to "scale-out" and distribute that load across multiple servers
Database Tuning
Database Tuning is the activity of making a database application run more quickly. "More quickly" usually means higher throughput, though it may mean lower response time for time-critical applications.
Types of NoSQL Databases (4)
"(1) *Key-Value Stores* - These databases pair keys to values. Examples are: Dynamo, MemcacheDB, Redis, Riak, FairCom c-treeACE, Aerospike, OrientDB, MUMPS, HyperDex, Azure Table Storage (2) *Graph Stores* - These excel at dealing with interconnected data. Graph databases consist of connections, or edges, between nodes. Examples include: Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso, Stardog, Sesame (3) *Column Stores* - Relational databases store all the data in a particular table's rows together on-disk, making retrieval of a particular row fast. Column-family databases generally serialize all the values of a particular column together on-disk, which makes retrieval of a large amount of a specific attribute fast. Examples include: Accumulo, Cassandra, Druid, HBase, Vertica (4) *Document Stores* - These databases store records as "documents" where a document can generally be thought of as a grouping of key-value pairs. Keys are always strings, and values can be stored as strings, numeric, Booleans, arrays, and other nested key-value pairs. Examples include: Lotus Notes, Clusterpoint, Apache CouchDB, Couchbase, MarkLogic, MongoDB, OrientDB, Qizx, Cloudant, Azure DocumentDB
Relational vs Non-Relational Data
*Relational databases* like MySQL, PostgreSQL and SQLite3 represent and store data in tables and rows. They're based on a branch of algebraic set theory known as relational algebra. Meanwhile, *non-relational databases* like MongoDB represent data in collections of JSON documents.
Bare Metal
A 'bare-metal server' is a computer server that is a 'single-tenant physical server'. The term is used nowadays to distinguish it from modern forms of virtualization and cloud hosting. Bare-metal servers have a single 'tenant'. They are not shared between customers. A bare metal compute instance gives you dedicated physical server access for highest performance and strong isolation.
Non-Relational Databases
A Non-Relational (NoSQL) Database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. The most popular NoSQL databases include MongoDB, DocumentDB, Cassandra, Coachbase, HBase, Redis, and Neo4j. These databases are usually grouped into four categories: Key-value stores, Graph stores, Column stores, and Document stores.
Virtual Machine (VM)
A Virtual Machine (VM) is an independent computing environment that runs on top of physical bare metal hardware. The virtualization makes it possible to run multiple VMs that are isolated from each other. VMs are ideal for running applications that do not require the performance and resources (CPU, memory, network bandwidth, storage) of an entire physical machine. An Oracle Cloud Infrastructure VM compute instance runs on the same hardware as a Bare Metal instance, leveraging the same cloud-optimized hardware, firmware, software stack, and networking infrastructure.
Blockchain
A blockchain, originally block chain, is a continuously growing list of records, called blocks, which are linked and secured using cryptography.
Computer Cluster
A computer cluster is a set of loosely or tightly connected computers that work together so that, in many respects, they can be viewed as a single system.
Service Oriented Architecture (SOA)
A service-oriented architecture (SOA) is a style of software design where services are provided to the other components by application components, through a communication protocol over a network. The basic principles of service-oriented architecture are independent of vendors, products and technologies. A service is a discrete unit of functionality that can be accessed remotely and acted upon and updated independently, such as retrieving a credit card statement online. A service has four properties according to one of many definitions of SOA: It logically represents a business activity with a specified outcome. It is self-contained. It is a black box for its consumers. It may consist of other underlying services.
Data Warehouse
A system used for *reporting* and *data analysis*, and is considered a core component of *business intelligence*. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise. The typical Extract, transform, load (ETL)-based data warehouse uses staging, data integration, and access layers to house its key functions.
Agile Development
Agile software development describes an approach to software development under which requirements and solutions evolve through the collaborative effort of self-organizing cross-functional teams and their customer/end users.
Application Server
An application server is a software framework that provides both facilities to create web applications and a server environment to run them.[1] Application Server Frameworks contain a comprehensive service layer model. An application server acts as a set of components accessible to the software developer through a standard API defined for the platform itself. For Web applications, these components are usually performed in the same running environment as their web server(s), and their main job is to support the construction of dynamic pages. However, many application servers target much more than just Web page generation: they implement services like clustering, fail-over, and load-balancing, so developers can focus on implementing the business logic. In the case of Java application servers, the server behaves like an extended virtual machine for running applications, transparently handling connections to the database on one side, and, often, connections to the Web client on the other.
Integrated Development Environment (IDE)
An integrated development environment is a software application that provides comprehensive facilities to computer programmers for software development. An IDE normally consists of a source code editor, build automation tools, and a debugger.
Apache Hadoop
Apache Hadoop is an open-source software framework used for distributed storage and processing of datasets of big data using the MapReduce programming model. It consists of computer clusters built from commodity hardware.
Apache Kafka
Apache Kafka is an open-source stream processing software platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Kafka is used for building real-time data pipelines and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
Apache Spark
Apache Spark is an open-source cluster-computing framework.
API
Application Programming Interface
Continuous Delivery/Deployment (CD)
Continuous Delivery (CD) is the continual delivery of code to an environment once the developer feels the code is ready to ship
Continuous Integration (CI)
Continuous Integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an automated build, allowing teams to detect problems early.
Database Virtualization
Database virtualization is the decoupling of the database layer, which lies between the storage and application layers within the application stack. ... This enables both the sharing of single server resources for multi-tenancy, as well as the pooling of server resources into a single logical database or cluster.
Elastic Compute
Elastic computing is a concept in cloud computing in which computing resources can be scaled up and down easily by the cloud service provider. Elastic computing is the ability of a cloud service provider to provision flexible computing power when and wherever required.
XML
Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. Non-relational databases usually work with JSON or XML
JSON
JSON, or JavaScript Object Notation, is a minimal, readable format for structuring data (considered semi-structured data). It is used primarily to transmit data between a server and web application, as an alternative to XML. Non-relational databases usually work with JSON or XML.
Machine Learning
Machine learning is a field of computer science that gives computer systems the ability to "learn" with data, without being explicitly programmed
OLAP
OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems a response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema).
OLTP
OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second. In OLTP database there is detailed and current data, and schema used to store transactional databases is the entity model (usually 3NF).
Container Database (CDB)
On the surface this seems very similar to a conventional Oracle database, as it contains most of the working parts you will be already familiar with (controlfiles, datafiles, undo, tempfiles, redo logs etc.). It also houses the data dictionary for those objects that are owned by the root container and those that are visible to all Portable Databases (PDBs).
Database Diagnostics
Oracle Diagnostics Pack offers a comprehensive set of automatic performance diagnostics and monitoring functionality built into core database engine and Oracle Enterprise Manager.
Pluggable Database
Oracle created a "container database" concept with release 12c. A container database is the main "physical" database, and can contain "pluggable" databases, also called multi-tenant. Each pluggable database is logically whole, but can share memory, undo/redo/archived log/etc. This at least initially can lower resources required.
Deep Learning
Part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, semi-supervised or unsupervised.
Polygot
Polyglot is the ability to speak multiple languages. Typically, polyglot is a term used in the context of Platform as a Service (PaaS). But whether it is Cloud or not, developers today need to learn more than one language and runtime.
RESTful Programming
REST is an architecture style for designing networked applications. It relies on a stateless, client-server, cacheable communications protocol -- and in virtually all cases, the HTTP protocol is used. *RE*presentational *S*tate *T*ransfer
Relational Databases
Relational databases, which can also be called relational database management systems (RDBMS) or Structured Query Language (SQL) databases usually work with structured data. The most popular of these are Microsoft SQL Server, Oracle Database, MySQL, and IBM DB2. These RDBMS's are mostly used in large enterprise scenarios, with the exception of MySQL, which is mostly used to store data for web applications, typically as part of the popular LAMP stack (Linux, Apache, MySQL, PHP/ Python/ Perl).
Portable Database (PDB)
Since the CDB contains most of the working parts for the database, the PDB only needs to contain information specific to itself. It does not need to worry about controlfiles, redo logs and undo etc. Instead it is just made up of datafiles and tempfiles to handle it's own objects. This includes it's own data dictionary, containing information about only those objects that are specific to the PDB. From Oracle 12.2 onward a PDB can, and should, have a local undo tablespace.
Structured vs Unstructured Data
Structured data is comprised of clearly defined data types whose pattern makes them easily searchable, while unstructured data - "everything else" - is comprised of data that is usually not as easily searchable, including formats like audio, video, and social media postings.
Structured Data
Structured data usually resides in relational databases (RDBMS). Fields store length-delineated data: phone numbers, Social Security numbers, or ZIP codes. Common relational database applications with structured data include airline reservation systems, inventory control, sales transactions, and ATM activity. Structured Query Language (SQL) enables queries on this type of structured data within relational databases.
Database Schema
The database schema of a database system is its structure described in a formal language supported by the database management system (DBMS). The term "schema" refers to the organization of data as a blueprint of how the database is constructed (divided into database tables in the case of relational databases). The formal definition of a database schema is a set of formulas (sentences) called integrity constraints imposed on a database.[citation needed] These integrity constraints ensure compatibility between parts of the schema.
Unstructured Data
Unstructured data is essentially everything else. Unstructured data has internal structure but is not structured via pre-defined data models or schema. It may be textual or non-textual, and human- or machine-generated. It may also be stored within a non-relational database like NoSQL. Typical human-generated unstructured data includes: Text files: Word processing, spreadsheets, presentations, email, logs. Email: Email has some internal structure thanks to its metadata, and we sometimes refer to it as semi-structured. However, its message field is unstructured and traditional analytics tools cannot parse it. Social Media: Data from Facebook, Twitter, LinkedIn. Website: YouTube, Instagram, photo sharing sites. Mobile data: Text messages, locations. Communications: Chat, IM, phone recordings, collaboration software. Media: MP3, digital photos, audio and video files. Business applications: MS Office documents, productivity applications. Typical machine-generated unstructured data includes: Satellite imagery: Weather data, land forms, military movements. Scientific data: Oil and gas exploration, space exploration, seismic imagery, atmospheric data. Digital surveillance: Surveillance photos and video. Sensor data: Traffic, weather, oceanographic sensors.