db final

Ace your homework & exams now with Quizwiz!

26.6 - Discuss how time is represented in temporal databases and compare the different time dimensions.

For temporal databases, time is considered to be an ordered sequence of points in some granularity that is determined by the application

25.11. What are the core properties of a job in MapReduce?

the Map task the Reduce task the Input that the Job is to run on: typically specified as an HDFS path(s) the Format(Structure) of the Input the Output path the Output Structure the Reduce-side parallelism

24.2. What are the main categories of NOSQL systems? List a few of the NOSQL systems in each category.

1.Document-based NOSQL systems: These systems store data in the form of documents using well-known formats, such as JSON (JavaScript Object Notation). Documents are accessible via their document id, but can also be accessed rapidly, using other indexes. 2.NOSQL key-value stores: These systems have a simple data model based on fast access by the key to the value associated with the key; the value can be a record or an object or a document or even have a more complex data structure. 3.Column-based or wide column NOSQL systems: These systems partition a table by column into column families, where each column family is stored in its own files. They also allow versioning of data values 4.Graph-based NOSQL systems: Data is represented as graphs, and related nodes can be found by traversing the edges using path expressions. Hybrid NOSQL systems: These systems have characteristics from two or more of the above four categories. 6.Object databases 7.XML database

30.4 - Discuss the types of privileges at the account level and those at the table level.

Account level The DBA specifies the particular privileges that each account holds independently of the tables in the database Example: CREATE SCHEMA or CREATE TABLE privilege, The CREATE VIEW privilege; The ALTER privilege, to apply schema changes such as adding or removing attributes from relations The DROP privilege, the MODIFY privilege, and the SELECT privilege. Table level Each table T assigned an owner account Owner of a table given all privileges on that table Owner can grant privileges to other users on any owned table SELECT (retrieval or read) privilege on T Modification privilege on T References privilege on T Privileges at the table level specify for each user the individual tables on which each type of command can be applied. Some privileges also refer to individual columns (attributes) of tables.

24.12. What are the main CRUD operations in Hbase? Discuss the storage and distributed system methods used in Hbase.

Create - create new table specifying one or more column families at time of creation Put - insert new data items into the data cells of a table, or entering new versions of existing data items into the table. Get - fetching the data items from a table. These data items are associated with a single row in a table Scan - fetching the data items from all rows of a table HBase Data Model consists of following elements, Set of tables Each table with column families and rows Each table must have an element defined as Primary Key. Row key acts as a Primary key in HBase. Any access to HBase tables uses this Primary Key Each column present in HBase denotes attribute corresponding to object

16.11 - Describe DBA operations as database life cycle.

DBA operations are commonly defined and divided according to the phases of the Database Life Cycle (DBLC). Database planning, including standards, procedures, and enforcement Database requirements gathering and conceptual design Database logical and transaction design Database physical design and implementation Database testing and debugging Database operations and maintenance Database training and support Data quality monitoring and management

24.9. Discuss the data modeling concepts in DynamoDB and Data types of DynamoDB.

Data model is based on concepts of tables, items, and attributes. Table is a collection of self-describing items of two fields - attribute and value. Items in tables consist of (attribute,value) pairs such that the attribute values can have either single or multiple values. Table contains a primary key for locating items in a table. Primary key can be of the following types: Single attribute: used by DynamoDB system to build a has index on items present in the table, known as a hash type primary key Pair of attributes: Known as hash and range type primary key. Consists of a combination of attributes such that hashing would be performed with the attribute A as several items can have the same A value. Attribute Data types: Scalar − These types represent a single value, and include number, string, binary, Boolean, and null. Document − These types represent a complex structure possessing nested attributes, and include lists and maps. Set − These types represent multiple scalars, and include string sets, number sets, and binary sets. Action Data types: AttributeDefinition − It represents a key table and index schema. Capacity − It represents the quantity of throughput consumed by a table or index. CreateGlobalSecondaryIndexAction − It represents a new global secondary index added to a table. LocalSecondaryIndex − It represents local secondary index properties. ProvisionedThroughput − It represents the provisioned throughput for an index or table. PutRequest − It represents PutItem requests. TableDescription − It represents table properties.

26.25 - What are deductive databases?

Deductive database uses facts and rules Inference engine can deduce new facts using rules Prolog/Datalog notation Based on providing predicates with unique names Predicate has an implicit meaning and a fixed number of arguments If arguments are all constant values, predicate states that a certain fact is true If arguments are variables, considered as a query or part of a rule or constraint

30.7 - List the types of privileges available in SQL.

Granting and Revoking If you grant a privilege on an SQL Function or Procedure, then the user can only EXECUTE that SQL Function or Procedure. The user cannot access tables that the SQL Function or Procedure uses. Use the GRANT statement to grant privileges on a data object. The following describes the GRANT statement syntax.The REVOKE statement takes privileges away from users. The arguments are similar to the GRANT statement. The major difference is the additional RESTRICT or CASCADE keyword and the GRANT OPTION FOR clause. The following describes the optional clauses GRANT OPTION FOR and RESTRICT or CASCADE. NOTE: If none of the privileges that you are trying to revoke actually exist, an error is raised.

24.7. What are the data modeling concepts used in MongoDB?

Individual MongoDB documents are stored in a collection. Collections are able to be created into a project such that the project collection will contain all the documents related to the project. The ObjectID field known as _id uniquely identifies a document in a collection and can be either user specified or system generated. CRUD operations: Insert - insert documents into their respective collections Remove - removing documents from their respective collections Update - update a document, or replace a particular document with another document Find - used for read queries

16.19 - Describe the DBA`s Technical roles.

Main Important tasks: Evaluate, select, and install DBMS software and related utilities / tools from market according to need of business. Design and implement databases and applications. Test and evaluate databases and applications. Operate the DBMS, utilities/tools, and applications. Train and support users Maintain the DBMS, utilities, and applications and as per need advise for new database software.

25.9. Describe the execution workflow of the MapReduce programming environment.

MapReduce is based on two functions, the 'Map' and 'Reduce' functions. The Map function maps the provided value with specified criteria in the given key and prepares a list. The map function receives each key-value pair and provides zero or more key-value pair(s) as output. Its signature specifies the data types of Map function's inputs and outputs. The Reduce function receives the key-value pair(s) returned by the Map function and returns 1 or more key-value pairs as output. Reduce function's signature specifies the data types of its inputs and outputs. The output type of the function Map must match the input type of Reduce function. Each invocation of the Reduce function receives the catalog of frequencies for a provided word, computed on the Map side. Context is an intermediate system that both the functions interact with. After receiving the information from functions, context interacts with the framework. Clients use it for sending configuration information to tasks and the task uses it for getting access to HDFS (Hadoop Distributed Files System) and reading data straightforwardly from HDFS. Also, clients use it for outputting the key-value pair(s) as well as for sending the status back to the client.

24.3. What are the main characteristics of NOSQL systems?

NOSQL characteristics related to distributed databases and distributed systems 1.Scalability: In NOSQL systems, horizontal scalability is generally used, where the distributed system is expanded by adding more nodes for data storage and processing as the volume of data grows. 2.Availability, replication, and eventual consistency: Replication improves data availability and can also improve read performance and NOSQL applications do not require serializable consistency, so more relaxed forms of consistency known as eventual consistency are used. 3.Replication models: Two major replication models are used in NOSQL systems: master-slave and master-master replication. 4.Shardingof files: Files (or collections of data objects) can have many millions of records, these records can be accessed concurrently by thousands of users. So it is not practical to store the whole file in one node. Sharding(also known as horizontal partitioning), is often employed in NOSQL systems. 5.High performance data access: To find individual records or objects (data items) from among the millions of data records or objects in a file, most NoSQL systems use one of two techniques: hashingor range partitioning on object keys. In hashing, a hash function h(K) is applied to the key K, and the location of the object with key K is determined by the value of h(K). In range partitioning, the location is determined via a range of key values; NOSQL characteristics related to data models and query languages: Schema not required: The flexibility of not requiring a schema is achieved in many NOSQL systems by allowing semi-structured, self-describing data. There are various languages for describing semi-structured data, such as JSON (JavaScript Object Notation) and XML (Extensible Markup Language. As there may not be a schema to specify constraints, any constraints on the data would have to be programmed in the application programs that access the data items. 7.Less powerful query languages: NOSQL systems typically provide a set of functions and operations as a programming API, so reading and writing the data objects is accomplished by calling the appropriate operations by the programmer. In many cases, the operations are called CRUDoperations, for Create, Read, Update, and Delete. Some NOSQL systems not have the full power of SQL; only a subset of SQL querying capabilities would be provided. 8.Versioning: Some NOSQL systems provide storage of multiple versions of the data items, with the timestamps of when the data version was created.

24.1. For which types of applications were NOSQL systems developed?

NOSQL systems focus on storage of "big data". Typical applications that use NOSQL Social media, Web links, User profiles, Marketing and sales, Posts, Tweets Road maps and spatial data Email storage

24.14. What are the data modeling concepts used in the graph-oriented NOSQL system Neo4j?

Nodes: represent the entities of a database having several attributes. Node labels: Identify a particular node in a database Properties: Properties of a node Relationships/relationship types: Can be traversed in any direction, relationship types identify a particular relationship in a database. Can specify properties for a relationship.

16.18 - What are the DBA`s managerial roles? Explain in detail.

Offer end-user support Enforce policies, procedures, and standards for correct data creation, usage, and distribution within the database Provide data security, privacy, and integrity Supply data backup and recovery Disaster management: planning, organizing, and testing of database contingency plans and recovery procedures Ensure data is distributed to the right people, at the right time, and in the right format Backup and recovery measures must include at least: Periodic data and application backupsProper backup identification Convenient and safe backup storage Physical protection of both hardware and software Personal access control to the software of a database installation Insurance coverage for the data in the database

24.5. What is the CAP theorem? Which of the three properties are most important in NOSQL systems?

Stands for Consistency, Availability, Partition tolerance, which are the three important characteristics of a distributed system having replicated data. Availability refers to successful handling of the request for performing read/write operations or displaying a message that the operation cannot be performed. Partition tolerance of the node implies that even in the case of a network fault the distributed system can keep functioning. Such network fault results in nodes being partitioned. Consistency implies that the multiple identical copies of replicated data would be made available to the nodes. According to the CAP theorem all 3 characteristics cannot be achieved at the same time. Availability and partition tolerance are the most important in NoSQL systems, while a weaker consistency is acceptable.

30.19 - What is a statistical database? Discuss the problem of statistical database security.

Statistical databases used to provide statistics about various populations Users permitted to retrieve statistical information about the populations, such as averages, sums, counts, maximums, minimums, and standard deviations. However, Must prohibit retrieval of individual data-protected info. in some cases it is possible to infer the values of individual tuples from a sequence of statistical queries. As an illustration, consider the following statistical queries: Q1: SELECT COUNT (*) FROM PERSON WHERE <condition>; Q2: SELECT AVG (Income) FROM PERSON WHERE <condition>; Now suppose that we are interested in finding the Salary of Jane Smith, and we know that she has a Ph.D. degree and that she lives in the city of Bellaire, Texas. We issue the statistical query Q1 with the following condition: (Last_degree='Ph.D.' AND Sex='F' AND City='Bellaire' AND State='Texas') If we get a result of 1 for this query, we can issue Q2 with the same condition and find the Salary of Jane Smith. Even if the result of Q1 on the preceding condition is not 1 but is a small number—say 2 or 3—we can issue statistical queries using the functions MAX, MIN, and AVERAGE to identify the possible range of values for the Salary of Jane Smith

30.5 - What is meant by granting a privilege? What is meant by revoking a privilege?

Suppose that the DBA creates four accounts—JOHN, PHILLIP, JEFFERY, and OREL —and wants only JOHN to be able to create base tables. To do this, the DBA must issue the following GRANT command in SQL: GRANT CREATETAB TO JOHN; The CREATETAB (create table) privilege gives account JOHN the capability to create new database tables (base tables) and is hence an account privilege. Revoke in some cases, it is desirable to grant a privilege to a user temporarily. For example, the owner of a table may want to grant the SELECT privilege to a user for a specific task and then revoke that privilege once the task is completed. In SQL, a REVOKE command is included for the purpose of canceling privileges. Now suppose that JOHN decides to revoke the SELECT privilege on the EMPLOYEE table from JEFFERY; JOHN then can issue this command: REVOKE SELECT ON EMPLOYEE FROM JEFFERY; The DBMS must now revoke the SELECT privilege on EMPLOYEE from JEFFERY, and it must also automatically revoke the SELECT privilege on EMPLOYEE from OREL. This is because JEFFERY granted that privilege to OREL, but JEFFERY does not have the privilege any more.

Q- 1.What are the main types of Cloud Computing? Briefly describe each one.

There are three main service models of cloud computing - Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS). IaaS (Infrastructure as Service) This is the most common service model of cloud computing as it offers the fundamental infrastructure of virtual servers, network, operating systems and data storage drives. It allows for the flexibility, reliability and scalability that many businesses seek with the cloud, and removes the need for hardware in the office. This makes it ideal for small and medium sized organisations looking for a cost-effective IT solution to support business growth. IaaS is a fully outsourced pay-for-use service and is available as a public, private or hybrid infrastructure. PaaS (Platform-as-a-Service) This is where cloud computing providers deploy the infrastructure and software framework, but businesses can develop and run their own applications. Web applications can be created quickly and easily via PaaS, and the service is flexible and robust enough to support them. PaaS solutions are scalable and ideal for business environments where multiple developers are working on a single project. It is also handy for situations where an existing data source (such as CRM tool) needs to be leveraged. SaaS (Software as a Service) This cloud computing solution involves the deployment of software over the internet to variousbusinesses who pay via subscription or a pay-per-use model. It is a valuable tool for CRM and for applications that need a lot of web or mobile access - such as mobile sales management software. SaaS is managed from a central location so businesses don't have to worry about maintaining it themselves, and is ideal for short-term projects.

26.12 - How do spatial databases differ from regular databases and what are the different types of spatial data?

Typical databases process numeric and character data, additional functionality needs to be added for databases to process spatial data types A query such as "List all the customers located within twenty miles of company headquarters" will require the processing of spatial data types typically outside the scope of standard relational algebra (RDMBS) and may involve consulting an external geographic database that maps the company headquarters and each customer to a 2-D map based on their address. Map data includes various geographic or spatial features of objects in a map, such as an object's shape and the location of the object within the map, with of features are points, lines, and polygons (or areas). Attribute data is the descriptive data that GIS systems associate with map features. For example, suppose that a map contains features that represent counties within a U.S. state (such as Florida). Attributes for each county feature (object) could include population, largest city/town, area in square miles, congressional districts, census tracts, and so on. Image data includes data such as satellite images and aerial photographs, which are typically created by cameras. Objects of interest, such as buildings and roads, can be identified and overlaid on these images. Images can also be attributes of map features.

25.6. What are the four major characteristics of big data? Provide examples drawn from current practice of each characteristic

Volume: quantity of data to be stored Refers to size of data managed by the system Scaling up: keeping the same number of systems but migrating each one to a larger system Scaling out: when the workload exceeds server capacity, it is spread out across a number of servers Velocity: Speed of data creation, ingestion, and processing speed at which data is entered into system and must be processed Stream processing: focuses on input processing and requires analysis of data stream as it enters the system Feedback loop processing: analysis of data to produce actionable results Variety: Refers to type of data source Structured, unstructured, and semi-structured Variations in the structure of data to be stored Structured data: fits into a predefined data modelUnstructured data: does not fit into a predefined model Semi-structured datasets. Veracity: Credibility of the source Trustworthiness of dataSuitability of data for the target audienceEvaluated through quality testing or credibility analysis

25.4. How do you define big data?

an accumulation of data that is too large and complex for processing by traditional database management tools.

30.8 - What is the difference between discretionary and mandatory access control?

discretionary The typical method of enforcing discretionary access control in a database system is based on the granting and revoking of privileges. mandatory It is important to note that most mainstream RDBMSs currently provide mechanisms only for discretionary access control. However, the need for multilevel security exists in government, military, and intelligence applications, as well as in many industrial and corporate applications. Because of the overriding concerns for privacy, in many systems the levels are determined by who has what access to what private information. In such many applications, an additional security policy is needed that classifies data and users based on security classes. This approach, known as -mandatory access control (MAC)


Related study sets

Florida Agent's Health & Life (including Annuities & Variable Contracts) Chapter 2

View Set

Med Surg - Chapter 39 - Assessment of the Hematologic System

View Set

Data Analysis: Chapter 9: One-Sample Hypothesis Tests

View Set

Chapter 3 - Bacterial Cell Structure

View Set

Texas State Nursing Home Administrator Test

View Set

MGMT 3810 test 1: my management lab quizzes

View Set