Team 1: Database Admin

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

What are some of the options available when creating a database/user?

- CREATE DATABASE, - CREATE USER - You can create a database/user FROM someone else's space - Transfer databases/users from one owner to another - PERM, SPOOL and TEMP space may be specified

What is a derived table?

- Derived tables are temporary tables that are created in spool and dropped when the query is completed. Derived tables is a good choice to use when the temporary table is required for this query but no other or the query will be run only one time with this data. Source: Adv sql module 9

What is Multi-Value Compression? What performance advantages does it offer? What are the rules for MVC for TD1310 and TD14?

- Multi Value Compression is a technique that allows users to specify which values and columns they wish to compress - It is used predominantly to save space - The Rules are that: o For multi-valued compression, there is no default compress value. o All values in the multi-value must be specified explicitly. o Nulls in a nullable column are always compressed automatically, even if not explicitly specified. o You can compress as few as 0 or as many as all but the primary index column set of a table. Neither can you compress partitioning column values when the primary index or table is row‑partitioned. o The maximum number of distinct values that can be compressed per column is 255. o If an attempt to modify or add the definition for a new column results in the table row header exceeding the maximum row length for the Teradata Database, then the responsible ALTER TABLE request fails. o You can modify an existing column to have multi-valued compression. o You can drop compression from an existing column or make values uncompressible for a new column by specifying the NO COMPRESS option.

What are some common traits of Volatile and Global Temporary Tables?

- Similarities: o Both are local to a session o Both get discarded when the session is over (the GTT instance is discarded, the definition is kept in the Data Dictionary) o Both can be used repeatedly throughout a session o Both have LOG and ON COMMIT PRESERVE/DELETE options. o Both materialized table contents aren't sharable with other sessions. o Neither support join and/or hash indexes.

What is a No Primary Index table and what are some practical uses?

--A table with no PI means that rows are inserted at the end of a table because there are no row-ordering constraints. NoPI tables must be multiset (allow duplicates) because (prior to TD 13.0) if a PI isn't specified but a pk is, then the pk or any UNIQUE field is automatically assigned as a UPI. No duplicate checks are performed upon loading data. (Advanced SQL 13.6) (Intro to TD - Module 6) --To create a NoPI table, simply don't define a PI in the CREATE TABLE statement. This makes the hash value and the AMP ownership arbitrary. NoPI tables are treated as hashed table, but all rows will typically just have the same hash bucket value. --NoPI rows are still distributed across AMPs, but a random generator will determine where they go, not their hash value. It is within each AMP where rows are appended to the end of a table. --FastLoad/TPump into a NoPI table is faster because rows are distributed in round-robin fashion by the random generator and are simply appended to arbitrary AMPs. This makes NoPI tables useful as staging tables --used for columnar (column partitioning) - break up table into sections of columns that are commonly accessed together and put these sets of columns on different AMPs --reduced I/O and CPU because AMPs don't need to check for duplicates or redistribute by hash

Where does a Volatile Table get its space from?

--Volatile tables are materialized in spool space, whereas the table definition is stored in the cache memory. For these reasons, they will not survive system restarts. (Advanced SQL 14.6) --example: CREATE VOLATILE TABLE vt, LOG (dept smallint, avgsal dec(9,2), maxsal dec(9,2)) ON COMMIT PRESERVE ROWS; --LOG is the default. Using NO LOG is faster because it doesn't keep a transaction journal. --allows you to use it for entire session. Default is ON COMMIT DELETE ROWS.

What does a fully qualified object name mean?

-A fully qualified object name includes the names of all parent objects up to the level of the containing database. A common use case is a fully qualified column name, which consists of a database name, table name, and column name.

How do you clear peak spool? Why would you clear peak spool?

-DBC.ClearPeakDisk -You might run out of spool eventually. It must be cleared/reset to restart the data collection process.

What are statistics and why are they important? What happens if statistics are not present? How often should you collect statistics? What are the different options for collecting statistics? If you collect statistics at the table level what happens? Can you run collect statistics on an empty table? What should you be aware of if you do this? How do you determine what columns to collect statistics on? Do you need to Collect Statistics on Join Indexes if the underlying columns have statistics collected? What type of Lock does Collect Statistics take on the table? What type of access right is needed to collect statistics or drop statistics?

-Stats are used by the optimizer when running SQL statements in order to choose the best way to run it based on where the data is and how it's related. -Without stats, the cost for running the query will increase and the optimizer will make estimates with "low confidence" or no confidence. -Stats help in accessing columns/indexes with uneven value distribution (can utilize NUSI Bit Mapping). -Stats should be collected every time a change is made to the data that is significant enough (perhaps percentage of rows affected or a significant alter of the data). They need to be up to date in order to actually help the optimizer and improve complex queries and joins. However, stats remain valid across a reconfiguration of the system due to information/history stored. _______________________________________________________________________ Different ways stats can be collected: -By using COLLECT STATISTICS statements, however user must have access rights on the target in order to collect stats on it. -TSET is a good way to transfer stats from one database or table to another. -collecting stats can happen when the user creates a table by specifying collect stats. -macros or triggers can be set up to collect stats -can now have table-level summary stats, no histogram is built, provides row count, avg row size, other info for extrapolations -You can collect stats at the column, table, database level -Collect stats on empty table - you can however that is not recommended since it has no data distributed amongst the amps yet What columns to choose to collect stats on? columns to collect stats on: -All indexes for Join Index table -All non-unique secondary indexes with ALL option -All VOSI (Value ordered NUSI) -All partitioned tables -Primary Index/Key -All Foreign Keys -histograms stored in Data dictionary Stats kept in frequency distribution histograms(4 pieces): -Summary info: general info about stats -High-biased values: represents values that are skewed -Equal-height intervals: used when values evenly distributed, cover about same number of rows each, default 250 intervals max 500 -History records: previously collected summary info data/histograms _______________________________________________________________________ -if you drop stats, you lose the accumulated history -if there are no summary stats then no stats have been collected on the empty table. Use the help statement in order to see if anything happened with your empty table too. -developer exchange for what columns to use _______________________________________________________________________ -User must be granted right to collect statistics, granted at either table or database level, can collect stats on DBC tables without needing granted access

What is the Data Dictionary Directory (DD/D)? What are some of the tables/information available there?

-The Data Dictionary Directory (DD/D) is an integrated set of system tables that contain definitions of and information about all objects. Furthermore, it is entirely maintained by the RDBMS. DD/D is also defined as "data about the data" or "metadata". The DD/D is distributed across all AMPs. Here are some of the tables of DD/D views: DBC.Tables ---> information about all tables DBC.Users ---> information about all users DBC.AllRights ---> information about access rights DBC.AllSpace ---> information about space utilization - DD/D contains all the database object definitions in the system database. The databases have a logical grouping for information. Therefore, the DD/D has all the system views and columns and lists the referenced table columns for each view column. - It also composed of tables and views in the DBC and Sys_Calendar databases. The tables within DD/D are reserved for system use. These tables contain metadata regarding the system objects, privileges, events, and usage[MB1] . Privilege is a permission to access or to manipulate an object. -Sets of dictionary tables that have information on database space usage -- Example: DBC.Dbase -Sets of dictionary tables that store resource usage data -- Example: DBC.Acctg - Sets of dictionary tables that store logs of events -- Example: DBC.Events - Sets of dictionary tables that store access rights -- Example: DBC.AllRights - DD/D allows users to retrieve frequently used data from tables through predefined views. (Source: Teradata Factory and Teradata 14.10 Manual - Data Dictionary)

What does Tablesize tell you? Where is it stored? Is data collection automatic? Is there history recorded in Tablesize?

-The TableSize view provides AMP-by-AMP vproc information about disk space usage (not including spool) for each database, table (including journal tables), stored procedure, join index, hash index, or account. -Database: DBC, category: Accounting -Stored in the system tables (DBC.TableSizeV) -Automatic during initial system setup, creates pre-defined data dictionary views, automatically updates the information in the tables to reflect the current state of the system. -Data Dictionary System logs: not automatic. Can make backup logs off active log. TableSize also keep tracks of PeakPerm, MaxPerm, and CurrentPerm so yes it does in that sense keep track of some history.

What is another way to implement an Aggregate Join Index? What are the trade-offs? What additional aggregate operators are allowed in Join Indexes on Teradata 14.10? What are the trades-offs while implementing it?

-You can create a table with aggregate function and manually update the table. -Trade-offs to implement AJI are the following: TPT Load (FastLoad) cannot be used to load tables for which indexes are defined. Join index also takes up PERM space. You can specify no more than 64 columns per referenced base table per join index. BLOB and CLOB data types cannot be defined within a Join Index. With a multi-table join index, you cannot use FULL OUTER join. Other than the general join index considerations, you can only assign SUM and COUNT one data type, FLOAT. You must have the privilege of DROP TABLE rights on each of the base tables and one of the following two privileges to create any join index: CREATE TABLE on the database or user which will own the join index, or INDEX privilege on each of the base tables. -MIN/MAX -Same trade-off as above.

What are the four locking modes available in Teradata?

1. Exclusive- prevents any other type of concurrent access 2. Write- prevents other reads, writes, exclusives 3. Read- prevents writes and exclusives 4. Access- prevents exclusive only

What is Database Query Log (DBQL)? Where is it stored? What type of information is available? Is data collection automatic? What is a suggested logging strategy? Is it backed up in DBC archive? What are the normalized columns doing?

1. The Database Query Log (or a "DBQL") is a feature that lets you log query processing activity for later analysis (p. 2516). 2. The DBQL stores information in cache memory. It writes to the disk only when the cache is full (p. 2542). 3. The DBQL makes available a historical record of queries and their duration, performance, and target activity (p. 2542). 4. Data collection is not automatic since Query Logging is "invoked and revoked" by SQL statements, which can be: "BEGIN QUERY LOGGING," "END QUERY LOGGING," and "REPLACE QUERY LOGGING" (p. 2543). 5. The Factory Manual does not provide broad strategies. However, some recommendations include: 1. Be aware of limits that can be placed, such as SQLTEXT and THRESHOLD (p. 2556) 2. Teradata Administrator can be used to begin and end Query Logging effectively and easily (p. 2573). 6. Yes, it is backed up in DBC archive in a variety of tables (p. 2547). 7.

Access Rights What are the different types of access rights? Where would you look to see access rights? What are some user access rights that are commonly granted? What does "with grant option" mean? What does the "allness" flag do? If an end user has no perm space and is not allowed to create objects, how many access rights does he need? How do you get rid of them? How many access rights does the logon that created you need on your logon? How many access rights does the database above you in the hierarchy need on your logon? In the schema, End User selects against a View Database which references a Data Database, how would the access rights between the view and data databases need to be scripted? What are these type of rights called?

1. There are four types of access rights: automatic rights, explicit rights, ownership rights, inherited rights, and access rights views. Automatic rights are given to creators and in the case of users and databases, their created objects. Explicit rights are given by the GRANT statement. Ownership rights are the implicit rights that parents have over children. Inherited rights are rights given to children when parents grant rights with an ALL statement (p. 2356) 2. A user can look at the three access rights views: DBC.Allrights [V] [X], DBC.UserRights [V], and DBC.UserGrantedRights [V]. The DBC.UserRights views shows all of the rights that a user has acquired (p. 2384). 3. A full list of the dozens of access rights is included on p. 2386. Some common ones include the ability to CREATE, DROP, EXECUTE, UPDATE. 4. The "with grant option" means that the recipent has "grant authority," i.e. the recipient can then grant his own access rights to other users, databases, or roles (p. 2364). 5. The "allness" flag indicates whether the privilege should be granted to all users (p. 2388). 6. It depends what the end user is trying to achieve. If he only needs perm space, then he doesn't have to be given any particular access rights. He can be given perm space through the CREATE DATABASE/GIVE/DROP DATABASE commands. However, select access rights should be sufficient. 7. The REVOKE command takes away access rights. 8. One: the CREATE USER access right. 9. None: the immediate owner has all access rights needed because of implicit rights. 10. The script would bel: GRANT SELECT ON DATA_DATABASE TO VIEW_DATABASE with grant option. These types of rights are called explicit ritghts.

What are some of the differences between a database and a user?

A "user" is a database that can be logged onto with a password. Databases and users are both collections of objects (such as tables, views, macros, etc.) that must be assigned perm space.

What types of objects can you create a Comment on?

A comment is a user-defined description of a user-defined database object or definition in the data dictionary. COMMENT is a Teradata extension to the ANSI SQL:2011 standard. The types of objects that can be commented on can be any of the following: A parameter in a macro, SQL procedure, or user-defined function. A column in a user base table or view. A specific function, macro, profile, role, SQL procedure, base table, trigger, or view name contained by a database or user. A database or user. A replication group. (You cannot specify the name of a containing database or user if object_kind is GROUP or if object_name is a replication group.) A UDT. (You can add a comment to a particular attribute of a structured UDT by specifying database_name.udt_name.attribute_name.) A method. (If you specify a method, you must use its specific method name.) A GLOP set.

How do you change/assign a different default database of a user? How long does the new setting last?

A default database can be assigned through · CREATE USER statement (DEFAULT DATABASE = database_name) · CREATE PROFILE statement (DEFAULT DATABASE = database_name) It can be changed through · MODIFY USER (DEFAULT DATABASE = database_name) · MODIFY PROFILE (DEFAULT DATABASE = database_name) · DATABASE (DATABASE database_name;) command is only used to change the default database for a sesssion · SET SESSION (DATABASE database_name -within a SQL statement to set other values for the session) · SET SESSION DATABASE (SET SESSION DATABASE database_name; --only sets the database) The new database assignment will last until it is changed again. )

What is a Partitioned Primary Index (PPI)? (row partitioning) What are the implementation options?

A partitioned primary index permits rows to be assigned to user-defined data partitions on the AMPs, enabling enhanced performance for range queries. The PPI feature allows a class of queries to access a portion of a large table instead of the whole table. The traditional uses of the primary index for data placement and rapid access of the data when the PI values are specified are retained. Partitions can be done by case or by range. Case partitions are declared with CASE_N and create partitions based on user-defined circumstances for each row. Range partitions are declared with RANGE_N and create partitions based on the range of a certain user-defined value using a user-defined segmentation. Multi-level partitioning allows each partition at a given level to be further partitioned into sub-partitions. Each partition for a level is sub-partitioned the same per a partitioning expression defined for the next lower level.

What is the Primary Index for?

A primary index is used to distribute rows across the AMPs. The hash value of the primary index influences the distribution of rows across all AMPs. Rows with the same PI value distribute to the same AMP. Accessing a row via its primary index value is a one AMP operation. The primary index value goes into the hashing algorithm and the resulting row hash's hash bucket number is used to identify the target AMP. The entire row hash is used by the AMP to locate the row.

Profiles What benefit(s) does using profiles provide? What parameters can be used on a profile? How many profiles can you be assigned to at one time? What are some challenges when implementing profiles? Where would you look to see Profiles defined in the database? Where would you look to see Profiles assigned to Users?

A set of common user attributes that can be applied to a group of users. Profiles simplify user management. Profile Parameters include: Account id(s), Default database, Spool space allocation, Temporary space allocation, Password attributes (expiration, etc.) 1 Profile can be assigned to a User. Some challenges when implementing profiles are when trying to view user information like diskspace or rights or account information. If a user has a profile, then one has to look at the profile to see overridden stats and rights. DBC.ProfileInfoV DBC.UsersV will show Profiles assigned to a user as well as DBC.ProfileAsgdSecConstraintsV.

What does the term "skewed table" mean? What performance implications are potentially there? How can it be corrected?

A skewed table is a table in the Teradata database that is not evenly distributed among AMPs in a system. Unevenly distributed data can cause performance problems, since AMPs with larger amounts of data (compared to other AMPs) will take longer to traverse than AMPs with smaller amounts of data. Some processes, such as full table scans, will need to wait for AMPs with larger amounts of data to finish before the result can be processed. In addition, database space limits are on a per AMP basis, so too much data on one AMP, even if the rest are not full, can cause a database full error. Since some operations on the system (such as full table scans) cannot be completed until every AMP has finished, AMPs with skewed data will hold the system up until the AMPs have completed their task. Skewed data can be corrected by making sure the hashing algorithm spreads the data to the AMPs evenly, avoiding hash synonyms. This can be done through a Unique Primary Index, Non-Unique Primary Index with very few repeating values, or No-Primary Index (which distributes the data through random generator code).

What is a Volatile Table?

A volatile table stores the data physically. You can access that data multiple times during your session. It is a temporary table that is held until the end of the session. It is created by default and consumes spool space. • Local to a session rather than a specific query (uses SPOOL space) • Uses CREATE VOLATILE TABLE syntax • Allows the table to be used repeatedly within a session • Automatically discarded when the session ends if it has not been discarded manually • Data dictionary has no knowledge of volatile tables • Space for volatile tables come from the user's Spool space Similar to Derived Tables: • Materialized in spool • No data dictionary access or transition locks • Table definition kept in cache • Designed for optimal performance Different from Derived tables: • Local to the session, not the query • Can be used with multiple queries in the same session • Dropped manually anytime or automatically after the session • Requires • CREATE VOLATILE TABLE statement Restrictions (not allowed with volatile tables): • Cannot create a join index or a hash index on VOLATILE table • Permanent journaling • Referential integrity • CHECK constraints • Column compression • Column default values • Column titles • Named indexes

What is Algorithmic Compression? How is it implemented?

Allow users to apply a compression algorithm to data at the column level in a row. For example, you can compress two-byte Unicode into one byte when the data is LATIN. Compression is done by specifying a UDF function. TD provides some UDFs to do compression for UNICODE and LATIN data columns. P.s. UDFs are installed in TD_SYSFNLIB

What are Amp Worker Tasks (AWTs)? Why do we study them for performance management? What steps can be taken to manage the number in use?

Amp Worker Tasks (AWTs) are the specific tasks assigned inside each AMP that get the database operations done. AWTs are studied for performance management because the AMP software includes utilities to perform system management functions. Some of the functions include configuring the system and rebuilding tables. There are two DBS Control parameters (MaxLoadTasks and MaxLoadAWT) that can be used to control and manage the number of AWTs in use. The maximum number of AWTs is 80[YT1] . However, the total number of AWTs per AMP can be increased to be greater than 80. When it is greater than 80, the number of AWTs contained with the reserve pools remains the same. The only thing that changes is that the number of AWTs in the unassigned pool is increased. - Formula for AWTs: # AWTs/AMP - # AWTs in standard reserve pools + (Work00 + Work01 reserves)

What is an Aggregate Join Index (AJI)? Do we do a duplicate row check during the build?

An Aggregate Join index aggregates one or more columns of a single table or multiple tables into a summary table to help with performance. An aggregate join index specifies MIN, MAX, SUM, COUNT or functions that extract a DATE value aggregate operations. No other aggregate functions are permitted in the definition of a join index. Aggregate join indexes may be created as either single-table or multi-table join indexes. There is no row check during the build of an aggregate join index

What option does Teradata provide you to increase the probability of stats being used in a query containing expression? (hint: expressions are Extract, substr(), concat, etc.. ) How do you implement it? What are its limitations?

An expression or a list of expressions for which statistics are to be collected can be used. The result of the expression must be a data type on which field statistics can be collected directly, such as an integer, character, or date. You cannot specify nondeterministic expressions such as RANDOM, nondeterministic UDFs, and nondeterministic CAST or EXTRACT operations. Syntax examples: COLLECT STATISTICS USING SYSTEM SAMPLE COLUMN BEGIN(eff_dt_duration) AS stats_begin_eff_dt_duration ON orders; COLLECT STATISTICS USING SYSTEM SAMPLE COLUMN (EXTRACT(MONTH FROM o_orderdate), o_totalprice) AS stats_month_price ON orders; COLLECT STATISTICS COLUMN CASE WHEN CAST(o_orderdatetime AS DATE AT 0) = CURRENT_DATE THEN 'orders-today' ELSE 'orders-old' END AS stats_orders_today_and_old ON orders;

What is a covering index?

An index whose subtable contains all columns referenced in a query. The index is said to cover the table in the query.

What are the key advantages for column partitioning?

Column Partitioning (CP) also known as Teradata Columnar is a new physical database design implementation option that allows single columns or sets of columns of a NoPI table to be stored in separate partitions. Advantages: - Improved query performance - Reduced disk space - Increased flexibility - Reduced I/O - Ease of use - Automatic optimization that occurs for queries that specify a restrictive condition on the partitioning column - Partition expression definition is the only thing that needs to be done by the DBA or the database designed. No separate partition layout—no disk layout for partitions - Even data distribution and even processing of a logical partition is automatic (due to the PI distribution of the rows) - No modifications of queries required

What is Teradata Columnar (column partitioning)? (TD14)

Column partitioning is setting up a table as a subset of columns as opposed to the traditional partition technique (row partitioning). Columnar allows single columns or sets of columns of a NoPI (In order to have a column partitioned table it must be a NoPI Table)Make it more clear in your answer that a columnar table HAS to be NoPI. table to be stored in separate partitions. Column partitioning can also be applied to join indexes. Teradata Columnar offers the ability to partition a table or join index by column. Teradata Columnar can be used alone or in combination with row partitioning in multilevel partitioning definitions. Column partitions may be stored using traditional 'ROW' storage or alternatively stored using the new 'COLUMN' storage option. In either case, columnar can automatically compress physical rows where appropriate. The major advantage of using column partitioning is to improve the performance of queries that access a subset of the columns from a table, either for predicates (e.g., WHERE clause) or projections (i.e., SELECTed columns). Because sets of one or more columns can be stored in separate column partitions, only the column partitions that contain the columns needed by the query need to be accessed. Just as row-partitioning can eliminate rows that need not be read, column partitioning eliminates columns that are not needed.

What is Block-Level Compression? How is it implemented? What is Temperature-based Block-Level Compression? TD14

Compression is performed by Teradata at the file system level on whole data blocks before the data blocks are actually written to/read from storage devices. When loading data, SET QUERY_BAND = 'BLOCKCOMPRESSION=YES/NO;' FOR SESSION; •Ferret utility has new commands -COMPRESS/UNCOMPRESS table• What is Temperature-based Block-Level Compression? TD14 Temperature based block-level compression uses the temperature of data as maintained by Teradata Virtual Storage to determine what to compress. For example, COLD or WARM data might be compressed automatically, while HOT data might be decompressed automatically if it was previously compressed. Temperature based block level compression only applies to permanent user data tables.

What are the cost considerations for Secondary and Join indexes?

Cost summary for Secondary Indexes: -Secondary index: update operations (including delete, insert, and update) require more I/O to process because of the required transient journaling, the possibility that secondary index subtables must be updated, fallback table updates, and so on. USIs impose a performance cost because their index subtables must be maintained by the system each time the column set in the base table they reference is updated. -USIs require two AMP access . -NUSIs require all AMPs, don't use a hash map due to all AMP access A secondary index is an alternate path to the data. Secondary Indexes are used to improve performance by allowing the user to avoid scanning the entire table. A Secondary Index is like a Primary Index in that it allows the user to locate rows. It is unlike a Primary Index in that it has no influence on the way rows are distributed among AMPs. A database designer typically chooses a secondary index because it provides faster set selection. Primary Index requests require the services of only one AMP to access rows, while secondary indexes require at least two and possibly all AMPs, depending on the index and the type of operation. A secondary index search will typically be less expensive than a full table scan. Secondary indexes add overhead to the table, both in terms of disk space and maintenance; however they may be dropped when not needed, and recreated whenever they would be helpful. Cost summary for Join Index: Creation and maintenance costs for join indexes can be a resource burden because of their processing overhead. In the case of join index maintenance, the burden is an ongoing process that lasts for the life of the index. Costs vary considerably among the various types of joins used as well as between simple and aggregate types. The following bulleted list summarizes the conclusions to be drawn from the performance analyses performed. •Maintenance overhead for in-place aggregate join indexes ranges from 1.0 to 2.9. •Maintenance overhead for aggregate join indexes that redistribute rows ranges from 2.6 to 9.5. •Maintenance overhead for simple join indexes ranges from 1.0 to 23.1, increasing as the number of hits per data block increases. •Maintenance overhead for aggregate join indexes decreases as the number of hits per data block increases because of the efficiencies of block-at-a-time operations in the file system.

What are some architecture considerations when using Join Indexes?

Creating join indexes by replicating a column in a table is a denormalization technique. You must be willing to denormalize your data when using a join index. You must also be willing to use PERM space when creating a join index. If you have a join index with repeating data, you will require less storage space if the data is compressed

What does logonoff tell you? Where is it stored? Is data collection automatic? Is it backed up in DBC archive? How many rows represent a logon/logoff sequence? Can you find the logon time in the logoff row?

DBC.LogOnOff provides information about logon attempts, both successful and unsuccessful. Data for this view is stored in DBC.EventLog. Collection is automatic for this data. It is not backed up in the DBC archive. Two rows represent a logon/logoff sequence. You can find the logon time in the logoff row by looking at a combination of the columns "LogonDate" and "LogonTime".

What dictionary table was added in TD14 to support statistics?

DBC.StatsTbl

What are the 3 locking object levels available in Teradata?

Database - Applies to all tables/views in the database Table/View - Applies to all rows in the table/views Row Hash - Applies to all rows with same row hash

Where is perm space assigned to?

Databases and users (if they are going to contain tables, UDFs or stored procedures)

How is date data type internally stored?

Dates are stored as Integers in the database using the following formula: (year - 1900) * 10000 + (month * 100) + day. (TDSQL Intro, Module 6)

What primary index is chosen if you neglect to declare a primary index in the create table statement? What performance implications are potentially there?

Depends on DBS Control, but it will default to a NUPI. If you use a NUPI when you don't need one, you will not gain the benefits of a direct access path with a 1 AMP operation that a UPI will give you.

How would you determine internal column size? Row length? Tablesize? Index Size?

Determining internal column size by taking in to account the size of each data type: ( see image on google doc) (Row Length (2)) + (Row ID (Row Hash (4)) + (Uniq Value (4))) + (Flag Byte (1)) + (1st Presence Byte (1)) + (Partition (2/8)) + (Optional Presence Byte (1 - n)) + (VAR Column Offsets (2) + (2) + (2)) + (Fixed Length Columns) + (Compression) + (VAR Length Columns ( 0 - n bytes)) + (Ref. Array ptr. (2/4)) Determining row size can be done through a "Row Size Calculation Form". In order to complete the form you must categorize all of the data types present. The form takes the calculations of the data type size * the number of columns. (BYTEINT (1) * # of Columns) + (SMALLINT (2) * # of Columns) + (INTEGER (4) * # of Columns) + (BIGINT (8) * # of Columns) + (DATE (4) * # of Columns) + (TIME (6 or 8) * # of Columns) + (TIME with ZONE (8) * # of Columns) + (TIMESTAMP (10 or 12) * # of Columns) + (TIMESTAMP/ZONE (12) * # of Columns) + (DECIMAL 1-2 (1) * # of Columns) + (DECIMAL 3-4 (2) * # of Columns) + (DECIMAL 5-9 (4) * # of Columns) + (DECIMAL 10-18 (8) * # of Columns) + (DECIMAL 19-38 (16) * # of Columns) + (FLOAT (8) * # of Columns) + (SUM(a) = SUM of the AVERAGE number of bytes expected for the variable columns) + (SUM(n) = SUM of the fixed CHAR and GRAPHIC column bytes) = LOGICAL SIZE + OVERHEAD + (Partitioned Primary Index Overhead (2 or 8) + (Variable Column Offset (n * 2) +2) + (Bits for compressible columns + Nullable Columns /8) + (PHYSICAL ROW SIZE + 1 if it is odd) Determining the table size of a data table is relatively easy. It is a simple multiplication formula: · row size x number of rows (double for Fallback) Determining the index size for a table can also be done with a formula. · USI Size = Row Count * (IndexValueSize + 31 (Plus another possible 1 if the number is odd) · NUSI Size = (Row Count * 10) + (#distinct values) * (IndexValueSize + 21 (Plus another possible 1 if the number is odd)) * MIN(#AMPs, rows/value)

What is a "dirty read"? When is it useful?

Dirty reads happen when a row is read by a transaction before another transaction has been able to commit the changes it made to it. Dirty reads are useful on large tables that are updated only by small, single-row changes. These allow queries to continue running without creating read locks and slowing other queries down

What does diskspace tell you? Where is it stored? Is data collection automatic? Is there history recorded in diskpace? What is the difference between allspace and diskspace?

Diskspace provides AMP vproc information about disk space usage at the database level. DBC.DataBaseSpace Yes, data collection is automatic. When you create a database, rows are created in DBC.DiskSpaceV without having to turn on logging. No, the PeakSpoolSpace, PeakPermSpace, and PeakTempSpace columns can be used to determine the maximum amount of space that a user has used but there is no time element associated. The size of a table can be monitored on a weekly or monthly time frame if a history table is created to record the max space used on a daily or weekly basis using the information from diskspace. Source: Database Administration p.225 DiskSpace provides information for any database or account. AllSpace provides information for any database, account or table. Allspace isn't recommended because it can return misleading results. It leads to a sum of both summary and detail rows added together. Diskspace should be used for space usage information at the database/user level.

What factors should you consider while recollecting stats? What option does Teradata provide you in the stats recollection process to save resource utilization (hint: stats overhead)? Can you specify this option in collect statistics statement? Where else can you specify this option? (Hint: System level) how does this help?

Factors: age of collection, table growth SAMPLE STATISTICS You can specify this option in collect statistics with the "USING SAMPLE" statement. You can also set sample size at the DBS Control level This helps if recollection on columns with evenly distributed data is too costly. Another option is THRESHOLD. You can specify this option in the collect statistics statement. If the system threshold option is specified, the threshold is determined by considering many different parameters. Alternatively, the threshold can be set explicitly using a DBA-defined global setting which would override the system threshold. Finally, if the threshold is defined explicitly on an individual statement level, this will override either other level of threshold definition. Being able to specify the threshold you wish to use is helpful because we can have input into whether the optimizer chooses to recollect statistics.

What is a Global Temporary Table?

Global temporary tables are known by the data dictionary where a permanent definition is kept. These are materialized within a session, and then discarded when the session ends (DDL remains in the Data Dictionary, data stored in temp space is deleted). Space for global temporary tables come from the user's temporary space, and the DDL is stored in the Data Dictionary. Global temporary tables are also known as temporary tables. They are created using the CREATE GLOBAL TEMPORARY TABLE command. They can be altered using the ALTER command. After creation, the table is considered materialized once a DML command has been issued, such as INSERT SELECT. Once created, the table makes an appearance in the DBC.Temptables view. Deleting all rows from a temporary table does not de-materialize the table. The instance of the table must be dropped or the session must end for it to be discarded.

What function(s) can be used to estimate a PI distribution?

HASHROW, HASHBUCKET, HASHAMP and HASHBAKAMP. The output of HASHAMP(HASHBUCKET(HASHROW(Primary_Index))) would be the id of the AMP that 'Primary_Index' would be hashed to.

What types of Referential Integrity is available for Teradata? How are they implemented?

Hard:A foreign key must exist elsewhere as a primary key to enforce referential integrity. This type of referential integrity is most concerned with data loading and is implemented differently for parent and child tables. Parent tables: --inserting - no validation needed. A primary key can exist on it's own without being a foreign key elsewhere. --updating/deleting - returns an error if it has any children. You would lose referential integrity if you only updated or deleted a primary key and did not update the instances where it is a foreign key. Child tables: --if you add a a foreign key in an ALTER TABLE statement and the child table has some foreign key values without parents, the ALTER statement will process, but an error table named tbl_0 will show the problem rows. --inserting/updating - system must first validate foreign key against parent. If a parent doesn't exist, the system returns an error. --deleting - no validation needed. Primary keys can still exist when their instances as foreign keys are deleted. --(Advanced SQL 13.20) --Soft: more concerned about accessing data than loading it. It does not enforce constraints, but it lets the optimizer know that foreign key references exist. This helps the optimizer join only to dimension tables that it needs. It is your job, however, to make sure that the foreign key references actually do exist ("trust me").

Is Ryan a cool guy?

I would say no, but Prachi is cool gal!

What is the default database of a user?

If the Default Database is not entered at creation of an user in the default database attribute, the default database of the creating user is used as the default(Student Manual 15-10)

Why would the optimizer choose to not use a NUSI?

If the number of physical reads exceeds the number of data blocks, the optimizer may choose to perform a full table scan. When choosing between a NUSI and a full table scan, if the optimizer determines that there is no selective secondary, hash or join index, and that most of the rows in the table would qualify for the answer set if a NUSI were used, it would most likely choose the full table scan.

What are some of the statistics enhancements in TD14?

In 14.0 enhancements include: Summary option, Sample n Percent option, statistics are stored in a new system table (DBC.StatsTbl), show statistics satatement. In 14.10 you can now: collect statistics on single table expressions, downgrade full stats to sample stats, avoid spool errors during collect stats operations, automated statistics management, geospatial statistics

What is a Join Index?

Join indexes provide additional processing efficiencies • They eliminate base table access • Eliminate aggregate processing • Reduce joins • Eliminate redistribution • Eliminate summary processing Three types of join indexes commonly used in Teradata: 1) Single table join index 2) Multi-table join index 3) Aggregate join index

What is Bitmap set manipulation step (BSMS) or NUSI bit-mapping?

NUSI bit-mapping is a performance optimization option technique (rarely applied by the optimizer as of Teradata 14). It may be useful for NUSI equality access where both NUSI's are weakly selective individually, but strongly selective when combined. The technique determines common Row IDs between multiple NUSI values by a process of intersection.

What are the types of Non-Unique Secondary Indexes (NUSI) available?

NUSI which will imporve equality access or a VONUSI (Value Ordered NUSI) which will improve range access

Can I write to a table that has a read lock? Can I write to a table that has an access lock? Can I write to a table that has an exclusive lock?

No Yes No

Is the Primary Key always the Primary Index?

No. A Primary Key is relational data modeling term that defines, in the logical model, the columns that uniquely identify a row. A Primary Index is a physical database implementation term that defines the actual columns used to distribute and access rows in a table. The relative uniqueness of the Primary Index will determine the uniformity of distribution of the rows of this table among the AMPs. The actual distribution of the rows on the AMPS comes from the hash value of the primary index. Primary Indexes are also used for accessing data; accessing a PI is always a one amp operation. Although a primary key will always be unique, it may be beneficial to choose a non-unique column(s) as the primary index if it will be accessed most often.

Do I need perm space if I am a database holding only views and macros?

No. Databases holding views and macros do not require Perm space.

Does a Join Index always involve more than one table?

No. There are three types of Join index. There are Single Table Join Index, Multi-Table Join Index, and Aggregate Join Index. Single Table Join Index distribute the rows of a single table on the hash value of a foreign key value and AJI aggregates one or more columns of a single table or multiple tables into a summary table

Can I read a table with a write lock?

Not with a standard read lock, which is used automatically if another lock is not specified. Write locks prevent read, write, and exclusive locks from accessing the table. Specifying an access lock allows for reading a write-locked table. Syntax for requesting an access lock" "LOCKING ROW (or TABLE tablename) FOR ACCESS"

On what types of tables should you collect statistics on the system-derived column PARTITION?

On tables that are partitioned by row or column, always collect PARTITION statistics and statistics on the row‑partitioning column set. Also, be sure to refresh statistics whenever row partition demographics change significantly. The Ten Percent Rule, the usual guideline of refreshing statistics after a 10 percent change in table demographics, does not apply for row partitioning columns. Instead, you should recollect statistics whenever the demographics for a row partition change by 10 percent of more. This guideline applies to the row partitioning column set and to the system‑derived PARTITION column for a partitioned table. You should collect single-column PARTITION statistics to enable the Optimizer to take advantage of the more accurate partitioning costing made possible when PARTITION statistics exist. The system does not use this fast collection method if a PARTITION column is part of a multicolumn statistics collection. Statistics on the system‑derived PARTITION column provide the Optimizer with the best estimate of the number of populated partitions and the cardinality of each partition when statistics were last collected.

What are the key advantages of PPI? What are the key advantages of ML-PP

PPI - Increases the available options to improve the performance of certain types of queries - specifically range-constrained queries Only the rows of the qualified partitions in a query need to be accessed - avoid full table scans. - Student Manual Pg. 765&823 ML-PPI - Allows multiple partitioning expressions instead of only one for a table or a non- compressed join index - Allows each partition at a level to be sub-partitioned - Allows efficient searched by using partition elimination at the various levels or combination of levels - Allows more flexibility in which partitioning expression to use when there are multiple choices for the partitioning expressions.

What is Priority Scheduler Facility (PSF)? What are the major elements?

PSF lets you use codes to assign users to performance groups using levels and user-defined levels of CPU usage when assigning user account priorities. (Teradata Factory Student Manual page 2284) -$L - Low Priority, 5- default weight, for queries not needed immediately $M - Medium Priority, 10 -default weight, using for complex and ad hoc queries $ H - High Priority, 20 - default weight, for tactical or OLTP queries $R - Rush Priority, 40 - default weight, for critical queries

What is Resusage? Where is it stored? What type of information is available? Is data collection automatic? Is it backed up in DBC archive?

Resusage is "a set of tables, view, and macros used to record and monitor system resource usage." Resusage stores its data in the DBC.Resusage tables. In these tables contains information about nodes, cpu specific information, vproc information, logical device information, channel and LAN host information, AMP worker task information, Priority Scheduler performance group information, and AMP pdisk/vdisk cylinder allocation, migration, and I/O statistics. Collection of this data is not automatic and must be started by data logging. It is not backed up in the DBC archive

If a user is assigned to role(s) and also has rights assigned to him at the user level, what are his allowable access rights? Where would you look to figure this out?

Rights at the individual level override rights at the role level. The process Teradata goes through the following steps to determine if the requested action is allowed on the object: 1) Search the AccessRights table for a UserId-ObjectId pair entry for the required right. In this step, the system will check for rights at the database/user level and at the object (e.g., table, view) level. 2) If the access right is not yet found and the user has a current role, search the AccessRights table for RoleId-ObjectId pair entry for the required right. 3) If not yet found, retrieve all roles nested within the current role from the RoleGrants table. For each nested role, search the AccessRights table for RoleId-ObjectId pair entry for the required right. 4) If not yet found, check if the right is a Public right

Roles What benefit(s) does using roles provide? How many roles can you be assigned to at the same time? How do you assign multiple roles? Where would you look to see Roles defined in the database? Where would you look to see access rights assigned to Roles?

Roles are a way to save, re-apply and modify a set of access rights common to a group of users. 1) Roles exist to simplify access rights administration. This is accomplished by granting the applicable access rights to a role then granting that role to the appropriate users, rather than repeatedly granting individual rights to each user. 2) Roles create a significant reduction in disk space usage by DBC in cases where the same set of access rights is assigned to a large number of users. In the absence of roles, the DBC.AccessRights table contains one row for every Right GRANTed to each user, causing an exponential rise in table rows as users are added to a set. With the use of a role, there is a linear relationship between the number of users granted a set of access rights and the number of rows stored (Note: When granting a role to a user, a row is inserted into the DBC.RoleGrants table rather than multiple rows in the DBC.AccessRights table). 3) A peripheral benefit to reducing the access rights table size is the increased performance of queries requiring a full scan of the table, which is common for access rights. - TF v1410.7 Student Manual: Page 47-4 While there is no limit to the number of roles that a user may be assigned to at one time, the ANSI standard is that users are only assigned to one role. This is why nested roles are useful, as it is often the case that a user may be required to have access rights associated with two or more roles. - TF v1410.7 Student Manual: Page 47-17 The GRANT command is used to assign a role to a user or to another role. In order for more than one role to be active for a given user, that user's "ROLE" must be set to "ALL" via SET ROLE ALL. Alternatively, a profile may be created with the default ROLE set to ALL so that any users created with that profile automatically have all assigned ROLES active by default. - TF v1410.7 Student Manual: Pages 47-4 through 47-17 DBC.Roles table - TF v1410.7 Student Manual: Page 47-18 DBC.UserRoleRights table - TF v1410.7 Student Manual: Page 47-8

If you are using SAMPLE statistics what are some of the rules to consider when choosing columns to apply this method of collecting statistics to?

Sampled statistics are generally more accurate for data that is uniformly distributed. For example, columns or indexes that are unique or nearly unique are uniformly distributed. Sampling may result in fewer intervals in the histogram (it is based on number of AMPs). Sampling should not be considered for data that is highly skewed because the Optimizer needs to be fully aware of such skew. In addition to uniformly distributed data, sampling is generally more appropriate for indexes than non-indexed column(s). For indexes, the scanning techniques employed during sampling can take advantage of the hashed organization of the data to improve the accuracy of the resulting statistics. If your table is partitioned, then you should include the partitioning column in your SAMPLE statistics. Otherwise you will only sample 1 or 2 partitions. To summarize, sampled statistics are generally most appropriate for: 1. Very large tables 2. Data that is uniformly distributed 3. Indexed column(s) Source: Teradata Factory Student Manual page 1286

What are the types of Secondary Indexes?

Secondary indexes are an optional assignment that can be created at table creation, after table creation, or dropped after table creation. They typically improve performance by allowing the user to avoid a full table scan. It is very similar to a primary index, because they both allow the user to locate rows at expedited performance. It differs from a primary index, because it has no influence on the way rows are distributed among the amps. Secondary indexes require as little as 2 amps for access, all the way up to all(n). Unique secondary index - They can speed up access to a row, which might otherwise require a fts. They can also be used to ensure uniqueness on a column or set of columns. Non unique secondary index - NUSI's are specified to prevent fts. NUSI's require all amps. Because the all amp requirement, the optimizer may choose a fts as an alternative for performance sake.Value Ordered NUSI - This type of secondary index can be any integer type. They are typically used when a range is specified (between).

Describe how Teradata implements secondary indexes

Secondary indexes can be defined in two methods: CREATE TABLE and CREATE INDEX. Secondary indexes are stored in subtables on each AMP. For USI Access, it is usually a two AMP operation because TD usually distributes a USI row to a different AMP other than the base table row that the index points to. However, if the system distributes the USI subtable row to the same AMP as the base table row that it points to, then only one AMP is accessed in the two step operation. For USI base table row access, the requested USI value is first accessed by hashing to its subtable. Then the pointer to the base table row is read and used to access the stored row directly. As for NUSI Access, it specifies a three-part BYNET message (similar to primary index access) however; the subtable ID in the message references the NUSI subtable rather than the base table. NUSI also request are all AMP request unless the NUSI is defined on the same columns as the primary index. [MB1] NUSI on the same columns as the primary index may be more efficient than accessing using the PI when there is no or limited partition elimination for a query. Furthermore, NUSI and PI on the same columns is only allowed when the PI does not include all the columns of all the partitioning columns. - Syntax for USI: CREATE UNIQUE INDEX (column) ON table; - Syntax for NUSI: CREATE INDEX (column) ON table;

Why would you create a single table JI with the ROWID?

Single table JI's are used to distribute the rows of a single table to align with the foreign key values of a table with which it is commonly joined. This improves query performance for frequent access of a specific join. The base table RowID with which the JI corresponds may be included in the JI as a link back to the base table for accessing base table columns not referenced in the JI. This is common enough to have a name - A single-table JI which includes the base table row ID is known as a Global Join Index.

How does Soft Referential Integrity improve query optimization?

Soft referential integrity is defined by specifying the with no check option phrase for a references constraint. When specified, the database does not enforce the define referential integrity constraint. You can still use utilities like FastLoad, MultiLoad, and Teradata Parallel Transporter on tables defined with soft referential integrity. Soft referential integrity does not check for referential integrity. Soft referential integrity enables optimization techniques such as join elimination. It provides a mechanism to allow user-specified referential integrity constraints that are not enforced by the database. Soft RI has no overhead of integrity checking, because no constraints are checked at this point. The optimizer then can sometimes make more efficient query plans.

Is space assigned at the database/user level or the object level?

Space can be defined at either the database or user level. Permanent and spool space limits can be defined at either the database or user level, not at the table level.

Why is spool space reserved? What does this really mean? What are common methods for this?

Spool space is reserved because spool space is temporary storage for returned rows during transactions that users submit. -------- Reserving this space prevents tables from being stored. -------- One method is to create a database called Spool_Reserved and allocate 20-25% of space to that database without creating any tables in it. Since unused space is spool space, this "reserves" this permanent space as spool.

What is the maximum number of columns in a table for TD 12, TD 13 and TD 1310 systems?

TD 12: 2048 TD 13: 2048 TD 13.10: 2048

What are some of the options available when creating tables?

Table Type - Specify the following type for the new table: •Teradata, ANSI - Create a SET table. Duplicate rows are not allowed. Create a MultiSet table. Duplicate rows are allowed.[RC1] •NoPI - Create a table with no primary index. NoPI tables are always ANSI and cannot have journals. •Queue - Create a Queue Table. Queue tables cannot be temporary or have journals. These tables require a Not Null Timestamp (6) field as the first column. Table Name - The name of the new table In Database - The name of the database in which to create the table. If a database is not specified, the new table is created in the default database for the current session. Note: The In Database field is not equivalent to the optional IN DATABASE clause of the SQL CREATE TABLE command. Block Size - The maximum size of the data blocks, in bytes, for the table. The actual maximum block size is the number entered in the Block Size field rounded to the nearest multiple of the 512 byte disk sector size. The minimum size is 8192. The maximum is 127.5K (255 sectors). Free Space - This specification, entered as a percentage, determines the amount of space to be left on each cylinder when the table is loaded. This allows users to insert extra data rows without incurring the additional system overhead associated with rearranging data blocks. The minimum free space specification is 0%; the maximum is 75%. Temporary - Create a Global Temporary table that can be instantiated by a user at a later date. FallBack - Specifies that this table is created with fallback protection.

Where does a Global Temporary Tables get its space from?

Temp space

What are the types of Primary Indexes?

Teradata Database provides the following primary index types: · Unique primary index (UPI) · Non-unique primary index (NUPI) · Non-partitioned primary index · Row-partitioned primary index (PPI) · Single-level partitioned primary index · Multilevel partitioned primary index These can be mixed in any cross-dimensional combination, so the following complete specifications for a primary index can all be defined in Teradata Database: · Unique non-partitioned primary index · Non-unique non-partitioned primary index · Unique single-level partitioned primary index · Unique multilevel partitioned primary index · Non-unique single-level partitioned primary index · Non-unique multilevel partitioned primary index The DBC.Indices View returns information about each indexed column from the DBC.Indexes table. Column definitions for this view include: Column: IndexType: · P = Primary (non-partitioned table) · Q= Primary (partitioned table) · Q = Partitioned Primary Index UniqueFlag: · Y = Unique · N = Non-unique

What functionality does Teradata Row Level Security provide?

Teradata Row Level Security is used "to control user access by table row and by SQL operation." (Source 2) Only DBC may use row level security unless granted. Creates constraints that restrict certain rows to certain users for security purposes.

What does ampusage data tell you? Where is it stored? Is data collection automatic? Is it backed up in DBC archive? What are the normalized CPU columns doing? Is everything recorded in ampusage?

The DBC.AMPUsage[V][X] views display information about I/O and AMP CPU usage, for each user and account. Effectively it presents resource usage stored in the DBC.Acctg table. - TF v1410.7 Student Manual: Page 44-28 The views are stored in the DBC database, and the data being accessed is stored in the DBC.Acctg table. - TF v1410.7 Student Manual: Page 44-28 Yes, data is collected automatically - TF v1410.7 Student Manual: Page 44-28 No, it is not backed up - TF v1410.7 Student Manual: Page 58-39 Keeping track of the number of requests that fall into a normalized CPU time interval, by user and account. - TF v1410.7 Student Manual: Page 49-46 Normalized CPU columns keep track of usage information using normalized CPU time. In this context, normalized indicates that the CPU time has been weighted according to specific processor speed factors on coexistence systems (i.e. systems running on mixed platforms), by user and account.

In the Locking Modifier, what does the NOWAIT clause do?

The NOWAIT clause ends a request to edit a table if there is already a lock on the desired table by another user as opposed to waiting in a queue for the lock to be released

What functionality does the Number data type provide? What are its limitations?TD14

The NUMBER data type is a feature added for 14.0 that basically creates a variable length numeric data type. Some of the main benefits of the data type are: - Provides greater efficiency in storing numeric data because the number of bytes to store the data can vary from 0-18 bytes depending on the value stored - More flexibility in defining numeric columns by providing the ability to change the scale of existing NUMBER columns in tables without modifying the data rows - Greater range than DECIMAL - Greater accuracy than FLOAT because NUMBER has greater guaranteed precision by representing common decimals exactly - Increased capatibility with other databases (Oracle) A limitation is that running a query that includes a NUMBER on a database from another vendor and then run the same query on Teradata, the result may be different because the order in which each database evaluates the expression may be different. (Reference: Page 23-34 and Page 23-35 in Student Manual)

What functionality does the SHOW STATISTICS provide?

The SHOW STATISTICS statement reports detailed information about each column or index for which statistics have been collected. This Teradata 14.0 feature can display important summary information for at both the table and column level.

What does Transfer Statistics command do? What are some limitations?

The Transfer Statistics feature enables users to transfer all or selected statistics from a source table to an identical target table any time after the target table has been created, if the data in both tables is identical. In this case, there is no need to collect statistics on the target table. - Release Summary 13.0 ----------- The columns for the target table must have identical data types and data as the source table.

What is an account string? What elements should an account string include? What are Account String Expansion characters? What client tools can use the account string as a grouping assignment?

The account string is a column up to 30 characters (128 if Teradata 14.10 extended object names feature is enabled) that is associated with each user. The account string is used to assign priority to individual sessions to manage the workload, monitor resource usage, and charge for resource or space used

How does the database handle stale statistics?

The database can detect stale statistics by comparing the row count from random AMP sampling and the histogram. Once the comparison is complete, a decision is made whether or not the statistics are considered stale. Stale statistics is based on 10% or 10,000 row count growth. If the difference exceeds either of the two values, the Optimizer assumes that the statistics are stale and overrides them with the estimates returned by a random AMP sample. (Reference: Page 25-48 and 25-49 in Student Manual)

How can a user get an "out of space error" without using all the space in a database?

The database may report an out of space error when the perm space or spool space does not provide sufficient storage for a database operation. Tables must be evenly distributed across amps. If a table has a lot of duplicates and they all go to the same amp, then that amp will run out of space and create an error. (http://developer.teradata.com/tools/articles/teradata-utility-error-handling). Under Utilized Cylinders that have not returned to their Free-Space percent, or if one VPROC is skewed and has more data than the limit. (https://forums.teradata.com/forum/database/spool-space).

What is a Hash Index?

The hash index provides a space-efficient index structure that can be hash distributed to AMPs in various ways Provides the optimizer with additional option and increases performance.(Student Maual 30-67) May be created on a single table, Automatically updated as base table rows are inserted or updated, Automatically includes the base table PI value as part of the index

Describe the differences of how USI and NUSI are implemented

The main difference between USI and NUSI file system implementation is that for row retrieval USIs use two AMPs while NUSIs use all AMPs. This is because USI sub-table rows are hash distributed like normal rows and directly reference the base table row on another AMP (1 AMP access for subtable row, 1 AMP access for base table row), while NUSI subtables are hash distributed on AMPS containing their local base table row,. In addition, a single NUSI sub-table row can contain multiple base table IDs for the AMP.

What is the partition limit in TD 14? What was the limit in prior releases?

The new partition limit in Teradata 14 is 9.2 quintillion combined partitions. Prior releases used 65k.

If a user has spool space assigned and is attached to a profile that has spool space assigned what spool space will be used?

The profile spool space will override the user's spool space

If no account string is defined what performance group does the user run their query in? What are some challenges when assigning account strings?

The recommended account format is '$xxxWORK&S&D&H'. It begins with a performance group. The defaults are L (low), M (medium), H (high) and R (rush) within resource partition 0. There are also two-character performance groups such as M1 (or MD). WORK is a 4-character work load type starting in the 5th position of an account id. Account String Expansion is an optional feature that enables you to use substitution variables in the account ID portion of the user's logon string. This provides more detailed utilization reports and user accounting data. The end portion can include ASE with the following substitution variables: session number (&S), date (&D), hour (&H), logon hostid (&I), logon timestamp (&L), and time (&T). ASE can be used with any utility that uses a standard Teradata Database interface such as BTEQ, FastLoad, MultiLoad, and SQL Assistant (Reference: info.teradata). If no account string is defined, the user runs their query in the $M performance group (Reference: Page 45-14 Student Manual). Some challenges when assigning account strings include: - If a string greater than 30 characters is generated as a result of string expansion the system truncates all the characters to the right of position 30. - Before a user can activate any of the ASE functions, there must be a matching account string in the DBC.UsersV view for that user and ASE function. - There is a minor increase in overhead associated with managing the frequency with which the DBC.Acctg table is cleared. This is caused because some substitution values require more overhead than others. For example, using time will use more space than hour because it will be tracked in increments smaller than every hour when using time (minute, second).

How can a user get a "spool exceeded" error without reaching his max spool limit?

The spool limit is divided by the number of AMPs in the system, giving a per-AMP limit that cannot be exceeded. This error can occur when the data is poorly distributed or there are joins on columns with large numbers of non-unique values. The spool limit on one AMP could be exceeded although the other AMP have remaining space. (Teradata Factory Manual)

How can one user block another?

There are four types of locks: -Exclusive locks are only applied to databases or tables, never to rows. They are the most restrictive type of lock; all other users are locked out. Exclusive locks are used rarely, most often when structural changes are being made to the database. - Write locks enable users to modify data while locking out all other users except readers not concerned about data consistency (Access lock readers). Until a Write lock is released, no new read or write locks are allowed. - Read locks are used to ensure consistency during read operations. Several users may hold concurrent read locks on the same data, during which no modification of the data is permitted. - Access locks can be specified by users who are not concerned about data consistency. The use of an access lock allows for reading data while modifications are in process. Access locks are designed for decision support on large tables that are updated only by small single-row changes. Access locks are sometimes called "stale read" locks, i.e. you may get 'stale data' that hasn't been updated

What does a Unique Secondary Index (USI) do?

Unique Secondary Index Speed up access to a row which otherwise might require a full table scan Improves data retrieval and efficiency in a query that does not specify primary index Enforce uniqueness on a column or set of columns ( makes USI enforce the uniqueness of PK)

What are some object types that can be created under a database/user? (8)

What are some object types that can be created under a database/user? (8) Tables - rows and columns of data Views - predefined subsets of existing tables Macros - predefined, stored SQL statements Triggers - SQL statements associated with a table Stored Procedures - program stored within Teradata User-Defined Function - function (C or Java program) to provide additional SQL functionality Join and Hash Indexes - separate index structures stored as objects within a database Permanent Journals - tables used to store before and/or after images for recovery

What are the statistics enhancements in TD 14.10? How can it help you on statistics collection process? What APIs are available to you and what are their functionalities? What type of Database objects are these APIs? What privileges should you have to use the auto stats features? Is it advised to automate the whole system with auto stats feature? If not, why? Where does the system store the metadata of all the collected stats? Does auto stats feature work with all data types? Does automated statistics management work on all types of tables? Which viewpoint portlet can be used for Automated Statistics Management? What functionalities does this portlet provide? Can you manually define a statistics collection within the portlet? If so, is there a way to run the stats collection on a regular basis

What are the statistics enhancements in TD 14.10? TD 14.10 has added the Autostats feature, which automates and provides intelligence and provides intelligence to database administrative tasks related to Optimizer Statistics Collections. (Orangebook: Automated Statistics Management Teradata 14.10, p. 1). How can it help you on statistics collection process? Autostats automates the collection of statistics. It helps eliminate problems such as: which columns to choose, how often to refresh the statistics, whether or not the statistics are still being used, are there tools that can help organize and schedule recurring collections and monitor their results. (Orangebook: Automated Statistics Management Teradata 14.10, p. 1). What APIs are available to you and what are their functionalities? Analyzer - analyze user objects and logged query plans to identify situations where statistics management might be improved (Orangebook: Automated Statistics Management Teradata 14.10, p. 51). Automate - copy the necessary metadata for dictionary-stored statistics into TDStats, which in turn places them under the control of statistics management features. (Orangebook: Automated Statistics Management Teradata 14.10, p. 51). RunCollect - submit the list of COLLECT STATISTICS statements prepared by PrepCollect to the DBS. (Orangebook: Automated Statistics Management Teradata 14.10, p. 52). PrepCollect - prepare a prioritized and batched list of collections to be executed during the next scheduled Run. (Orangebook: Automated Statistics Management Teradata 14.10, p. 52). Recollect - intended to be called after completing bulk load operations (or other significant update related events) to immediately recollect all maintained statistics for a specified table. (Orangebook: Automated Statistics Management Teradata 14.10, p. 52). DBA Control API - provide users the option of controlling the application recommended actions by the Analyzer and the ability to override any of the default settings stored in TDStats (Orangebook: Automated Statistics Management Teradata 14.10, p. 52). What type of Database objects are these APIs? The APIs are External Stored Procedures (Orangebook: Automated Statistics Management Teradata 14.10, p. 51). What privileges should you have to use the auto stats features? A user needs the SELECT, UPDATE, INSERT, and DELETE privileges to the TDStats table to use auto stats. They also need EXECUTE priviliedges to that table. In order to run Collect jobs, a user needs the STATISTICS privilege for the TDStats table. (Orangebook: Automated Statistics Management Teradata 14.10, p. 40). Is it advised to automate the whole system with auto stats feature? If not, why? Where does the system store the metadata of all the collected stats? The stored metadata of all collected stats are stored in a table called TDStats. (Orangebook: Automated Statistics Management Teradata 14.10, p. 1). Does auto stats feature work with all data types? The following are unsuitable for automated collection: - Columns with Complex Data Types (CDT), like geospatial - Statistics on temporary tables - Statistics on Extract, Transform, Load (ETL) tables (Orangebook: Automated Statistics Management Teradata 14.10, p. 15) Does automated statistics management work on all types of tables? The following are unsuitable for automated collection: - Columns with Complex Data Types (CDT), like geospatial - Statistics on temporary tables - Statistics on Extract, Transform, Load (ETL) tables (Orangebook: Automated Statistics Management Teradata 14.10, p. 15) Which viewpoint portlet can be used for Automated Statistics Management? Stats Mananger is the viewpoint portlet used for Automated Statistics Management. (Orangebook: Automated Statistics Management Teradata 14.10, p. 1). What functionalities does this portlet provide? It consists of a Viewpoint portlet named Stats Manager that provides a user interface along with job scheduling capabilities. Stats Manager calls DBS-supplied stored procedures to perform its major tasks, including tuning analysis and statistics collections. These stored procedures along with all of the DBS AutoStats related metadata are stored in a dedicated database named TDStats. (Orangebook: Automated Statistics Management Teradata 14.10, p. 1). The portlet also allows you to: - View all statistics defined on a system. - Schedule and execute collections in the background at defined time periods. - View when statistics were last collected and are scheduled for collection again. - Identify missing (new) statistics needed for accurate query optimization. - Detect and refresh stale statistics. - Identify and discontinue collecting unused statistics. Can you manually define a statistics collection within the portlet? If so, is there a way to run the stats collection on a regular basis? Users are able to manually select statistic collection from a list within the portal. They can then chose to run it only once or for every collection. (Orangebook: Automated Statistics Management Teradata 14.10, p. 23)

What is an Identity column? How is it implemented? Is it sequential? When would you use this?

What is an Identity column? A table-level system-generated number for a column value as rows are inserted in the table. How is it implemented? Column level in a CREATE TABLE statement Data type may be any exact numeric type -INTEGER, DECIMAL (x,0) Is it sequential? no When would you use this? To guarantee row uniqueness in a table To guarantee even row distribution for a table To optimize and simplify first port from other databases that utilize generated keys

What happens if Table A Column A defined as integer is compared to Table B Column A defined as character? What performance implications are possibly there?

When comparing a character type to a numeric type, the character type is converted to numeric for comparison. Source: Teradata Factory p.12-23 (470) & SQL Functions, Operators, Expressions, and Predicates p. 607 The attempted comparison could result in a Full Table Scan if a numeric value is input in a character domain or an unnecessary conversion if a character type is used in a numeric domain. Both situations reduce performance.

Does Teradata allow duplicate rows? How do you allow duplicate rows?

Yes, but not by default. In Teradata (BTET) mode, the default when creating a table is to not allow duplicate rows (a SET table). In order to allow duplicate rows, it needs to be explicitly stated in the CREATE TABLE syntax to create a MULTISET table.

Do I care about temp or spool space assignments on a database?

Yes. When sizing databases, it is important to estimate the amount of Spool space required. Maximum spool-space needs will vary with table size, use (type of application), and frequency of use. Only free cylinders are available for spool. If any AMP runs out of free space (insufficient available cylinders for Spool), the following message will be displayed: 2507 - Out of spool space on disk Yes. Temporary (Temp) Space is temporary space acquired automatically by the system when Global Temporary tables are materialized and used.

How do you transfer ownership of a database object? Are there costs to this approach?

• (Factory Manual, p. 2378) "GIVE database_name TO recipient_name;" • (Factory Manual; p. 2126) The GIVE statement transfers a database or user space to a recipient you specify. The GIVE statement also transfers all child databases and users as well as the tables, views and macros owned by the transferred object. Rules affecting transfer of ownership: • Use the GIVE statement to transfer databases and users only. (You cannot use the GIVE statement to transfer tables, views, and macros from one database to another.) • To transfer an object, you must have DROP DATABASE privilege on the object to transfer and CREATE DATABASE privilege on the receiving object. - Even though you may be transferring a USER to another user, you need the CREATE DATABASE privilege on the user that is going to get the transferred user. A CREATE USER privilege will not work. • You cannot give an object to one of its children. • During a transfer, you transfer all objects the object owns. • Transfer of ownership affects space ownership and access right privileges. When you transfer an object, the space the object owns is also transferred.

What is a Sparse Join Index?

• When creating any of the join indexes, you can include a WHERE clause to limit the rows created in the join index to a subset of the rows in the base table or tables. (Module 30, Student Manual, p. 1574) • Allows you to index a portion of the table using the WHERE clause in the CREATE JOIN INDEX statement to limit the rows indexed. (1610)


Kaugnay na mga set ng pag-aaral

Chapter 19: Public Health Informatics

View Set

Chapter 35: Geriatric Emergencies

View Set

Intestinal Obstructions and Colon Cancer

View Set

Chapter 15 pt 2 Med.Term- Endocrine System and Nutritional and Metabolic Diseases

View Set

35. Capital Budgeting (Web + Sch. Note)

View Set