SQL 2 colby_orr
What is a transaction in SQL
-Define -What does a transaction do -Simple Update = Autocommit Transaction -ACID -Nest Multiple Transactions -Save Points -Example (Code at bottom) I recently used a basic transaction with a try catch statement when inserting and updating new records into a product table and I set up a transaction on my query so that if any product would not insert it would be rolled back. BEGIN TRANSACTION [Tran1] BEGIN TRY INSERT INTO [Test].[dbo].[T1] ([Title], [AVG]) VALUES ('Tidd130', 130), ('Tidd230', 230) UPDATE [Test].[dbo].[T1] SET [Title] = N'az2' ,[AVG] = 1 WHERE [dbo].[T1].[Title] = N'az' COMMIT TRANSACTION [Tran1] END TRY BEGIN CATCH ROLLBACK TRANSACTION [Tran1] END CATCH GO Define: Transactions group a set of tasks into a single execution unit. Each transaction begins with a specific task and ends when all the tasks in the group successfully complete. If any of the tasks fails, the transaction fails. Therefore, a transaction has only two results: success or failure. Simple update = Autocommit transaction Users can group two or more Transact-SQL statements into a single transaction using the following statements: -Begin Transaction -Rollback Transaction -Commit Transaction If a transaction is successful, all of the data modifications made during the transaction are committed and become a permanent part of the database. If a transaction encounters errors and must be canceled or rolled back, then all of the data modifications are erased. transaction has four key properties that are abbreviated ACID. ACID is an acronym for for Atomic Consistent Isolated Durability. If you need a set of updates to either succeed or fail as whole then you wrap them in a transaction. If you want the two updates and one delete to stay in synch then that is what a transaction can do. BEGIN TRANSACTION; DELETE FROM HumanResources.JobCandidate WHERE JobCandidateID = 13; COMMIT; ----- BEGIN TRANSACTION [Tran1] BEGIN TRY INSERT INTO [Test].[dbo].[T1] ([Title], [AVG]) VALUES ('Tidd130', 130), ('Tidd230', 230) UPDATE [Test].[dbo].[T1] SET [Title] = N'az2' ,[AVG] = 1 WHERE [dbo].[T1].[Title] = N'az' COMMIT TRANSACTION [Tran1] END TRY BEGIN CATCH ROLLBACK TRANSACTION [Tran1] END CATCH GO --- BEGIN TRAN UPDATE authors SET au_fname = 'John' WHERE au_id = '172-32-1176' UPDATE authors SET au_fname = 'JohnY' WHERE city = 'Lawrence' IF @@ROWCOUNT = 5 COMMIT TRAN ELSE ROLLBACK TRAN --- Hopefully most of your transactions will occur in stored procedures. Let's look at the second example inside a stored procedure. SQL Server allows you to nest transactions. Basically, this feature means that a new transaction can start even though the previous one is not complete. Savepoints offer a mechanism to roll back portions of transactions. A user can set a savepoint, or marker, within a transaction. The savepoint defines a location to which a transaction can return if part of the transaction is conditionally canceled. SQL Server allows you to use savepoints via the SAVE TRAN statement, which doesn't affect the @@TRANCOUNT value. A rollback to a savepoint (not a transaction) doesn't affect the value returned by @@TRANCOUNT, either. However, the rollback must explicitly name the savepoint: using ROLLBACK TRAN without a specific name will always roll back the entire transaction. http://www.codeproject.com/Articles/4451/SQL-Server-Transactions-and-Error-Handling Atomicity. In a transaction involving two or more discrete pieces of information, either all of the pieces are committed or none are. Consistency. A transaction either creates a new and valid state of data, or, if any failure occurs, returns all data to its state before the transaction was started. Isolation. A transaction in process and not yet committed must remain isolated from any other transaction. Durability. Committed data is saved by the system such that, even in the event of a failure and system restart, the data is available in its correct state.
What is BCP? When is it Used?
BCP or BulkCopy is a tool used to copy huge amounts of data from tables and views. BCP does not copy the complete structures from source to destination. BULK INSERT command helps to import a data file into a database table or view in a user-specified format.
Write the command to remove all employees named John from the EMPLOYEE table.
DELETE from EMPLOYEE WHERE firstName = 'John'
Normalization
Database normalization, or simply normalization, is the process of organizing the columns (attributes) and tables (relations) of a relational database to reduce data redundancy and improve data integrity. We do this by simplifying the structures of the table from general to specific. Each table should encompass a single idea. Benefits: -Reduced usage of storage space by intelligently categorizing data -It aids in better, faster, stronger searches as it entails fewer entities to scan in comparison with the earlier searches based on mixed entities. This greatly improves performance. -Data integrity is improved through database normalization as it splits all the data into individual entities yet building strong linkages with the related data so that when something is modified you only have to make that change in one place (Cascading Effect) Ultimately we are focusing on starting from the general and work toward the specific. This Ultimately helps us to avoid redundancy problems such as -Inserting -Deleting -Modification Applying Normalization by giving an example. Step 1: Create first normal form (1NF) For 1NF, ensure that the values in each column of a table are atomic; which means they are unique, containing no sets of values. In our case, Author and Subject do not comply. One method for bringing a table into 1NF is to separate the entities contained in the table into separate tables. Step 2: Define relationships Three types of relations can be established: One-to-(Zero or)-one (Example: marriage) One-to-(Zero or)-many (Example: kids) (Foreign Keys) Many-to-many (Example: facebook) We can achieve 'one-to-many' relationships with a foreign key. A foreign key is a mechanism in database management systems (DBMS) that defines relations and creates constraints between data segments. When deleting a publisher, all the related books may need to be deleted along with the reviews of those books. The authors would not be needed to be deleted. Step 3: Make second normal form (2NF) Second normal form (2NF) cuts down the tautological/superfluous data in a table by selecting it, putting it in new tables and by establishing relations amongst them. In database normalization, 2NF is about the relations between the composite key columns and non-key columns. That means the non-key columns have to depend on the whole composite key. Here, the primary key is composite to eliminate the possibility of the same person writing more than one review of the book. Reviewer URL is dependent only on the Reviewer ID which is only part of the whole composite key. Step 4: Third Normal Form (3NF) This requires that all columns depend directly on the primary key. Tables violate the 3NF when one column depends on another column which in turn depends on the primary key. Levels of Normalization 1NF -For a table to be in 1NF you need to ensure that the data is atomic, having no repeating groups. -Atomic data is a form of minimalism for data items. A data item is atomic if only one item is in each cell of a table In First Normal Form, any row must not have a column in which more than one value is saved, like separated with commas. Rather than that, we must separate such data into multiple rows. Using the First Normal Form, data redundancy increases, as there will be many columns with same data in multiple rows but each row as a whole will be unique. Second normal form (2NF): An entity type is in 2NF when it is in 1NF and when all of its non-key attributes are fully dependent on its primary key. Third normal form (3NF): An entity type is in 3NF when it is in 2NF and when all of its attributes are directly dependent on the primary key. 1NF - Eliminate Repeating Groups 2NF - Eliminate Redundate Data 3NF - Eliminate Columns Not Dependent On Key
Delete vs Truncate
Delete Usage Example Simplified -Where Clause -Logging -Triggers -foreign keys -identity keys -locking -example Truncate -Very Fast execution -use when you want to reset a table identity -bypasses transactin logs -bypasses triggers -difficult to recover -Locks table (Be careful about shared concurrent users) -Foreign key constraint -I use truncate if I am working with test data so that I can clear out the data if needed -------- Delete -Condtions can be set to specify what rows you want to delete using a where clause. -DELETE is a logged operation on a per row basis. This means that the deletion of each row gets logged and physically deleted. This makes rows easier to reover if needed. -Any deletion triggers are executed on the affected table. -You are allowed to DELETE records that have foreign key constraints defined. -Record deletions don't reset identity keys. -Depending on the locking you are using, row locks are placed on deleted rows. Unaffected rows remain unlocked. -I had to delete some duplicate records recently using Delete top(2) From EmpDup Where empid=2 Example of Truncate I usually dont use a truncate command within T-SQL but recently I set up a truncate command in my ETL process. This ETL was extracing information from a excel file and I set up a Execute SQL Command transformation in the task flow and placed a truncate command in there so that every time the report ran it would truncate the table to clean it out and then put the new updated information in there. The table was a employee prospects table. Truncate -TRUNCATE statement executes very fast; however, it does so at the expense of not logging each row deleted. -TRUNCATE statement, it is much more difficult to recover, and you may loose data in the process. -TRUNCATE is also a logged operation, but in a different way. TRUNCATE logs the deallocation of the data pages in which the data exists. The deallocation of data pages means that your data rows still actually exist in the data pages, but the extents have been marked as empty for reuse. -TRUNCATE command does log the pages it removes, so it is possible to recover the pages using some advanced code. -Use When You want to "reset" a table to its empty state. All rows are removed, and identity key values reset to the initial defined values. -You need to have a super quick way of clearing out table data. I can see this occurring when you need to repeatedly import test data -Keep in mind that TRUNCATE will lock the table, so obviously don't use this command on a table being shared by many concurrent users. -You cannot TRUNCATE a table that has any foreign key constraints. You will have to remove the contraints, TRUNCATE the table, and reapply the contraints. -TRUNCATE will reset any identity columns to the default seed value. - Truncate bypasses the transaction log, and the only record of the truncation in the transaction logs is the page deallocation. -Records removed by the TRUNCATE TABLE statement cannot be restored. -TRUNCATE TABLE statement, it is all or nothing. -since TRUNCATE statements do not fire triggers. This could result in inconsistent data because ON DELETE/UPDATE triggers would not fire. TRUNCATE TABLE statements cannot be used for tables involved in replication or log shipping, since both depend on the transaction log to keep remote databases consistent. When tables require that all records be deleted and TRUNCATE TABLE cannot be used, the following statements can be used to delete the data and reset the identity value: DELETE from "table_name" DBCC CHECKIDENT("table_name", RESEED, "reseed_value")
Which SQL command is used to add a row?
INSERT
Implicit vs Explicit Cursor
Implicite is automatically created Explicite has to be declared
How do We Know if Any Query is Retrieving a Large Amount of Data or very little data?
In one way, it is quite easy to figure this out by just looking at the result set; however, this method cannot be relied upon every time as it is difficult to reach a conclusion when there are many columns and many rows. It is easy to measure how much data is retrieved from server to client side. The SQL Server Management Studio has feature that can measure client statistics.
Explain You SQL Experience
My Experience -7+ years of experience -Working with T-SQL 2008-2014 Best Practices -Properly formatting my code for readability -Inputting Notes for other developers to understand my logic -Avoid using Select * statements -Always Using Aliases when needed for readability Properly Creating and Configuring Database Objects -Creating tables and properly assigning relationships using primary and foreign keys -As well as implementing the proper indexes to maintain a high level of performance and referential integrity within the database -Using views to hide complexity of the code, for security so that we can set permissions on the views instead of the underlying table. -Creating stored procedures to handle complex code and logic to do things like calculations, dropping and recreating indexes, or used to run in a SSRS reports(give a recent example) I know how to use the data dictionary -Data dictionaries are used by designers and developers to understand the structure of the database. I can use the data dictionary to help me quickly identify important information without having to hunt and peck through the object explorer. For example finding all of the tables and views that that use a certain column name like Business Entity ID SQL Profiler is a powerful tool -Use SQL profiler to help identify and troubleshoot troublesome queries. -Things like viewing execution plans -I recently used the SQL profiler to set up a trace on a poorly performing stored procedure and doing so allowed me to set up an execution plan to discover that the creator of the SP had too wide of a net cast on their "where" clause Tuning Advisor -Tuning Advisor to help performance tune a database by adding, removing, or modifying physical design structures in your databases like creating indexes, partitioning tables, or updating statistics. Other Tools -Transactions to maintain the Atomicity, Consistency, Isolation, and Durability -Case statements to embed an if-then-else logic in the SELECT clause to help transform data into the proper output or even grouping information based on certain parameters. -Use table joins (Both inner and Outer) to combine table information and storing that in a view so that people can quickly access that information or storing the subset of information into a temp table if I will be reusing that data for further manipulation. -Temp tables and table variables to store result sets while reducing the load on a databases storage -Setting up triggers to Enforcing referential integrity, Event logging and storing information on table access, or Preventing invalid transactions -Merge statements to Conditionally insert or update rows in several target tables Setting up roles and permissions -Selecting and assigning privileges to users on various database objects (Find Code and put that below here) Deadlocks (Add later when I understand more) Normalization and Denormalization (Make sure I understand all of these and that I put all of it to practice throroughly. If I am not confident in it dont mention it).
how to debug a stored procedure?
Sql Profiler Examin the code http://www.dbafire.com/2014/09/07/troubleshoot-sql-script-like-boss-sql-server/
What does TOP Operator Do?
The TOP operator is used to specify the number of rows to be returned by a query. The TOP operator has new addition in SQL SERVER 2008 that it accepts variables as well as literal values and can be used with INSERT, UPDATE, and DELETE statements.
What is the XML Datatype?
The xml data type lets you store XML documents and fragments in a SQL Server database. An XML fragment is an XML instance that has a missing single top-level element. You can create columns and variables of the xml type and store XML instances in them. The xml data type and associated methods help integrate XML into the relational framework of SQL Server.
How do you maintain database integrity where deletions from one table will automatically cause deletions in another table?
You create a trigger that will automatically delete elements in the second table when elements from the first table are removed.
Why can't I use Outer Join in an Indexed View?
Rows can logically disappear from an indexed view based on OUTER JOIN when you insert data into a base table. This makes incrementally updating OUTER JOIN views relatively complex to implement, and the performance of the implementation would be slower than for views based on standard (INNER) JOIN.(Read More Here)
What is Policy Management?
Policy Management in SQL SERVER 2008 allows you to define and enforce policies for configuring and managing SQL Server across the enterprise. Policy-Based Management is configured in SQL Server Management Studio (SSMS). Navigate to the Object Explorer and expand the Management node and the Policy Management node; you will see the Policies, Conditions, and Facets nodes.
What is ROW_NUMBER()?
ROW_NUMBER() returns a column as an expression that contains the row's number within the result set. This is only a number used in the context of the result set; if the result changes, the ROW_NUMBER() will change.
What is RAISEERROR?
RaiseError generates an error message and initiates error processing for the session. RAISERROR can either reference a user-defined message stored in the sys.messages catalog view or build a message dynamically. The message is returned as a server error message to the calling application or to an associated CATCH block of a TRY...CATCH construct.
Optimizing a SQL Query
Ways to Optimize a Query: 1. Learn How to Create Indexes Properly 2. Only Retrieve the Data You Really Need 3. Avoid Functions on the Left Hand-Side of the Operator 4. Consider Getting Rid of Correlated Subqueries 5. Avoid Wildcard Characters at the Beginning of a LIKE Pattern https://www.vertabelo.com/blog/technical-articles/5-tips-to-optimize-your-sql-queries http://www.infoworld.com/article/2628420/database/database-7-performance-tips-for-faster-sql-queries.html
ACID
Atomicity. In a transaction involving two or more discrete pieces of information, either all of the pieces are committed or none are. Consistency. A transaction either creates a new and valid state of data, or, if any failure occurs, returns all data to its state before the transaction was started. Isolation. A transaction in process and not yet committed must remain isolated from any other transaction. Durability. Committed data is saved by the system such that, even in the event of a failure and system restart, the data is available in its correct state.
What is the SQL server TCP port number?
1433
What is the Maximum Number of Index per Table?
1 Clustered Index + 249 Nonclustered Index = 250 Index. For SQL Server 2008: 1 Clustered Index + 999 Nonclustered Index = 1000 Index.
What is a Filestream?
Filestream allows you to store large objects in the file system and have these files integrated within the database. It enables SQL Server-based applications to store unstructured data such as documents, images, audios and videos in the file system. FILESTREAM basically integrates the SQL Server Database Engine with New Technology File System (NTFS); it basically stores the data in varbinary (max) data type. Using this data type, the unstructured data is stored in the NTFS file system, and the SQL Server Database Engine manages the link between the Filestream column and the actual file located in the NTFS. Using Transact-SQL statements users can insert, update, delete and select the data stored in FILESTREAM-enabled tables.
What is the Difference between a HAVING clause and a WHERE clause?
They specify a search condition for a group or an aggregate. But the difference is that HAVING can be used only with the SELECT statement. HAVING is typically used in a GROUP BY clause. When GROUP BY is not used, HAVING behaves like a WHERE clause. Having Clause is basically used only with the GROUP BY function in a query, whereas WHERE Clause is applied to each row before they are part of the GROUP BY function in a query. (Read more here)
What is cardinality?
Thinking mathematically, it is the number of elements in a set. Thinking in the database world, cardinality has to do with the counts in a relationship, one-to-one, one-to-many, or many-to-many.
What are the Advantages of Using CTE?
Using CTE improves the readability and enables easy maintenance of complex queries. The query can be divided into separate, simple, and logical building blocks, which can be then used to build more complex CTEs until the final result set is generated. CTE can be defined in functions, stored procedures, triggers or even views. After a CTE is defined, it can be used as a Table or a View and can SELECT, INSERT, UPDATE or DELETE Data.
What is Use of @@ SPID in SQL Server?
A SPID is the returns sessions ID of the current user process. And using that session ID, we can find out that the last query was executed.
How to Enable/Disable Indexes?
--Disable Index ALTER INDEX [IndexName] ON TableName DISABLE GO --Enable Index ALTER INDEX [IndexName] ON TableName REBUILD GO
What is the procedure when using a delete statment
1. Create a backup table (DBA) 2. Check for approval 3. Ensure Where Clause is deleting the proper data
Isolation Levels
1. READ UNCOMMITTED: UserA will see the change made by UserB. This isolation level is called dirty reads, which means that read data is not consistent with other parts of the table or the query, and may not yet have been committed. This isolation level ensures the quickest performance, as data is read directly from the table's blocks with no further processing, verifications or any other validation. The process is quick and the data is asdirty as it can get. 2. READ COMMITTED: UserA will not see the change made by UserB. This is because in the READ COMMITTED isolation level, the rows returned by a query are the rows that were committed when the query was started. The change made by UserB was not present when the query started, and therefore will not be included in the query result. 3. REPEATABLE READ: UserA will not see the change made by UserB. This is because in the REPEATABLE READ isolation level, the rows returned by a query are the rows that were committed when the transaction was started. The change made by UserB was not present when the transaction was started, and therefore will not be included in the query result. This means that "All consistent reads within the same transaction read the snapshot established by the first read" (from MySQL documentation. See http://dev.mysql.com/doc/refman/5.1/en/innodb-consistent-read.html). 4. SERIALIZABLE: This isolation level specifies that all transactions occur in a completely isolated fashion, meaning as if all transactions in the system were executed serially, one after the other. The DBMS can execute two or more transactions at the same time only if the illusion of serial execution can be maintained. In practice, SERIALIZABLE is similar to REPEATABLE READ, but uses a different implementation for each database engine. In Oracle, the REPEATABLE READ level is not supported and SERIALIZABLE provides the highest isolation level. This level is similar to REPEATABLE READ, but InnoDB implicitly converts all plain SELECT statements to "SELECT ... LOCK IN SHARE MODE. http://highscalability.com/blog/2011/2/10/database-isolation-levels-and-their-effects-on-performance-a.html
What is CHECK Constraint?
A CHECK constraint is used to limit the values that can be placed in a column. The check constraints are used to enforce domain integrity.
What is NOT NULL Constraint?
A NOT NULL constraint enforces that the column will not accept null values. The not null constraints are used to enforce domain integrity, as the check constraints.
Keys
A PRIMARY KEY constraint is a unique identifier for a row within a database table. Every table should have a primary key constraint to uniquely identify each row, and only one primary key constraint can be created for each table. The primary key constraints are used to enforce entity integrity. A UNIQUE constraint enforces the uniqueness of the values in a set of columns; so no duplicate values are entered. The unique key constraints are used to enforce entity integrity as the primary key constraints. A FOREIGN KEY constraint prevents any actions that would destroy links between tables with the corresponding data values. A foreign key in one table points to a primary key in another table. Foreign keys prevent actions that would leave rows with foreign key values when there are no primary keys with that value. The foreign key constraints are used to enforce referential integrity.
What is PIVOT and UNPIVOT?
A Pivot Table can automatically sort, count, and total the data stored in one table or spreadsheet and create a second table displaying the summarized data. The PIVOT operator turns the values of a specified column into column names, effectively rotating a table. In simpler word UNPIVOT table is reverse of PIVOT Table, however it is not exactly true. UNPIVOTING is for sure reverse operation to PIVOTING but if during PIVOTING process data aggregated the UNPIVOT table does not return to original table.
What is a surrogate key and a Natural Key?
A natural key is a key that has contextual or business meaning (for example, in a table containing STORE, SALES and DATE we might use the DATE field as a natural key when joining with another table detailing inventory). A natural key can be system-generated, but natural keys are at least partially determined by a manual process. Some natural keys are totally manually generated. One of the most widely recognized use of a natural key is a stock ticker symbol - i.e. MSFT, APPL, and GOOGL. Natural keys serve as a great primary key when contextual meaning is important. A surrogate key is a key which does not have any contextual or business meaning. It is manufactured "artificially" and only for the purposes of data analysis. The most frequently used version of a surrogate key is an increasing sequential integer or "counter" value (i.e. 1, 2, 3). Surrogate keys can also include the current system date/time stamp, or a random alphanumeric string. https://www.sisense.com/blog/when-and-how-to-use-surrogate-keys/ (More info on how and when to use them ^^)
What are Sparse Columns?
A sparse column is another tool used to reduce the amount of physical storage used in a database. They are the ordinary columns that have an optimized storage for null values. Sparse columns reduce the space requirements for null values at the cost of more overhead to retrieve non-null values.
What is Aggregate Functions?
Aggregate functions perform a calculation on a set of values and return a single value. Aggregate functions ignore NULL values except COUNT function. HAVING clause is used, along with GROUP BY for filtering query using aggregate values. The following functions are aggregate functions. AVG, MIN, CHECKSUM_AGG, SUM, COUNT, STDEV, COUNT_BIG, STDEVP, GROUPING, VAR, MAX, VARP
What is an index?
An index is a physical structure containing pointers to the data. Indices are created in an existing table to locate rows more quickly and efficiently. It is possible to create an index on one or more columns of a table, and each index is given a name. The users cannot see the indexes; they are just used to speed up queries. Effective indexes are one of the best ways to improve performance in a database application. A table scan happens when there is no index available to help a query. In a table scan, the SQL Server examines every row in the table to satisfy the query results. Table scans are sometimes unavoidable, but on large tables, scans have a terrific impact on performance.
What is the Difference between Index Seek vs. Index Scan?
An index scan means that SQL Server reads all the rows in a table, and then returns only those rows that satisfy the search criteria. When an index scan is performed, all the rows in the leaf level of the index are scanned. This essentially means that all the rows of the index are examined instead of the table directly. This is sometimes compared to a table scan, in which all the table data is read directly. However, there is usually little difference between an index scan and a table scan. An index seek, on the other hand, means that the Query Optimizer relies entirely on the index leaf data to locate rows satisfying the query condition. An index seek will be most beneficial in cases where a small percentage of rows will be returned. An index seek will only affect the rows that satisfy a query condition and the pages that contain these qualifying rows; in terms of performance, this is highly beneficial when a table has a very large number of rows.
What's the Difference between a Primary Key and a Unique Key?
Both primary key and unique key enforce uniqueness of the column on which they are defined. But by default, the primary key creates a clustered index on the column, whereas unique key creates a non-clustered index by default. Another major difference is that primary key doesn't allow NULLs, but unique key allows one NULL only. (Read more here)
What is CHECKPOINT Process in the SQL Server?
CHECKPOINT process writes all dirty pages for the current database to disk. Dirty pages are data pages that have been entered into the buffer cache and modified, but not yet written to disk.
Explain a CTE table compared to temp tables and table variables.
CTEs... -Are unindexable (but can use existing indexes on referenced objects) - Cannot have constraints -Are essentially disposable VIEWs -Persist only until the next query is run -Can be recursive -Do not have dedicated stats (rely on stats on the underlying objects) - They are recursive #Temp Tables... -Are real materialized tables that exist in tempdb -Can be indexed -Can have constraints -Persist for the life of the current CONNECTION -Can be referenced by other queries or subprocedures -Have dedicated stats generated by the engine
What is a change data capture
Change Data Capture records INSERTs, UPDATEs, and DELETEs applied to SQL Server tables, and makes a record available of what changed, where, and when, in simple relational 'change tables' rather than in an esoteric chopped salad of XML. Because CDC is a table-level feature, it then has to be enabled for each table to be tracked. You can run following query and check whether it is enabled for any database. USE master GO SELECT [name], database_id, is_cdc_enabled FROM sys.databases GO https://www.simple-talk.com/sql/learn-sql-server/introduction-to-change-data-capture-cdc-in-sql-server-2008/
What is Collation?
Collation refers to a set of rules that determine how data is sorted and compared. Character data is sorted using rules that define the correct character sequence with options for specifying case sensitivity, accent marks, Kana character types, and character width. (Read more here)
What is Database Mirroring?
Database mirroring can be used with replication to provide availability for the publication database. Database mirroring involves two copies of a single database that typically reside on different computers. At any given time, only one copy of the database is currently available to clients, which is known as the principal database. Updates made by the clients to the principal database are applied to the other copy of the database, known as the mirror database. Mirroring involves applying the transaction log from every insertion, update, or deletion made on the principal database onto the mirror database.
Tell me about your last project
Deliv 1. We worked on an agile project consisting of 10 sprints 2. Extracting information for driver demographics 3. Creating a Clean ETL Process - Extracting data from the proper data sources (Excel and XML) - Using proper transformations (Merge join, data conversion, derived column) 4. Error Handeling process 5. Loading into staging tables (What are staging tables?) 6. Loaded into our database (OLTP or OLAP?) 7. Created a stored procedure 8. Created a dashboard with: - Pie chart, bar charts - Cascading Parameters, drill down, and drill through
What is Difference in Performance between INSERT TOP (N) INTO Table and Using Top with INSERT?
INSERT TOP (N) INTO Table is faster than Using Top with INSERT but when we use INSERT TOP (N) INTO Table, the ORDER BY clause is totally ignored.
What is an Identity?
Identity (or AutoNumber) is a column that automatically generates numeric values. A start and increment value can be set, but most DBAs leave these at 1. A GUID column also generates unique keys. Updated based on the comment of Aaron Bertrand. (Blog)
What is Data Compression?
In SQL SERVE 2008, Data Compression comes in two flavors: Row Compression Page Compression Row Compression Row compression changes the format of physical storage of data. It minimize the metadata (column information, length, offsets etc) associated with each record. Numeric data types and fixed-length strings are stored in variable-length storage format, just like Varchar. (Read more here) Page Compression Page compression allows common data to be shared between rows for a given page. It uses the following techniques to compress data: Row compression. Prefix Compression. For every column in a page, duplicate prefixes are identified. These prefixes are saved in compression information headers which resides after the page header. A reference number is assigned to these prefixes and that reference number is replaced where ever those prefixes are being used. Dictionary Compression Dictionary compression searches for duplicate values throughout the page and stores them in CI. The main difference between prefix and dictionary compression is that the former is only restricted to one column while the latter is applicable to the complete page.
What is the Difference between GRANT and WITH GRANT while Giving Permissions to the User?
In case of only GRANT, the username cannot grant the same permission to other users. On the other hand, with the option WITH GRANT, the username will be able to give the permission after receiving requests from other users.
What is the Difference between VARCHAR and NVARCHAR datatypes?
In principle, they are the same and are handled in the same way by your application. The only difference is that NVARCHAR can handle unicode characters, allowing you to use multiple languages in the database (Arabian, Chinese, etc.). NVARCHAR takes twice as much space when compared to VARCHAR. Use NVARCHAR only if you are using foreign languages.
Keys
Key. A key is one or more data attributes that uniquely identify an entity. In a physical database a key would be formed of one or more table columns whose value(s) uniquely identifies a row within a relational table. Composite key. A key that is composed of two or more attributes. Natural key. A key that is formed of attributes that already exist in the real world. For example, U.S. citizens are issued a Social Security Number (SSN) that is unique to them (this isn't guaranteed to be true, but it's pretty darn close in practice). SSN could be used as a natural key, assuming privacy laws allow it, for a Person entity (assuming the scope of your organization is limited to the U.S.). Surrogate key. A key with no business meaning. Candidate key. An entity type in a logical data model will have zero or more candidate keys, also referred to simply as unique identifiers (note: some people don't believe in identifying candidate keys in LDMs, so there's no hard and fast rules). For example, if we only interact with American citizens then SSN is one candidate key for the Person entity type and the combination of name and phone number (assuming the combination is unique) is potentially a second candidate key. Both of these keys are called candidate keys because they are candidates to be chosen as the primary key, an alternate key or perhaps not even a key at all within a physical data model. Primary key. The preferred key for an entity type. Alternate key. Also known as a secondary key, is another unique identifier of a row within a table. Foreign key. One or more attributes in an entity type that represents a key, either primary or secondary, in another entity type. - See more at: http://www.agiledata.org/essays/keys.html
What do you mean by TABLESAMPLE?
TABLESAMPLE allows you to extract a sampling of rows from a table in the FROM clause. The rows retrieved are random and they are not in any order. This sampling can be based on a percentage of number of rows. You can use TABLESAMPLE when only a sampling of rows is necessary for the application instead of a full result set.
How to Rebuild the Master Database?
Master database is system database and it contains information about running server's configuration. When SQL Server 2005 is installed, it usually creates master, model, msdb, tempdb, resourcedb and the distribution system database by default. Only the Master database is the one which is absolutely a must-have database. Without the Master database, the SQL Server cannot be started. This is the reason why it is extremely important to backup the Master database. To rebuild the Master database, run Setup.exe, verify, and repair a SQL Server instance, and rebuild the system databases. This procedure is most often used to rebuild the master database for a corrupted installation of SQL Server.
Is It Possible to have Clustered Index on Separate Drive From Original Table Location?
No! It is not possible.
Does the Order of Columns in UPDATE statements Matter?
No, the order of columns in UPDATE statement does not matter for results updated. Both the below options produce the same results. Option 1: UPDATE TableName SET Col1 ='Value', Col2 ='Value2' Option 2: UPDATE TableName SET Col2 ='Value2', Col1 ='Value'
Can we Insert Data if Clustered Index is Disabled?
No, we cannot insert data if Clustered Index is disabled because Clustered Indexes are in fact original tables which are physically ordered according to one or more keys (Columns).
Inner vs outer joins?
Now that we've gone over outer joins, we can contrast those with the inner join. The difference between an inner join and an outer join is that an inner join will return only the rows that actually match based on the join predicate. An inner join involves joining two tables where a common id/key exists in both. An outer join is the joining of two tables, but where there is no match in the second (or first).
Name 3 ways to get an Accurate Count of the Number of Records in a Table?
SELECT * FROM table1 SELECT COUNT(*) FROM table1 SELECT rows FROM sysindexes WHERE id = OBJECT_ID(table1) AND indid < 2
Explain Few of the New Features of SQL Server 2008 Management Studio
SQL Server 2008 Microsoft has upgraded SSMS with many new features as well as added tons of new functionalities requested by DBAs for long time. A few of the important new features are as follows: IntelliSense for Query Editing Multi Server Query Query Editor Regions Object Explorer Enhancements Activity Monitors
What is the Maximum Size per Database for SQL Server Express?
SQL Server Express supports a maximum size of 4 GB per database, which excludes all the log files. 4 GB is not a very large size; however, if the database is properly designed and the tables are properly arranged in a separate database, this limitation can be resolved to a certain extent.
How will you Handle Error in SQL SERVER 2008?
SQL Server now supports the use of TRY...CATCH constructs for providing rich error handling. TRY...CATCH lets us build error handling at the level we need, in the way we need to by setting a region where if any error occurs, it will break out of the region and head to an error handler. The basic structure is as follows: BEGIN TRY <code> END TRY BEGIN CATCH <code> END CATCH So if any error occurs in the TRY block, then execution is diverted to the CATCH block, and the error can be resolved.
What is a Scheduled Job or What is a Scheduled Task?
Scheduled tasks let user automate processes that run on regular or predictable cycles. User can schedule administrative tasks, such as cube processing, to run during times of slow business activity. User can also determine the order in which tasks run by creating job steps within a SQL Server Agent job, e.g. back up database and update statistics of the tables. Job steps give user control over flow of execution. If one job fails, then the user can configure SQL Server Agent to continue to run the remaining tasks or to stop execution.
What are the Advantages of Using Stored Procedures?
Stored procedure can reduced network traffic and latency, boosting application performance. Stored procedure execution plans can be reused; they staying cached in SQL Server's memory, reducing server overhead. Stored procedures help promote code reuse. Stored procedures can encapsulate logic. You can change stored procedure code without affecting clients. Stored procedures provide better security to your data.
What is B-Tree?
The database server uses a B-tree structure to organize index information. B-Tree generally has following types of index pages or nodes: Root node: A root node contains node pointers to only one branch node. Branch nodes: A branch node contains pointers to leaf nodes or other branch nodes, which can be two or more. Leaf nodes: A leaf node contains index items and horizontal pointers to other leaf nodes, which can be many.
What are the elements of an ERD?
The three elements include: the entities for which someone is seeking information, the attributes of those entities, the relationships between the entities.
How to Optimize Stored Procedure Optimization?
There are many tips and tricks for the same. Here are few: Include SET NOCOUNT ON statement. Use schema name with object name. Do not use the prefix "sp_" in the stored procedure name. Use IF EXISTS (SELECT 1) instead of (SELECT *). Use the sp_executesql stored procedure instead of the EXECUTE statement. Try to avoid using SQL Server cursors whenever possible. Keep the Transaction as short as possible. Use TRY-Catch for error handling.
What are the Authentication Modes in SQL Server? How can it be Changed?
There are two authentication modes in SQL Server. Windows Mode Mixed Mode - SQL and Windows To change authentication mode in SQL Server, go to Start -> Programs- > Microsoft SQL Server and click SQL Enterprise Manager to run SQL Enterprise Manager from the Microsoft SQL Server program group. Select the server; then from the Tools menu, select SQL Server Configuration Properties and choose the Security page.
What is recompilation?
This happens because temp tables are treated just like regular tables by the SQL Server Engine. When the tables (in which underlying queries rely on) change significantly, SQL Server detects this change (using auto update statistics) and marks the dependent queries to be recompiled so the next execution can create an optimal execution plan. A similar sort of phenomenon also happens in temp tables.
When might someone denormalize their data?
Typically done for performance reasons, to reduce the number of table joins. This is not a good idea in a transactional environment as there are inherent data integrity risks or performance risks due to excessive locking to maintain data integrity.
What is a Table Called, if it has neither Cluster nor Non-cluster Index? What is it Used for?
Unindexed table or Heap. Microsoft Press Books and Book on Line (BOL) refers it as Heap. A heap is a table that does not have a clustered index and therefore, the pages are not linked by pointers. The IAM pages are the only structures that link the pages in a table together. Unindexed tables are good for fast storing of data. Many times, it is better to drop all the indexes from table and then do bulk of INSERTs and restore those indexes after that.
What is NOLOCK?
Using the NOLOCK query optimizer hint is generally considered a good practice in order to improve concurrency on a busy system. When the NOLOCK hint is included in a SELECT statement, no locks are taken on data when data is read. The result is a Dirty Read, which means that another process could be updating the data at the exact time you are reading it. There are no guarantees that your query will retrieve the most recent data. The advantage to performance is that your reading of data will not block updates from taking place, and updates will not block your reading of data. SELECT statements take Shared (Read) locks. This means that multiple SELECT statements are allowed simultaneous access, but other processes are blocked from modifying the data. The updates will queue until all the reads have completed, and reads requested after the update will wait for the updates to complete. The result to your system is delay (blocking).
What is the Difference between VARCHAR and VARCHAR(MAX) Datatypes?
VARCHAR stores variable-length character data whose range varies up to 8000 bytes; varchar(MAX) stores variable-length character data whose range may vary beyond 8000 bytes and till 2 GB. TEXT datatype is going to be deprecated in future versions, and the usage of VARCHAR(MAX) is strongly recommended instead of TEXT datatypes.
What is the difference between CHAR and VARCHAR Datatypes?
VARCHARS are variable length strings with a specified maximum length. If a string is less than the maximum length, then it is stored verbatim without any extra characters, e.g. names and emails. CHARS are fixed-length strings with a specified set length. If a string is less than the set length, then it is padded with extra characters, e.g. phone number and zip codes. For instance, for a column which is declared as VARCHAR(30) and populated with the word 'SQL Server,' only 10 bytes will be stored in it. However, if we have declared the column as CHAR(30) and populated with the word 'SQL Server,' it will still occupy 30 bytes in database.
How to Delete Duplicate Rows?
We can delete duplicate rows using CTE and ROW_NUMBER () feature of SQL Server 2005 and SQL Server 2008. e.g. WITH CTE (COl1,Col2, DuplicateCount) AS ( SELECT COl1,Col2, ROW_NUMBER() OVER(PARTITION BY COl1,Col2 ORDER BY Col1) AS DuplicateCount FROM DuplicateRcordTable ) DELETE FROM CTE WHERE DuplicateCount >1
Using a case statement to change incoming information
case when BASIC = 1 then 'YES' when BASIC = 0 then 'NO' else 'UNDEFINED' end
Snapshot
http://solutioncenter.apexsql.com/using-sql-server-database-snapshots-protect-yourself-against-accidental-data-modification/ http://blog.sqlauthority.com/2010/04/05/sql-server-2008-introduction-to-snapshot-database-restore-from-snapshot/
Explain Locking in SQL
http://www.sqlteam.com/article/introduction-to-locking-in-sql-server
SQL Performance Tuning
https://www.mssqltips.com/sql-server-tip-category/9/performance-tuning/
Indexed Views
https://www.simple-talk.com/sql/learn-sql-server/sql-server-indexed-views-the-basics/
How to troubleshoot a deadlock?
https://www.sqlskills.com/blogs/jonathan/the-accidental-dba-day-29-of-30-troubleshooting-deadlocks/
More great interview questions
https://www.tutorialspoint.com/sql/sql_interview_questions.htm
SQL training
https://www.youtube.com/watch?v=ZNObiptSMSI&list=PL08903FB7ACA1C2FB
System stored procedures that I have used
sp_ helpindex [table] - shows you index info (same info as sp_help) sp_helpconstraint [table] - shows you primary/foreign key/defaults and other constraints * sp_depends [obj] - shows dependencies of an object, for example: sp_depends [table] - shows you what stored procs, views, triggers, UDF affect this table sp_depends [sproc] - shows what tables etc are affected/used by this stored proc Elaborate just a little on these in using them in my own experience.
Merge Statement
- The MERGE statement allows you to join a data source with a target table or view, and then perform multiple actions (like insert, update, delete) against the target based on the results of that join. Merge also allows you to do this using a single statement. -you can use a MERGE statement to modify data in a target table based on data in a source table. The statement joins the target to the source by using a column common to both tables, such as a primary key. You can then insert, modify, or delete data from the target table-all in one statement-according to how the rows match up as a result of the join. -The first MERGE clause we'll look at is WHEN MATCHED. You should use this clause when you want to update or delete rows in the target table that match rows in the source table. Rows are considered matching when the joined column values are the same. -The next clause in the MERGE statement we'll review is WHEN NOT MATCHED [BY TARGET]. (The BY TARGET keywords are optional.) You should use this clause to insert new rows into the target table. The rows you insert into the table are those rows in the source table for which there are no matching rows in the target. -MERGE BookInventory bi USING BookOrder bo ON bi.TitleID = bo.TitleID WHEN MATCHED AND bi.Quantity + bo.Quantity = 0 THEN DELETE WHEN MATCHED THEN UPDATE SET bi.Quantity = bi.Quantity + bo.Quantity; 5 key pieces 1. The MERGE clause specifies the table or view that is the target of the insert, update, or delete operations. 2. The USING clause specifies the data source being joined with the target. 3. The ON clause specifies the join conditions that determine where the target and source match. 4. The WHEN clauses (WHEN MATCHED, WHEN NOT MATCHED BY TARGET, and WHEN NOT MATCHED BY SOURCE) specify the actions to take based on the results of the ON clause and any additional search criteria specified in the WHEN clauses. 5. The OUTPUT clause returns a row for each row in the target that is inserted, updated, or deleted. https://www.simple-talk.com/sql/learn-sql-server/the-merge-statement-in-sql-server-2008/ https://www.youtube.com/watch?v=jywhBCgknuY
What is a stored procedure?
-Define -Describe benefits -Give example of a complex Stored Procedure (Use the mortgage project example) A stored procedure in SQL Server is a group of one or more Transact-SQL statements Accept input parameters and return multiple values in the form of output parameters to the calling program. Contain programming statements that perform operations in the database. Return a status value to a calling program to indicate success or failure (and the reason for failure). Benefits of using a stored procedure: 1. Reduced server/client network traffic 2. Stronger security 3. Reuse of code 4. Easier maintenance 5. Improved performance ------- A stored procedure is a named group of SQL statements that have been previously created and stored in the server database. Stored procedures accept input parameters so that a single procedure can be used over the network by several clients using different input data. And when the procedure is modified, all clients automatically get the new version. Stored procedures reduce network traffic and improve performance. Stored procedures can be used to help ensure the integrity of the database. e.g. sp_helpdb, sp_renamedb, sp_depends etc. https://msdn.microsoft.com/en-us/library/ms190782.aspx
Levels of Normalization
1NF: Eliminate Repeating Groups Make a separate table for each set of related attributes, and give each table a primary key. Each field contains at most one value from its attribute domain. 2NF: Eliminate Redundant Data If an attribute depends on only part of a multi-valued key, then remove it to a separate table. 3NF: Eliminate Columns Not Dependent On Key If attributes do not contribute to a description of the key, then remove them to a separate table. All attributes must be directly dependent on the primary key. (Read more here) BCNF: Boyce-Codd Normal Form If there are non-trivial dependencies between candidate key attributes, then separate them out into distinct tables. 4NF: Isolate Independent Multiple Relationships No table may contain two or more 1:n or n:m relationships that are not directly related. 5NF: Isolate Semantically Related Multiple Relationships There may be practical constrains on information that justify separating logically related many-to-many relationships. ONF: Optimal Normal Form A model limited to only simple (elemental) facts, as expressed in Object Role Model notation. DKNF: Domain-Key Normal Form A model free from all modification anomalies is said to be in DKNF. Remember, these normalization guidelines are cumulative. For a database to be in 3NF, it must first fulfill all the criteria of a 2NF and 1NF database.
What are the Difference between Clustered and a Non-clustered Index?
A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore, the table can have only one clustered index. The leaf nodes of a clustered index contain the data pages. A non-clustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a non-clustered index does not consist of the data pages. Instead, the leaf nodes contain index rows. (Read more here)
What is a deadlock?
A deadlock occurs when two or more sessions are waiting for each other, in such a way that none can complete. SQL Server's lock monitor has a deadlock detector that periodically checks the locks to see if there are any circular locking chains. If it finds any, it selects one of the sessions associated with a suspended thread, kills it, rolls back its transaction and releases its locks. This allows the other session to continue executing. The killed session, known as the deadlock victim, receives error 1205 The lock monitor picks the deadlock victim based, firstly, on the setting of DEADLOCK_PRIORITY for each session and, secondly (in the event of a tie) on the amount of work that it will take to roll back each of the open transactions. It's convenient that SQL Server will detect and resolve deadlocks automatically, but that doesn't mean a DBA can just ignore them. After all, the deadlock victim may well be an important business operation and its failure to run will cause considerable disruption. A DBA needs to know when a deadlock occurs in one of their SQL Server instances by alerting on 1205 errors, to capture the information, the deadlock graph, which will detail which processes were involved in the deadlock, and then set about trying to ensure that it does not occur again. The DBA will then have to read the dealock charts and decide the priority. https://www.simple-talk.com/sql/performance/sql-server-deadlocks-by-example/
What is Dirty Read?
A dirty read occurs when two operations, say, read and write occur together giving the incorrect or unedited data. Suppose, A changed a row but did not committed the changes. B reads the uncommitted data but his view of the data may be wrong so that is Dirty Read.
What is a trigger, and what are the different kinds of triggers?
A trigger is a database object that is attached to a table and is fired only when a Insert, Update, or Delete occurs. I only use them when I need to perform a certain action as a result of an INSERT, UPDATE or DELETE and ad hoc SQL (aka SQL Passthrough) is used. Examples would be show system time or email someone. Benefits: Generating some derived column values automatically Enforcing referential integrity Event logging and storing information on table access Auditing Synchronous replication of tables Imposing security authorizations Preventing invalid transactions Disadvantages: SQL triggers are invoked and executed invisible from the client applications, therefore, it is difficult to figure out what happen in the database layer. SQL triggers may increase the overhead of the database server. Types of Triggers: DDL Triggers (Create, Alter, Drop) DML Triggers (After Trigger and Instead of Trigger) CLR Triggers Logon Triggers Example Query for showing system time when a row is inserted: CREATE TABLE Source (Sou_ID int IDENTITY, Sou_Desc varchar(10)) go CREATE TRIGGER tr_Source_INSERT ON Source FOR INSERT AS PRINT GETDATE() go INSERT Source (Sou_Desc) VALUES ('Test 1') ----- A trigger is a SQL procedure that initiates an action when an event (INSERT, DELETE or UPDATE) occurs. Triggers are stored in and managed by the DBMS. Triggers are used to maintain the referential integrity of data by changing the data in a systematic fashion. A trigger cannot be called or executed; DBMS automatically fires the trigger as a result of a data modification to the associated table. Triggers can be considered to be similar to stored procedures in that both consist of procedural logic that is stored at the database level. Stored procedures, however, are not event-drive and are not attached to a specific table as triggers are. Stored procedures are explicitly executed by invoking a CALL to the procedure while triggers are implicitly executed. In addition, triggers can also execute stored procedures. Nested Trigger: A trigger can also contain INSERT, UPDATE and DELETE logic within itself; so when the trigger is fired because of data modification, it can also cause another data modification, thereby firing another trigger. A trigger that contains data modification logic within itself is called a nested trigger. (Read more here) Different Kinds of Triggers: 1) DML Trigger There are two types of DML Triggers 1.Instead of Trigger Instead of Triggers are fired in place of the triggering action such as an insert, update, or delete. 2. After Trigger After triggers execute following the triggering action, such as an insert, update, or delete. 2) DDL Trigger This type of trigger is fired against Drop Table, Create Table, Alter Table or Login events. DDL Triggers are always After Triggers. http://www.dotnettricks.com/learn/sqlserver/different-types-of-sql-server-triggers (Shows different types of triggers.)
What is a Linked Server?
Linked Servers is a concept in SQL Server by which we can add other SQL Server to a Group and query both the SQL Server databases using T-SQL Statements. With a linked server, you can create very clean, easy-to-follow SQL statements that allow remote data to be retrieved, joined and combined with local data. Stored Procedures sp_addlinkedserver, sp_addlinkedsrvlogin will be used to add new Linked Server. (Read more here)
Interview Tips
Answer with: 1. Definition 2. How I have used them Make sure I am asking questions when the opportunity presents itself. Take over. Make the interview more like a conversation. Talk slower and clearly and with charismatic confidence Focus on technical points and don't over explain it.*** Use examples after technical points, only if needed. -Ask "Would you like me to give you an example?" Simplify all of these definitions into something that gets straight to the point. Efficiency covering the major/important points. I am talking to highly skilled people so don't talk to them like they are new. Talk to them with technical information that proves I know what I am talking about! Sound Enthusiastic with good positive energy! Talk with confidence, even if it is slightly wrong! If you are not cinfident in it dont talk about it.
What languages BI uses to achieve the goal?
BI uses following languages for achieve the Goal. MDX - Multidimensional Expressions: This language is used for retrieving data from SSAS cubes. It looks very similar to T-SQL, but it is very different in the areas of conceptualization and implementation. DMX - Data Mining Extensions: This is again used for SSAS, but rather than cubes it is used for data mining structures. This language is more complicated than MDX. Microsoft has provided many wizards in its BI tools, which further reduced number of experts for learning this language, which deals with data mining structures. XMLA - XML for Analysis: This is mainly used for SSAS administrative tasks. It is quite commonly used in administration tasks such as backup or restore database, copy and move database, or for learning Meta data information. Again, MS BI tools provide a lot of wizards for the same. (Read More Here)
User defined function vs stored procedure?
Benefits of Stored Procedures 1. Precompiled execution: SQL Server compiles each Stored Procedure once and then re utilizes the execution plan. This results in tremendous performance boosts when Stored Procedures are called repeatedly. 2. Reduced client/server traffic: If network bandwidth is a concern in your environment then you'll be happy to learn that Stored Procedures can reduce long SQL queries to a single line that is transmitted over the wire. 3. Efficient reuse of code and programming abstraction: Stored Procedures can be used by multiple users and client programs. If you utilize them in a planned manner then you'll find the development cycle requires less time. 4. Enhanced security controls: You can grant users permission to execute a Stored Procedure independently of underlying table permissions. Benefits of User Defined Functions 1. They allow modular programming: You can create the function once, store it in the database, and call it any number of times in your program. User Defined Functions can be modified independently of the program source code. 2. They allow faster execution: Similar to Stored Procedures, Transact-SQL User Defined Functions reduce the compilation cost of Transact-SQL code by caching the plans and reusing them for repeated executions. This means the user-defined function does not need to be reparsed and reoptimized with each use resulting in much faster execution times. CLR functions offer significant performance advantage over Transact-SQL functions for computational tasks, string manipulation, and business logic. Transact-SQL functions are better suited for data-access intensive logic. 3. They can reduce network traffic: An operation that filters data based on some complex constraint that cannot be expressed in a single scalar expression can be expressed as a function. The function can then invoked in the WHERE clause to reduce the number or rows sent to the client. UDF 1. Function Must Return a value 2. Will allow only a Select statement, it will not allow the use of DML statements 3. It will allow only input parameters, doesn't support output parameter. 4. It will not allow us to use Try Catch Blocks 5. Transactions are not allowed 6. We can use only table variable, it will not allow using temporary tables 7. Stored Procedures can't be called from a function 8. Functions can be called from a select statement 9. A UDF can be used in join clause as a result set SP 1. Stored Procedures may or may not return values 2. Can have select statements as well as DML statements such as insert, update, and delete. 3. It can have both input and output parameters 4. For exception handling we can use try catch blocks 5. Can use transactions within stored procedures 6. Can use both table variables as well as temporary table in it. 7. Stored Procedures can call functions 8. Procedures can't be called from Select/Where/Having. Execute/Exec statement can be used to call Stored Procedure 9. Procedures can't be used in a Join Clause
Clustered vs Non-Clustered Index
Clustered Index: - A clustered index determines the order in which the rows of a table are stored on disk by physically storing the rows. - The retrieval will be much faster because the physical data stored on disk is sorted in the same order as the index. -Only one clustered index allowed -Primary Key -No Clustured index = Heap -A clustered index can be a disadvantage because any time a change is made to a value of an indexed column, the subsequent possibility of re-sorting rows to maintain order is a definite performance hit. Non-Clustered Index -A non-clustered index has no effect on which the order of the rows will be stored. -Non clustered indexes store both a value and a pointer to the actual row that holds that value. -A table can have multiple non-clustered indexes. But, a table can have only one clustered index. 4 important aspects of a good clustured index: Narrow - as narrow as possible, in terms of the number of bytes it stores Unique - to avoid the need for SQL Server to add a "uniqueifier" to duplicate key values Static - ideally, never updated Ever-increasing - to avoid fragmentation and improve write performance A clustered index determines the order in which the rows of the table will be stored on disk - and it actually stores row level data in the leaf nodes of the index itself. A non-clustered index has no effect on which the order of the rows will be stored. Using a clustered index is an advantage when groups of data that can be clustered are frequently accessed by some queries. This speeds up retrieval because the data lives close to each other on disk. Also, if data is accessed in the same order as the clustered index, the retrieval will be much faster because the physical data stored on disk is sorted in the same order as the index. A clustered index can be a disadvantage because any time a change is made to a value of an indexed column, the subsequent possibility of re-sorting rows to maintain order is a definite performance hit. A table can have multiple non-clustered indexes. But, a table can have only one clustered index. Non clustered indexes store both a value and a pointer to the actual row that holds that value. Clustered indexes don't need to store a pointer to the actual row because of the fact that the rows in the table are stored on disk in the same exact order as the clustered index - and the clustered index actually stores the row-level data in it's leaf nodes. http://www.programmerinterview.com/index.php/database-sql/clustered-vs-non-clustered-index/ https://www.simple-talk.com/sql/learn-sql-server/effective-clustered-indexes/
What is Log Shipping?
Log shipping is the process of automating the backup of database and transaction log files on a production SQL server and then restoring them onto a standby server. All Editions (except Express Edition) supports log shipping. In log shipping, the transactional log file from one server is automatically updated into the backup database on the other server. If one server fails, the other server will have the same db and can be used this as the Disaster Recovery plan. The key feature of log shipping is that it will automatically backup transaction logs throughout the day and automatically restore them on the standby server at defined intervals.
What are Different Types of Join?
Cross Join A cross join that does not have a WHERE clause produces the Cartesian product of the tables involved in the join. The size of a Cartesian product result set is the number of rows in the first table multiplied by the number of rows in the second table. The common example is when company wants to combine each product with a pricing table to analyze each product at each price. Inner Join A join that displays only the rows that have a match in both joined tables is known as inner Join. This is the default type of join in the Query and View Designer. Outer Join A join that includes rows even if they do not have related rows in the joined table is an Outer Join. You can create three different outer join to specify the unmatched rows to be included: Left Outer Join: In Left Outer Join, all the rows in the first-named table, i.e. "left" table, which appears leftmost in the JOIN clause, are included. Unmatched rows in the right table do not appear. Right Outer Join: In Right Outer Join, all the rows in the second-named table, i.e. "right" table, which appears rightmost in the JOIN clause are included. Unmatched rows in the left table are not included. Full Outer Join: In Full Outer Join, all the rows in all joined tables are included, whether they are matched or not. Self Join This is a particular case when one table joins to itself with one or two aliases to avoid confusion. A self join can be of any type, as long as the joined tables are the same. A self join is rather unique in that it involves a relationship with only one table. The common example is when company has a hierarchal reporting structure whereby one member of staff reports to another. Self Join can be Outer Join or Inner Join. (Read more here)
Describe the differences in the first through fifth normalization forms.
First: The domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain. Second: No non-prime attribute in the table is functionally dependent on a proper subset of any candidate key. Third: Every non-prime attribute is non-transitively dependent on every candidate key in the table. The attributes that do not contribute to the description of the primary key are removed from the table. In other words, no transitive dependency is allowed. Fourth: Every non-trivial multivalued dependency in the table is a dependency on a superkey. Fifth: Every non-trivial join dependency in the table is implied by the superkeys of the table.
What is a cursor, give an example, and explaint the commands needed for a basic cursor.
Cursors are database objects used to manipulate data in a set on a row-by-row basis; it acts just like a recordset in ASP and Visual Basic. We can also fetch cursor rows and perform operations on them in a loop just like using any looping mechanism found in any other programming language. Before you can use a cursor, you need to declare it. Cursor can also be defined as a set of rows together with a pointer that identifies a current row. Basic Commands to create a cursor -Declare -Open Cursor -Fetch -While Loop -Close Cursor -Deallocate (Memory) DECLARE cursor_name [INSENSITIVE] [SCROLL] CURSOR FOR select_statement [FOR {READ ONLY | UPDATE [OF column_name [,...n ] ] } ] [;] cursor_name: Itis the name of the Transact-SQL server cursor defined. It must conform to the rules for identifiers. INSENSITIVE: It is a cursor that makes a temporary copy of the data to be used by the cursor. All requests to the cursor are answered from this temporary table in tempdb; therefore, modifications made to base tables are not reflected in the data returned by fetches made to this cursor, and this cursor does not allow modifications. SCROLL: It specifies that all fetch options (FIRST, LAST, PRIOR, NEXT, RELATIVE, ABSOLUTE) are available. If SCROLL is not specified in an ISO DECLARE CURSOR, NEXT is the only fetch option supported. SCROLL cannot be specified if FAST_FORWARD is also specified. select_statement: It is a standard SELECT statement that defines the result set of the cursor. The keywords COMPUTE, COMPUTE BY, FOR BROWSE, and INTO are not allowed within select_statement of a cursor declaration. READ ONLY: It prevents updates made through this cursor. The cursor cannot be referenced in a WHERE CURRENT OF clause in an UPDATE or DELETE statement. This option overrides the default capability of a cursor to be updated. UPDATE [OF column_name [,...n]]: It defines updatable columns within the cursor. If OF column_name [,...n] is specified, only the columns listed allow modifications. If UPDATE is specified without a column list, all columns can be updated.
What is a data dictionary
Data Dictionary -Data that describes the database objects and also shows privileges in roles -These privileges are usually reserved for DBA's -SQL Server also uses Data Dictionary to verify SQL statements -I personally reference the data dictionary to find product information, instead of browsing the object explorer -2 Types (Passive = manual batch process / Active Automatically) In SQL Server the data dictionary is a set of database tables used to store information about a database's definition. The dictionary contains information about database objects such as tables, indexes, columns, datatypes, and views. The data dictionary is used by SQL server to execute queries and is automatically updated whenever objects are added, removed, or changed within the database. SQL Server uses the database dictionary to verify SQL statements. When you execute a SQL statement the DBMS (Database Management System) parses the statement and then determines whether the tables and fields you are referencing are valid. To do this quickly it references the data dictionary. Since the data dictionary contains the definition of the database, it is really good resource for you to use to gain information about the database. Data dictionaries are used by designers and developers to understand the structure of the database. You can think of the dictionary as an up-to-date reference document. http://www.essentialsql.com/what-is-a-data-dictionary/ ----------------------------------------------- In SQL Server the data dictionary is a set of database tables used to store information about a database's definition. The dictionary contains information about database objects such as tables, indexes, columns, datatypes, and views. It also shows privileges and roles. that can be used as auditing information to see who has been updating the tables. The data dictionary is used by SQL server to execute queries and is automatically updated whenever objects are added, removed, or changed within the database Usually only available to a DBA because it contains very important information that we don't want to get accidentally deleted or improperly updated. Passive Data Dictionary - has to be manually updated running a batch process. The information within a data dictionary is stored within tables and views so a simply SQL query can be used to view the information. Active Data Dictionary - Automatically updated whenever the database is accessed SQL statements are verified using the data dictionary. So when you execute a SQL statement the DBMS parses the statement and verify the tables and fields that you are referencing are valid and to do this efficiently it uses the Data Dictionary. Example - I had to find product information and instead of sorting through the object explorer one by one I just queried the information schema view.columns (Correct Syntax) How Do I create one? http://www.essentialsql.com/what-is-a-data-dictionary/ http://sqlmag.com/sql-server/building-dynamic-data-dictionaries
Describe Tuning Advisor
Database Engine Tuning Advisor is a utility that comes with SQL Server and can be used by both novice and experienced database administrators to get recommendations to improve the performance of SQL Server queries by making required physical structural changes. Based on your workload, Database Engine Tuning Advisor provides recommendations for best mix of indexes (clustered and non-clustered indexes) or indexed views, aligned or non-aligned partitions and required statistics. Before you begin using Database Engine Tuning Advisor, you first need to collect the workload (a set of SQL Server queries that you want to optimize and tune). You can use direct queries, trace files, and trace tables generated from SQL Server Profiler as workload input when tuning databases. We will demonstrate these different methods of using workload with Database Engine Tuning Advisor to tune the performance. Exaplain what is partitioning tables and updating statistics? http://www.databasejournal.com/features/mssql/getting-starting-with-database-engine-tuning-advisor-in-sql-server-part-1.html
Primary Keys, Foreign Keys, Unique Keys
Foreign keys are used to reference unique columns in another table. So, for example, a foreign key can be defined on one table A, and it can reference some unique column(s) in another table B. Why would you want a foreign key? Well, whenever it makes sense to have a relationship between columns in two different tables. It can accept null value. While unique and primary keys both enforce uniqueness on the column(s) of one table, foreign keys define a relationship between two tables. Primary key is a set of one or more fields/columns of a table that uniquely identify a record in database table. It can not accept null, duplicate values. Only one Candidate Key can be Primary Key. Unique key is a set of one or more fields/columns of a table that uniquely identify a record in database table. It is like Primary key but it can accept only one null value and it can not have duplicate values. http://www.programmerinterview.com/index.php/database-sql/differences-between-primary-and-foreign-keys/
Why use a UDF over a Stored Procedure?
Functions are computed values and cannot perform permanent environmental changes to SQL Server (i.e. no INSERT or UPDATE statements allowed). UDFs and stored procedures are both SQL Server objects that store one or more T-SQL statements in a single named, executable routine. Although you can often implement the same or similar functionality using either a UDF or a stored procedure, the code will look significantly different depending on which technique you choose. Here are the main differences between UDFs and stored procedures: A UDF must return a value-a single result set. A stored procedure can return a value-or even multiple result sets-but doesn't have to. You can use a UDF directly in a SELECT statement as well as in ORDER BY, WHERE, and FROM clauses, but you can't use a stored procedure in a SELECT statement. A UDF can't use a nondeterministic function such as GETDATE(), NEWID(), or RAND(), whereas a stored procedure can use such functions. A nondeterministic function is one that can return a different result given the same input parameters. A UDF can't change server environment variables; a stored procedure can. A UDF always stops execution of T-SQL code when an error occurs, whereas a stored procedure continues to the next instruction if you've used proper error handling code.
What is OLTP (Online Transaction Processing)?
In OLTP -(online transaction processing) systems, relational database design uses the discipline of data modeling and generally follows the Codd rules of data normalization in order to ensure absolute data integrity. Using these rules, complex information is broken down into its most simple structures (a table) where all of the individual atomic level elements relate to each other and satisfy the normalization rules.
What is a view and how/why do we use it?
In SQL, a view is a virtual table based on the result-set of an SQL statement. A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database. You can add SQL functions, WHERE, and JOIN statements to a view and present the data as if the data were coming from one single table. Views, which are kind of virtual tables, allow users to do the following: Structure data in a way that users or classes of users find natural or intuitive. Restrict access to the data such that a user can see and (sometimes) modify exactly what they need and no more. Summarize data from various tables which can be used to generate reports. 1. Views can hide complexity If you have a query that requires joining several tables, or has complex logic or calculations, you can code all that logic into a view, then select from the view just like you would a table. 2. Views can be used as a security mechanism A view can select certain columns and/or rows from a table, and permissions set on the view instead of the underlying tables. This allows surfacing only the data that a user needs to see. 3. Views can simplify supporting legacy code If you need to refactor a table that would break a lot of code, you can replace the table with a view of the same name. The view provides the exact same schema as the original table, while the actual schema has changed. This keeps the legacy code that references the table from breaking, allowing you to change the legacy code at your leisure.
Denormalization
In a relational database, denormalization is an approach to speeding up read performance (data retrieval) in which the administrator selectively adds back specific instances of redundant data after the data structure has been normalized. http://searchdatamanagement.techtarget.com/definition/denormalization Denormalization should not be done early, however. It is a last desperate resort that one should turn to only after exhausting all other options (like query optimization, improved indexing, and database system tuning, all of which will be discussed later in the book). Add back a certain relational column
What is the SQL CASE statement used for and give an example?
It allows you to embed an if-else like clause in the SELECT clause. SELECT Employee_Name, CASE Location WHEN 'Boston' THEN Bonus * 2 WHEN 'Austin' THEN Bonus * ,5 ELSE Bonus END "New Bonus" FROM Employee;
What are Pessimistic Lock and Optimistic Lock?
Optimistic Locking is a strategy where you read a record, take note of a version number and check that the version hasn't changed before you write the record back. If the record is dirty (i.e. different version to yours), then you abort the transaction and the user can re-start it. Pessimistic Locking is when you lock the record for your exclusive use until you have finished with it. It has much better integrity than optimistic locking but requires you to be careful with your application design to avoid Deadlocks.
How to backup and restore (What is the process)
Point in time restore?
What are the Properties and Different Types of Sub-Queries?
Properties of a Sub-Query A sub-query must be enclosed in the parenthesis. A sub-query must be put on the right hand of the comparison operator, and A sub-query cannot contain an ORDER BY clause, however sub-query can use ORDER BY when used with TOP clause. Read Comment by David Bridge A query can contain more than one sub-query. Types of Sub-query Single-row sub-query, where the sub-query returns only one row. Multiple-row sub-query, where the sub-query returns multiple rows, and Multiple column sub-query, where the sub-query returns multiple columns
What is referential integrity
Referential integrity means that the foreign key in any referencing table must always refer to a valid row in the referenced table. Referential integrity ensures that the relationship between two tables remains synchronized during updates and deletes. Referential Integrity enforces: 1. We may not add a record to the Employee Salary table unless the foreign key for that record points to an existing employee in the Employee table. 2. If a record in the Employee table is deleted, all corresponding records in the Employee Salary table must be deleted using a cascading delete. This was the example we had given earlier. 3. If the primary key for a record in the Employee table changes, all corresponding records in the Employee Salary table must be modified using what's called a cascading update.
Describe a Complex Stored Procedure that I have built
Review the one I built for Mortgage Project Refer to Ali's Code or watch the video
SQL Agent
SQL agent is an integral part of Microsoft's SQL Server. It runs as a windows service only and allows for the handling of a wide variety of tasks such as backup automation, database replication setup, job scheduling, user permissions and database monitoring. These tasks do not necessarily have to be related to SQL Server. For example, a daily backup job may be created to use a database backup to call an external program (e.g., WinZip) to compress the backup file result, and then relocate the file by invoking the MOVE command. SQL agent jobs are a series of steps that use a graphical user interface (GUI) wizard, allowing DBAs at every experience level to set up jobs comprised of a complex series of tasks. After setting up a job, the DBA can schedule an execution frequency; for example, it could be one-time only, daily, weekly or monthly.
Optimize a stored procedure
SQL profiler can be used to investigate slow running queries. You can capture and save data about each event to a file or table to analyze later. Recently I utilized SQL profiler for a slowley running procedure: -Set up event filters - RPC:Completed - SP:StmtCompleted - ShowPlan XML - Select Data Columns - DataBase Name - Duration -Ran trace - Analyzed the results and discovered a where statment that was pulling in too much information so I narrowed down the scope and this greatly sped up the Stored Procedure The RPC: Completed event fires after a stored procedure is executed as a remote procedure call. The SP:StmtCompleted event tells us when a statement within a stored procedure has completed. ShowPlan XML - This event displays the graphical execution plan of a query. http://blog.sqlauthority.com/2010/02/16/sql-server-stored-procedure-optimization-tips-best-practices/
SQL Profiler
SQL profiler can be used to investigate slow running queries. You can capture and save data about each event to a file or table to analyze later. Recently I utilized SQL profiler for a slowley running stored procedure: -Set up event filters - RPC:Completed - SP:StmtCompleted - ShowPlan XML - Select Data Columns - DataBase Name - Duration -Ran trace - Analyzed the results and discovered a "where" statement that was pulling in too much information so I narrowed down the scope and this greatly sped up the Stored Procedure The RPC: Completed event fires after a stored procedure is executed as a remote procedure call. The SP:StmtCompleted event tells us when a statement within a stored procedure has completed. ShowPlan XML - This event displays the graphical execution plan of a query. We can do the following using SQL Server Profiler -Create a trace -Watch the trace results as the trace runs -Store the trace results in a table -Start, stop, pause, and modify the trace results as necessary -Replay the trace results Use SQL Server Profiler to monitor only the events in which you are interested. https://www.simple-talk.com/sql/performance/how-to-identify-slow-running-queries-with-sql-profiler/ http://www.codeproject.com/Articles/21371/SQL-Server-Profiler-Step-by-Step
What are Different Types of Locks?
Shared Locks: Used for operations that do not change or update data (read-only operations), such as a SELECT statement. Update Locks: Used on resources that can be updated. It prevents a common form of deadlock that occurs when multiple sessions are reading, locking, and potentially updating resources later. Exclusive Locks: Used for data-modification operations, such as INSERT, UPDATE, or DELETE. It ensures that multiple updates cannot be made to the same resource at the same time. Intent Locks: Used to establish a lock hierarchy. The types of intent locks are as follows: intent shared (IS), intent exclusive (IX), and shared with intent exclusive (SIX). Schema Locks: Used when an operation dependent on the schema of a table is executing. The types of schema locks are schema modification (Sch-M) and schema stability (Sch-S). Bulk Update Locks: Used when bulk-copying data into a table and the TABLOCK hint is specified.
Having vs Where clause?
So we can see that the difference between the having and where clause in sql is that the where clause can not be used with aggregates, but the having clause can.
What are the different kinds of views?
Standard Views Combining data from one or more tables through a standard view lets you satisfy most of the benefits of using views. These include focusing on specific data and simplifying data manipulation. These benefits are described in more detail in Scenarios for Using Views. Indexed Views An indexed view is a view that has been materialized. This means it has been computed and stored. You index a view by creating a unique clustered index on it. Indexed views dramatically improve the performance of some types of queries. Indexed views work best for queries that aggregate many rows. They are not well-suited for underlying data sets that are frequently updated. For more information, see Designing Indexed Views. Partitioned Views A partitioned view joins horizontally partitioned data from a set of member tables across one or more servers. This makes the data appear as if from one table. A view that joins member tables on the same instance of SQL Server is a local partitioned view.
What is Standby Servers? Explain Types of Standby Servers.
Standby Server is a type of server that can be brought online in a situation when Primary Server goes offline and application needs continuous (high) availability of the server. There is always a need to set up a mechanism where data and objects from primary server are moved to secondary (standby) server. This mechanism usually involves the process of moving backup from the primary server to the secondary server using T-SQL scripts. Often, database wizards are used to set up this process. Different types of standby servers are given as follows: 1) Hot Standby: Hot Standby can be achieved in the SQL Server using SQL Server 2005 Enterprise Edition and the later enterprise versions. SQL Server 2005 has introduced Mirroring of database that can be configured for automatic failover in a disaster situation. In the case of synchronous mirroring, the database is replicated to both the servers simultaneously. This is a little expensive but provides the best high availability. In this case, both primary and standby servers have same data all the time. 2) Warm Standby: In Warm Standby, automatic failover is not configured. This is usually set up using Log Shipping or asynchronous mirroring. Sometimes warm standby is lagging by a few minutes or seconds, which results into loss of few latest updates when the primary server fails and secondary server needs to come online. Sometimes a warm standby server that is lagging by a few transactions is brought back to the current state by applying the recent transaction log. 3) Cold Standby: Code Standby servers need to be switched manually, and sometimes all the backups as well as the required OS need to be applied. Cold Standby just physically replaces the previous server. (Read more here)
What are Statistics
Statistics for query optimization are objects that contain statistical information about the distribution of values in one or more columns of a table or indexed view. The query optimizer uses these statistics to estimate the cardinality, or number of rows, in the query result. These cardinality estimates enable the query optimizer to create a high-quality query plan. The query optimizer uses statistics to create query plans that improve query performance. https://msdn.microsoft.com/en-us/library/ms190397.aspx
What does SQL stand for?
Structured Query Language
What is subquery? Explain the Properties of a Subquery?
Subqueries are often referred to as sub-selects as they allow a SELECT statement to be executed arbitrarily within the body of another SQL statement. A subquery is executed by enclosing it in a set of parentheses. Subqueries are generally used to return a single row as an atomic value although they may be used to compare values against multiple rows with the IN keyword. A subquery is a SELECT statement that is nested within another T-SQL statement. A subquery SELECT statement if executed independently of the T-SQL statement, in which it is nested, will return a resultset. This implies that a subquery SELECT statement can stand alone, and it does not depend on the statement in which it is nested. A subquery SELECT statement can return any number of values and can be found in the column list of a SELECT statement, and FROM, GROUP BY, HAVING, and/or ORDER BY clauses of a T-SQL statement. A subquery can also be used as a parameter to a function call. Basically, a subquery can be used anywhere an expression can be used. (Read more here)
How do you grant or revoke permissions in SQL
Syntax GRANT privileges ON object TO user; https://www.techonthenet.com/sql_server/grant_revoke.php
Difference between ISNULL and Coalesce
The COALESCE and ISNULL T-SQL functions are used to return the first nonnull expression among the input arguments. The ISNULL() function is used to replace NULL with the specified replacement value. This function contains only two arguments. Syntax ISNULL (check_exp, change_value) Coalesce() Function The Coalesce() function returns the first non-null value among its arguments. This function doesn't limit the number of arguments, but they must all be of the same data type. Syntax COALESCE ( expression [ ,...n ] ) COALESCE() function is equivalent to the following CASE expression.CASE WHEN (exp1 IS NOT NULL) THEN exp1 WHEN (exp2 IS NOT NULL) THEN exp2 ... ELSE expN Differences Between IsNull() and Coalesce() Functions 1. The COALESCE() function is based on the ANSI SQL standard whereas ISNULL function is a Transact-SQL function. 2. An expression involving ISNULL with non-null parameters is considered to be NOT NULL, while expressions involving COALESCE with non-null parameters is considered to be NULL. 3. The ISNULL() function contains only two parameters. The COALESCE() function contains multiple parameters. If we use more than two parameters with the ISNULL function then we must use nested ISNULL functions. 4. The ISNULL() function looks at the first value and the second parameter value is automatically limited to that length but COALESCE() does not have this restriction. 5. The ISNULL() function contains various types of parameters. The COALESCE() function doesn't limit the number of arguments, but they must all be of the same data type. http://www.c-sharpcorner.com/uploadfile/rohatash/differences-between-isnull-and-coalesce-functions-in-sql/
What is a query optimizer
The SQL Server Query Optimizer is a cost-based optimizer. It analyzes a number of candidate execution plans for a given query, estimates the cost of each of these plans and selects the plan with the lowest cost of the choices considered. Indeed, given that the Query Optimizer cannot consider every possible plan for every query, it actually has to do a cost-based balancing act, considering both the cost of finding potential plans and the costs of plans themselves. Therefore, it is the SQL Server component that has the biggest impact on the performance of your databases. the Query Optimizer devises the plan and then passes it along to the Execution Engine, which will actually execute the plan and get the results from the database. ----- A query optimizer is a critical database management system (DBMS) component that analyzes Structured Query Language (SQL) queries and determines efficient execution mechanisms. A query optimizer generates one or more query plans for each query, each of which may be a mechanism used to run a query. The most efficient query plan is selected and used to run the query. Database users do not typically interact with a query optimizer, which works in the background.
What is an SQL Server Agent?
The SQL Server agent plays an important role in the day-to-day tasks of a database administrator (DBA). It is often overlooked as one of the main tools for SQL Server management. Its purpose is to ease the implementation of tasks for the DBA, with its full-function scheduling engine, which allows you to schedule your own jobs and scripts.
Explain Statistics
The SQL Server query optimizer uses distribution statistics when creating a query plan. The statistics provide information about the distribution of column values across participating rows, helping the optimizer better estimate the number of rows, or cardinality, of the query results. The statistics provide the optimizer with clues on the best ways to process the query and consequently maximize performance. For example, the statistics might lead the optimizer to choose an index seek over an index scan, a potentially more resource-intensive operation. Without statics, it would be difficult for the optimizer to determine the best processing strategy in advance. Index Scan: Since a scan touches every row in the table, whether or not it qualifies, the cost is proportional to the total number of rows in the table. Thus, a scan is an efficient strategy if the table is small or if most of the rows qualify for the predicate. Index Seek: Since a seek only touches rows that qualify and pages that contain these qualifying rows, the cost is proportional to the number of qualifying rows and pages rather than to the total number of rows in the table.
Temp Table vs Table Variable
The advantages of using temp tables are: 1. Larger scope (session to session) 2. the data can be manipulated after creation and indexes can be added. 3. they can handle large data sets that will be changing and growing overtime because of the support use of statistics. 4. Supports transactions 5. Since the Temporary Tables are physical tables, while reading from the table, SQL Optimizer puts a read lock on the table. Advantages of using Table Variables 1. Uses less resources because it triggers less recompiles. 2. Useful for smaller data sets because it does not use statistics 3. Since it is not a part of the persisted data, they are handy for creating or storing data that ought to survive roll backs such as log entries. 4. Can be used in a UDF 5.The biggest problem with table variables is that statistics aren't maintained on the columns. This means that the query optimiser has to make a guess as to the size and distribution of the data 6.Transactions touching table variables only last for the duration of the update on the table variable, so there is less locking and logging overhead. Start With Examples Temp Table -Used in stored procedures when doing table joins -Practice queries instead of hitting the real table Temp Table -# Local -## Global -Active only on a session by session basis - Supports DDL and DML commands -Indexes can be added -Good for large datasets that will be manipulated over time -Supports Transactions Table Variable Example If I know the result set will be small and I dont have to worry about the table growing too much then I will use a table variable because it is better for perofrmance since it doesnt trigger recompiles and uses less resources. Table Variable -Lifespan is only the execution batch -Smaller Scope -Indexes can not be added after creation but and index can be implied upon creation -Useful for smaller data sets because it does not use statistics. (Research statistics) -Can be used in a user defined function (UDF) -Does not support Transactions or rollback because they are not part of the persisted database. -trigger less recompiles so it takes up less resources -Since the Table Variables are partially stored in the memory, they cannot be accessed by any other user or process that the current user. Therefore, no read lock is put on the Table Variable. Table variables require less locking resources as they are 'private' to the process that created them. Transaction rollbacks do not affect them because table variables have limited scope and are not part of the persistent database, so they are handy for creating or storing data that ought to survive roll backs such as log entries. The downside of table variables is that they are often disposed of before you can investigate their contents for debugging, or use them to try out different SQL expressions interactively. The biggest problem with table variables is that statistics aren't maintained on the columns. This means that the query optimiser has to make a guess as to the size and distribution of the data and if it gets it wrong, then you're going to see poor performance on joins: If this happens, there is little you can do other than to revert to using classic local temporary tables. http://sqlserverplanet.com/optimization/temp-table-recompiles (Understanding Recompiles) https://www.simple-talk.com/sql/t-sql-programming/temporary-tables-in-sql-server/
What is the Correct Order of the Logical Query Processing Phases?
The correct order of the Logical Query Processing Phases is as follows: 1. FROM 2. ON 3. OUTER 4. WHERE 5. GROUP BY 6. CUBE | ROLLUP 7. HAVING 8. SELECT 9. DISTINCT 10. TOP 11. ORDER BY
When is the use of UPDATE_STATISTICS command?
This command is basically used when a large amount of data is processed. If a large amount of deletions, modifications or Bulk Copy into the tables has occurred, it has to update the indexes to take these changes into account. UPDATE_STATISTICS updates the indexes on these tables accordingly.
Union VS Union All
UNION The UNION command is used to select related information from two tables, much like the JOIN command. However, when using the UNION command all selected columns need to be of the same data type. With UNION, only distinct values are selected. UNION ALL The UNION ALL command is equal to the UNION command, except that UNION ALL selects all values. The difference between Union and Union all is that Union all will not eliminate duplicate rows, instead it just pulls all rows from all tables fitting your query specifics and combines them into a table. A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results. Union Statement: SELECT CustomerKey FROM FactInternetSales UNION SELECT CustomerKey FROM DimCustomer ORDER BY CustomerKey; Union All Statement: SELECT supplier_id FROM suppliers UNION ALL SELECT supplier_id FROM orders ORDER BY supplier_id;
What is User-defined Functions? What are the types of User-defined Functions that can be created?
User-defined Functions allow defining its own T-SQL functions that can accept zero or more parameters and return a single scalar data value or a table data type. Different Types of User-Defined Functions created are as follows: Scalar User-defined Function A scalar user-defined function returns one of the scalar data types. Text, ntext, image and timestamp data types are not supported. These are the type of user-defined functions that most developers are used to in other programming languages. Inline Table-Value User-defined Function An Inline table-value user-defined function returns a table data type and is an exceptional alternative to a view as the user-defined function can pass parameters into a T-SQL select command and in essence provide us with a parameterized, non-updateable view of the underlying tables. Multi-Statement Table-Value User-defined Function A multi-statement table-value user-defined function returns a table, and it is also an exceptional alternative to a view as the function can support multiple T-SQL statements to build the final result where the view is limited to a single SELECT statement. Also, the ability to pass parameters into a T-SQL select command or a group of them gives us the capability to in essence create a parameterized, non-updateable view of the data in the underlying tables. Within the create function command, you must define the table structure that is being returned. After creating this type of user-defined function, It can be used in the FROM clause of a T-SQL command unlike the behavior encountered while using a stored procedure which can also return record sets. (Read here for example)
How do we use views for security?
Views are virtual tables that can be a great way to optimize your database experience. Not only are views good for defining a table without using extra storage, but they also accelerate data analysis and can provide your data extra security. 1. Views can hide complexity If you have a query that requires joining several tables, or has complex logic or calculations, you can code all that logic into a view, then select from the view just like you would a table. 2. Views can be used as a security mechanism A view can select certain columns and/or rows from a table, and permissions set on the view instead of the underlying tables. This allows surfacing only the data that a user needs to see. 3. Views can simplify supporting legacy code If you need to refactor a table that would break a lot of code, you can replace the table with a view of the same name. The view provides the exact same schema as the original table, while the actual schema has changed. This keeps the legacy code that references the table from breaking, allowing you to change the legacy code at your leisure.
What does it mean to have QUOTED_IDENTIFIER ON? What are the Implications of having it OFF?
When SET QUOTED_IDENTIFIER is ON, identifiers can be delimited by double quotation marks, and literals must be delimited by single quotation marks. When SET QUOTED_IDENTIFIER is OFF, identifiers cannot be quoted and must follow all T-SQL rules for identifiers.
Can a Stored Procedure call itself or a Recursive Stored Procedure? How many levels of SP nesting is possible?
Yes. As T-SQL supports recursion, you can write stored procedures that call themselves. Recursion can be defined as a method of problem solving wherein the solution is arrived at by repetitively applying it to subsets of the problem. A common application of recursive logic is to perform numeric computations that lend themselves to repetitive evaluation by the same processing steps. Stored procedures are nested when one stored procedure calls another or executes managed code by referencing a CLR routine, type, or aggregate. You can nest stored procedures up to 32 levels. Any reference to managed code from a Transact-SQL stored procedure counts as one level against the 32-level nesting limit. Methods invoked from within managed code do not count against this limit.
How to get row count using a merge statement
You could specify an OUTPUT clause on your MERGE statement and get an output report of what's been done during MERGE. DECLARE @tableVar TABLE (MergeAction VARCHAR(20), InsertedID INT, DeletedID INT) MERGE (targetTable) AS t USING (sourceTable) AS s ON t.ID = s.ID WHEN MATCHED THEN (some statements) WHEN NOT MATCHED THEN (some statements) OUTPUT $action, inserted.ID 'inserted', deleted.ID 'deleted' INTO @tableVar ; SELECT MergeAction, COUNT(*) FROM @tableVar GROUP BY MergeAction http://stackoverflow.com/questions/1268491/sql-server-2008-merge-best-way-to-get-counts
Describe a Merge Statement and give an example
you can perform insert, update, or delete operations in a single statement using the MERGE statement. The MERGE statement allows you to join a data source with a target table or view, and then perform multiple actions against the target based on the results of that join. For example, you can use the MERGE statement to perform the following operations: Conditionally insert or update rows in a target table. -If the row exists in the target table, update one or more columns; otherwise, insert the data into a new row. Synchronize two tables. -Insert, update, or delete rows in a target table based on differences with the source data. The MERGE syntax consists of five primary clauses: 1. The MERGE clause specifies the table or view that is the target of the insert, update, or delete operations. 2. The USING clause specifies the data source being joined with the target. 3. The ON clause specifies the join conditions that determine where the target and source match. 4. The WHEN clauses (WHEN MATCHED, WHEN NOT MATCHED BY TARGET, and WHEN NOT MATCHED BY SOURCE) specify the actions to take based on the results of the ON clause and any additional search criteria specified in the WHEN clauses. 5. The OUTPUT clause returns a row for each row in the target that is inserted, updated, or deleted. It is important to understand how the source and target data are merged into a single input stream and how additional search criteria can be used to correctly filter out unneeded rows. Otherwise,you might specify the additional search criteria in a way that produces incorrect results. Depending on the WHEN clauses specified in the statement, the input row might be any one of the following: 1. A matched pair consisting of one row from the target and one from the source. This is the result of the WHEN MATCHED clause. 2. A row from the source that has no corresponding row in the target. This is the result of the WHEN NOT MATCHED BY TARGET clause. 3. A row from the target that has no corresponding row in the source. This is the result of the WHEN NOT MATCHED BY SOURCE clause.