SQL Server Architecture

Ace your homework & exams now with Quizwiz!

What is the FORMAT function?

The FORMAT function, as the name states, returns a formatted value: DECLARE @myMoney decimal(5,2) = 10.52; SELECT FORMAT(@myMoney, 'C', 'en-US'); Result: MyMoney ---------------------------- $10.52

Design Best Practices

1. Create covering indexes - where all the data of the query is set at the leaf level itself. The most selective column should come first. Make sure this has all the columns needed for any SELECT, WHERE, ORDER BY, JOIN. 2. Remove unwanted indexes. This is potentially a HUGE performance gain. 3. For insert-heavy tables, use identity columns. 4. Use the proper fillfactor for indexes - 80-90 in general. 5. Keep your code in source control!

What's the best way to begin setting up governance with your SQL Server environment?

1. Make a map of your existing environments 2. Create a description of what your new environment should look like 3. Document it so its clear and can be passed on to managers and vendors 4. Document system management procedures for most changes 5. Create system reports to show changes

How would you handle SQL injection?

1. Sprocs 2. Sanitizing input 3. Escaping quotes

Describe normalization rules in simple terms.

1NF means that all columns are atomic - every row is unique - and there's nothing weird like arrays or comma delimited values. 2NF means all non-key columns must represent a fact about the key, the whole key, and nothing but the key. The key word here is atomic. • Columns - make sure every column represents only one value. • Table/Row Uniqueness - one row represents one thing, and that thing isn't represented elsewhere. • Columns are either part of a key or describe the row identified by the key. • Keys always represent a single expression.z

How would you go about fixing situation with conflicting or erroring ETL jobs?

A lot of times companies try to fix ETL problems with short term fixes - hardware, change of product, turnkey solution, partitioning db's. Instead, 1. monitor and gather a dependency list 2. get a baseline of long-running problem queries - and begin fixing "top 5" issues 3. Include management on monitoring reports 4. Define threads and processing times 5. Implement batch system control Basically it goes like this - when there's problems with batch processes overlapping or timing out, companies typically try the following: 1. Can we 'fix' this with more hardware? (this is expensive, and a temporary fix) 2. Can we change the product - for example from SQL to Oracle? (this only work with a redesign. Usually the product isn't the issue!) 3. Turnkey solution (expensive and typically doesn't address the root issue) 4. Split systems into parts, each on isolated servers (hard to synchronize running jobs across your now-separate systems) Instead of these knee-jerk reactions, there's a better way - both strategic and tactical - to fix this: • Tactical o Gather statistics - how long is a process likely to take? o Get a baseline of long-running problem queries • Strategic o Determine processing time windows o Create a task list in Excel - and include internal/external dependencies o Monitor the system - and include management in monitoring reports o Define threads - tasks in sequence - and processing times o Implement batch system control that runs off your table of tasks/dependencies

How does Using a Separate Hard Drive for Several Database Objects Improves Performance Right Away?

A non-clustered index and tempdb can be created on a separate disk to improve performance.

What are hash spills?

Hash spills - portions of both table are written to tempdb and read back - memory is many times faster than disk, so that's bad for performance. Use sys.dm_db_session_space_usage to detect. To correct - 1. Prevent spills caused by memory grant misestimates 2. Minimize the operations that have to spill 3. Tune the tempdb to handle spills

What advantages does release management offer the DBA?

Release management offers advantages like reduced risk, repeatability, perform successful changes, follow best practices, architect/design changes.

What is an SQL Profiler?

SQL Profiler is a graphical tool that allows system administrators to monitor events in an instance of Microsoft SQL Server. You can capture and save data about each event to a file or SQL Server table to analyze later. For example, you can monitor a production environment to see which stored procedures are hampering performances by executing very slowly. Use SQL Profiler to monitor only the events in which you are interested. If traces are becoming too large, you can filter them based on the information you want, so that only a subset of the event data is collected. Monitoring too many events adds overhead to the server and the monitoring process and can cause the trace file or trace table to grow very large, especially when the monitoring process takes place over a long period of time.

How will you Handle Error in SQL SERVER 2008?

SQL Server now supports the use of TRY...CATCH constructs for providing rich error handling. TRY...CATCH lets us build error handling at the level we need, in the way we need to by setting a region where if any error occurs, it will break out of the region and head to an error handler. The basic structure is as follows: BEGIN TRY <code> END TRY BEGIN CATCH <code> END CATCH So if any error occurs in the TRY block, then execution is diverted to the CATCH block, and the error can be resolved.

How would you find the selectivity of an index?

Selectivity is the # of distinct index values - a low density means high selectivity, and a high density means low selectivity - bad news for indexing. To identify top queries - 1. Capture all queries with a profiler trace. Don't filter it - capture all. 2. Pull it into a Profiler and run an aggregate query.

What are the different lock types and when would you use them?

Shared - use for SELECTS Update - first transaction typically reads data, places Shared (S) lock for the read, and then places an exclusive (X) lock. If two transactions are trying to set X locks, an issue! Setting an Update (U) lock can help avoid this. Exclusive - X - used for crud operations.

What are some hardware rules to remember for SQL server?

Tempdb should be on a RAID10 if at all possible; RAID1 might be a good second choice. Definitely not RAID5 (computation of parity script), and not RAID 0 (no redundancy on failure). Putting tempDB on a separate, solid state drive would be an excellent idea.

How can you track logins in your environment?

You can create a table in msdb to capture and monitor logins. You can map versions, editions, features, naming standards for servers and databases, and the status of your patching.

What's a good alternative to a covering index?

You can create a filtered index (example, in create statement of index put it where ReasonID is not null) - this will be a good alternative to a covering index - if it's a large table, and the covering index is relatively small, it's good practice.

When would you use index hints?

You CAN do this - but why? Query Optimizer does its job really well.. let it. Think - what happens when you add more rows? What happens when you have to remove it? (look at Estimated Subtree cost on the root SELECT statement to compare exec times.)

What's with the Activity Montitor?

You can R-click on the instance, and select Activity Monitor. See graphs/tabs for Processor Time, Waiting Tasks, Database I/O, Recent Requests, Recent most Expensive Queries.

What are best practices with udf's?

• Choose inline table-valued functions over multistatement table-valued functions whenever possible. • Even if it looks like you need a scalar function, write it as an inline table-valued function; avoid scalar functions wherever possible. • If you need a multistatement table-valued function, check to see if a stored procedure might be the appropriate solution. This might require a broader look at query structure, but it's worth taking the time to do it.

What different levels of isolation levels are there? What's the default?

1. READ COMMITTED 2. READ UNCOMMITTED (careful, data could not updated here) 3. REPEATABLE READ - (locks for longer period of time, shared locks held until current transaction terminates) 4. SERIALIZABLE - strictest. Locked until terminates. Isolation levels are really about how long shared locks on data being read are kept. But as Lieven already mention: those are NOT about preventing "corruption" in the database - those are about preventing readers and writers from getting in each other's way. First up: any write operation (INSERT, UPDATE) will always require an exclusive lock on that row, and exclusive locks are not compatible with anything else - so if a given row to be updated is already locked, any UPDATE operation will have to wait - no way around this. For reading data, SQL Server takes out shared locks - and the isolation levels are about how long those are held. The default isolation level (READ COMMITTED) means: SQL Server will try to get a shared lock on a row, and if successful, read the contents of the row and release that lock again right away. So the lock only exists just for the brief period of the row being read. Shared locks are compatible to other shared locks, so any number of readers can read the same rows at the same time. Shared locks however prevent exclusive locks, so shared locks prevent mostly UPDATE on the same rows. And then there's also the READ UNCOMMITTED isolation level - which basically takes out no locks; this means, it can also read rows that are currently being updated and exclusively locked - so you might get non-committed data - data that might not even really end up in the database in the end (if the transaction updating it gets rolled back) - be careful with this one! The next level up is REPEATABLE READ, in which case the shared locks once acquired are held until the current transaction terminates. This locks more rows and for a longer period of time - reads are repeatable since those rows you have read are locked against updates "behind your back". And the ultimate level is SERIALIZABLE in which entire ranges for rows (defined by your WHEREclause in the SELECT) are locked until the current transaction terminates.

What's the proper usage of views in SQL Server?

1. Take out complex joins 2. Naming conventions and filter columns/rows not of interest 3. Consolidate across environments 4. Security Views - usage ranges from abstinence to overuse. The Information Architectural Principle states that information must be made available in a usable format. That's what views DO. An abstraction layer is used for ad hoc queries and reports - not for updates or a transactional layer. So, use it for the following: • Flatten complex joins and some complex aggregate queries • Use aliases to change cryptic column names, generic naming conventions, and filter - including only columns of interest • If the view selects data from a range, use a UDF that accepts a parameter. • No hardcoded values - use calculations. Think WHERE clauses. • Consolidate data across environments • Can be used for security - where you grant access to the views not the base objects themselves

Fillfactor best practices

1. Use index optimizer to reorganizie indexes and update statistics (this is a third party open-source package) - check on indexes that are highly fragmented. 2. Don't set system-wide fillfactor, it'll hurt worse than it'll help. The default of 0 (or 100, same diff) is fine. 3. Change highly fragmented indexes gradually. Bump it down gradually 5% at a time and monitor for 1 week. Remember, BOL says a fillfactor of 50% can cause READ performance to drop by 2x. Wow!

General SQL best practices

1. Use stored procedures. This offers reusability, security, execution plan reusability, protects against SQL injection, and better readability. 2. Use a fully qualified name for objects 3. Avoid using scalar function in SELECTS - this acts like a cursor. Use inline or multiline UDF's instead. 4. Don't mix DML and DDL language on a temp table - this causes a recompile. For example, CREATE/ALTER. 5. Don't use SELECT *. Specify column names. 6. Avoid usage of hints. These behave differently than your original use case as the table grows. 7. Don't use table variables except for small resultsets - use a temp table. 8. SET NOCOUNT ON in your sprocs - this doesn't return the # of rows affected, which is usually unnecessary. 9. USE EXISTS vs IN - use IN for small tables, EXISTS for large. 10. Keep your transactions as short as possible, and don't allow user input mid-transaction. 11. Use db's for what they're meant for. String processing, ordering conversion, numbering rows, etc can easily be done in the business layer - not in the DAL. 12. Don't use a function in a SELECT statement. 13. Don't use EXEC('string here') - use sp_executeSQL instead. This prevents dynamic injection and is more efficient. 14. Don't ever use sp_ for your sproc names. 15. Avoid using LIKE %XXX% in your where clauses - this will cause a full table scan. Go for full text indexing instead.

How would you avoid deleting data in a db?

1. a bit flag Deleted column (perf killer on queries) 2. cascading logical deletes (complex to code, difficult to maintain) 3. INSTEAD OF trigger that copies deleted rows to an archive table. (best) Some database developers choose to completely avoid deleting data. Instead, they build systems to remove the data from the user's view while retaining the data for safekeeping like dBase did. This can be done in several different ways: • A logical-delete bit flag, or nullable MomentDeleted column is added to each row. The bit flag is set to zero by default. When the row is deleted it is set to 1, marking the row as logically deleted. The MomentDeleted is a datetime column that is null initially. When the row is deleted the column is updated to the date and time of the delete. This makes deleting or restoring a single row a straightforward matter of setting or clearing a bit. However, because a relational database involves multiple related tables, there's more work to it than that. All queries must check the logical-delete flag and filter out logically deleted rows. This means that a bit column (with extremely poor selectivity) is probably an important index for every query. Although SQL Server's filtered indexes are a perfect fit, it's still a performance killer. • The cascading logical deletes method is complex to code and difficult to maintain. This is a case of complexity breeding complexity, so the authors no longer recommend this method. • Another alternative to physically deleting rows is to archive the deleted rows in an archive or audit table. This method is best implemented by an INSTEAD OF trigger that copies the data to the alternative location and then physically deletes the rows from the production database. This method offers several advantages. Data is physically removed from the database, so there's no need to artificially modify SELECT queries or index on a bit column. Physically removing the data enables SQL Server referential integrity to remain in effect. In addition, the database is not burdened with unnecessary data. Retrieving archived data remains relatively straightforward and can be easily accomplished with a view that selects data from the archive location.

What is a cursor

A cursor is a database object used by applications to manipulate data in a set on a row-by-row basis, instead of the typical SQL commands that operate on all the rows in the set at one time. In order to work with a cursor, we need to perform some steps in the following order: • Declare cursor • Open cursor • Fetch row from the cursor • Process fetched row • Close cursor • Deallocate cursor (Read more here)

What is a trigger?

A trigger is a SQL procedure that initiates an action when an event (INSERT, DELETE or UPDATE) occurs. Triggers are stored in and managed by the DBMS. Triggers are used to maintain the referential integrity of data by changing the data in a systematic fashion. A trigger cannot be called or executed; DBMS automatically fires the trigger as a result of a data modification to the associated table. Triggers can be considered to be similar to stored procedures in that both consist of procedural logic that is stored at the database level. Stored procedures, however, are not event-drive and are not attached to a specific table as triggers are. Stored procedures are explicitly executed by invoking a CALL to the procedure while triggers are implicitly executed. In addition, triggers can also execute stored procedures. Nested Trigger: A trigger can also contain INSERT, UPDATE and DELETE logic within itself; so when the trigger is fired because of data modification, it can also cause another data modification, thereby firing another trigger. A trigger that contains data modification logic within itself is called a nested trigger. (Read more here)

How is ACID property related to Database?

ACID (an acronym for Atomicity Consistency Isolation Durability) is a concept that Database Professionals generally look for while evaluating databases and application architectures. For a reliable database, all this four attributes should be achieved. Atomicity is an all-or-none proposition. Consistency guarantees that a transaction never leaves your database in a half-finished state. Isolation keeps transactions separated from each other until they are finished. Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination. (Read more here)

What's Change Tracking? How does it differ from CDC?

All Change Tracking does is say to the world: "This row was changed; here's the PK." Clean and simple, no fuss, no muss. It's easy to configure and query. It's both a database property and a table property. Although Change Data Capture is limited to only the Enterprise Edition, Change Tracking is available in all the SQL Server editions, even SQL Server Express. Where change data capture shines is in gathering data for ETL from a high-traffic OLTP database to a data warehouse. Of the possible options, change data capture has the least performance hit and does a great job to provide the right set of data for the Business Intelligence ETL (extract-transform-load). When you think big-dollar BI, think CDC.

How do clustered indexes work with the b-tree?

All the data for the table is stored in the leaf pages of a clustered index. So, for clustered indexes, after you make it to the leaf level, you've got all you need right there - there's no need for further searches. A nonclustered index is a separate data structure from the base table.

What are the best features of SQL Server 2012?

AlwaysOn - this is where you can represent a group of databases as a single entity (an Availability Group). Huge improvement in availability and recovery here. We can failover a group of databases as a unit - using one or more local or remote copies. Contained Databases - easier to migrate a database to a new instance now. A database is rarely independent - there's linked servers, sql agent jobs, collation differences. Now they're more bundled and easier to migrate. Column Store - this is targeted at DW. A covering index that includes multiple columns - but splits it up and stores them on their own set of pages. Not good for OUTER joins, but can improve query performance by 100% or more. READ only. Some T-SQL enhancements - including Window Function that Itzik Ben-Gan is thrilled about. Show Plan enhancements, Powershell (more cmdlets), Big Data support and Hadoop, and Powerview (mashups of BI reports)

What is the Difference between Index Seek vs. Index Scan?

An index scan means that SQL Server reads all the rows in a table, and then returns only those rows that satisfy the search criteria. When an index scan is performed, all the rows in the leaf level of the index are scanned. This essentially means that all the rows of the index are examined instead of the table directly. This is sometimes compared to a table scan, in which all the table data is read directly. However, there is usually little difference between an index scan and a table scan. An index seek, on the other hand, means that the Query Optimizer relies entirely on the index leaf data to locate rows satisfying the query condition. An index seek will be most beneficial in cases where a small percentage of rows will be returned. An index seek will only affect the rows that satisfy a query condition and the pages that contain these qualifying rows; in terms of performance, this is highly beneficial when a table has a very large number of rows.

What languages BI uses to achieve the goal?

BI uses following languages for achieve the Goal. MDX - Multidimensional Expressions: This language is used for retrieving data from SSAS cubes. It looks very similar to T-SQL, but it is very different in the areas of conceptualization and implementation. DMX - Data Mining Extensions: This is again used for SSAS, but rather than cubes it is used for data mining structures. This language is more complicated than MDX. Microsoft has provided many wizards in its BI tools, which further reduced number of experts for learning this language, which deals with data mining structures. XMLA - XML for Analysis: This is mainly used for SSAS administrative tasks. It is quite commonly used in administration tasks such as backup or restore database, copy and move database, or for learning Meta data information. Again, MS BI tools provide a lot of wizards for the same.

What's the Difference between a Primary Key and a Unique Key?

Both primary key and unique key enforce uniqueness of the column on which they are defined. But by default, the primary key creates a clustered index on the column, whereas unique key creates a non-clustered index by default. Another major difference is that primary key doesn't allow NULLs, but unique key allows one NULL only

What is CHECKPOINT Process in the SQL Server?

CHECKPOINT process writes all dirty pages for the current database to disk. Dirty pages are data pages that have been entered into the buffer cache and modified, but not yet written to disk.

What are Different Types of Collation Sensitivity?

Case sensitivity - A and a, B and b, etc. Accent sensitivity - a and á, o and ó, etc. Kana Sensitivity - When Japanese Kana characters Hiragana and Katakana are treated differently, it is called Kana sensitive. Width sensitivity - When a single-byte character (half-width) and the same character represented as a double-byte character (full-width) are treated differently, it is width sensitive. (Read more here)

What is Change Data Capture (CDC) in SQL Server 2008?

Change Data Capture (CDC) records INSERTs, UPDATEs, and DELETEs applied to SQL Server tables and makes a record available of what changed, where, and when, in simple relational 'change tables' rather than in an esoteric chopped salad of XML. These change tables contain columns that reflect the column structure of the source table you have chosen to track along with the metadata needed to understand the changes that have been made.

What's CDC?

Change Data Capture - Very BI-centric. It captures both DML and DDL commands. Data is stored in change tables. There's four drawbacks: cost (Enterprise edition), tlog (2x size), Disk space, and the necessity to create a SSIS package. Big deal! Any data written to the transaction log can be asynchronously captured using CDC from the transaction log after the transaction is complete, so it doesn't affect the original transaction's performance. CDC can track any data from the T-Log, including any DML insert, update, delete, and merge command, and DDL create, alter, and drop. Changes are stored in change tables — tables created by CDC with the same columns as the tracked tables plus a few extra CDC-specific columns. All the changes are captured, so CDC can return all the intermediate values or just the net changes. There are only a few drawbacks to CDC: • Cost: It requires Enterprise Edition. • T-Log I/O: The transaction log experiences about twice as much I/O because CDC reads from the log. • Disk space: Because CDC essentially stores copies of every transaction's data, without proper maintenance there's the potential that it can grow like a transaction log gone wild. • SSIS: You will probably need to set up an SSIS package for the data.

Give an example of a recursive query.

Common Table Expression (CTE) was introduced in SQL Server 2005 as an alternative to derived tables and an expansion of temporary results sets. A CTE is defined within the execution scope of a single SELECT, INSERT, UPDATE, and DELETE statement. Much like a derived table, a CTE is not stored as an object and lasts only for the duration of the query. CTEs can be self-referencing, enabling it to be referenced multiple times in the same query. Because it is self-referencing, it therefore creates recursion, thus a recursive CTE. A recursive CTE is one in which an initial CTE is repeatedly executed to return subsets of data until the complete result set is returned. A recursive CTE is a common use for querying and returning hierarchical data. An example of this is the HumanResources.Employee table in which there are two columns; EmployeeID and ManagerID. The following query shows the structure of a recursive CTE by returning a hierarchical list of employees. The list starts with the highest ranking employee. WITH DirectReports(ManagerID, EmployeeID, Title, EmployeeLevel) AS ( SELECT ManagerID, EmployeeID, Title, 0 AS EmployeeLevel FROM HumanResources.Employee WHERE ManagerID IS NULL UNION ALL SELECT e.ManagerID, e.EmployeeID, e.Title, EmployeeLevel + 1 FROM HumanResources.Employee AS e INNER JOIN DirectReports AS d ON e.ManagerID = d.EmployeeID ) SELECT ManagerID, EmployeeID, Title, EmployeeLevel FROM DirectReports ORDER BY ManagerID; GO

Linq - what is it, and sample code

Concepts are taken from Haskell, LISP. Allows programmer to query inline data as they could with standard SQL. It features lambda expressions, recursive processing over a sequence, and lazy evaluation. Sample: Var result = From b books Where b.QuantityInStock > 80 Select b; LINQ is a .NET concept - where you can write efficient query language in the language of your choice (I'm looking at you, C#) - offers full type safety and IDE support, including intellisense, debugging, and refactoring. C# and OO-languages are mostly stable now - classes, objects and methods are all pretty well understood. The two remaining worlds are relational databases and XML - this bridges the gap.

How do data types contribute to the modeling work?

Data types are often an afterthought - in fact, they're one of the most important choices you make, both from a data integrity and a performance standpoint. Constraints help protect the quality of your data. Getting indexing right eliminates a lot of rework tuning later. Data types by grouping: • Precise - int, tinyint, etc, and decimal/money. Use decimal versus money! • Approximate - float, real. Use for scientific apps where some rounding errors are OK. • Datetime - Use more standards-oriented datetime2 vs datetime/smalldatetime. • Character data - strings like char, varchar, etc. • Binary data - varbinary, not image! • There's also xml and a host of other types

What is Difference between DELETE and TRUNCATE Commands?

Delete command removes the rows from a table on the basis of the condition that we provide with a WHERE clause. Truncate will actually remove all the rows from a table, and there will be no data in the table after we run the truncate command. TRUNCATE • TRUNCATE is faster and uses fewer system and transaction log resources than DELETE. (Read all the points below) • TRUNCATE removes the data by deallocating the data pages used to store the table's data, and only the page deallocations are recorded in the transaction log. • TRUNCATE removes all the rows from a table, but the table structure, its columns, constraints, indexes and so on remains. The counter used by an identity for new rows is reset to the seed for the column. • You cannot use TRUNCATE TABLE on a table referenced by a FOREIGN KEY constraint. • Using T-SQL - TRUNCATE cannot be rolled back unless it is used in TRANSACTION. OR TRUNCATE can be rolled back when used with BEGIN ... END TRANSACTION using T-SQL. • TRUNCATE is a DDL Command. • TRUNCATE resets the identity of the table. DELETE • DELETE removes rows one at a time and records an entry in the transaction log for each deleted row. • DELETE does not reset Identity property of the table. • DELETE can be used with or without a WHERE clause • DELETE activates Triggers if defined on table. • DELETE can be rolled back. • DELETE is DML Command. • DELETE does not reset the identity of the table.

What are DMV's? Give some examples.

Dynamic management views must be used with Dynamic Management Functions. These are all in the sys schema - sys.dm_exec_sql_text for example. Some good ones: • Sys.dm_exec_connections - retrieve info about connections. • Sys.dm_exec_requests - currently executing requests. Like sp_who, on steroids. Use with dm_exec_requests to show blocked sessions. • Dm_tran_locks to show locks. • Dm_exec_sql_text + dm_exec_query_plan and _stats to locate performance issues - such as the top 10 performance challenges. You can take the xml output from this , bring it up in a browser, and save it as .SQLPLAN, and then load it in SQL query analyzer. • Dm_exec_requests to show what used to be sp_who2 and sp_blocker. • Sys.dm_db_missing_index_* - this shows the indexes the optimizer considers are missing. Remember - indexes should ALWAYS be justified. Since they impact ins/upd ops, it's a crime to have indexes be unused. You'll know an index is unused when user_seeks, user_scans, and user_loads = 0. There's a great code snippet - see the appendix - on how to find the top 1000 slow performaing queries. CPU is EVERYTHING for SQL server.

How can you get confidence with your governance strategy?

Good reporting is the key. Pull the 35 most resource-consuming queries, the most execution #'s, the most I/O, the most CPU usage, significant events, trends in disk usage.

What is Hadoop and what can it offer an enterprise?

Hadoop is a system of comonents for implementing the MapReduce algorithms in a scalable fashion. BI data tends to double every year. 85% of this is unstructured (video, RFID tags, etc). And data is coming in faster, more realtime - so there's more data, moving at a greatly faster velocity. HDInsight integrates with Active Directory - Microsoft hopes to leverage this for Azure. It integrates well with Powerview (data visualization) and PowerPivot (data modeling) - both Excel-based. Use BIDS to create OLAP cubes

What's HADOOP? MongoDB?

Hadoop is a system of comonents for implementing the MapReduce algorithms in a scalable fashion. BI data tends to double every year. 85% of this is unstructured (video, RFID tags, etc). And data is coming in faster, more realtime - so there's more data, moving at a greatly faster velocity. HDInsight integrates with Active Directory - Microsoft hopes to leverage this for Azure. It integrates well with Powerview (data visualization) and PowerPivot (data modeling) - both Excel-based. Use BIDS to create OLAP cubes. MongoDB is another way. It's an operational store, not an analytical datastore. It doesn't support multi-statement transactions or joins, for example. In terms of cross-store persistence - we're talking Spring, Rails, or Django here - these are application frameworks that run on top of multiple data stores. 2-3 data stores to consolidate is about the sweet spot, no more (from a management/maintainability standpoint). This allows you to persist some data/messages into NoSQL in-memory stores, others in classic RDBMS for analytical reports. Remember, security is minimal at present for all these noSQL vendors (Hadoop, Cassandra). Mongo keeps data in a key/value store - it uses this to retrieve documents. This maps very nicely to programming languages - the elimination of joins helps performance. This model is very HA, and offers easy scaling. BSON is the API.

How to Stop Log File Growing too Big?

If your Transaction Log file was growing too big and you wanted to manage its size, then instead of truncating transaction log file, you should choose one of the options mentioned below. 1) Convert the Recovery Model to Simple Recovery If you change your recovery model to Simple Recovery Model, then you will not encounter the extraordinary growth of your log file. However, please note if you have one long running transaction it will for sure grow your log file till the transaction is complete. 2) Start Taking Transaction Log Backup In this Full Recovery Model, your transaction log will grow until you take a backup of it. You need to take the T-Log Backup at a regular interval. This way, your log would not grow beyond some limits.

What is Data Compression?

In SQL SERVE 2008, Data Compression comes in two flavors: • Row Compression • Page Compression Row Compression Row compression changes the format of physical storage of data. It minimize the metadata (column information, length, offsets etc) associated with each record. Numeric data types and fixed-length strings are stored in variable-length storage format, just like Varchar. (Read more here) Page Compression Page compression allows common data to be shared between rows for a given page. It uses the following techniques to compress data: • Row compression. • Prefix Compression. For every column in a page, duplicate prefixes are identified. These prefixes are saved in compression information headers which resides after the page header. A reference number is assigned to these prefixes and that reference number is replaced where ever those prefixes are being used. Dictionary Compression Dictionary compression searches for duplicate values throughout the page and stores them in CI. The main difference between prefix and dictionary compression is that the former is only restricted to one column while the latter is applicable to the complete page.

What's best practice with using sprocs and EF?

In general it's best to use ORM tools like EF for direct-to-entity or code-first type approaches... use LINQ-to-entities for querying. However, if you must use sprocs (like for legacy db's, you CAN. See below. The Entity Framework allows you to use stored procedures in the Entity Data Model instead of, or in combination with, its automatic command generation. When designing your model, you can: specify that EF should use your stored procedures for inserting, updating, or deleting entities. create model functions to call query stored procedures that return entities or results that do not match any of the entities in your model. provide functions that call actions in the database but do not return data at all, such as performing a sweeping update across rows in a table. define a stored procedure within custom functions directly in your model without it existing in the database. map modification stored procedures to entities that are based on database views, eliminating EF's need for direct table access.

Indexing best practices

Indexing is a topic by itself. Beyond the standard best practices: • Use clustered indexes on PK's • Index on search columns • Use database-level fill factor • Use an index-level fill factor • Index on FK's

What is Log Shipping?

Log shipping is the process of automating the backup of database and transaction log files on a production SQL server and then restoring them onto a standby server. All Editions (except Express Edition) supports log shipping. In log shipping, the transactional log file from one server is automatically updated into the backup database on the other server. If one server fails, the other server will have the same db and can be used this as the Disaster Recovery plan. The key feature of log shipping is that it will automatically backup transaction logs throughout the day and automatically restore them on the standby server at defined intervals. (Courtney: Rhys)

What is MERGE Statement?

MERGE is a new feature that provides an efficient way to perform multiple DML operations. In previous versions of SQL Server, we had to write separate statements to INSERT, UPDATE, or DELETE data based on certain conditions, but now, using MERGE statement, we can include the logic of such data modifications in one statement that even checks when the data is matched, then just update it, and when unmatched, insert it. One of the most important advantages of MERGE statement is all the data is read and processed only once.

WHat's the difference between a dumb sproc and a smart sproc?

Many different philosophies exist on the appropriate use of stored procedures, but they all come down to two basic ideas: dumb stored procedures versus smart stored procedures. Organizational policy and team coding standards might dictate in advance how stored procedures are implemented, but it is worth looking at these two approaches. Some organizations require that all T-SQL called by an application must be encapsulated within a stored procedure. This results in a high number of stored procedures, many consisting of basic Create, Retrieve, Update, and Delete (CRUD) operations. These are "dumb" stored procedures. On the other hand, some organizations implement a stored procedure only when a process calls for multiple T-SQL statements, multiple result sets, complex processing, or parameterization. These are "smart" stored procedures. The goals of both approaches are similar: maintainability, security, performance, and consistency. Depending on how an organization and its application is structured and built, either approach may be appropriate. The key is consistency. Decide upfront how to implement stored procedures and stick to it.

What are the basic functions for master, msdb, model, tempdb and resource databases?

Master - engine msdb - sql agent, ssis, backups tempdb - temp objects model - template database resource - all system objects The master database holds information for all the databases located on the SQL Server instance, and it is the glue that holds the engine together. Because SQL Server cannot start without a functioning master database, you must administer this database with care. The msdb database stores information regarding database backups, SQL Agent information, DTS packages, SQL Server jobs, and some replication information such as for log shipping. The tempdb holds temporary objects such as global and local temporary tables and stored procedures. The model is essentially a template database used in the creation of any new user database created in the instance. The resource Database is a read-only database that contains all the system objects that are included in the SQL Server. SQL Server system objects such as sys.objects are physically persisted in the Resource database, but they logically appear in the sys schema of every database. The Resource database does not contain user data or user metadata.

What's MongoDB and how does it work?

MongoDB is another way. It's an operational store, not an analytical datastore. It doesn't support multi-statement transactions or joins, for example. In terms of cross-store persistence - we're talking Spring, Rails, or Django here - these are application frameworks that run on top of multiple data stores. 2-3 data stores to consolidate is about the sweet spot, no more (from a management/maintainability standpoint). This allows you to persist some data/messages into NoSQL in-memory stores, others in classic RDBMS for analytical reports. Remember, security is minimal at present for all these noSQL vendors (Hadoop, Cassandra). Mongo keeps data in a key/value store - it uses this to retrieve documents. This maps very nicely to programming languages - the elimination of joins helps performance. This model is very HA, and offers easy scaling. BSON is the API.

How would you set optimistic locking? Pessimistic locking?

Optimistic assumes values are unchanged. So, you can handle this with: 1. Datasets (click of an old/new values before updating 2. Timestamp - while updating check in old timestamp = new timestamp. If not, roll back. 3. Check old/new values Pessimistic really means Shared/Exclusive locking. (Update and Intent are more hybrid.)

What are Ranking Functions?

Ranking functions return a ranking value for each row in a partition. All the ranking functions are non-deterministic. The different Ranking functions are as follows: ROW_NUMBER () OVER ([<partition_by_clause>] <order_by_clause>) Returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition. RANK () OVER ([<partition_by_clause>] <order_by_clause>) Returns the rank of each row within the partition of a result set. DENSE_RANK () OVER ([<partition_by_clause>] <order_by_clause>) Returns the rank of rows within the partition of a result set, without any gaps in the ranking. (Read more here )

What about Deadlocks - how would you identify and kill them?

Run the query below to display a recent deadlock from Master. The data display is retrieved from historical tables populated every few minutes. You may need to wait a bit before the actual deadlock is displayed. --sys.event_log --Deadlock data displayed as XML in additonal_data SELECT * FROM sys.event_log WHERE database_name like 'AdventureWorks2012' AND event_type = 'deadlock' I would recommend using SQL Server Management Studio to query sys.event_log table to retrieve the deadlock details. It's a bit more user friendly then SQL Database Portal. If you're using the portal, here are the steps. In upper left corner, click your database, then select Master. Now click New Query and paste in the same sys.event_log DMV. ` Here is here it gets messy. Click the |> to select the ROW displaying the deadlock. When selected, you'll see the |> highlighted as shown below. Then CTRL-C to copy the row and paste into your favorite text editor like NotePad or TextPad.

What are Slowly Changing Dimensions (SCD)?

SCD is the abbreviation of slowly changing dimensions. SCD applies to cases where the attribute for a record varies over time. There are three different types of SCD. 1. SCD1: The new record replaces the original record. Only one record exists in database - current data. 2. SCD2: A new record is added into the customer dimension table. Two records exist in the database - current data and previous history data. 3. SCD3: The original data is modified to include new data. One record exists in database - new information is attached with old information in same row.

List four ways of viewing the code of a stored procedure

SSMS Sp_Helptext Sys.sql_modules system view SELECT Object_definition(N' sproc name'

What is the STUFF Function and How Does it Differ from the REPLACE Function?

STUFF function is used to overwrite existing characters using this syntax: STUFF (string_expression, start, length, replacement_characters), where string_expression is the string that will have characters substituted, start is the starting position, length is the number of characters in the string that are substituted, and replacement_characters are the new characters interjected into the string. REPLACE function is used to replace existing characters of all occurrences. Using the syntax REPLACE (string_expression, search_string, replacement_string), every incidence of search_string found in the string_expression will be replaced with replacement_string.

What two ways of indexing hierarchical data are there?

Short answer- depth-first or breadth-first. In a depth-first approach, rows in a subtree are stored near each other. For example, employee records are stored near their managers' record. In a breadth-first approach, rows for each level in the hierarchy are stored together. For example, managers and their direct reports are stored near each other.

What is COALESCE

The CASE expression shown above can be simplified using Transact SQL's (T-SQL) coalesce function. This function accepts a number of parameters, each a value that may or may not be NULL. It cycles through the provided items in the order they appear until a non-NULL value is encountered. This becomes the return value. If all of the arguments of the function are NULL, the return value is also NULL. We can recreate the previous query using coalesce for a briefer, more pleasing query, as shown below. Note that the three columns and zero are used in the function; if all of the mileage columns are NULL, the person's mileage will be reported as zero. SELECT VehicleOwner, COALESCE(CarMileage, MotorcycleMileage, BicycleMileage, 0) AS Mileage FROM VehicleMileage /* RESULTS VehicleOwner Mileage ------------ ------- Bob 9500 Sam 6500 Mel 1000 Jim 8475 Kat 3875 */

What is COALESCE?

The CASE expression shown above can be simplified using Transact SQL's (T-SQL) coalesce function. This function accepts a number of parameters, each a value that may or may not be NULL. It cycles through the provided items in the order they appear until a non-NULL value is encountered. This becomes the return value. If all of the arguments of the function are NULL, the return value is also NULL. We can recreate the previous query using coalesce for a briefer, more pleasing query, as shown below. Note that the three columns and zero are used in the function; if all of the mileage columns are NULL, the person's mileage will be reported as zero. SELECT VehicleOwner, COALESCE(CarMileage, MotorcycleMileage, BicycleMileage, 0) AS Mileage FROM VehicleMileage /* RESULTS VehicleOwner Mileage ------------ ------- Bob 9500 Sam 6500 Mel 1000 Jim 8475 Kat 3875 */

What's the CHOOSE function (new in SQL2012)

The CHOOSE function is also a logical operator, but it behaves in a somewhat different manner. CHOOSE is similar to an index in an array, assuming the array is a list of arguments. The syntax for the CHOOSE function is as follows: CHOOSE(index, val_1, val_2 [, val_n]) Index is a 1-based integer that acts as the index into the list of values. The corresponding values are the list of values that will be searched. The following script illustrates the use of the function: SELECT CHOOSE(3, 'Lions', 'Tigers', 'Bears') Chosen Result: Chosen ----------- Bears If you supply an integer value outside the bounds of the arry, NULL value is returned.

What is the CONCAT function?

The CONCAT function combines two or more strings into a single string: SELECT CONCAT('Patrick ','LeBlanc') Results; Result: Results --------------- Patrick LeBlanc

What's the IIF function (new in SQL2012)

The IIF function is a shorthand version for the CASE statement. The syntax for the IIF function is as follows: IIF(boolean_expression, true_value, false_value) The function behaves exactly as it does in Microsoft Excel. If the boolean expression evaluates to true, the first value is return, and if it evaluates to false, the second value is returned. The following script illustrates the use of the function: SELECT IIF(1=1, 'True', 'False') Condition Result Condition ------------- True

How would you encrypt a view?

The WITH ENCRYPTION option is another security feature. When views or stored procedures are created, the text can be retrieved through thesys.sql_modules and sys.syscomments system views. The code is therefore available for viewing. The view may contain a WHERE condition that should be kept confidential, or there may be some other reason for encrypting the code. The WITH ENCRYPTION option encrypts the code in the system tables, hides the code from sys.sql_modules and sys.syscomments, and prevents anyone from viewing the original code. Be careful with this option. When the code is encrypted, Management Studio can no longer produce a script to alter the view - and it can't be replicated or published.

What's the advantage of multistatement table-valued UDF's?

The advantage of multistatement table-valued UDF's are - it encapsulates complex multi-statement logic. The primary benefit of the multistatement table-valued, user-defined function is that complex result sets may be generated within code and then easily used with a SELECT statement. This enables you to build complex logic into a query and solve problems that would otherwise be difficult to solve without a cursor. Multistatement table-valued, user-defined functions use tempdb to pass the table variable to the calling query. For many applications this is not a concern, but for high-transaction applications, it could pose a performance problem. Consider incorporating the code directly into the calling stored procedure.

What about Log Shipping - and how does it vary from AlwaysOn?

The basic idea is that the transaction log, with its record of the most recent transactions, is regularly backed up and then sent, or shipped, to another server where the log backup is applied so that the server has a fresh copy of all the data from the primary server. Log Shipping doesn't require any special hardware or magic. It's relatively easy to set up and administer. Prior to SQL Server 2012, if you wanted the option to have a warm standby copy of a database that you could query from, you were mainly limited to setting up log shipping. With the introduction of AlwaysOn Availability Groups you have another fantastic high-availability solution option. With AlwaysOn Availability Groups you can not only have up-to-date, queryable copies of your database, but you can also take advantage of other features such as multiple failover options (automatic, planned manual, and forced manual), multiple database grouping, alternative availability modes (asynchronous-commit, synchronous-commit), and much more. Database mirroring (including AlwaysOn - mirroring is deprecated in 2012) enables you to maintain a copy of your production database that could potentially be completely synchronized on a separate server for failover if a failed production server or database occurs. Like log shipping, database mirroring provides high availability at the database level, but unlike log shipping, you can configure database mirroring to provide no data loss and automatic failover.

What is B-Tree?

The database server uses a B-tree structure to organize index information. B-Tree generally has following types of index pages or nodes: • Root node: A root node contains node pointers to only one branch node. • Branch nodes: A branch node contains pointers to leaf nodes or other branch nodes, which can be two or more. • Leaf nodes: A leaf node contains index items and horizontal pointers to other leaf nodes, which can be many.

What's the default collation set?

The default is case insensitive, accent sensitive - Latin1_General-C1_AS. Wouldn't want to have jsmith & JSmith have the same login name, for example.

What is PATINDEX

The following example finds the position at which the pattern ensure starts in a specific row of the DocumentSummary column in the Document table. USE AdventureWorks2012; GO SELECT PATINDEX('%ensure%',DocumentSummary) FROM Production.Document WHERE DocumentNode = 0x7B40; GO

What is PATINDEX?

The following example finds the position at which the pattern ensure starts in a specific row of the DocumentSummary column in the Document table. USE AdventureWorks2012; GO SELECT PATINDEX('%ensure%',DocumentSummary) FROM Production.Document WHERE DocumentNode = 0x7B40; GO

How would you hide information about a table from prying eyes in a view?

The front-end application or data access layer may request schema information, called metadata, along with the data when querying SQL Server. Typically, SQL Server returns schema information for the underlying tables, but the WITH VIEW METADATA option tells SQL Server to return schema information about the view, rather than the tables referenced by the view. This prohibits someone from learning about the table's schema and is useful when the view's purpose is to hide sensitive columns.

How would you find the previous or next rows in a query?

The true value in the new OVER() arguments is their capability to access previous and next rows. The following query introduces an additional modification to the PRECEDING argument that was defined earlier: USE AdventureWorks GO WITH YearlyCountryRegionSales AS ( SELECT [Group] AS CtryReg, Year(soh.OrderDate) OrYr, SUM(TotalDue) TotalDueYTD FROM Sales.SalesOrderHeader soh INNER JOIN Sales.SalesTerritory st ON soh.TerritoryID = st.TerritoryID GROUP BY [Group], Year(soh.OrderDate) ) SELECT CtryReg, OrYr, TotalDueYTD CurrentYearTotals, SUM(TotalDueYTD) OVER( PARTITION BY CtryReg ORDER BY OrYr ROWS BETWEEN 1 PRECEDING AND CURRENT ROW ) - TotalDueYTD PreviousYearTotals, SUM(TotalDueYTD) OVER( PARTITION BY CtryReg ORDER BY OrYr ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) - TotalDueYTD NextYearTotals FROM YearlyCountryRegionSales

What kind of confusion is caused because of concurrency?

There are 4 kinds of major problems caused because of concurrency, below table shows the details of the same. Problems Short description Explanation Dirty reads "Dirty Read" occurs when one transaction is reading a record, which is part of a half, finished work of other transaction. • User A and user B are seeing value as "5". • User B changes the value "5" to "2". • User A is still seeing the value as "5"....Dirty read has happened. Unrepeatable read In every data read if you get different values then it's an "Unrepeatable Read" problem. • User A is seeing value as "5". • User B changes the value"5" to "2". • User A refreshes see values "5", he is surprised....unrepeatable read has happened. Phantom rows If "UPDATE" and "DELETE" SQL statements does not affect the data then it can be "Phantom Rows" problem. • User A updates all value "5' to "2". • User B inserts a new record with value "2". • User A selects all record with value "2' if all the values have changed, he is surprised to still find value "2" records.....Phantom rows have been inserted. Lost updates "Lost Updates" are scenario where one updates which is successfully written to database is overwritten with other updates of other transaction. • User A updates all value form "5" to "2". • User B comes and updates all "2" values to "5". • User A has lost all his updates.

How to Optimize Stored Procedure Optimization?

There are many tips and tricks for the same. Here are few: • Include SET NOCOUNT ON statement. • Use schema name with object name. • Do not use the prefix "sp_" in the stored procedure name. • Use IF EXISTS (SELECT 1) instead of (SELECT *). • Use the sp_executesql stored procedure instead of the EXECUTE statement. • Try to avoid using SQL Server cursors whenever possible. • Keep the Transaction as short as possible. • Use TRY-Catch for error handling.

What two types of triggers are there?

There are two types of Triggers. 1) DML Trigger There are two types of DML Triggers 1.Instead of Trigger Instead of Triggers are fired in place of the triggering action such as an insert, update, or delete. 2. After Trigger After triggers execute following the triggering action, such as an insert, update, or delete. 2) DDL Trigger This type of trigger is fired against Drop Table, Create Table, Alter Table or Login events. DDL Triggers are always After Triggers.

What is the Difference between a HAVING clause and a WHERE clause?

They specify a search condition for a group or an aggregate. But the difference is that HAVING can be used only with the SELECT statement. HAVING is typically used in a GROUP BY clause. When GROUP BY is not used, HAVING behaves like a WHERE clause. Having Clause is basically used only with the GROUP BY function in a query, whereas WHERE Clause is applied to each row before they are part of the GROUP BY function in a query. (Read more here)

When is the use of UPDATE_STATISTICS command?

This command is basically used when a large amount of data is processed. If a large amount of deletions, modifications or Bulk Copy into the tables has occurred, it has to update the indexes to take these changes into account. UPDATE_STATISTICS updates the indexes on these tables accordingly.

What is WITH CHECK option?

This is where you create a select view - intended for reports - and someone's able to run an insertion against the view (underlying table). For the application, row-level security can be designed using user-access tables and stored procedures, but views can help enforce row-level security for ad hoc queries. Some developers employ views and the WITH CHECK OPTION as a way to provide row-level security—a technique called horizontally positioned views. As in the subcategory view example, they create a view for each subcategory and then give users security permission to the view that pertains to them. Although this method does achieve row-level security, it also has a high maintenance cost. Example: INSERT INTO vComponentsProductSubCats(ProductCategoryID, ProductSubCategory) VALUES(2, 'Bike Pedal'); Server: Msg 550, Level 16, State 1, Line 1 The attempted insert or update failed because the target view either specifies WITH CHECK OPTION or spans a view that specifies WITH CHECK

Are stored procedures faster because the execution plans are precompiled?

This used to be true - pre-SQL 2000. Now, all execution plans are cached - both inline SQL and sproc calls. There are some performance benefits to sprocs (esp for multi-statements, one call vs many) - but a cached exec plan isn't one of them. Note what it says below - dumb sprocs (CRUD only) vs smart sprocs (multiple result sets, parametrization) • Performance: Stored procedure execution is a server-side process. This means that no matter how complex the procedure, only one statement is sent over the network. An application that has embedded T-SQL code might have to use multiple round trips to the server to achieve the same result. The Performance Myth Many DBAs and developers, especially those who worked with SQL Server prior to SQL Server 2000 will say that stored procedures provide a performance benefit because the execution plan is cached and stored for reuse. The phrase "precompiled" might also come up. This is no longer true. As of SQL Server 2000, all execution plans are cached, regardless of whether they're the result of inline T-SQL or a stored procedure call. The name of the cache area was changed from Procedure Cache to Plan Cache with SQL Server 2000 to properly reflect what was actually happening. There are some performance benefits to stored procedures, but a cached execution plan is no longer one of them.

What are Isolation Levels?

Transactions specify an isolation level that defines the degree to which one transaction must be isolated from resource or data modifications made by other transactions. Isolation levels are described in terms of which concurrency side-effects, such as dirty reads or phantom reads, are allowed. Transaction isolation levels control the following: • Whether locks are taken when data is read, and what type of locks are requested. • How long the read locks are held. • Whether a read operation referencing rows modified by another transaction • blocks until the exclusive lock on the row is freed, • retrieves the committed version of the row that existed at the time the statement or transaction started, and • reads the uncommitted data modification.(

Triggers Best Practices

Triggers are quite valid where use is justified. Its great for fire and forget auditing, for example, without having to write procedural level code with every CRUD command on every table. The problem is, they're 'invisible'. It's easy to view relationships/sprocs - triggers fire invisibly and its hard to trace their effects. Avoid nested triggers - they can be nested up to 32x. Set this as a database option to not allow. Avoid recursion - either direct (where a trigger fires off the same trigger again - database option setting) or indirect (nested trigger, where trigger A fires trigger B, which fires off A again) - this is a nested triggers server option.

How would you pass in an array of integers into a sproc?

Two methods to pass a list as an input parameter are shown here. The first method is to construct a string that is suitable for an IN clause, and the second is to pass a list that is then turned into a table that can be included in a table join. If a list will be passed into the stored procedure using a string, you must use Dynamic SQL to construct the T-SQL statement so that it can concatenate the string while maintaining correct syntax. This approach adds some complexity to the stored procedure and uses thesp_executesql system stored procedure to execute the query. USE AdventureWorks2012 GO CREATE PROCEDURE Sales.uspGetCurrencyRate @CurrencyRateIDList varchar(50) AS DECLARE @SQLString nvarchar(1000) SET @SQLString = N' SELECT CurrencyRateID, CurrencyRateDate, FromCurrencyCode, ToCurrencyCode, AverageRate, EndOfDayRate FROM Sales.CurrencyRate WHERE CurrencyRateID in ('+@CurrencyRateIDList+');' EXECUTE sp_executesql @SQLString; GO

What's the advantage of calling an inline table UDF versus a VIEW?

Typically views don't have predicates - you'd have to use a WHERE clause. To retrieve data through dbo.ufnGetOrderTotalByProductCategory, call the function within the FROM portion of a SELECT statement: SELECT ProductID, Name, TotalOrders FROM dbo.ufnGetOrderTotalByProductCategory(1) ORDER BY TotalOrders DESC;

What is the difference between UNION and UNION ALL?

UNION The UNION command is used to select related information from two tables, much like the JOIN command. However, when using the UNION command all selected columns need to be of the same data type. With UNION, only distinct values are selected. UNION ALL The UNION ALL command is equal to the UNION command, except that UNION ALL selects all values. The difference between UNION and UNION ALL is that UNION ALL will not eliminate duplicate rows, instead it just pulls all rows from all the tables fitting your query specifics and combines them into a table.

When would you use CTE's over subqueries?

Use them if 1) they use recursion (i.e. the sproc calls itself) 2) If the sproc is long or complex 3) if the subquery is repeated. It makes it much more readable.

What is User-defined Functions? What are the types of User-defined Functions that can be created?

User-defined Functions allow defining its own T-SQL functions that can accept zero or more parameters and return a single scalar data value or a table data type. Different Types of User-Defined Functions created are as follows: Scalar User-defined Function A scalar user-defined function returns one of the scalar data types. Text, ntext, image and timestamp data types are not supported. These are the type of user-defined functions that most developers are used to in other programming languages. Inline Table-Value User-defined Function An Inline table-value user-defined function returns a table data type and is an exceptional alternative to a view as the user-defined function can pass parameters into a T-SQL select command and in essence provide us with a parameterized, non-updateable view of the underlying tables. Multi-Statement Table-Value User-defined Function A multi-statement table-value user-defined function returns a table, and it is also an exceptional alternative to a view as the function can support multiple T-SQL statements to build the final result where the view is limited to a single SELECT statement. Also, the ability to pass parameters into a T-SQL select command or a group of them gives us the capability to in essence create a parameterized, non-updateable view of the data in the underlying tables. Within the create function command, you must define the table structure that is being returned. After creating this type of user-defined function, It can be used in the FROM clause of a T-SQL command unlike the behavior encountered while using a stored procedure which can also return record sets. (Read here for example)

How would you view query execution plan using DMV's?

Using DMVs, you can view the SQL code and the query execution plan in the procedure cache. Clicking the XML in the right-most column would open another tab with the graphical view of the selected query execution plan. SELECT cp.cacheobjtype, cp.objtype, cp.usecounts, cp.refcounts, SQLText = text, query_plan FROM sys.dm_exec_cached_plans cp CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) CROSS APPLY sysm.dm_exec_query_plan(cp.plan_handle)

Can a Stored Procedure call itself or a Recursive Stored Procedure? How many levels of SP nesting is possible?

Yes. As T-SQL supports recursion, you can write stored procedures that call themselves. Recursion can be defined as a method of problem solving wherein the solution is arrived at by repetitively applying it to subsets of the problem. A common application of recursive logic is to perform numeric computations that lend themselves to repetitive evaluation by the same processing steps. Stored procedures are nested when one stored procedure calls another or executes managed code by referencing a CLR routine, type, or aggregate. You can nest stored procedures up to 32 levels. Any reference to managed code from a Transact-SQL stored procedure counts as one level against the 32-level nesting limit. Methods invoked from within managed code do not count against this limit.

Entity Framework - when would you use a sproc versus direct access for insertions?

You can map modification stored procedures to entities so that EF will automatically use those any inserts, updates and deletes that it needs to perform in the database. You can also create functions from stored procedures, whether those are to query data or make changes to the database and then call them directly in your application code. You've also seen how you can combine the EDM's ability to use database views and stored procedures to completely circumvent EF's ability to access database tables directly. You could also use table-valued functions. Remember these are a powerful alternative to views - they can replace sprocs that return a single result set. Logic for this - CREATE FUNCTION udf_x AS... RETURNS @myTable TABLE(SELECT XXX)

What's best practices for sprocs in Entity Framework?

You can map modification stored procedures to entities so that EF will automatically use those any inserts, updates and deletes that it needs to perform in the database. You can also create functions from stored procedures, whether those are to query data or make changes to the database and then call them directly in your application code. You've also seen how you can combine the EDM's ability to use database views and stored procedures to completely circumvent EF's ability to access database tables directly. You could also use table-valued functions. Remember these are a powerful alternative to views - they can replace sprocs that return a single result set. Logic for this - CREATE FUNCTION udf_x AS... RETURNS @myTable TABLE(SELECT XXX)

Datatype best practices

a. If a webapp, use Unicode versus ASCII. This takes up twice the space (2 bytes/character) but it's obviously worth it for intl support. b. If multilingual - again - UNICODE. c. For storage, especially integer values, use proper sizing. Tinyint vs int can offer great gains in storage. d. Use fixed datatypes (char, etc) if its consistently <20 characters. NVARCHAR for description fields that vary widely in length.

How would you group by ShipDate (by year, month) and then the ID field and shipdate?

he OVER() clause normally creates a single sort order, but it can divide the windowed data into partitions, which are similar to groups in an aggregate GROUP BY query. This is dramatically powerful because the ranking functions can restart with every partition. The next query example uses the OVER() clause to create a sort order of the query results by ShipDate, and then partitions the data byYEAR() and MONTH().The syntax is the opposite of the logical flow. The PARTITION BY goes before the ORDER BY within the OVER() clause: USE AdventureWorks GO SELECT ROW_NUMBER() OVER( PARTITION BY YEAR(ShipDate), Month(ShipDate) Order By ShipDate ) AS RowNumber, PurchaseOrderID, ShipDate FROM Purchasing.PurchaseOrderHeader WHERE EmployeeID = 259 ORDER BY RowNumber

What about Deadlocks - how would you identify and kill them?

un the query below to display a recent deadlock from Master. The data display is retrieved from historical tables populated every few minutes. You may need to wait a bit before the actual deadlock is displayed. --sys.event_log --Deadlock data displayed as XML in additonal_data SELECT * FROM sys.event_log WHERE database_name like 'AdventureWorks2012' AND event_type = 'deadlock' I would recommend using SQL Server Management Studio to query sys.event_log table to retrieve the deadlock details. It's a bit more user friendly then SQL Database Portal. If you're using the portal, here are the steps. In upper left corner, click your database, then select Master. Now click New Query and paste in the same sys.event_log DMV. Here is here it gets messy. Click the |> to select the ROW displaying the deadlock. When selected, you'll see the |> highlighted as shown below. Then CTRL-C to copy the row and paste into your favorite text editor like NotePad or TextPad.

How would you use a utility database to monitor activity?

• Create a utility database to track database file sizes • Over time, show the amount of free space and CPU usage. • Error logs, track index fragmentation • And - best of all- process execution time. • Failed jobs. • For more, see the DMV's at sqlserverperformance.wordpress.com You could use Visual Studio to create a report that can be deployed and subscribed to. Build the utility database in all the servers in your domain - use SSIS and linked servers to aggregate and centralize data. Powershell can help you deploy.

DBCC commands of choice

• DBCC CHeckident - to check the current identity value and reseed (reset) the integer value. • DBCC LOG - to check the transaction log. • DBCC CHECKDB - runs CHECKALLOC, CHECKTABLE, CHECKCATALOG, checks the indexes, and validates the service broker. BOL recommends with this not using the REPAIR option if errors are found - its always best to restore from a backup.

What are Different Types of Locks?

• Shared Locks: Used for operations that do not change or update data (read-only operations), such as a SELECT statement. • Update Locks: Used on resources that can be updated. It prevents a common form of deadlock that occurs when multiple sessions are reading, locking, and potentially updating resources later. • Exclusive Locks: Used for data-modification operations, such as INSERT, UPDATE, or DELETE. It ensures that multiple updates cannot be made to the same resource at the same time. • Intent Locks: Used to establish a lock hierarchy. The types of intent locks are as follows: intent shared (IS), intent exclusive (IX), and shared with intent exclusive (SIX). • Schema Locks: Used when an operation dependent on the schema of a table is executing. The types of schema locks are schema modification (Sch-M) and schema stability (Sch-S). • Bulk Update Locks: Used when bulk-copying data into a table and the TABLOCK hint is specified.

What are the Advantages of Using Stored Procedures?

• Stored procedure can reduced network traffic and latency, boosting application performance. • Stored procedure execution plans can be reused; they staying cached in SQL Server's memory, reducing server overhead. • Stored procedures help promote code reuse. • Stored procedures can encapsulate logic. You can change stored procedure code without affecting clients. • Stored procedures provide better security to your data.

What is DW and in what 4 ways does it differ from OLTP systems?

• Subject-oriented, which means that the data in the database is organized so that all the data elements relating to the same real-world event or object are linked together; • Time-variant, which means that the changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time; • Non-volatile, which means that data in the database is never over-written or deleted, once committed, the data is static, read-only, but retained for future reporting. • Integrated, which means that the database contains data from most or all of an organization's operational applications, and that this data is made consistent.

How do you find a mixing index recommendation?

• Use dynamic management objects - DMO's - see sys.dm_db_missing_index_*. • You can also review cached execution plans. • Use the Database Engine Tuning Advisor. This can recommend noncollated indexes. You can use the Tuning template from Profiler along with this. People object to DETA - why? It's very helpful automation.

What are the Advantages of Using CTE?

• Using CTE improves the readability and enables easy maintenance of complex queries. • The query can be divided into separate, simple, and logical building blocks, which can be then used to build more complex CTEs until the final result set is generated. • CTE can be defined in functions, stored procedures, triggers or even views. • After a CTE is defined, it can be used as a Table or a View and can SELECT, INSERT, UPDATE or DELETE Data.


Related study sets

Accounting Principles 2 - Exam 3: Chapters 19, 20, 21

View Set

Biology 152 - Launchpad Quiz Terms

View Set