SQL

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

BISM

Business Intelligence Semantic Model

DML

Data Manipulation Language (DML)- SELECT, INSERT, UPDATE, DELETE, TRUNCATE, and MERGE

DMX

Data Mining Extensions

ETL

Extract, Transform, Load

ISO

International Organization for Standardization (ISO)

MDX

Multidimensional Expressions

OLTP

Online Transactional Processing

SEQUEL

Structured English QUEry Language

SQL

Structured Query Language

DM

data mining

DSA

data staging area

DW

data warehouse

Microsoft SQL Server supports four types of table expressions:

derived tables, common table expressions (CTEs), views, and inline table-valued functions (inline TVFs)

Types of Environments

⦁ DEV - Development ⦁ SIT - System Integration Testing ⦁ UAT - User Acceptance Testing ⦁ QA - Quality Analyses ⦁ Prod - Production

First Financial Bank, Abilene, TX - Interview - 10.18.2017

⦁ Do you have any Azure experience? - yes, in my previous project at Sompo Int. the project was all about initially migrating on-promise SQL database into Azure SQL Database and into Azure VM, however, most of the data we had to migrate into Azure SQL Database. Also, I had to update most of ETL packages so they point properly to Azure SQL Database. Creation and configuration of ETL packages on Azure SQL database is pretty same on-premise database. ⦁ How big was the data? - Overall database was huge there and there were over 50 SQL Server instances and within each server there were more than 100 different database, and each database had on average 150 tables and the size of tables ranged from 100 rows to 2 million, but on average most tables had several hundred thousand rows. ⦁ Have you monitored packages? And how? - I have really strong C# knowledge and by using Script Task I usually configured my packages to notify me about the package execution steps. Also, I used regular SSIS tools like Send Mail Task, Logging into either flat files or XML Files or directly SQL Server tables. ⦁ What do you have to pay attention when you are transferring data from Oracle? - First of all, you have to decide what connector you will choose, i.e., ADO.NET, OLE DB, Attunity Connectors or other 3rd party connector. Secondly, you have to make sure that the incoming data type is matching with your destination, if it is not matching you have to convert the data type, and to convert it there are several ways, either you can do it through Data Conversion on your SSIS package, but if you worry about the performance of your SSIS package it is recommended to do the conversion on your SQL Server. ⦁ What is a control flow? - A control flow defines a workflow of tasks to be executed, often in a particular order. Control flow is consist of Tasks and Containers (Sequence, For Loop, For Each Loop Container) ⦁ What is data Flow? - A data flow defines a flow of data from a source to a destination. It consists of Source, Transformations and Destinations ⦁ What is a data flow pipeline engine? - It is a component in ⦁ SSIS that powers the ⦁ Dataflow Task. It moves data from one place to another in memory-resident data structures called buffers. ⦁ How do you capture error logs? - Event handler and use task and system variables, Fail/Redirect/Ignore Failure Components ⦁ What do you need before you start your SSIS Development? - talk to business side people and gather all the business requirements, find out sources, check mapping documents and data dictionary ⦁ Have you created mapping documents? - yes, you can use data dictionary if available, otherwise do data mining, use system store procedures to find out info about tables and their relationships ⦁ Do you have any PowerBI experience? - yes, they use DAX ⦁ Difference between break point and check point? - Checkpoints is the points where you can record the control flow executable that have successfully run, if a package fails during execution, SSIS will reference the checkpoint file when we are trying to rerun the package. Breakpoint is a stopping point in the code. The breakpoint can give the Developer\DBA an opportunity to review the status of the data, variables and the overall status of the SSIS package. ⦁ Difference between inner join and outer join? ⦁ What is a cartesian product? - it is cross join, it multiplies rows of two table and brings all possible combinations ⦁ What is clustered and non-clustered index? ⦁ What is the difference between drill down and drill through reports? ⦁ What is the difference between Star and Snow Flake Schemas? ⦁ How to Load data into warehouse? - Full Load and then Incremental Load ⦁ What is Data Flow Tap in error handling? - Data Flow Tap available from 2012, saves coming data into text file at the specific point of time within data flow and used for error handling and debugging ⦁ Fuzzy LookUp vs LookUp ⦁ Merge Join vs Union All Transformation

When you are using TempTables in your stored procedures and using that SP in your SSIS packages, will it work?

⦁ It will not work unless you run your Stored Procedure with 'With Result Set()' to declare the metadata of your table columns

Parent / Child Tables Relationships

A child is always that model which holds the foreign key as this indicates where it belongs to. You can examine the foreign keys to identify parent-child relationships. The child table has the foreign key which references the parent. This way, all children of the same parent will have the same value for the foreign key.

How to rename database?

⦁ GUI way ⦁ alter database SampleDB modify name = NewSampleDB ⦁ sp_renameDB 'SampleDB', 'NewSampleDB'

What is recursive stored procedure?

⦁ SQL Server supports recursive stored procedure which calls by itself. Recursive stored procedure can be defined as a method of problem solving wherein the solution is arrived repetitively. It can nest up to 32 levels.

How you can perform data validation in SSIS?

⦁ SSIS provides a wonderful control 'Data Profiling Task' to accomplish data validation task that helps us to provide the Meta information of how the data is organized.

ANSI

American National Standards Institute (ANSI)

Advantages and disadvantages of Stored Procedures?

Advantages: ⦁ Maintainability, because scripts are in one location, updates and tracking of dependencies based on schema changes becomes easier ⦁ Isolation of Business Rules, having Stored Procedures in one location means that there's no confusion of having business rules spread over potentially disparate code files in the application ⦁ Speed / Optimization, Stored procedures are cached on the server and execution plans for the process are easily reviewable without having to run the application, when same stored procedure executed again, it can use the previously cached execution plans ⦁ Utilization of Set-based Processing, the power of SQL is its ability to quickly and efficiently perform set-based processing on large amounts of data; the coding equivalent is usually iterative looping, which is generally much slower ⦁ Security, limit direct access to tables via defined roles in the database, provides an "interface" to the underlying data structure so that all implementation and even the data itself is shielded. Can prevent SQL injection attacks, database administrator can control the users who access the stored procedures ⦁ Reduce network usage between clients and servers, stored procedures perform intermediate processing on the database server reducing unnecessary data transfer across the network Disadvantages ⦁ Writing and maintaining stored procedures requires more specialized skills. ⦁ There are no debuggers available for stored procedures ⦁ Poor exception handling, may cause parameter sniffing issue, because it uses the same execution plan for all runs. ⦁ Not possible to use objects ⦁ May cause parameter sniffing

What type of constraints available in SQL Server?

NOT NULL - Ensures that a column cannot have a NULL value UNIQUE - Ensures that all values in a column are different PRIMARY KEY - A combination of a NOT NULL and UNIQUE. Uniquely identifies each row in a table FOREIGN KEY - Uniquely identifies a row/record in another table CHECK - Ensures that all values in a column satisfies a specific condition DEFAULT - Sets a default value for a column when no value is specified INDEX - Used to create and retrieve data from the database very quickly

Differences among SQL Server and SSIS Versions

SSIS 2008 vs 2012 ⦁ BIDS - Business Intelligence Data Studio (until 2008R2), SSDT - SQL Server Data Tools (from 2012) ⦁ The project deployment model was introduced in SQL Server 2012 Integration Services (SSIS) ⦁ No Undo and Redo feature in SSIS 2008 ⦁ SSIS Parameters at the package level only in 2008, task level and project level added in 2012 ⦁ DQS Transformation is available in SSIS 2012 ⦁ CDC Control Task available to support CDC in SSIS 2012 ⦁ Data Flow Tap available from 2012, saves coming data into text file at the specific point of time within data flow and used for error handling and debugging SSIS 2012 vs 2014 vs 2016 ⦁ From 2016, Incremental package deployment available which lets you deploy one or more packages to an existing or new project without deploying the whole project ⦁ From 2016, when you redirect rows in the data flow that contain errors to an error output, the output contains a numeric identifier for the column in which the error occurred, but does not display the name of the column. There are now several ways to find or display the name of the column in which the error occurred. ⦁ From 2016, new templates renamed as parts, the new reusable control flow templates released in CTP 3.0 have been renamed as control flow parts or package parts SQL Server 2008 vs 2012 ⦁ New system functions introduced, like Concat, Format, EndOfMonth, Lead, Lag, IIF ⦁ Partitions increased from 1000 to 15000 in 2012 ⦁ Order By clause can use Offset Fetch starting from 2012 ⦁ Unlimited connection size from 2012 SQL Server 2016 ⦁ In Memory OLTP Enhancement ⦁ Always Encrypted ⦁ JSON and R Supported ⦁ Query Store and Live Query Statistics ⦁ Row Level Security

What does heap mean in T-SQL?

⦁ A heap is a table without a clustered index. One or more non-clustered indexes can be created on tables stored as a heap. Data is stored in the heap without specifying an order. Usually data is initially stored in the order in which is the rows are inserted into the table, but the Database Engine can move data around in the heap to store the rows efficiently; so, the data order cannot be predicted. To guarantee the order of rows returned from a heap, you must use the ORDER BY clause. To specify the order for storage of the rows, create a clustered index on the table, so that the table is not a heap.

What is Agile Methodology?

⦁ Agile is a time boxed, iterative approach to software delivery that builds software incrementally from the start of the project, instead of trying to deliver it all at once near the end. ⦁ "Agile Development" is an umbrella term for several iterative and incremental software development methodologies. The most popular agile methodologies include Extreme Programming (XP), Scrum, Crystal, Dynamic Systems Development Method (DSDM), Lean Development, and Feature-Driven Development (FDD).

Index Seek vs. Index Scan

⦁ An index scan is where SQL server reads the whole of the index looking for matches - the time this takes is proportional to the size of the index. ⦁ An index seek is where SQL server uses the b-tree structure of the index to seek directly to matching records - time taken is only proportional to the number of matching records. ⦁ In general, an index seek is preferable to an index scan (when the number of matching records is proportionally much lower than the total number of records), as the time taken to perform an index seek is constant regardless of the total number of records in your table. ⦁ However, in certain situations an index scan can be faster than an index seek (sometimes significantly faster) - usually when the table is very small, or when a large percentage of the records match the predicate.

Difference between Coalesce and ISNULL and Case Function

⦁ COALESCE "returns the first nonnull expression among its arguments," and ISNULL "replaces NULL with the specified replacement value." ⦁ COALESCE() is literally shorthand for a CASE statement, they will perform identically

What is the difference between check point and break point in SSIS?

⦁ Checkpoints force SSIS to maintain a record of the control flow executable that have successfully run. In addition, SSIS records the current values of user-defined variables. This current context will be stored in an XML file which is defined in the package property window. After that, if a package fails during execution, SSIS will reference the checkpoint file when we are trying to rerun the package. It will first do so by retrieving the current variable values as they existed prior to package failure and, based on the last successful executable that ran, start running the package where it left off. That is, it will continue executing from the point of failure, assuming. ⦁ Breakpoint is a stopping point in the code. The breakpoint can give the Developer\DBA an opportunity to review the status of the data, variables and the overall status of the SSIS package.

What type of indexes available and their differences?

⦁ Clustered Index ⦁ Non-clustered Index ⦁ Filtered Index (create index for specific portion of the table with WHERE clause) ⦁ Composite Index ⦁ Covered Index ⦁ Column Store Index (from 2012, to speed up OLAP Environment) ⦁ In Memory OLTP (to speed up OLTP Environment)

What type of triggers available in SQL Server?

⦁ DDL Triggers ⦁ DML Triggers ⦁ CLR Triggers ⦁ Logon Triggers ⦁ For/After Trigger ⦁ Instead of Trigger

Differences between Truncate and Delete Commands?

⦁ Delete command removes the rows from a table based on the condition that we provide with a WHERE clause. Truncate will remove all the rows from a table and there will be no data in the table after we run the truncate command. ⦁ TRUNCATE cannot be rolled back while DELETE can be rolled back. ⦁ TRUNCATE Resets identity of the table while DELETE does not reset identity of the table ⦁ TRUNCATE is Partial DDL (even though it is marked as DML) Command while DELETE is pure DML Command. ⦁ Truncate is faster than delete command. Reason: When you type DELETE all the data get copied into the Rollback Table space first. then delete operation get performed. That's why when you type ROLLBACK after deleting a table, you can get back the data (The system gets it for you from the Rollback Table space). All this process take time. But when you type TRUNCATE, it removes data directly without copying it into the Rollback Table space. That's why TRUNCATE is faster. Once you Truncate you cannot get back the data.

Differences between (system, user) store procedures and (system, user) functions

⦁ Functions must return a value while stored procedures may or may not return any value. ⦁ Function can return only a single value at a time while stored procedures can return one, many or none. ⦁ Functions can have only input parameters for it whereas Procedures can have input/output parameters . ⦁ Function can be called from SQL statements while stored procedures can't. ⦁ You can join UDF while You cannot join SP ⦁ Cannot be used to change server configuration using function while can be used to change server configuration with SP ⦁ Cannot be used with XML FOR clause in function while can be used with XML FOR clause with SP

IN vs EXISTS

⦁ IN does a direct match between the column specified before the IN keyword and the values returned by the subquery. When using IN there can only be a single column specified in the select clause of the subquery ⦁ Exists doesn't check for a match, it doesn't care in the slightest what values been returned from the expression, it just checks for whether a row exists or not. Because of that, if there's no predicate in the WHERE clause of the subquery that compares rows in the subquery with rows in the outer query, EXISTS will either return true for all the rows in the outer query or it will return false for all the rows in the outer query ⦁ If there are null values then you must use EXISTS instead of IN ⦁ When you use the NOT IN predicate against a subquery that returns at least one NULL, the outer query always returns an empty set. ⦁ Recall that unlike IN, EXISTS uses two-valued predicate logic. EXISTS always returns TRUE or FALSE and never UNKNOWN.

Synchronous vs Asynchronous Transformations in SSIS?

⦁ In SSIS, transformations are available in two main categories--Synchronous and Asynchronous. During ETL design it's recommended to use all Synchronous transformation components. ⦁ Synchronous are components like the Conditional Split or Derived Column Transformation where rows flow into memory buffers in the transformation and the same buffers come out. No rows are held and characteristically these transformations perform very quickly with marginal impact to Data Flow. ⦁ Asynchronous transformation has two types, fully blocking and partial blocking. Partial blocking transformation is that transformation which creates new memory buffers for the output of the transformation than what come into the transformation, like Union All Transformation; fully blocking transformations also require a new memory buffer similar to partial blocking. Asynchronous transformations additionally cause a full block of the data like Sort and Aggregate transformations. ⦁ Synchronous Transformations: Audit, Character Map, Conditional Split, Copy Column, Data Conversion, Derived Column, Import Column, Export Column, Lookup, Multicast, OLE DB Command, Percent Sampling, Row Count, Script Component, Slowly Changing Dimension ⦁ Asynchronous Transformations (Partially blocking): Data Mining Query, Merge, Merge Join, Un/Pivot, Term Lookup, Union All ⦁ Asynchronous Transformations (Fully blocking): Aggregate, Sort, Fuzzy Grouping, Fuzzy Lookup, Row Sampling, Term Extraction

What type of logical and physical joins available in SQL Server? Logical Joins

⦁ Inner Join ⦁ Full Join ⦁ Right Join ⦁ Left Join ⦁ Cross Join ⦁ Semi Join (where in/exists) ⦁ Anti-Semi Join (where not in/exists) ⦁ Implicit Join ⦁ Explicit Join Physical Joins ⦁ Merge Join - if the two join inputs are not small but are sorted on their join column (for example, if they were obtained by scanning sorted indexes), a merge join is the fastest join operation. ⦁ Nested Loops Join - if one join input is small (fewer than 10 rows) and the other join input is fairly large and indexed on its join columns, an index nested loops join is the fastest join operation because they require the least I/O and the fewest comparisons. ⦁ Hash Join - can efficiently process large, unsorted, nonindexed inputs

T-SQL Performance Tuning

⦁ It is always better to use between, <, > when sorting the dates instead of sys functions (YEAR, DATEADD, etc.) ⦁ Get rid of wild cards ⦁ Replace cursors and while loops with window functions ⦁ Use table partitioning with indexes ⦁ Lookup ⦁ Spool ⦁ Sort ⦁ Scan vs Seek ⦁ Hash Join ⦁ Nested Loop Joins

What is SQL Server Profiler?

⦁ It's a tool for tracing, recreating, and troubleshooting problems in MS SQL Server, Microsoft's Relational Database Management System (RDBMS). The profiler lets developers and Database Administrators (DBAs) create and handle traces and replay and analyze trace results. In a nutshell, it's like a dashboard that shows the health of an instance of MS SQL Server. ⦁ While it's a robust tool, many features are being deprecated by Microsoft. This is happening because most developers and DBAs feel server-side trace is a more robust option. ⦁ It works by giving DBAs and developers a high-level view of the operation of a system. Users create traces to capture data and monitor errors and other problems. They then use the profiler to store, retrieve, and view the results of many traces graphically for purposes of troubleshooting and repair. This function all happens on the client-side, meaning it uses resources on the same machine it's monitoring.

Other Reporting Tools available in the market?

⦁ JReport ⦁ Tableau Desktop ⦁ SAP Crystal Reports ⦁ MS SSRS ⦁ MS PowerBI ⦁ Domo ⦁ Oracle BI Publisher ⦁ IBM Cognos Analytics ⦁ Pentaho Reporting ⦁ Jasper Report ⦁ MicroStrategy Report Services ⦁ QlikView

Ralph Kimball vs Bill Inmon datawarehouse modeling approaches

⦁ Kimball (mostly used): Build business process oriented small data marts which are joined to each other using common dimensions between business process. It is known as bottom-up approach. Data marts should be built on dimensional modelling approach ⦁ Inmon: One centralizes data warehouse which will act as an enterprise-wide datawarehouse and then build data mart as per need for specific department or process. It is known as top down approach. Central data warehouse to follow ER modelling approach

What are the system databases and their difference?

⦁ Master Database - The master database contains all the system level information for SQL Server; all the logins, linked servers, endpoints, and other system-wide configuration settings. The master database is also where SQL Server stores information about the other databases on this instance and the location of their files. If the master database is not present, SQL Server cannot start. ⦁ Model Database - Is used as the template for all databases created on the instance of SQL Server. Modifications made to the model database, such as database size, collation, recovery model, and other database options, are applied to any databases created afterward. ⦁ MSDB - Is used by SQL Server Agent for scheduling alerts and jobs. msdb is used by the SQL Server Agent, database mail, Service Broker, and other services. If you aren't actively working with things like jobs, alerts, log shipping, etc you can pretty safely ignore msdb... sort of. One important item is that msdb holds backup history. Using the msdb tables (you can start by taking a look at msdb.dbo.backupset), it's possible to determine when each database and filegroup was last backed up. This is very useful, especially when you've just started working at a new company or taken over the maintenance of new servers. ⦁ TempDB - Is a workspace for holding temporary objects or intermediate result sets. We come, at last, to tempdb. Tempdb is the workhorse of the system databases. It is the workspace that SQL Server uses to store the intermediate results of query processing and sorting. You know how you see those spools in your execution plans? When you see one of those, SQL Server is probably spooling the data to a temporary table in the tempdb. Outside of storing temporary results, tempdb is also used during snapshot isolation and for user created temporary tables (⦁ this includes table variables). One thing that is interesting to note about tempdb is that it is re-created every time the SQL Server service is started. Any objects that you have created in tempdb will be gone once the server restarts. If you want specific tables or stored procedures to always be available in tempdb, you will need to add them to the model database or else use a stored procedure to create them in tempdb when the SQL Server is started.A properly sized and configured tempdb is vital for effective SQL Server performance. By default, tempdb starts at 8MB in size and will continue to grow by ten percent until the drive is full. If the tempdb is sized too small, system resources will be used growing the tempdb file. This overhead can be avoided by increasing the default size of tempdb. ⦁ Resource Database (hidden) - Is a read-only database that contains system objects that are included with SQL Server. System objects are physically persisted in the Resource database, but they logically appear in the 'sys schema' of every database.

Tell me at least 10 differences between OLAP and OLTP Environments

⦁ OLTP System - Online Transaction Processing (Operational System) OLAP System - Online Analytical Processing (Data Warehouse) ⦁ Source of data OLTP: Operational data; OLTPs the original source of the data. OLAP: Consolidation data; OLAP data comes from the various OLTP Databases ⦁ Purpose of data OLTP: To control and run fundamental business tasks OLAP: To help with planning, problem solving, and decision support ⦁ What the data OLTP: Reveals a snapshot of ongoing business processes OLAP: Multi-dimensional views of various kinds of business activities ⦁ Inserts and Updates OLTP: Short and fast inserts and updates initiated by end users OLAP: Periodic long-running batch jobs refresh the data ⦁ Queries OLTP: Relatively standardized and simple queries returning relatively few records OLAP: Often complex queries involving aggregations ⦁ Processing Speed OLTP: Typically, very fast OLAP: Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes ⦁ Space Requirements OLTP: Can be relatively small if historical data is archived OLAP: Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP ⦁ Database Design OLTP: Highly normalized with many tables OLAP: Typically, de-normalized with fewer tables; use of star and/or snowflake schemas ⦁ Backup and Recovery OLTP: Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability OLAP: Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method are

Can we use Order By clause in CTE/DerivedTable/SubQuery/View?

⦁ ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified. So, we can use "OFFSET 0 ROWS" after order by and it will work.

Other ETL Tools available in the market?

⦁ Oracle Warehouse Builder (OWB)/Data Integrator (ODI) ⦁ SAP Data Services ⦁ SAS Data Management ⦁ IBM Infosphere Information Server/Cognos Data Manager ⦁ PowerCenter Informatica ⦁ Data Migrator (IBI) ⦁ SQL Server Integration Services (SSIS) ⦁ Talend Studio for Data Integration ⦁ Sagent Data Flow ⦁ CloverETL ⦁ Centerprise Data Integrator ⦁ Pentaho Data Integration ⦁ Adeptia Integration Server ⦁ QlikView Expressor

Difference between [ReportServer] and [ReportServerTempDB]

⦁ ReportServer is the database which is used to store Metadata and Definition of Objects used by Reporting Server. ⦁ ReportServerTemp DB is created with ReportServer Database. It stores temporary data, session information, and cached reports

Where can we find URL's for Report Server and Report Manager?

⦁ Reporting Services Configuration Manager

Describe Incremental Load Types

⦁ SCD (Slowly Changing Dimension) Transformation, not recommended, because it uses OLE DB Command which results in a row by row reading. ⦁ Tradition LookUp/Merge Join Transformations ⦁ Staging Phase using Execute SQL Task (with Merge Function or user defined stored procedures) ⦁ CDC (Changed Data Capture)

SSIS Deployment Approaches?

⦁ SQL Server > Stored Packages > MSDB ⦁ File System > from physical location where SSIS project is actually saved on the computer ⦁ SSIS Package Store > Stored Packages > File System or MSDB (can access both) ⦁ SSIS Catalog > Integration Services Catalogs > SSISDB (introduced in 2012, directly stored in SQL Server DB) ⦁ The Incremental Package Deployment feature introduced in SQL Server 2017 Integration Services (SSIS) lets you deploy one or more packages to an existing or new project without deploying the whole project

What is SQL Server Agent?

⦁ SQL Server Agent is a Microsoft Windows service that allows you to automate some administrative tasks. SQL Server Agent runs jobs, monitors SQL Server, and processes alerts. The SQL Server Agent service must be running before local or multi-server administrative jobs can run automatically. ⦁ SQL Server Agent is the service installed by SQL Server that lets you automate and schedule tasks by running SQL Server Agent jobs. ⦁ You can automate and schedule the execution of SQL Server Integration Services packages by using SQL Server Agent. You can schedule packages that are deployed to the Integration Services server, and are stored in SQL Server, the SSIS Package Store, and the file system. ⦁ Using the SQL Server Agent service, you can schedule Analysis Services administrative tasks to run in the order and times that you need. Scheduled tasks help you automate processes that run on regular or predictable cycles. You can schedule administrative tasks, such as cube processing, to run during times of slow business activity. You can also determine the order in which tasks run by creating job steps within a SQL Server Agent job. For example, you can process a cube and then perform a backup of the cube. ⦁ It is a series of operations performed by SQL Server Agent sequentially. It can do a wide range of activities, including running T-SQL scripts and ActiveX scripts, Integration Services packages, Analysis Services commands and queries, or Replication tasks. Jobs can run repetitive or schedulable tasks, and they can automatically notify users of job status by generating alerts, thereby greatly simplifying SQL Server admin. A job can be edited only by its owner or members of the sysadmin role ⦁ SQL Server Agent is a Microsoft Windows service that allows you to automate some administrative tasks. SQL Server Agent runs jobs, monitors SQL Server, and processes alerts. The SQL Server Agent service must be running before local or multi-server administrative jobs can run automatically. ⦁ SQL Server Agent is the service installed by SQL Server that lets you automate and schedule tasks by running SQL Server Agent jobs. ⦁ You can automate and schedule the execution of SQL Server Integration Services packages by using SQL Server Agent. You can schedule packages that are deployed to the Integration Services server, and are stored in SQL Server, the SSIS Package Store, and the file system. ⦁ Using the SQL Server Agent service, you can schedule Analysis Services administrative tasks to run in the order and times that you need. Scheduled tasks help you automate processes that run on regular or predictable cycles. You can schedule administrative tasks, such as cube processing, to run during times of slow business activity. You can also determine the order in which tasks run by creating job steps within a SQL Server Agent job. For example, you can process a cube and then perform a backup of the cube. ⦁ It is a series of operations performed by SQL Server Agent sequentially. It can do a wide range of activities, including running T-SQL scripts and ActiveX scripts, Integration Services packages, Analysis Services commands and queries, or Replication tasks. Jobs can run repetitive or schedulable tasks, and they can automatically notify users of job status by generating alerts, thereby greatly simplifying SQL Server admin. A job can be edited only by its owner or members of the sysadmin role ⦁ Data warehouse modeling (conceptual, logical, physical) ⦁ ER Diagram for data warehouse ⦁ Data Dictionary ⦁ Mapping ⦁ Document Types ⦁ ETL Process ⦁ Approaches (Inmon-Snowflake, Kimball-Star)

If we can manage to do everything with SQL Server, why is the need for SSIS?

⦁ SSIS is an integration tool, not a database. Naturally, it's better at integrating data from heterogeneous locations. ⦁ Connections to a vast number of data sources in different servers - flat files, other RDMS systems(Oracle, TerraData), XML, excel sources and destinations, etc. ⦁ Complex error handling within dataflows - start and stop dataflows based on severity of the error. Throw an email to admins/DBAs when on error. Pick up the dataflow mid-stream after the error has been resolved. ⦁ Dataflow auditing - Easily check and redirect data that does not meet/match certain criteria - get an email when a certain number of rows in each process have no matches found. Easily collect statistics on dataflow processes ⦁ Visual/logical representation of dataflows as well as logical modularity of dataflows

Source Control Types?

⦁ TFS - Team Foundation Services ⦁ VSTS - Visual Studio Team Services ⦁ GIT

Differences between Cross Apply vs Outer Apply vs Joins

⦁ The APPLY operator introduced in SQL Server 2005, is used to join a table to a table-valued function. The Table Valued Function on the right-hand side of the APPLY operator gets called for each row from the left (also called outer table) table. Cross Apply returns only matching rows (semantically equivalent to Inner Join) Outer Apply returns matching + non-matching rows (semantically equivalent to Left Outer Join). The unmatched columns of the table valued function will be set to NULL.

What are the differences between Where and Having Clauses?

⦁ The difference is that WHERE operates on individual rows, while HAVING operates on groups. ⦁ The WHERE clause selects rows before grouping. The HAVING clause selects rows after grouping. ⦁ The WHERE clause cannot contain aggregate functions. The HAVING clause can contain aggregate functions.

SSIS Performance Tuning

First, you need to split your package into three phases: source, middle stage(transformation), destination SOURCE: ⦁ For the source choose right driver/provider (i.e., choose OLE DB, SQL Server for SQL databases instead of ADO.NET) ⦁ Extract required data; pull only the required set of data from any table or file. Avoid Select * and use specific column name in SELECT statement if possible. ⦁ If your source is flat file then change the "Fast Parse" property "True" ⦁ You should optimize your source T-SQL query (i.e., check execution plan, proper index creating, get rid of scans) MIDDLE STAGE: ⦁ Avoid asynchronous transformations (i.e., Sort, Merge, Merge Join) ⦁ We should try to minimize transformations by T-SQL queries if possible, i.e., Derived Column, Aggregate Colum, Sort Transformations can easily be achieved with T-SQL query ⦁ If you still have to use Merge Transformations, you can get rid of additional required Sort Transformation by sorting the table in SQL Server side and making IsSorted property of Merge Transformation True DESTINATION: ⦁ Make the target table a heap by dropping all the indexes then transfer the data and then recreate the indexes on the target table. ⦁ Control parallel execution of a task by configuring the MaxConcurrentExecutables and EngineThreads property. MaxConcurrentExecutables is the package level property and has a default value of -1, which means the maximum number of tasks that can be executed is equal to the total number of processors on the machine plus two; EngineThreads is a data flow task level property and has a default value of 10, which specifies the total number of threads that can be created for executing the data flow task. ⦁ Configure Data access mode option in OLEDB Destination. In the SSIS data flow task we can find the OLEDB destination, which provides a couple of options to push data into the destination table, under the Data access mode; first, the "Table or view" option, which inserts one row at a time; second, the "Table or view fast load" option, which internally uses the bulk insert statement to send data into the destination table, which always provides better performance compared to other options. Once you choose the "fast load" option it gives you more control to manage the destination table behavior during a data push operation, like Keep identity, Keep nulls, Table lock and Check constraints. ⦁ Configure Rows per Batch and Maximum Insert Commit Size in OLEDB destination. These two settings are important to control the performance of tempdb and transaction log because with the given default values of these properties it will push data into the destination table under one batch and one transaction. It will require excessive use of tembdb and transaction log, which turns into an ETL performance issue because of excessive consumption of memory and disk storage. ⦁ We should have a clear idea about when to use event logging and when to avoid. Excessive logging is an overhead on SSIS packages and affects the performance tuning to an certain extent. So, its recommended if we dynamically set the value of logging mode property. ⦁ Avoid implicit typecast. When data comes from a flat file, the flat file connection manager treats all columns as a string (DS_STR) data type, including numeric columns. As you know, SSIS uses buffer memory to store the whole set of data and applies the required transformation before pushing data into the destination table. Now, when all columns are string data types, it will require more space in the buffer, which will reduce ETL performance. ⦁ We should use Performance counters to monitor the performances of runtime and data flow pipeline engines. ⦁ We should use Checkpoint features so that it can help us in package restarting.

Tell me the ways you can schedule to run your SSIS packages:

⦁ SQL Agent ⦁ Creating special .NET application with C# or VB ⦁ Windows Task Scheduler with PowerShell Script or other command scripts

Microsoft SQL Server supports four table operators

JOIN, APPLY, PIVOT, and UNPIVOT.

Two forms of CASE expression

simple and searched

SSRS Deployment?

⦁ Report Server ⦁ Report Manager

How to upgrade SQL Server Versions?

⦁ Script out the whole database ⦁ Use upgrade wizard

How to select a random sampling of rows from a large table with lots of rows?

⦁ Select TOP 100 * from tblEmployee ORDER BY NEWID() ⦁ Select * from tblEmployee tablesample (100 rows)

EXECUTION ORDER

1. FROM, 2. WHERE, 3. GROUP BY, 4. HAVING, 5. SELECT, 6. ORDER BY

Explain the differences between Primary, Unique and Foreign Keys.

A PRIMARY Key and UNIQUE Key constraints both are similar and it enforces uniqueness of the column on which they are defined. * Primary Key - Primary key cannot have a NULL value. - Each table can have only one primary key. - By default, Primary key is clustered index and data in the database table is physically organized in the sequence of clustered index. - Primary key can be related with another table's as a * Foreign Key. - Primary key supports Auto Increment value (Identity Column). * Unique Key -Unique Constraint may have ONE NULL value. -Each table can have more than one Unique Constraint. - By default, Unique key is a unique non-clustered index. - Unique Constraint cannot be related with another table as a Foreign Key. - Unique Constraint doesn't support Auto Increment value (Identity Column). * Foreign Key - Foreign key is a field in the table which is primary key in another table. - Foreign key can accept multiple null values. - Foreign key does not automatically create an index, you can manually create an index on the foreign key. - We can have more than one foreign key in a table. - There are actual advantages to having a foreign key be supported with a clustered index, but you get only one per table. What's the advantage? If you are selecting the parent plus all child records, you want the child records next to each other. This is easy to accomplish using a clustered index. - Having a NULL foreign key is usually a bad idea, because it may cause "orphan record".

Difference between Union and Union ALL and Joins

⦁ The main difference between UNION ALL and UNION is that, while UNION only selects distinct values, UNION ALL selects all values (including duplicates). ⦁ UNION puts lines from queries after each other, while JOIN makes a cartesian product and subsets it -- completely different operations

What are Attunity Drivers/Connectors?

Attunity's high speed connectors for Oracle and Teradata have been selected by Microsoft to be included with SQL Server Integration Services (SSIS). These high-speed connectors created by Attunity provide: ⦁ The connectors deliver unparalleled throughput for extracting and loading data to and from Oracle and Teradata. Architected with Microsoft, the connectors use direct integration into internal SSIS buffering APIs, cutting through .NET and other layers, and also use the high-speed load/extract utilities offered by Oracle and Teradata. ⦁ The connectors are fully integrated into the Business Intelligence Development Studio (BIDS), part of Microsoft Visual Studio 2008, offering a user experience like that of the SSIS generic OLEDB Connector, with intuitive capabilities including configuration, metadata browsing, and error routing.

What is Linked Server in SSMS?

Configure a linked server to enable the SQL Server Database Engine to execute commands against OLE DB data sources outside of the instance of SQL Server. Typically, linked servers are configured to enable the Database Engine to execute a Transact-SQL statement that includes tables in another instance of SQL Server, or another database product such as Oracle. Many types OLE DB data sources can be configured as linked servers, including Microsoft Access and Excel. Linked servers offer the following advantages: ⦁ The ability to access data from outside of SQL Server. ⦁ The ability to issue distributed queries, updates, commands, and transactions on heterogeneous data sources across the enterprise. ⦁ The ability to address diverse data sources similarly. You can configure a linked server by using SQL Server Management Studio or by using the sp_addlinkedserver (Transact-SQL) statement. OLE DB providers vary greatly in the type and number of parameters required. For example some providers require you to provide a security context for the connection using sp_addlinkedsrvlogin (Transact-SQL). Some OLE DB providers allow SQL Server to update data on the OLE DB source. Others provide only read-only data access. For information about each OLE DB provider, consult documentation for that OLE DB provider.

DAX

Data Analysis Expressions

DCL

Data Control Language (DCL) - GRANT and REVOKE

DDL

Data Definition Language (DDL) - CREATE, ALTER, and DROP

Why does NULL = NULL evaluate to false in SQL server?

First, we have to keep in mind that T-SQL is based on three-valued predicate logic which refers to the three possible logical values that can result from a predicate: true, false, and unknown. NULL means unknown in SQL, so we cannot equate two unknown values with each other. If we would like to accept two NULL values as equal then we must Turn Off the ANSI-Standard. We can test these by running the following syntaxes at our SSMS: set ansi_nulls ON/OFF if null = null print 'true' else print 'false'

What is Registered Servers in SSMS?

SQL Server Management Studio allows you to manage large numbers of servers. Use Registered Servers to organize servers that you access frequently. With Registered Servers you can: ⦁ Register servers to preserve the connection information. ⦁ Determine if a registered server is running. ⦁ Easily connect Object Explorer and Query Editor to a registered server. ⦁ Edit or delete the registration information for a registered server. ⦁ Create groups of servers. ⦁ Provide user-friendly names for registered servers by providing a value in the Registered server name box that is different from the Server name list. ⦁ Provide detailed descriptions for registered servers. ⦁ Provide detailed descriptions of registered server groups. ⦁ Export registered server groups. ⦁ Import registered server groups.

Differences between Temp Tables and Table Variables.

There are a few differences between Temporary Tables (#tmp) and Table Variables (@tmp), although using tempdb isn't one of them, as spelt out in the MSDN link below. As a rule of thumb, for small to medium volumes of data and simple usage scenarios you should use table variables. (This is an overly broad guideline with of course lots of exceptions) Some points to consider when choosing between them: ⦁ Temporary Tables are real tables so you can do things like CREATE INDEXes, etc. If you have large amounts of data for which accessing by index will be faster than temporary tables are a good option. ⦁ Table variables can have indexes by using PRIMARY KEY or UNIQUE constraints. (If you want a non-unique index just include the primary key column as the last column in the unique constraint. If you don't have a unique column, you can use an identity column.) ⦁ SQL 2014 has non-unique indexes too. ⦁ Table variables don't participate in transactions and SELECTs are implicitly with NOLOCK. The transaction behavior can be very helpful, for instance if you want to ROLLBACK midway through a procedure then table variables populated during that transaction will still be populated! ⦁ Temp tables might result in stored procedures being recompiled, perhaps often. Table variables will not. ⦁ You can create a temp table using SELECT INTO, which can be quicker to write (good for ad-hoc querying) and may allow you to deal with changing datatypes over time, since you don't need to define your temp table structure upfront. ⦁ You can pass table variables back from functions, enabling you to encapsulate and reuse logic much easier (eg make a function to split a string into a table of values on some arbitrary delimiter). ⦁ Using Table Variables within user-defined functions enables those functions to be used more widely (see CREATE FUNCTION documentation for details). If you're writing a function you should use table variables over temp tables unless there's a compelling need otherwise. ⦁ Both table variables and temp tables are stored in tempdb. But table variables (since 2005) default to the collation of the current database versus temp tables which take the default collation of tempdb (⦁ ref). This means you should be aware of collation issues if using temp tables and your db collation is different to tempdb's, causing problems if you want to compare data in the temp table with data in your database. ⦁ Global Temp Tables (##tmp) are another type of temp table available to all sessions and users.

Can you truncate parent table which has relationship with child table?

To answer this question, we first must understand what is Truncate Command and how it works. You cannot Truncate a table which has a Foreign Key Constraints, in other words, if the parent table is being referenced by child table with Foreign Key Constraints, then you cannot Truncate the parent table. To work around is, first, you must delete the constraints, then you can truncate the table then you can re-assign the constraints. Second, you can delete the table and reseed the identity columns to ZERO.

Cascading referential integrity constraint?

Under Table Design>>RELATIONSHIPS>>INSERT AND UPDATE SPECIFIACTIONS there are three options available to choose: ⦁ No action - will throw an erroring stating that it cannot delete the rows because of FK constraint ⦁ Cascade - will delete dependent rows on the referenced tables ⦁ Set to NULL - will set all the effected values to NULL ⦁ Set to DEFAULT - will set the effected values to the DEFAULT Constraint value

Does Group By Syntax should always include aggregate functions? Why?

Yes, if Group By is used, then Select statement should include at least one aggregate function, otherwise it will be meaningless. Let's say we have Employee Table which includes EmpID and Salary for each month, if we Group By the table by EmpID, then one EmpID will end up having multiple values (multiple salaries for different months), so the systems gives Error, so we need to aggregate those multiple values into one line with aggregate functions such as MIN, MAX, AVG, SUM, COUNT. However, Aggregate Functions does not always need to be followed by Group By. (Select MIN(Salary) from tblEmployee)

What is parameter sniffing?

⦁ Every batch you execute, either ad-hoc or stored procedure, generates a query plan that is kept in the plan cache for future usage. SQL Server attempts to create the best query plan to retrieve the data, but what may seem obvious is not always the case with the plan cache. The way SQL Server choses the best plan is by cost estimation. For example, if I ask you which is best, an index seek followed by a key lookup or a table scan you may answer the first, but it depends on the number of lookups. In other words, it depends on the amount of data being retrieved. So, the best query plan takes into consideration the cardinality estimation based on input parameters and with the help of statistics. When the optimizer creates an execution plan it sniffs the parameter values. This is not an issue; in fact, it is needed to build the best plan. The problem arises when a query uses a previously generated plan optimized for a different data distribution. In most cases the database workload is homogeneous so parameter sniffing won't be a problem; but on a small number of cases this becomes a problem and the outcome can be dramatic.

What is the difference between Stored Procedures and View's

⦁ View - A View in simple terms is a subset of a table. It can be used to retrieve data from the tables, Insert, Update or Delete from the tables. The Results of using View are not permanently stored in the database. ⦁ Stored Procedure - A stored procedure is a group of SQL statements which can be stored into the database and can be shared over the network with different users. ⦁ The views can't accept input parameters and S.P. accept input parameters.

SSIS Package/Project Configuration approaches?

⦁ XML Configuration ⦁ Environmental Variables ⦁ SQL Server Configuration by creating SSIS Configuration Table ⦁ Parent Child Package Configuration ⦁ Registry Entry ⦁ SQL Server SSIS Catalog (from 2012 both package and project configuration available), by adding parameters and environments

What should you pay attention when you are using TempTables in your SSIS pacakges?

⦁ You should have to make sure that RetainSameConnection property set to True

Differences between char/varchar/ncahr/nvarchar

⦁ nchar and nvarchar can store Unicode characters. N stands for National ⦁ char and varchar cannot store Unicode characters. ⦁ char and nchar are fixed-length which will reserve storage space for number of characters you specify even if you don't use up all that space. ⦁ varchar and nvarchar are variable-length which will only use up spaces for the characters you store. It will not reserve storage like char or nchar.


Kaugnay na mga set ng pag-aaral

ENGL1101-Composition and Rhetoric - Grammar Quiz 1

View Set

Chapter 23 Management of patients with chest and lower Respiratory tract disorders

View Set

Employee and Labor Relations (40%) (part 1)

View Set

Survey of World Religion Ch. 9- 13

View Set