SAS Programming 2: Data Manipulation Techniques Review

Ace your homework & exams now with Quizwiz!

What is the difference between a DO UNTIL statement and a DO WHILE statement?

A DO UNTIL executes until a condition is true, and the condition is checked at the bottom of the DO loop. A DO UNTIL loop always executes at least one time. A DO WHILE executes while a condition is true, and the condition is checked at the top of the DO loop. A DO WHILE loop does not iterate even once if the condition is initially false.

What is the difference between iterative and conditional DO loops?

A conditional DO loop executes based on a condition, whereas an iterative DO loop executes a set number of times.

What is the process for table merging?

Any tables listed in the MERGE statement must be sorted by the same column (or columns) listed in the BY statement. The MERGE statement combines rows where the BY-column values match.

Which columns are required in an input table to create a format based on numeric ranges?

At a minimum, FMTNAME, LABEL, and START are required for a CNTLIN= table. END is also needed for ranges.

What does the IN= option in a MERGE statement do?

By default, both matches and nonmatches are written to the output table in a DATA step merge. The IN= data set option follows a table in the MERGE statement and names a variable that will be added to the PDV. The IN= variables are included in the PDV during execution, but they are not written to the output table. Each IN= variable relates to the table that the option follows. During execution, the IN= variable is assigned a value of 0 or 1. 0 means that the corresponding table did not include the BY column value for that row, and 1 means that it did include the BY-column value.

What is the difference between these functions: CAT( ), CATS( ), CATX( )?

CAT( ) concatenates strings together, does not remove leading or trailing blanks. CATS( ) concatenates strings together, removes leading or trailing blanks from each string. CATX( ) concatenates strings together, removes leading or trailing blanks from each string, and inserts the delimiter between each string.

What function returns the smallest integer that is greater than or equal to the argument?

CEIL(number)

What does the CNTLIN= option do in a PROC FORMAT step?

CNTLIN= specifies the table to read in order to create a format.

How might we simply create a variable to calculate the sum of these columns: CampingPark, CamptingTent, CampingOther?

CampTotal=sum(of Camping:);

What are the two phases of DATA step execution?

Compilation and execution

What is the proper syntax of a DROP operator within a DATA statement?

DATA output(drop=variable);

What function will extract the date from a datetime value?

DATEPART(datetime); TIMEPART(datetime) may be used to extract the time

What is the format of an iterative DO loop?

DO index-column = start TO stop <BY increment>; ...repetitive code... END; The index-column parameter names a column whose value controls the execution of the DO loop. This column is included in the table that is being created unless you drop it. The start value is a number or numeric expression that specifies the initial value of the index column. The stop value is a number or numeric expression that specifies the ending value that the index column must exceed to stop the execution of the DO loop. The increment value specifies a positive or negative number to control the incrementing of the index column. The BY keyword and the increment are optional. If they are omitted, the index column is increased by 1.

What function returns the largest integer that is less than or equal to the argument?

FLOOR(number)

True/False: Multiple variables in a KEEP operator within a DATA statement must be separated by commas.

False

True/False: Only two input tables can be specified in the MERGE statement.

False

True/False: To concatenate tables, those tables must be in the same library.

False

True/False: You cannot put an OUTPUT command in an IF operator.

False

True/False: Use a BY statement to sort the data while transposing.

False; PROC TRANSPOSE cannot sort the data. The data needs to be sorted before the PROC TRANSPOSE statement. The BY statement transposes data within groups.

True/False: Functions and CALL routines both return a value that must be used in an assignment statement or expression.

False; a function returns a value that must be used in an assignment statement or expression, but a CALL routine alters existing column values or performs other system functions

True/False: During the execution stage, an implied REINITIALIZE occurs at the bottom of the DATA sthe tep.

False; an implied OUTPUT and RETURN (not REINITIALIZE) occurs at the bottom of the DATA step

True/False: If you add the DROP= option to an output table in the DATA step, the columns that you list are not available for processing.

False; if you add the DROP= option, the columns that you list are not added to the output table. Columns that will be dropped are flagged in the PDV and are not dropped until the row is output to the designated table. Therefore, dropped columns are still available for processing in the DATA step.

True/False: During the compilation phase of the DATA step, initial values are assigned to the columns.

False; initial values are assigned to columns at the beginning of the execution phase

True/False: This expression only returns rows where "Oahu" is all lowercase: if find(Location, "Oahu", "i") > 0 then output;

False; the "I" modifier as the third argument in the FIND function makes the search case insensitive

True/False: The DATA step debugger in SAS Enterprise Guide can be used with DATA athe nd PROC steps.

False; the DATA step debugger in SAS Enterprise Guide works only with DATA steps

True/False: The FMTLIB option specifies the library to store the format in a PROC FORMAT step.

False; the FMTLIB option goes in the PROC FORMAT statement and creates a report containing information about your custom formats.

True/False: The KEEP statement names the columns to include from the input table.

False; the KEEP statement controls which columns are in the output table

True/False: When using the DATA step to go from a narrow table to a wide table, the KEEP statement is needed to hold values in the PDV across multiple iterations of the DATA step.

False; the RETAIN statement is needed when using the DATA step to go from a narrow table to a wide table. The RETAIN statement holds values in the PDV across multiple iterations of the DATA step. The KEEP statement controls the columns that make it to the final table.

True/False: If an increment value is not specified in an iterative DO loop, the default increment is 0.

False; the default increment is 1

True/False: In the FORMAT procedure, you specify the name of the format and the name of the column that will use the custom format.

False; the format name is specified in the VALUE statement of PROC FORMAT. However, PROC FORMAT has no reference to the table or column to be formatted. Those items appear in other steps, such as a FORMAT statement within PROC PRINT.

True/False: The sum statement initially sets the accumulator column to missing.

False; the sum statement ignores missing values, initially sets the accumulator column to 0, adds a numeric value to an accumulator column, and automatically retains the value of the accumulator column

True/False: A KEEP operator within a DATA statement is not followed by a (=) symbol.

False; when included in DATA or SET statements, the proper syntax is (drop= ) or (keep= )

What are the values in the temporary First.bycol and Last.bycol variables?

First.bycol is 1 for the first row within a group and 0 otherwise. Last.bycol is 1 for the last row within a group and 0 otherwise.

What function can be used to count the number of intervals that have occurred between a start and end date?

INTCK('interval',start-date,end-date <,'method'>)

What function can be used to adjust or shift date values?

INTNX(interval,start,increment <,'alignment'>)

What happens if a subsetting IF statement is not true?

If the subsetting IF expression is not true, the DATA step immediately stops processing statements for that particular iteration, likely skipping the output trigger, and the row is not written to the output table.

Which option in the PROC FORMAT statement specifies a library to store a custom format?

Library=

When using the DATA step to go from a wide tabe to a narrow table, which statement is needed for creating multiple rows from a single row?

OUTPUT

What is the purpose of PUTLOG commands?

PUTLOG statements allow us to review the processes of data compiling in the PDV in the log; options include PUTLOG _ALL_, PUTLOG COLUMN=, or COLUMN "message"

How do we create an accumulating computed column?

RETAIN column <initial-value>;

What steps are required when converting a character column named Date to a numeric SAS date column with the same name?

Rename the Date column to a new name, such as CharDate. Use the INPUT function to read the renamed CharDate character column and create a numeric column named Date. Specify an appropriate informat in the INPUT function.

What does COMPBL( ) do?

Returns a character string with all multiple blanks in the source string converted to single blanks

What does STRIP( ) do?

Returns a character string with leading and trailing blanks removed

What does COMPRESS( ) do?

Returns a character string with specified characters removed from the source string

What is the process for table concatenation?

SAS first reads all the rows from the first table listed in the SET statement and writes them to the new table. Then it reads and writes the rows from the second table, and so on.

What three character columns must exist in an input table for a custom format?

Start, which represents the raw data values to be formatted. Label, which represents the formatted labels. FmtName, which contains the name of the format that you are creating. Character formats start with a dollar sign.

What is the proper syntax of a TRANWORD function?

TRANWRD(source, target, replacement)

What temporary variables are created in the PDV when a BY statement is executed?

The BY statement creates two temporary variables in the PDV for each column listed as a BY column: First.bycol and Last.bycol.

What is the difference between PUT( ) and INPUT( )?

The INPUT function converts a character value to a numeric value using a specified informat. SAS automatically tries to convert character values to numeric values using the w. informat. The PUT function converts a numeric or character value to a character value using a specified format. SAS automatically tries to convert numeric values to character values using the BEST12. format.

What is the purpose of the LARGEST function? What is its syntax?

The LARGEST function returns the kth largest nonmissing value: LARGEST(k, value-1 <, value-2 ...>). The first argument is the value to return, and the remaining arguments are the numbers to evaluate. There is also a SMALLEST function that returns the kth smallest nonmissing value.

What is the purpose of the RAND function? What is its syntax?

The RAND function generates random numbers from a selected distribution. The first argument specifies the distribution, and the remaining arguments differ depending on the distribution. To generate a random, uniformly distributed integer, use 'INTEGER' as the first argument. The second and third arguments are the lower and upper limits.

What must be input into the ROUND function?

The ROUND function rounds the first argument to the nearest integer. The optional second argument can be provided to indicate the rounding unit.

What is the proper syntax of a SCAN function?

The SCAN function returns the nth word in a string: SCAN(string, n <, 'delimiters'>)

To filter output by the values in a computed column, should a WHERE or IF statement be used?

The WHERE statement subsets data as it is being read into the PDV, so computed columns cannot be specified in the WHERE statement. To subset a computed column, the condition must be specified in an IF statement.

What is the difference between a custom format name for a numeric format and a character format?

The format name can be up to 32 characters in length, must begin with a $ followed by a letter or underscore for character formats, and must begin with a letter or underscore for numeric formats. You do not include a period in the format name when you create the format, but you do include the period in the name when you use the format.

What do these functions have in common: ANYDIGIT( ), ANYALPHA( ), ANYPUNCT( )?

They return the first position at which a digit, alpha character, or punctuation symbol are found in a string.

For what values would 'IF last.variable;' be true?

This statement is true if Last.variable is equal to a nonzero or nonmissing value. Therefore, this statement is true when Last.variable is equal to 1.

What must we do before processing data by in groups in a DATA step?

To process data in groups, the data first must be sorted by the grouping column or columns. This can be accomplished with PROC SORT. The BY statement in the DATA step indicates how the data has been grouped. Each unique value of the BY column will be identified as a separate group.

True/False: During table, concatenation, columns that are not in all tables are also included in the output table.

True

True/False: During the compilation phase of the DATA step, the descriptor portion of the output table is created.

True

True/False: If a column is not read into the PDV (i.e.,​ if it is dropped in the SET statement), it is not available for processing in the DATA step.

True

True/False: If an explicit OUTPUT statement is used in the DATA step, it disables the implicit OUTPUT at the end of the DATA step.

True

True/False: If you add the KEEP= option to a SET statement, only the columns that you list are added to the output table.

True

True/False: Multiple VALUE statements can be used in the PROC FORMAT step.

True

True/False: The KEEP= option in the DATA statement names the columns to include in the output table.

True

True/False: The index column is incremented at the bottom of each iterative DO loop.

True

True/False: This expression rounds each value of Sales to the nearest hundredth (or two decimal places): round(Sales, .01).

True

True/False: SET is a compile-time-only statement.

True; at execution time, the SET statement is processed to read data into the PDV

True/False: During the execution stage, data is processed in the program data vector (PDV).

True; during execution, data manipulation occurs in the PDV

True/False: When concatenating tables, missing values are generated for columns that exist in one input table and not in another.

True; tables with different columns can be concatenated, but missing values are generated for columns that exist in one input table and not in another.

True/False: KEEP, LENGTH, and WHERE statements are compile-time-only.

True; the compile-time statements of KEEP, LENGTH, and WHERE are not processed at execution time. The rules of these statements are processed in the compilation phase so that their impact will be observed in the output table.

True/False: By default, all computed columns are reset to missing at the beginning of each subsequent iteration of the DATA step.

True; this is called reinitializing the PDV

What statement in a PROC FORMAT step specifies the new criteria?

VALUE format-name value-or-range-1 = 'formatted-value'

What is the default search order that is used to locate formats?

WORK.FORMATS > LIBRARY.FORMATS

Which is the correct syntax of a RENAME statement: 'set sashelp.shoes(rename=(Product=Type));' or 'set sashelp.shoes(rename(Product=Type));'?

set sashelp.shoes(rename=(Product=Type));' is correct because a (=) symbol is required after 'rename'


Related study sets

Environmental Health Quiz 6 (Ch. 6)

View Set

biology 2 - unit 1: genetics lessons 1-5

View Set

Reproductive morphology of flowering plants: A flower

View Set