IQCode's 2023 List of Top SAS Interview Questions and Answers
Understanding SAS: Statistical Analytics System
Many people often wonder about SAS and what it entails. In order to answer that question, it is important to first grasp why SAS was developed. The Statistical Analytics System (SAS) was created as a way to handle the massive amounts of data that are generated daily in a systematic and organized manner. SAS is primarily used for creating, analyzing, and making strategic decisions based on data.
In summary, data analysis tools like SAS were developed due to the abundance of data that requires analysis, and SAS happens to be highly effective at performing this analysis. Now let's dive into what SAS actually is before delving into SAS interview questions.
What is SAS (Statistical Analytics System)?
SAS (Statistical Analytics System) is one of the leading analytics software tools, developed by SAS Institute. It allows users to manage, retrieve, and modify different types of data generated from various sources and perform statistical analysis on the collected data. SAS provides users with numerous options for data processing such as statistical analysis, data management, business modeling, report writing, data extraction, data transformation, quality improvement, and application development.
SAS extracts raw data from different sources, cleanses it, and stores or loads the data in a database. SAS then categorizes the data into tables to identify and analyze data patterns. By using SAS, you can increase productivity and profits through advanced analytics, multivariate analysis, business intelligence, data management functions, or predictive analytics. SAS programmers perform a series of operations on SAS datasets, generating reliable statistical data reports for businesses to make decisions. Non-technical users can use a graphical interface with point-and-click functionality while SAS language offers more advanced options.
This article will cover various SAS concepts, ranging from fresh to experienced levels, including reading and data manipulation, reporting, SQL queries, SAS macros, SAS programming and more.
If you're preparing for a SAS interview, let's take a look at some interview questions for freshers.
SAS Interview Questions for Freshers
1. Why should we choose SAS over other data analytical tools?
What are the essential features of SAS?
SAS (Statistical Analysis System) has several essential features that make it a popular tool in the field of data analysis. Some of these features are:
- Flexibility: SAS is a flexible tool that allows users to handle data of different types and sizes. It can handle structured and unstructured data and can work with various file formats, including CSV, Excel, and SAS data sets.
- Power: SAS has robust analytics capabilities that enable users to perform complex analyses quickly and efficiently. It has a vast range of statistical functions that can be customized to meet specific needs.
- Scalability: SAS can handle large volumes of data and can be used in multi-user environments. It can use parallel processing to speed up computations and provide faster results.
- Integration: SAS can be integrated with other software tools such as R and Python. It can also be used with various databases, including Oracle, SQL Server, and MySQL.
- Data management: SAS has excellent data management capabilities that enable users to clean, transform, and manipulate data efficiently. It can handle missing data, perform data validation, and automate data processing tasks.
Overall, SAS is a powerful and versatile tool that can handle various aspects of data analysis, from data management to statistical modeling and machine learning.
SAS Framework Capabilities
SAS Framework offers a variety of capabilities that enable users to efficiently work with data, analytics, and reporting. Here are some of its capabilities:
- Data manipulation and transformation using SAS Data Step and SAS Procedures - Statistical analysis and modeling using SAS Procedures - Data visualization using SAS ODS Graphics - Reading and writing various file formats including Microsoft Excel, CSV, and JSON - Integration with external data sources using SAS/ACCESS - Large-scale data processing and parallel computing using SAS Grid - Machine learning and AI using SAS Viya - Business intelligence and reporting using SAS Visual Analytics and SAS Enterprise BI Server
What is the Use of Retain in SAS?
The RETAIN statement is used in SAS to retain the value of a variable from one iteration of the data step to the next. It is especially useful when we need to carry forward information from one observation to another.
Here's an example:
data example;
set input;
retain total 0;
total + amount;
run;
In this example, the RETAIN statement is used to retain the value of the variable TOTAL and initialize it to 0. Each time the DATA step processes an observation, the value of AMOUNT is added to the retained value of TOTAL.
Without the RETAIN statement, SAS would reset the value of TOTAL to 0 for each new observation and we wouldn't be able to sum up the AMOUNT variable across all observations.
Understanding PDV (Program Data Vector)
PDV or Program Data Vector is a term used in SAS programming language that represents the current data values being processed by a data step. It contains information about the variables used in the data step, their attributes and values. PDV is used to process data sequentially and creates a new observation after processing every data step. This is an important concept to understand when working with SAS programming language.
State the Difference between MISSOVER and TRUNCOVER in SAS
In SAS, both MISSOVER and TRUNCOVER are options that can be used to handle missing data when reading in a dataset.
MISSOVER option: This option tells SAS to treat missing values as character strings rather than numeric values. When a missing value is encountered in a numeric field, the field is set to missing and SAS moves on to the next variable. This option is useful when working with data that contains both numeric and character variables.
TRUNCOVER option: This option tells SAS to truncate any values that are longer than the input field length. When SAS encounters a longer value, it truncates the value and continues reading the following input fields. This option is useful when working with data that has inconsistent lengths for the same variable.
Overall, it is important to choose the appropriate option based on the type of data being handled to ensure accurate analysis.H3. Explanation of the SCAN Function in SAS and its Usage
The SCAN function in SAS is used to extract a specific word position from a string of words by identifying a specific delimiter. The function takes two arguments: the first is the string to be searched and the second is the word position to be extracted.
For example, consider the following SAS code:
data names;
input full_name $;
firstname = scan(full_name,1,' ');
lastname = scan(full_name,2,' ');
datalines;
John Smith
Sarah Jones
;
run;
In this code, we have a dataset named 'names' that consists of a variable 'full_name' which has the first and last name of each person separated by a space. The SCAN function is used to extract the first name and last name separately by specifying the word position and the delimiter (in this case, a space). The resulting dataset 'names' will have two variables 'firstname' and 'lastname' that contain the extracted first and last names, respectively.
The SCAN function can also be used with a DO loop to extract multiple words from a string. Additionally, it can be used in conjunction with other string functions to manipulate and transform text data as needed.
Example of an Address Stored in a Variable:
The following expression is stored in the variable "address":
9/4 Infantry Marg, Mhow City, MP 453441
SAS Terminologies - FIRST and LAST
In SAS, FIRST and LAST are special variables that are automatically created to identify the first and last observations within a data set, respectively. These variables are commonly used in SAS data step programming to perform conditional processing or aggregate functions on specific records within a data set.
For example, suppose we have a data set that contains information on sales transactions for a store, with each observation representing a single transaction. We can use the FIRST and LAST variables to identify the first and last transaction for each individual customer.
Code:
data sales;
set transactions;
by customer_id;
if first.customer_id then do;
<conditional processing for first transaction>
end;
if last.customer_id then do;
<conditional processing for last transaction>
end;
run;
In the above code, the `by customer_id` statement tells SAS to sort the data by the `customer_id` variable so that all transactions for a single customer are grouped together. Then, the `if first.customer_id` and `if last.customer_id` statements are used to perform conditional processing on the first and last transactions for each customer, respectively.
By leveraging the FIRST and LAST variables in this way, SAS programmers can easily perform a variety of data processing tasks that would otherwise be difficult to accomplish.
Meaning of STOP and OUTPUT Statements in SAS
In SAS, the STOP statement is used to terminate a DATA step. It immediately stops the execution of the program and prevents any further processing of the data. The STOP statement is often used in conjunction with conditional statements to exit a loop or terminate a program when a certain condition is met.
The OUTPUT statement in SAS is used to write observations to a data set. It is commonly used in the context of conditional statements where only certain observations are selected for output. The OUTPUT statement is often used in conjunction with the IF-THEN statement to output selected observations to a new data set.
Overall, both the STOP and OUTPUT statements are important tools for controlling the flow of data and terminating programs in SAS.
Understanding the Difference between "DROP=" Data Set Option in SET Statement and DATA Statement
When using the "DROP=" data set option in the SET statement, SAS drops the specified variables during data retrieval from the input data set. It is helpful when you do not need to use some of the variables present in the input data set.
On the other hand, when using the same "DROP=" statement in a DATA statement, SAS drops the specified variables during data creation in the output data set. Here, if a dataset already exists, the DROP statement in a DATA step drops only selected variables from your original data set, without creating a new data set.
Therefore, the main difference lies in when this option is used and whether it affects the input data or output data created.
Supported Data Types in SAS
SAS supports several different data types, including:
Character
: This data type is used for storing text, such as names or addresses. Character data is enclosed in single or double quotation marks.
Numeric
: This data type is used for storing numeric values, such as age or income.
Date/Time
: This data type is used for storing date and time values.
Boolean
: This data type is used for storing logical values, such as true/false.
Binary
: This data type is used for storing binary data, such as images or sound files.
Understanding the different data types in SAS is important for creating effective data sets and ensuring accurate analysis.
Explanation of the "+" Operator and sum() Function
In Python, the "+" operator is used to concatenate (join) two or more strings together. For example, "hello " + "world" would result in the string "hello world". It can also be used to add numeric values together.
The sum() function is a built-in Python function that returns the sum of all the numeric values in an iterable (such as a list or tuple). For example, sum([1, 2, 3]) would return 6.
Here is an example of using the "+" operator and sum() function:
# Using "+" operator to concatenate strings
str1 = "hello"
str2 = "world"
result = str1 + " " + str2
print(result) # Output: "hello world"
# Using "+" operator to add numeric values num1 = 5 num2 = 10 result = num1 + num2 print(result) # Output: 15
# Using sum() function to get the sum of a list of numbers numbers = [1, 2, 3, 4, 5] result = sum(numbers) print(result) # Output: 15
Explaining N and ERROR in SAS
In SAS, the variable "N" represents the sample size or the number of observations. It can be used with various SAS procedures to calculate statistics such as means or proportions.
The "ERROR" variable in SAS is used to identify whether a particular step or line of code resulted in an error. This variable takes the value of 0 if there are no errors and 1 if there is an error. It can be helpful in troubleshooting SAS code when an error is encountered.
Overall, understanding the use of these variables can aid in the analysis and debugging of SAS code.
Example:
data test;
set dataset;
if variable = " " then ERROR = 1; /* if variable has missing values, ERROR=1 */
run;
In the above example, we create a new variable "ERROR" that takes a value of 1 if the variable "variable" has any missing values.
Different Ways to Include or Exclude Specific Variables in a Dataset
There are several ways to include or exclude specific variables in a dataset. Here are a few common methods:
- Subsetting: This involves creating a new dataset that includes only the variables of interest. You can create a subset using indexing or by using the
subset()
function.
- Dropping: This involves removing the variables that you want to exclude from the dataset. You can do this using indexing or by using the
droplevels()
function.
- Selecting: If you have a large dataset with many variables and only want to include a specific set of variables, you can use the
select()
function from the
dplyr
package.
Here's an example of subsetting a dataset to include only certain variables:
# create a subset of the 'mtcars' dataset with only the 'mpg' and 'cyl' variables
subset_mtcars <- mtcars[, c("mpg", "cyl")]
And here's an example of dropping a variable from a dataset:
# remove the 'disp' variable from the 'mtcars' dataset
mtcars <- mtcars[, -which(names(mtcars) == "disp")]
Finally, here's an example of using
select()
to include only the 'mpg' and 'cyl' variables from the 'mtcars' dataset:
library(dplyr)
# create a new dataset with only the 'mpg' and 'cyl' variables
mtcars_select <- select(mtcars, mpg, cyl)
Common Mistakes in SAS Programming
When writing programs in SAS, some common mistakes that people make include:
- Not correctly specifying the data type of variables
- Forgetting to initialize variables before using them
- Using incorrect syntax when referencing functions or procedures
- Not properly ordering steps in a program
- Using inefficient code instead of utilizing built-in functions or procedures
- Forgetting to check for missing data when performing calculations
- Not properly handling errors and exceptions in the code
- Using hard-coded values instead of creating variables for constant values
- Not commenting the code, making it difficult for others to understand
To avoid these mistakes, it is important to carefully review and test your code before running it and to adhere to best practices for SAS programming.
SAS Interview Questions for Experienced
Question 17: Can you explain what SAS macros are and why we use them?
Answer: SAS macros are pieces of code that can be reused throughout SAS programs. They are used to simplify complex SAS programs, reduce redundancy, and improve program efficiency. Macros can also be used to automate repetitive tasks. SAS macros are defined using macro language, which involves creating a macro variable that can be referenced throughout the program as needed. Macros can take arguments, making them customizable for specific applications. Overall, macros are a powerful tool that allow SAS programmers to write more efficient and maintainable code.
/* Example of a simple SAS macro */
%macro example(name);
/* Define a macro variable */
%let message = Hello, &name.;
/* Print the message */
%put &message.;
%mend example;
/* Call the macro and pass an argument */
%example(John);
Different Ways to Create Macro Variables in SAS Programming
Code:
Macro variables in SAS programming can be created using various techniques, some of which are:
- Using the %let statement followed by the variable name and value. For example:
- Using the %global statement to create a global macro variable. For example:
- Using the %local statement to create a local macro variable. For example:
- Using the %put statement to display the value of a macro variable. For example:
- Using the %do loop to create multiple macro variables. For example:
- Using the data step and the call symputx routine to create macro variables from data values. For example:
%let var1 = 10;
%global var2;
%local var3;
%put var1 = &var1;
%do i=1 %to 5;
%let var&i = 10 + &i;
%end;
data _null_;
set sashelp.class;
call symputx('name', name);
run;
These are some examples of how you can create macro variables in SAS programming.
How to create micro variables in SAS programming using %let and macro parameters?
In SAS programming, %let is used to create a macro variable that can be referenced throughout the program. Macro parameters, on the other hand, are values that are passed to a macro when it is called. Together, they can be used to create micro variables.
To create a micro variable, you can use %let to define a macro variable with a unique name and assign it a value based on the macro parameters. For example:
%macro my_macro(var1, var2);
%let microvar_&var1._&var2. = &var1. * &var2.;
%mend;
%my_macro(2, 3);
In this example, the macro %my_macro takes two parameters (var1 and var2). The %let statement creates a macro variable with a name that is based on the values of var1 and var2 (microvar_2_3), and assigns it a value based on the product of var1 and var2.
Once the macro is called, the macro variable (microvar_2_3) can be referenced throughout the program like any other variable.
SAS System Options for Debugging SAS Macros
To debug SAS macros, you can use the following SAS system options:
MCOMPILENOTEALL
: provides detailed notes for each macro compilation.
MLOGIC
: prints the SAS program generated by macro before executing.
MPRINT
: prints the source statements generated by macro before executing.
SYMBOLGEN
: prints the symbol table for a macro and the value of each macro variable.
Difference between PROC MEANS and PROC SUMMARY
The main difference between PROC MEANS and PROC SUMMARY in SAS is that PROC SUMMARY can be used to calculate percentiles whereas PROC MEANS cannot. PROC SUMMARY also has the option to calculate the median, which PROC MEANS lacks. Additionally, the output of PROC SUMMARY can be customized more easily than that of PROC MEANS. However, PROC MEANS has the advantage of being faster for large datasets, especially when calculating simple statistics like mean and standard deviation.
Functions and Procedures in SAS
In SAS, a function is a pre-written program that performs a specific task. It takes input values and produces output values. Examples of SAS functions include the SUM function, which adds the values of a numeric variable, and the SUBSTR function, which extracts a substring from a character variable.
On the other hand, a SAS procedure is a pre-written program that performs a series of tasks. It can be used to summarize data, create plots and charts, perform statistical analyses, and more. Examples of SAS procedures include the MEANS procedure, which calculates summary statistics, and the REG procedure, which performs linear regression analysis.
Both functions and procedures can be very useful in SAS programming, as they can help automate tasks and save time and effort.I'm sorry, but there's no code given to identify the error. Could you please provide the code so I can help you identify the error?
Explanation of SYMGET and SYMPUT
In SAS programming, SYMGET and SYMPUT are functions used to either retrieve the value of a macro variable or assign a value to a macro variable, respectively.
The SYMGET function is used to retrieve the value of a macro variable and store it into a data step variable. It takes the name of the macro variable as its argument, and returns the value of the macro variable. For example, SYMGET('myvar') would retrieve the value of the macro variable named "myvar" and set that value to a data step variable.
The SYMPUT function, on the other hand, is used to assign a value to a macro variable. It takes two arguments: the name of the macro variable and the value to be assigned to it. For example, SYMPUT('myvar', 5) would assign the value of 5 to the macro variable named "myvar".
Both functions are useful for manipulating macro variables in SAS programming.
Importance of the TRANWRD function in SAS
The TRANWRD function in SAS is used to replace a specific word with another word within a text string. It helps to modify and clean the text data in the desired format and also reduces errors in SAS programming. This function is very useful while dealing with large data sets, especially in the data cleaning process. It can be used to standardize categorical data and also used in data manipulation processes like the creation of new variables. Hence, the TRANWRD function plays a crucial role in data preprocessing and analysis using SAS.
Specifying Iterations and Conditions in a DO Loop
To specify the number of iterations and specific conditions within a single DO loop in programming, you can use the DO statement, followed by the desired number of iterations and the relevant conditions. For example:
DO i = 1, 10, 2 ! loop from i=1 to i=10, incrementing by 2 each time
IF (i > 5) THEN ! only execute the code below if i > 5
print *, "i =", i
END IF
END DO
This code will execute the loop five times, with `i` taking on the values 1, 3, 5, 7, and 9. The `IF` statement ensures that only values of `i` greater than 5 are printed to the screen.H3 tag: Usage of Trailing @@ in SQL
When used in SQL, the trailing @@ symbol is used to reference a global temporary table. This table is specific to the current session and can be accessed across different sessions while the connection is open.
It is important to note that the data in global temporary tables is only visible to the current session. Additionally, after the session is closed, the table is automatically dropped and all data is deleted, making it a temporary storage solution.
Here is an example of how trailing @@ can be used:
CREATE GLOBAL TEMPORARY TABLE myGlobalTempTable (
id INT,
name VARCHAR(50)
) ON COMMIT PRESERVE ROWS;
INSERT INTO myGlobalTempTable VALUES (1, 'John'), (2, 'Jane');
SELECT * FROM myGlobalTempTable @@;
In this example, we create a global temporary table called myGlobalTempTable with two columns - id and name. We then insert two rows of data and use trailing @@ to reference the table.
Note that we specify ON COMMIT PRESERVE ROWS to ensure that the temporary table data is kept after a transaction commit.
Overall, the usage of trailing @@ in SQL provides a convenient way to reference temporary tables that can be accessed across multiple sessions.
Various Methods to Remove Duplicate Values in SAS
Duplicate values in a SAS dataset can cause errors or inaccuracies in data analysis. It's important to remove these duplicates before proceeding with data analysis. Here are several ways to remove duplicate values in SAS:
Method 1:
PROC SORT with NODUPKEY option - This method sorts the dataset by the variable(s) specified in the SORT statement and removes duplicates based on those values.
Method 2:
PROC SQL with DISTINCT option - This method uses SQL code to select distinct observations from a dataset, based on the variable(s) specified in the SELECT statement.
Method 3:
DATA step with BY statement and FIRST./LAST. processing - This method uses data step programming to identify and remove duplicates in a sorted dataset. The BY statement is used to designate the variable(s) to sort the dataset by, and the FIRST./LAST. processing is used to identify and remove duplicates.
Method 4:
HASH object - This method creates a hash object that stores unique values from a dataset, based on the variable(s) specified in the hash object. The hash object is then used to output the unique values to a new dataset.
Method 5:
PROC FREQ with TABLES statement - This method uses PROC FREQ to count the frequency of each observation in a dataset, based on the variable(s) specified in the TABLES statement. The resulting output can be used to identify and remove duplicates.
Each of these methods has its own advantages and disadvantages, and the best method to use will depend on the specific dataset and the goals of the analysis.
Explanation of NODUP and NODUPKEY Options
In SAS programming language, NODUP and NODUPKEY are options that are used to identify and remove duplicate observations from a dataset. The main difference between these two options is as follows:
- NODUP option is used when we want to remove all duplicate observations from the dataset, meaning all observations that have the exact same values for all variables. - NODUPKEY option is used when we want to remove duplicate observations based on the values of certain key variables that we specify. In this case, only the first observation with a particular key value is retained, and all subsequent observations with the same key value are removed.
Both NODUP and NODUPKEY options are followed by a BY statement that specifies the variables to be used for identifying duplicates. For example, if we want to remove all duplicate observations from a dataset named ‘mydata’ based on the values of variables ‘id’ and ‘date’, we can use the following code:
data mydata_nodup; set mydata; nodup id date; run;
This will create a new dataset named ‘mydata_nodup’ with all duplicate observations deleted based on the values of ‘id’ and ‘date’.
Similarly, if we want to remove duplicate observations based on the value of only one key variable ‘id’, we can use the NODUPKEY option as follows:
data mydata_nodupkey; set mydata; nodupkey id; run;
This will create a new dataset named ‘mydata_nodupkey’ with only the first occurrence of each value of ‘id’ retained, and all subsequent occurrences deleted.
SAS Sorting Command
To sort data in SAS, the SORT procedure is used. The syntax for the command is as follows:
PROC SORT DATA = dataset_name; BY variable_name(s); RUN;
Here, "dataset_name" refers to the name of the dataset to be sorted, and "variable_name(s)" refers to the variable(s) to sort by. Multiple variables can be listed separated by a space. The sorted dataset is output as a new dataset with the same name.
Explanation of INPUT and INFILE statements
In SAS programming, the INPUT statement is used to read data from a raw data file and assign values to variables. The syntax for the INPUT statement is:
INPUT variable-name-1 variable-name-2 ...;
The INFILE statement is used to specify the raw data file to be read by the INPUT statement. The syntax for the INFILE statement is:
INFILE 'filename' ;
The filename can be an absolute or relative path. Additionally, you can specify various options such as delimiter, column widths, and record length using various options available in the INFILE statement.
Both of these statements are used together to read and process data from a raw data file in SAS programming.
Meaning of %INCLUDE and %EVAL in SAS
In SAS, the %INCLUDE statement is used to include the contents of an external SAS file into the current program. This is helpful when you want to reuse code that is stored in a different file.
On the other hand, the %EVAL function is a macro function that is used to perform arithmetic or logical operations. It evaluates the expression and returns the result. The %EVAL function can be used within a macro or outside a macro.
It is important to note that both %INCLUDE and %EVAL are macro statements/functions in SAS and can only be used within SAS macros.
Technical Interview Guides
Here are guides for technical interviews, categorized from introductory to advanced levels.
View AllBest MCQ
As part of their written examination, numerous tech companies necessitate candidates to complete multiple-choice questions (MCQs) assessing their technical aptitude.
View MCQ's