Accreditation Bodies
Accreditation Bodies
Accreditation Bodies
Supercharge your career with our Multi-Cloud Engineer Bootcamp
KNOW MORER is widely used for data analysis tasks, such as data manipulation, statistical analysis, data visualization, and machine learning. Its ability to handle large datasets, perform complex statistical analyses, and create data visualizations. Prepare for your R interview with the top R interview questions curated by our experts. This will help convert your R interview into a top job offer as a Business Statistical Analyst, R programmer, or any other beginner, intermediate or expert role. The following list covers the conceptual questions for freshers and experts and helps you cover topics like the difference between dcast() and table(), tidy data in R, etc, giving you an edge in the data analytics market. Prepare well with these R programming interview questions and answers and ace your next interview. You can also convert this resource as your R programming interview questions PDF and use it for quick revision.
Filter By
Clear all
Given data related to specific parameter for a country for a particular year is mentioned in Table 1. Provide an approach or write a program function using R to reshape the data in the way it is expected in Table 2 (which is the desired layout). Explain briefly with your response.
Table 1 (Given input data)
Country | 2011 | 2012 | 2013 |
---|---|---|---|
Japan | 2300 | 3100 | 6800 |
China | 2700 | 3300 | 5400 |
India | 4800 | 6200 | 9500 |
Assume this data exists in your data frame in R as “my_df”
Table 2 (Expected desired layout as output)
Country | Year | n |
---|---|---|
Japan | 2011 | 2300 |
China | 2011 | 2700 |
India | 2011 | 4800 |
Japan | 2012 | 3100 |
China | 2012 | 3300 |
India | 2012 | 6200 |
Japan | 2013 | 6800 |
China | 2013 | 5400 |
India | 2013 | 9500 |
Here objective is to get the count (n) captured in a separate row for every year for every country.
We can use gather() function in tidyr package to accomplish this.
Below is the desired line of code.
# This will load the “tidyr” package library(tidyr) # This will reshape the data in desired format gather(my_df,"Year","n",2:4,convert = TRUE)
gather() function parameters –
This is one of the most frequently asked R programming interview questions for freshers and experienced professionals in recent times.
Given sample data below in table 1 has 4 columns including a date column in “Col4”. Provide an approach using separate() function in R to convert the data to be reflected in desired layout as provided in Table 2. Explain briefly.
Table1 (Input data layout)
Col1 | Col2 | Col3 | Col4 |
---|---|---|---|
AA | 110 | 1007 | 2002-08-11 |
BB | 45 | 1009 | 1999-08-12 |
CC | 65 | 1005 | 2002-04-13 |
DD | 40 | 1013 | 2001-08-14 |
EE | 50 | 1010 | 2002-01-15 |
FF | 45 | 1010 | 2002-07-16 |
Assume this data exists in your data frame in R as “my_df”
Table 2 (Expected desired layout as output)
Col1 | Col2 | Col3 | year | month | day |
---|---|---|---|---|---|
AA | 110 | 1007 | 2002 | 08 | 11 |
BB | 45 | 1009 | 1999 | 08 | 12 |
CC | 65 | 1005 | 2002 | 04 | 13 |
DD | 40 | 1013 | 2001 | 08 | 14 |
EE | 50 | 1010 | 2002 | 01 | 15 |
FF | 45 | 1010 | 2002 | 07 | 16 |
We can use the following approach using separate to distribute date field into three separate columns for year, month and day values.
# This will load the tidyr package library(tidyr) # This will reshape the data in desired format separate(my_df, Col4, c("year","month","day"),sep = "-")
separate() function will use the parameters appropriately to display data in desired format.
Given below is sample input dataset and the code snippet. When we execute the code mentioned in Figure2 using the dataset in Figure1, is the desired output same as input data? Explain with your response.
Figure1 (input dataset)
Col1 | Col2 | Col3 | Col4 |
---|---|---|---|
AA | 110 | 1007 | 2002-08-11 |
BB | 45 | 1009 | 1999-08-12 |
CC | 65 | 1005 | 2002-04-13 |
DD | 40 | 1013 | 2001-08-14 |
EE | 50 | 1010 | 2002-01-15 |
FF | 45 | 1010 | 2002-07-16 |
Assume this data exists in your data frame in R as “my_df”
Figure2 (code snippet)
my_df %>% separate(Col4,c("year","month","day")) %>% unite("Col4",month,day,year,sep = "/")
The output data will not be same as that of input.
Output will look like below.
Col1 | Col2 | Col3 | Col4 |
---|---|---|---|
AA | 110 | 1007 | 08/11/2002 |
BB | 45 | 1009 | 08/12/1999 |
CC | 65 | 1005 | 04/13/2002 |
DD | 40 | 1013 | 08/14/2002 |
EE | 50 | 1010 | 01/15/2002 |
FF | 45 | 1010 | 07/16/2002 |
The difference is in the format of Col4 which is the date value.
Separate() function splits into 3 different parts of this date column.
Unite() function unites these 3 different parts into one column which is Col4.
However the format is slightly different as mentioned in the code.
Here we are converting non-tidy format to tidy format and again back to non-tidy format.
This is one of the most frequently asked R programming interview questions and answers for freshers in recent times.
The differences are the following:
These are NOT same. Flights_mutate1 will perform appropriately. Where as
flights_mutate2 will throw an error. We can not use select because the derived variables “speed” does not exist. It has to be created first using mutate() function and then select() function can be used to extract specific variables from the data frame.
We can use the summarise() function from R in the dplyr package which will provide the mean and variance values as per below.
If can include the below parameter to get the number of observations information as well.
Expect to come across this, one of the most important R programming interview questions for experienced professionals in programming, in your next R interviews.
The n() provides the number of values in a vector, where as n_distinct() provides number of distinct values in a vector. For example, if we take the sample “flights” dataset in R, then we see the following characteristic:
We first remove the NA values from air_time and distance before using the summarise function.
The n() function performs a count of total number of flights or rows in the dataset. The
n_distinct() function captures the number of distinct carriers / airlines in the dataset which is 16.
Data set comes in many formats but R prefers just one format and that is tidy data. Tidyr package in R does this. For example if you look at below dataset of pollution:
Each variable is saved in its own column, each observation is saved in its own row and each “type” observation stored in a single table (here it is in “pollution” shown above). It automatically preserve observations.
Library(tidyr) can be used to load the required package in R if not installed already.
The differences are the following:
The mutate() function in dplyr package in R is used to derive new variables from existing variables (not from existing observations). For existing observations, one needs to use summarise() function instead. Below is an example:
If we take a sample data from “nycflights13” dataset, and try to view top few records, it looks like as below.
Now, if we use the mutate() function to derive a new variable and use select() function to fetch selected columns from above data frame.
flights <- as.data.frame(flights) flights_mutate <- flights %>% mutate(speed=distance/air_time*60) %>% select(carrier,arr_delay,speed)
This will give below desired result. (again, few records shown from the data frame). Here the new derived variable is “speed” which is computed and derived based on the formula [distance / air_time*60]
kable() function is used to explore entirety of a data frame. This is from the knitr() package in R. When we execute above two statements from R console, the kable() statement produces output which is much more legible. It is used in the R markdown where documentation can be clearer.
Below are snapshot of differences while executing from R console.
We need to groupby data from source to destination using a group_by() function and then summarize it find number of records in each grouped by set. That will provide us the desired result. Please refer below.
We have an untidy dataset as shown below. Provide your approach to make it tidy and a format that you would like to analyze using R?
Country | 2011 | 2012 | 2013 |
---|---|---|---|
FR | 7000 | 6900 | 7000 |
DE | 5800 | 6000 | 6200 |
US | 15000 | 14000 | 13000 |
P.S: This above data has case count year wise for every country but represented in above untidy format.
We would like to convert it into below format which can be tidy format and will be easily analyzed in R.
Country | Year | n |
---|---|---|
FR | 2011 | 7000 |
DE | 2011 | 5800 |
US | 2011 | 15000 |
FR | 2012 | 6900 |
DE | 2012 | 6000 |
US | 2012 | 14000 |
FR | 2013 | 7000 |
DE | 2013 | 6200 |
US | 2013 | 13000 |
A staple in senior R language interview questions with answers, be prepared to answer this one using your hands-on experience. This is also one of the top interview questions to ask an R programmer.
We need to use gather() function to reshape the dataset into tidy format in R so that desired / expected output can be achieved. Please see below.
The first parameter in gather()function takes the data frame name that needs to be reshaped, second parameter is the name of the new key column which is “year” here since we want to show number of cases by year, by country, third parameter is the name of new value column which is count here, fourth parameter is the names or numeric indexes of columns to collapse. There could be different ways to achieve, but important aspect to think about the approach and see how we can leverage powerful packages such as “tidyr” package in R to accomplish this.
Both code snippet will yield the same result output.
This is because we are arranging by country, year, sex and age in both cases.
The 4:6 and child:elderly portion will pick based on column indexes or column names. Post that reshaping by arrange() will provide in desired / expected organized fashion.
This, along with other basic R interview questions for freshers, is a regular feature in R programming interviews, be ready to tackle it with the approach mentioned.
We can use gather() function in tidyr package to accomplish this.
Below is the desired line of code.
# This will load the “tidyr” package library(tidyr) # This will reshape the data in desired format gather(my_df,"Year","n",2:4,convert = TRUE)
gather() function parameters –
This is one of the most frequently asked R programming interview questions for freshers and experienced professionals in recent times.
We can use the following approach using separate to distribute date field into three separate columns for year, month and day values.
# This will load the tidyr package library(tidyr) # This will reshape the data in desired format separate(my_df, Col4, c("year","month","day"),sep = "-")
separate() function will use the parameters appropriately to display data in desired format.
The output data will not be same as that of input.
Output will look like below.
Col1 | Col2 | Col3 | Col4 |
---|---|---|---|
AA | 110 | 1007 | 08/11/2002 |
BB | 45 | 1009 | 08/12/1999 |
CC | 65 | 1005 | 04/13/2002 |
DD | 40 | 1013 | 08/14/2002 |
EE | 50 | 1010 | 01/15/2002 |
FF | 45 | 1010 | 07/16/2002 |
The difference is in the format of Col4 which is the date value.
Separate() function splits into 3 different parts of this date column.
Unite() function unites these 3 different parts into one column which is Col4.
However the format is slightly different as mentioned in the code.
Here we are converting non-tidy format to tidy format and again back to non-tidy format.
This is one of the most frequently asked R programming interview questions and answers for freshers in recent times.
The differences are the following:
These are NOT same. Flights_mutate1 will perform appropriately. Where as
flights_mutate2 will throw an error. We can not use select because the derived variables “speed” does not exist. It has to be created first using mutate() function and then select() function can be used to extract specific variables from the data frame.
We can use the summarise() function from R in the dplyr package which will provide the mean and variance values as per below.
If can include the below parameter to get the number of observations information as well.
Expect to come across this, one of the most important R programming interview questions for experienced professionals in programming, in your next R interviews.
The n() provides the number of values in a vector, where as n_distinct() provides number of distinct values in a vector. For example, if we take the sample “flights” dataset in R, then we see the following characteristic:
We first remove the NA values from air_time and distance before using the summarise function.
The n() function performs a count of total number of flights or rows in the dataset. The
n_distinct() function captures the number of distinct carriers / airlines in the dataset which is 16.
Data set comes in many formats but R prefers just one format and that is tidy data. Tidyr package in R does this. For example if you look at below dataset of pollution:
Each variable is saved in its own column, each observation is saved in its own row and each “type” observation stored in a single table (here it is in “pollution” shown above). It automatically preserve observations.
Library(tidyr) can be used to load the required package in R if not installed already.
The differences are the following:
The mutate() function in dplyr package in R is used to derive new variables from existing variables (not from existing observations). For existing observations, one needs to use summarise() function instead. Below is an example:
If we take a sample data from “nycflights13” dataset, and try to view top few records, it looks like as below.
Now, if we use the mutate() function to derive a new variable and use select() function to fetch selected columns from above data frame.
flights <- as.data.frame(flights) flights_mutate <- flights %>% mutate(speed=distance/air_time*60) %>% select(carrier,arr_delay,speed)
This will give below desired result. (again, few records shown from the data frame). Here the new derived variable is “speed” which is computed and derived based on the formula [distance / air_time*60]
kable() function is used to explore entirety of a data frame. This is from the knitr() package in R. When we execute above two statements from R console, the kable() statement produces output which is much more legible. It is used in the R markdown where documentation can be clearer.
Below are snapshot of differences while executing from R console.
We need to groupby data from source to destination using a group_by() function and then summarize it find number of records in each grouped by set. That will provide us the desired result. Please refer below.
We would like to convert it into below format which can be tidy format and will be easily analyzed in R.
Country | Year | n |
---|---|---|
FR | 2011 | 7000 |
DE | 2011 | 5800 |
US | 2011 | 15000 |
FR | 2012 | 6900 |
DE | 2012 | 6000 |
US | 2012 | 14000 |
FR | 2013 | 7000 |
DE | 2013 | 6200 |
US | 2013 | 13000 |
A staple in senior R language interview questions with answers, be prepared to answer this one using your hands-on experience. This is also one of the top interview questions to ask an R programmer.
We need to use gather() function to reshape the dataset into tidy format in R so that desired / expected output can be achieved. Please see below.
The first parameter in gather()function takes the data frame name that needs to be reshaped, second parameter is the name of the new key column which is “year” here since we want to show number of cases by year, by country, third parameter is the name of new value column which is count here, fourth parameter is the names or numeric indexes of columns to collapse. There could be different ways to achieve, but important aspect to think about the approach and see how we can leverage powerful packages such as “tidyr” package in R to accomplish this.
Both code snippet will yield the same result output.
This is because we are arranging by country, year, sex and age in both cases.
The 4:6 and child:elderly portion will pick based on column indexes or column names. Post that reshaping by arrange() will provide in desired / expected organized fashion.
This, along with other basic R interview questions for freshers, is a regular feature in R programming interviews, be ready to tackle it with the approach mentioned.
R is a programming language at your disposal which can be used for multiple purposes like statistical analysis, predictive modeling, data manipulation, data visualization, etc. It holds a high percentage of market share in the analytics industry. R is an open source programming language which is cross-platform compatible, that is it can run on several operating systems with varied Software/Hardware. Candidates proficient in R programming language are generally paid more than Python and SAS programmers. Work on R language proficiency with a course on R Certification with ML.
The average salary of an R Programmer is $76,487 per year. Big companies including Facebook, Google, Twitter use R programming language.
If you are determined to ace your next interview as an R programmer, these R interview questions and answers will fast-track your career. To relieve you of the worry and burden of preparation for your upcoming interviews, we have compiled the above list of interview questions for R programming with answers prepared by industry experts. Being well versed with these commonly asked R language interview questions will be your very first step towards a promising career as an R programmer.
Candidates can opt for various options after learning R programming. A few are listed below:
Candidates who wish to build a career as an R programmer can learn more about R programming from the best programming courses for beginners.
Crack your R interview with ease and confidence!
Submitted questions and answers are subjecct to review and editing,and may or may not be selected for posting, at the sole discretion of Knowledgehut.
Get a 1:1 Mentorship call with our Career Advisor
By tapping submit, you agree to KnowledgeHut Privacy Policy and Terms & Conditions