Data Science Interview Questions [2025]

All Courses

Introduction

The article contains statistics interview questions for freshers and experienced to get started with your preparation for the interview. The basic statistics questions for the interview include probability and statistics interview questions, data cleaning and visualization topics for data analyst profiles. The intermediate and advanced sections contain data science and machine learning statistics interview questions. This guide will help you to cover most of the statistics interview questions. Statistics is an important field for data analysts, machine learning and data science professionals. Therefore, this article also covers interview questions on statistics for data science and data analytics.

Statistics Interview Questions and Answers for 2025

Beginner

1. How would you explain statistics as a field of study?

The study of statistics focuses on gathering, organising, analysing, interpreting, and presenting data.

2. How are statisticians different from data analysts?

A data analyst works in a particular business vertical while a statistician is responsible to work with data irrespective of the industry vertical.

3. What role do you expect to perform as a statistician?

To perform exploratory data analysis to understand the data, relationship and its distribution as well as provide predictions based on these relationships.

4. Where can we include statistics in machine learning?

Statistics forms the base of machine learning. The predictive modelling in machine learning takes into account the concepts from inferential statistics.

5. Can we understand data science without knowing statistics?

No, it is highly advised to know the basics of statistics before jumping to data science. Statistics is the key to understanding data and therefore, one must be aware of descriptive statistics to learn data science.

11. Explain different forms of a bar chart.

A bar chart uses rectangular vertical and horizontal bars to statistically represent the given data. Each bar's length is proportionate to the value it corresponds to. The values among various categories are compared using bar charts. With the use of two axes, bar charts illustrate the relationship. It depicts the discrete values on one axis while the categories are represented on another. There are a number of different bar charts available for visualizing the data but the 4 major categories in which we can distinguish them is vertical, horizontal, stacked, and grouped bar chart.

Vertical bar chart

The most popular type of bar chart is the vertical bar chart. A vertical bar chart is one in which the given data is displayed on the graph using vertical bars. The measure of the data is represented by these vertical rectangular bars. On the x- and y-axes, vertical lines are drawn to represent the rectangular bars. The number of the variables listed on the x-axis is represented by these rectangle-shaped bars.

Horizontal bar chart

Charts that show the given data as horizontal bars are referred to as horizontal bar charts. The measures of the provided data are displayed in these horizontal, rectangular bars. In this style, the x-axis and y-axis are labelled with the data categories. The bar chart's horizontal representation is displayed in the y-axis category.

Stacked bar chart

Each sub-bar that makes up a normal bar chart represents a level of the second categorical variable, and they are all stacked on top of one another. A 100% stacked bar chart represents the given data as the percentage of data that contributes to a total volume in a distinct category, in contrast to a stacked bar chart that directly depicts the given data.

Grouped bar chart

A grouped bar chart makes it easier to compare data from multiple categories. For levels of a single categorical variable, bars are grouped by position, with colour often designating the secondary category level within each group.

12. What is a scatter plot and how is it useful?

Scatter plot is a very important graph when it comes to understanding the relationship between two numerical variables. For example, consider the following table which provides the percentage marks scored and total attendance of ten students of a class.

Student	Attendance	Percentage
Student 1	78	84
Student 2	91	96
Student 3	66	70
Student 4	42	85
Student 5	90	92
Student 6	59	62
Student 7	83	75
Student 8	72	75
Student 9	94	96
Student 10	88	67

The percentage of students in attendance is represented on the x-axis, while the percentage of marks scored is represented on the y-axis. The scatter plot could therefore help us comprehend the relationship between the two variables. We may argue that when students attend class more frequently, they tend to perform better academically. We can also spot instances that are the exception rather than the rule, like Student 4.

13. What is a frequency distribution table? What is the purpose of it?

Frequency distribution is a series when a number of observations with similar or closely related values are put in separate bunches or groups, each group being in order of magnitude in a series. The data are simply organised into classes in a table, and the number of cases that fall into each class is noted. It displays the frequency with which various values of a single phenomenon occur. In order to estimate frequencies of the unknown population distribution from the distribution of sample data, a frequency distribution is created.

Take a survey of 50 households in a society as an example. The number of children in each family was recorded, and the results are shown in the following frequency distribution table

No. of children	Frequency
0	12
1	24
2	13
3	0
4	1

As a result, frequency in the table refers to how frequently an observation occurs. The number of observations is always equal to the sum of the frequencies. We can evaluate the data's underlying distribution and base judgements on it with the aid of frequency distribution.

Intermediate

1. What is normal distribution? Why do we need it?

Normal distribution is a symmetrical bell-shaped curve representing frequencies of different classes from the data. Some of the characteristics of normal distribution include:

The mean, median and mode of the distribution coincide.
The curve of the distribution is bell-shaped and symmetrical about the line x = mean value. This means that exactly half of the values are to the left of the centre and the other half to the right.
The total area under the curve is 1.
It is a limiting form of binomial distribution where the number of trials in indefinitely large (infinity) and the probability of success and failure is not indefinitely small.

Normal distribution is one of the most significant probability distributions in the study of statistics. This is so because a number of natural events fit the normal distribution. For instance, the normal distribution is observed for heights and weights of an age group, test scores, blood pressure, rolling a die or tossing a coin, and income of individuals. The normal distribution provides a good approximation when the sample size is large.

2. Explain the impact of mean and standard deviation on the normal distribution.

The distribution moves to either side of the horizontal axis if we adjust the mean while maintaining the same standard deviation. The graph is shifted to the right by a higher mean value and to the left by a lower mean value.

The graph reshapes when the standard deviation changes while the mean remains constant. When the standard deviation is lower, more data are seen in the centre and have thinner tails. The graph will flatten out with more points at the ends or better tails and fewer points in the middle as a result of a larger standard deviation.

3. Explain outliers and their impact on your data.

Outlier is an observation which is well separated from the rest of the data. The interpretation of an outlier takes into account the purported underlying distribution. Outliers can be dealt with primarily in two ways: first, by adapting techniques that can handle the existence of outliers in the sample, and second, by attempting to remove the outliers. We know that outliers have a significant impact on our estimation. Instead of following the sample or population, these observations have an impact on the predictions. The removal of an outlier from our sample is frequently not the best option, therefore we either employ techniques to mitigate their negative effects or use estimators that are insensitive to outliers.

4. What is skewness? Explain the different types of skewness.

Skewness is a measure of asymmetry that indicates whether the data is concentrated on one side. It allows us to get a complete understanding of the distribution of data. Based on the type, skewness is classified into three different types.

Positive skewness or right skew

Outliers at the top end of the range of values cause positive skewness. Extremely high numbers will cause the graph to skew to the right, showing that there are outliers present. The higher numbers slightly raise the mean above the median in this instance, meaning that the mean is higher than the median.

No skewness or zero skew

This is a classic instance of skewness not being present. It denotes a uniformly distributed distribution around the mean. As a result, it appears that the three values, mean, median, and mode, all coincide.

Negative skewness or left skew

Outliers near the lower end of the values cause negative skewness. Extremely low numbers will cause the graph to skew to the left, indicating that there are outliers present. In this instance, the mean is significantly smaller than the median because the lower values cause the mean to fall from the central value

5. Define the different central moments.

In probability theory and statistics, a central moment is a moment of a probability distribution of a random variable about the random variable's mean.

The zeroth central moment is the total probability i.e., equal to one.
The first central moment is the expected value or mean and equal to zero.
The second central moment is the variance.
The third central moment is skewness.
The fourth central moment is kurtosis

7. How would you measure the relationship between variables and the strength of the relationship.

Covariance and correlation coefficient reveals the relationship and the strength of relationship between the two variables.

Covariance is a measure of how two random variables in a data set will change jointly. When two variables are positively correlated and moving in the same direction, this is referred to as positive covariance. A negative covariance denotes an inverse relationship between the variables or a movement in the opposite directions. For instance, a student's performance on a particular examination improves with increased attendance, which is a positive correlation, whereas a decrease in demand caused by a rise in the price of an item is a negative correlation. When the covariance value is zero, the variables are said to be independent of one another and have no influence on one another. If the covariance value is higher than 0, it means that the variables are positively correlated and move in the same direction. The variables are negatively correlated and move in the opposite direction when the correlations have a negative value.

Covariance Value	Effect on Variables
Cov (X, Y) > 0	Positive Correlation (X & Y variables move together)
Cov (X, Y) = 0	No Correlation (X & Y are independent)
Cov (X, Y) < 0	Negative Correlation (X & Y variables move in opposite direction)

Similar information is given by the correlation coefficient and the covariance. The fact that the correlation coefficient will always retain a value between negative one and one is its benefit over covariance. A perfect positive correlation exists between the variables under study when the correlation coefficient is 1. In other words, as one moves, the other follows suit proportionally in the opposite direction. A less than perfect positive correlation is present if the correlation coefficient is less than one but still larger than zero. The correlation between the two variables is stronger as the correlation coefficient approaches one. There is no observable relationship between the variables when the correlation coefficient is zero. That means it is difficult to predict the movement of the other variable if one variable moves. The variables are perfectly negatively or inversely connected if the correlation coefficient is zero, or negative one. One variable will drop proportionally in response to an increase in the other. The variables will oscillate in opposing directions. If the correlation coefficient is more than negative one, it means that the negative correlation is not perfect. The correlation increases as it gets closer to being negative one.

Covariance Value	Effect on Variables

9. How do you measure the performance of a classification model?

There are various performance measures or metrics that can help to evaluate the performance of a classification model. However, it depends on the kind of problem we are dealing it. At times, accuracy might not be a good idea for evaluation and we need to focus on certain aspects of the results rather than the accuracy as a whole. The most common metrics used for the purpose are –

Confusion matrix

A confusion matrix is one of the evaluation techniques for machine learning models in which you compare the results of all the predicted and actual values. Confusion matrix helps us to derive several different metrics for evaluation purpose such as accuracy, precision, recall, and F1 score which are widely used across different classification use cases.

ROC AUC curve

The probability curve, the Receiver Operator Characteristic (ROC) separates the signal from the noise by plotting the True Positive Rate (TPR) versus the False Positive Positive Rate (FPR) at different threshold values. A classifier's capacity to distinguish between classes is measured by the Area Under the Curve (AUC). The performance of the model at various thresholds between positive and negative classes is improved by a higher AUC. The classifier can correctly discriminate between all Positive and Negative class points when AUC is equal to 1. The classifier would be predicting all negatives as positives and vice versa when AUC is equal to 0.

Jaccard index

Jaccard Index or also known as Jaccard similarity coefficient. If y is the actual label and ŷ is the predicted value then we can define Jaccard index as the size of the intersection by the size of the union of two labelled sets.

Consider if you have a total of 50 observations, out of which your model predicts 41 of them correctly, then the Jaccard index is given as 41 / (50 + 50 - 41) = 0.69. The Jaccard index of 0.69 defines that the model predicts on the test set with an accuracy of 69%. So, a Jaccard index ranges from 0 to 1 where an index value of 1 implies maximum accuracy.

Log loss

Log loss or logarithmic loss measures the performance of a classifier where predicted output is a probability value between 0 and 1. We can calculate the log loss using the log loss equation which measures how far each prediction is from the actual label. It is obvious that most ideal classifiers have a lower value of log loss. So the classifier with lower log loss has better accuracy.

10. What is a confusion matrix and how do you interpret it? Explain type 1 and type 2 error.

Confusion matrix is one of the evaluation methods for machine learning models that compares the outcomes of all the expected and actual values.

The figure representing confusion matrix has four different cases:

There are five instances where the predicted value and the actual value are both true. This is referred to as a True Positive case, where True denotes that the values are identical (true and true) and Positive denotes that the situation is true. Example: A diabetes test is positive for a diabetic patient.
There are four instances where both the predicted value and the actual value are false. This is referred to as a True Negative situation, where True denotes identical numbers (false and false) and Negative denotes a negative outcome. Example: A diabetes test is negative for a non-diabetic patient.
In three instances, the projected value is true, but the actual value is false. False denotes that the values are different (false and true), while Positive means that the predicted value is positive. This is referred to as a False Positive event. Example: A diabetes test is positive for a non-diabetic patient.
There are two situations where the projected value is false, whereas the actual value is true. This situation is known as a False Negative Case, where False denotes that the values (true and false) are different, and Negative denotes that the predicted value is negative. Example: A diabetes test is negative for a diabetic patient.

In this matrix, the values in green are correctly identified by the model and the values in red are wrongly identified by the model. Confusion matrix can also be used for non-binary target variables.

The occurrence of Type 1 Error, also known as a False Positive event, occurs when the expected value is positive but it is actually negative. When the actual value is positive when the predicted value is negative, this is known as a False Negative event and results in Type 2 Error. For instance, if we consider rain to be a positive event, then your device's prediction that it would rain today but it didn't actually happen is a type 1 error, while your device's prediction that it wouldn't rain today but it actually did happen is a type 2 error.

Advanced

1. Explain hypothesis. What do you understand by null and alternate hypothesis?

The process of hypothesis testing enables us to either validate the null hypothesis, which serves as the beginning point for our investigation, or to reject it in favour of the alternative hypothesis. A parametric test is a type of hypothesis test that assumes a specific shape for each distribution connected to the underlying populations. In a non-parametric test, the parametric form of the underlying population's distribution is not required to be specified. The null hypothesis is the one that needs to be tested while conducting hypothesis testing. The alternate hypothesis is the opposite argument. If the test results show that the null hypothesis cannot be verified, the alternative hypothesis will be adopted. For example, if the null hypothesis states that “The mean height of men in India is more than 5 feet 6 inches” then the alternate hypothesis will state that, “The mean height of men in India is equal to or less than 5 feet 6 inches”.

2. Explain rejection region and significance level in hypothesis theory.

The interval that causes the null hypothesis to be rejected in a hypothesis test is known as the rejection region and is measured in the sampling distribution of the statistic under examination. The rejection zone complements with the acceptance region and is connected to a probability alpha, also known as the test's significance level or type I error. It is a user-fixed parameter of the hypothesis test that establishes the likelihood of rejecting the null hypothesis.

3. What is one-tailed test and two-tailed test? Explain with the help of an example.

A one-sided or one-tailed test on a population parameter is a sort of hypothesis test in which the values for which we can reject the null hypothesis, indicated, are exclusively located in one tail of the probability distribution. For instance, if "The mean height of men in India is higher than 5 feet 6 inches" is the null hypothesis, then the alternative hypothesis would be "the mean height of men in India is equal to or less than 5 feet 6 inches." This is a one-sided test because the alternate hypothesis, i.e., equal to or less than 5 feet 6 inches, only considers one end of the distribution.

A two-sided test for a population is a hypothesis test used when comparing an estimate of a parameter to a given value versus the alternative hypothesis that the parameter is not equal to the stated value. If the null hypothesis is, for instance, "The mean height of men in India is equal to 5 feet 6 inches," then the alternative hypothesis would be, "The mean height of men in India is either less than or greater than 5 feet 6 inches but not equal." The alternate hypothesis, greater than or less than 5 feet 6 inches, deals with both extremes of the distribution, making this a two-tailed test.

4. What is p-value? Explain its significance.

The probability determined using the null hypothesis is the basis of the p-value. Consider if we are trying to reject the null hypothesis at a certain significance level, alpha. If we are not able to reject the null hypothesis at this significance level, we can reduce the significance level which might allow us to accept the null hypothesis. The p-value is the smallest value of significance level alpha, for which we can reject the null hypothesis. If the p-value is smaller than the alpha, we reject the null hypothesis otherwise we fail to reject the null hypothesis.

5. How do you calculate the confidence interval for a population mean with known and unknown variance?

If the sample size is high or the population variance is known, many statistical tests can be conveniently carried out as approximate Z-tests. The Student's t-test would be more appropriate if the population variance is unknown (and must therefore be approximated from the sample itself) and the sample size is small (n < 30). The sample size affects the t-distribution. The distribution of t-distribution approaches the z-distribution as the sample size increases. The t-statistic table becomes nearly identical to the z-statistic after the 30th row, or after 30 degrees of freedom. Therefore, even though the population variance is unknown, we may still apply the z-distribution.

Want to Know More?

Full Name*

Email*

+91

Phone Number*

United States +1

India +91

Canada +1

Australia +61

Singapore +65

New Zealand +64

Germany +49

United Arab Emirates +971

Hong Kong +852

Ireland +353

Afghanistan +93

Aland Islands +358

Albania +355

Algeria +213

AmericanSamoa +1684

Andorra +376

Angola +244

Anguilla +1264

Antarctica +672

Antigua and Barbuda +1268

Argentina +54

Armenia +374

Aruba +297

Ascension Island +247

Austria +43

Azerbaijan +994

Bahamas +1242

Bahrain +973

Bangladesh +880

Barbados +1246

Belarus +375

Belgium +32

Belize +501

Benin +229

Bermuda +1441

Bhutan +975

Bolivia +591

Bosnia and Herzegovina +387

Botswana +267

Brazil +55

British Indian Ocean Territory +246

Brunei Darussalam +673

Bulgaria +359

Burkina Faso +226

Burundi +257

Cambodia +855

Cameroon +237

Cape Verde +238

Cayman Islands +1345

Central African Republic +236

Chad +235

Chile +56

China +86

Christmas Island +61

Cocos (Keeling) Islands +61

Colombia +57

Comoros +269

Congo +242

Cook Islands +682

Costa Rica +506

Cote d'Ivoire +225

Croatia +385

Cuba +53

Cyprus +357

Czech Republic +420

Democratic Republic of the Congo +243

Denmark +45

Djibouti +253

Dominica +1767

Dominican Republic +1849

Ecuador +593

Egypt +20

El Salvador +503

Equatorial Guinea +240

Eritrea +291

Estonia +372

Eswatini +268

Ethiopia +251

Falkland Islands (Malvinas) +500

Faroe Islands +298

Fiji +679

Finland +358

France +33

French Guiana +594

French Polynesia +689

Gabon +241

Gambia +220

Georgia +995

Ghana +233

Gibraltar +350

Greece +30

Greenland +299

Grenada +1473

Guadeloupe +590

Guam +1671

Guatemala +502

Guernsey +44

Guinea +224

Guinea-Bissau +245

Guyana +592

Haiti +509

Holy See (Vatican City State) +379

Honduras +504

Hungary +36

Iceland +354

Indonesia +62

Iran +98

Iraq +964

Isle of Man +44

Israel +972

Italy +39

Jamaica +1876

Japan +81

Jersey +44

Jordan +962

Kazakhstan +77

Kenya +254

Kiribati +686

Korea, Democratic People's Republic of Korea +850

Korea, Republic of South Korea +82

Kosovo +383

Kyrgyzstan +996

Laos +856

Latvia +371

Lebanon +961

Lesotho +266

Liberia +231

Libya +218

Liechtenstein +423

Lithuania +370

Luxembourg +352

Macau +853

Madagascar +261

Malawi +265

Malaysia +60

Maldives +960

Mali +223

Malta +356

Marshall Islands +692

Martinique +596

Mauritania +222

Mauritius +230

Mayotte +262

Mexico +52

Micronesia, Federated States of Micronesia +691

Moldova +373

Monaco +377

Mongolia +976

Montenegro +382

Montserrat +1664

Morocco +212

Mozambique +258

Myanmar +95

Namibia +264

Nauru +674

Nepal +977

Netherlands +31

New Caledonia +687

Nicaragua +505

Niger +227

Nigeria +234

Niue +683

Norfolk Island +672

North Macedonia +389

Northern Mariana Islands +1670

Norway +47

Oman +968

Pakistan +92

Palau +680

Palestine +970

Papua New Guinea +675

Paraguay +595

Peru +51

Philippines +63

Pitcairn +872

Poland +48

Portugal +351

Puerto Rico +1939

Qatar +974

Reunion +262

Romania +40

Russia +7

Rwanda +250

Saint Barthelemy +590

Saint Helena, Ascension and Tristan Da Cunha +290

Saint Kitts and Nevis +1869

Saint Lucia +1758

Saint Martin +590

Saint Pierre and Miquelon +508

Saint Vincent and the Grenadines +1784

Samoa +685

San Marino +378

Sao Tome and Principe +239

Saudi Arabia +966

Senegal +221

Serbia +381

Seychelles +248

Sierra Leone +232

Sint Maarten +1721

Slovakia +421

Slovenia +386

Solomon Islands +677

Somalia +252

South Africa +27

South Georgia and the South Sandwich Islands +500

South Sudan +211

Spain +34

Sri Lanka +94

Sudan +249

Suriname +597

Svalbard and Jan Mayen +47

Sweden +46

Switzerland +41

Syrian Arab Republic +963

Taiwan +886

Tajikistan +992

Tanzania, United Republic of Tanzania +255

Thailand +66

Timor-Leste +670

Togo +228

Tokelau +690

Tonga +676

Trinidad and Tobago +1868

Tunisia +216

Turkey +90

Turkmenistan +993

Turks and Caicos Islands +1649

Tuvalu +688

Uganda +256

Ukraine +380

United Kingdom +44

Uruguay +598

Uzbekistan +998

Vanuatu +678

Venezuela, Bolivarian Republic of Venezuela +58

Vietnam +84

Virgin Islands, British +1284

Virgin Islands, U.S. +1340

Wallis and Futuna +681

Yemen +967

Zambia +260

Zimbabwe +263

By Signing up, you agree to ourTerms & Conditionsand ourPrivacy and Policy

10% OFF

Coupon Code "GIFT10"

Coupon Expires 22/12

Copy

Description

Statistics Interview Preparation Tips and Tricks

In order to aid hiring managers in determining a candidate's competency and expertise, the interview may include several questions covering a wide range of topics and problems as discussed in this interview questions and answers series.
Your knowledge will all improve as a result of your preparation for the various questions with the help of this guide which also contains basic statistics interview questions for freshers looking for a role as data analysts.
It is highly recommended to start learning probability, followed by descriptive statistics, and then inferential statistics for a smooth learning journey.
Statistics is a field that would require you to try out some hands-on of the concepts that you learn and its implications in real life.
Your profile should be able to demonstrate your expertise with data by mentioning in your portfolio certain analytical reports or findings that you have created.
Always include the correct terminologies and equations wherever possible to support your knowledge and understanding of the subject. You can join the Data Science Bootcamp training to prepare for your data science statistics interviews.

How to Prepare for a Statistics Interview?

Preparing yourselves through mock interviews is always a great place to start with. Apart from this,

you can prepare a short note to mention all the key terminologies related to statistics.
For quick reference, you can always view this article to prepare for the statistics interview questions data science.
For further reading, you can go through the books, “Statistics without Tears” from Derek Rowntree and “The Concise Encyclopaedia of Statistics” by Yadolah Dodge to gain additional in-depth knowledge of the subject.
KnowledgeHut’s Data Science course details link contains 20+ courses on data science specially curated for your requirements. You can sign up for a data science course to further enhance your skills to crack the interviews of top companies in India.

What to Expect in a Statistics Interview?

In interviews for jobs in statistics, most interviewers want to know how you approached a problem. This demonstrates to them your competence in working with data and its implications, as well as your ability for problem-solving, analytical and logical thinking.
A statistician might also be required to fulfil the role of Data Analyst. For this purpose, we have mentioned statistics interview questions for data analysts as well.
A good understanding of core concepts of statistics is expected from an individual.
Apart from core concepts, you can also be asked to determine probabilities, test hypotheses, etc.
To help with your interviews for Data Science and Machine Learning profile, we have also included some statistics interview questions for data science as well as machine learning.

The job roles that you can look for after completing this course include:

Data Scientist
Data Analyst
Statistician
Business Analyst
Machine Learning Engineer

If you are able to crack these statistics questions for data science interview then you can expect to join the top data science companies including:

Findability Sciences
LatentView Analytics
Fractal Analytics
Tiger Analytics
Convergytics Solutions
eClerx Services
Wipro
Tata Insights and Quants

Summary

In today's age of computing and huge data handling, statistics is a very interesting field that has a significant impact. Many businesses are pouring billions of dollars into learning analytics and statistics to leverage their existing data. This opens the door for the establishment of numerous jobs in this industry. These Probability and Statistics interview questions will help you brush up on the fundamentals of Statistics as you get ready for employment involving data science and machine learning. With our online Bootcamps, you can learn how to manage enormous data sets and prepare ready to accept lucrative job offers. Our Data Science certification courses will help you become knowledgeable in both fundamental and advanced subjects. To start or advance a successful data career, develop skills in a variety of programming languages and technologies, such as Python, R, MongoDB, TensorFlow, Keras, Tableau, Hadoop, Spark, and more. You should also gain expertise in data manipulation, visualisation, predictive analytics, data science, machine learning, and AI. With more than 400,000+ professionals trained by 650+ expert trainers in 100+ countries, it is the right choice to achieve high growth in your career and journey as a data scientist or ML engineer.

Recommended Courses

Learners Enrolled For

Got more questions? We've got answers.

Book Your Free Counselling Session Today.

Statistics Interview Questions and Answers for 2025

Introduction

Beginner

Intermediate

Advanced

1. How would you explain statistics as a field of study?

2. How are statisticians different from data analysts?

3. What role do you expect to perform as a statistician?

4. Where can we include statistics in machine learning?

5. Can we understand data science without knowing statistics?

6. What is the difference between inferential and descriptive statistics?

7. How is analysis different from analytics?

8. Explain the difference between information and data?

9. What is the difference between a sample and a population?

10. Explain the different types of variables and how are they broadly classified.

11. Explain different forms of a bar chart.

12. What is a scatter plot and how is it useful?

13. What is a frequency distribution table? What is the purpose of it?

14. Explain the measures of central tendency.

15. When do you use mean and median?

16. What is a categorical variable's mode when there are several values present in the majority of instances?

17. Explain the steps to find the median of the data. Also, explain what do you mean by quantiles?

18. What do you mean by univariate, bivariate and multivariate analysis?

19. What do you mean by measure of dispersion? How to measure it for a single variable?

20. What do you mean by distribution? Mention different types of distribution.

21. What is point estimator and confidence interval? Which one is preferred out of these two?

22. What do you mean by margin of error?

23. What is regression? Give examples.

24. What are the types of regression?

25. What is an outlier?

26. What is the meaning of KPI? Why is it important?

27. What is sampling bias? How to avoid it?

28. What is the Pareto principle?

29. What is the complement of an event?

30. In a simultaneous throw of a pair of dice, find the probability of getting a total more than 7.

1. What is normal distribution? Why do we need it?

2. Explain the impact of mean and standard deviation on the normal distribution.

3. Explain outliers and their impact on your data.

4. What is skewness? Explain the different types of skewness.

5. Define the different central moments.

6. Mention the different visualization graphs that you will use to understand numerical variables.

7. How would you measure the relationship between variables and the strength of the relationship.

8. Explain the purpose of central limit theorem.

9. How do you measure the performance of a classification model?

10. What is a confusion matrix and how do you interpret it? Explain type 1 and type 2 error.

11. How to do measure the performance of a regression model?

12. What is R2 score and adjusted R2 score? Which one is preferred?

13. What is F1 score?

14. Explain SST, SSE and SSR.

15. What is MAPE, MSE, RMSE?

16. Mention some of the common distance measurements.

17. How do you check for outliers in a dataset?

18. Explain precision and recall.

19. Explain the OLS assumptions for a linear regression.

20. How do you convert a normal distribution to standard normal distribution?

21. What do you mean by exploratory data analysis?

22. Mention some of the methods to convert a skewed data to approximate normal distribution.

23. What do you mean by conditional probability?

24. A bag contains 4 white, 5 red, and 6 blue balls. Three balls are drawn at random from the bag. The probability that all of them are red is?

25. What is the difference between Spearman Rank correlation and Pearson correlation?

1. Explain hypothesis. What do you understand by null and alternate hypothesis?

2. Explain rejection region and significance level in hypothesis theory.

3. What is one-tailed test and two-tailed test? Explain with the help of an example.

4. What is p-value? Explain its significance.

5. How do you calculate the confidence interval for a population mean with known and unknown variance?

6. Why is harmonic mean used in F1 score?

7. What is the purpose of AIC and BIC?

8. What is VIF score?

9. How do you handle imbalance classes?

10. Explain the bias-variance trade-off.

11. Explain cross-validation.

12. What is the difference between parametric and non-parametric models?

13. What is the law of large numbers in statistics?

14. Explain autocorrelation.

15. How to perform test for stationarity in your data?

16. Explain Bayes’ theorem with an example.

17. Explain the test procedure involved to perform the hypothesis test

18. What is chi-square distribution?

19. What do you mean by cluster analysis?

20. What is the use of ANOVA?