Data Science Interview Questions

Want to become an expert in cracking Data Science interview questions/Data Analysis interview questions?

Start with practicing the questions below. Whether a question involves multiple choice or live coding, we will give you hints as you go and tell you if your answers are correct or incorrect.

After that, take our timed public Data Science Interview Questions Test.

To use our service for testing candidates, buy a pack of candidates.


1. Petri Dish

Data Science Correlation Public New

Two bacteria cultures, A and B, were set up in two different dishes, each covering 50% of its dish. Over 20 days, bacteria A's percentage of coverage increased to 70% and bacteria B's percentage of coverage reduced to 40%:

Petri Dish

Easy 
5min

Which of the two bacterium's growth correlates more linearly with the number of days passed?


Approximately, what is the Pearson correlation coefficient of bacteria B's coverage?


If, after 20 days, bacteria A's coverage starts to correlate less with its linear trend line, what can we say about the value of its Pearson correlation coefficient?

   


2. AB Test

Data Science Bayes' theorem Probability Public New

Your company is running a test that is designed to compare two different versions of the company’s website.

Version A of the website is shown to 60% of users, while version B of the website is shown to the remaining 40%. The test shows that 8% of users who are presented with version A sign up for the company’s services, as compared to 4% of users who are presented with version B.

If a user signs up for the company’s services, what is the probability that she/he was presented with version A of the website?

Easy 
7min
 %
   


3. Login Table

Data Science Python data libraries Public New

A company stores login data and password hashes in two different containers:

  • DataFrame with columns: Id, Login, Verified.
  • Two-dimensional NumPy array where each element is an array that contains: Id and Password.

Elements on the same row/index have the same Id.

Implement the function login_table that accepts these two containers and modifies id_name_verified DataFrame in-place, so that:

  • The Verified column should be removed.
  • The password from NumPy array should be added as the last column with the name "Password" to DataFrame.

For example, the following code snippet:

id_name_verified = pd.DataFrame([[1, "JohnDoe", True], [2, "AnnFranklin", False]], columns=["Id", "Login", "Verified"])
id_password = np.array([[1, 987340123], [2, 187031122]], np.int32)
login_table(id_name_verified, id_password)
print(id_name_verified)

Should print:

   Id        Login   Password
0   1      JohnDoe  987340123
1   2  AnnFranklin  187031122
Easy 
15min
Python 3.6.5, Pandas 0.22.0, Numpy 1.13.3, Scipy 1.0.1, Scikit-learn 0.19.1  
 


  •   Example case: Wrong answer
  •   Column Verified is removed: Wrong answer
  •   Column Password is appended: Wrong answer
  •   Various DataFrames: Wrong answer


4. Marketing Costs

Data Science Linear regression Python data libraries Public New

Implement the desired_marketing_expenditure function, which returns the required amount of money that needs to be invested in a new marketing campaign to sell the desired number of units.

Use the data from previous marketing campaigns to evaluate how the number of units sold grows linearly as the amount of money invested increases.

For example, for the desired number of 60,000 units sold and previous campaign data from the table below, the function should return the float 250,000.

Previous campaigns

Campaign Marketing expenditure Units sold
#1 300,000 60,000
#2 200,000 50,000
#3 400,000 90,000
#4 300,000 80,000
#5 100,000 30,000
Hard  
30min
Python 3.6.5, Pandas 0.22.0, Numpy 1.13.3, Scipy 1.0.1, Scikit-learn 0.19.1  
 


  •   Example case: Wrong answer
  •   Linear dependency without error: Wrong answer
  •   Linear dependency with error: Wrong answer


5. Stock Prices

Data Science Correlation Data aggregation Python data libraries Public

You are given a list of tickers and their daily closing prices for a given period.

Implement the most_corr function that, when given each ticker's daily closing prices, returns the pair of tickers that are the most highly (linearly) correlated by daily percentage change.

Hard  
30min
Python 3.6.5, Pandas 0.22.0, Numpy 1.13.3, Scipy 1.0.1, Scikit-learn 0.19.1  
 


  •   Example case: Wrong answer
  •   Small data set: Wrong answer
  •   Large data set: Wrong answer


If you feel ready, take one of our timed public Data Science Interview Questions tests:
  • Data Science Test (Easy / Hard)
  • Data Science and SQL Online Test (Easy / Hard)
Not exactly what you are looking for? Go to our For Jobseekers section.