# Python Data Science Online Test

The Python Data Science online test assesses knowledge of using Python and data science libraries such as Pandas, NumPy, Scipy, and Scikit-learn to analyze data through a series of live coding questions. This test requires applying probability and statistics to solve data science problems.

The assessment includes work-sample tasks such as:

- Classification of data using different algorithms.
- Aggregating, grouping, sorting, and cleaning data.
- Building machine learning models.

A good data scientist or data analyst using Python for their tasks should be able to take advantage of the functionality provided by Python data science libraries to extract and analyze knowledge and insights from data.

## Sample public questions

You are given a list of tickers and their daily closing prices for a given period.

Implement the *most_corr* function that, when given each ticker's daily closing prices, returns the pair of tickers that are the most highly (linearly) correlated by **daily percentage change**.

As a part of an application for iris enthusiasts, implement the *train_and_predict *function which should be able to classify three types of irises based on four features.

The *train_and_predict *function accepts three parameters:

*train_input_features*- a two-dimensional NumPy array where each element is an array that contains: sepal length, sepal width, petal length, and petal width.*train_outputs*- a one-dimensional NumPy array where each element is a number representing the species of iris which is described in the same row of*train_input_features*. 0 represents Iris setosa, 1 represents Iris versicolor, and 2 represents Iris virginica.*prediction_features*- two-dimensional NumPy array where each element is an array that contains: sepal length, sepal width, petal length, and petal width.

The function should train a classifier using *train_input_features *as input data and *train_outputs* as the expected result. After that, the function should use the trained classifier to predict labels for *prediction_features *and return them as an iterable (like list or numpy.ndarray). The nth position in the result should be the classification of the nth row of the *prediction_features *parameter.

Class Grades, Median Height, Cubic Approximation, Clean CSV, Birthday Cards, Free Throws, Credit Score, Distribution Fitting.

### Skills and topics tested

- Python for Data Science
- Grouping
- NumPy
- Pandas
- Data Cleaning
- Machine Learning
- Nonlinear Regression
- Scikit-Learn
- Processing CSV
- Sorting
- Data Aggregation
- Classification
- K-Nearest Neighbors
- Cauchy Distribution
- Exponential Distribution
- Normal Distribution
- SciPy

