##### Questions

Movies,

Index Performance,

Poll,

Rectangles,

Retirees,

Authors,

Countries,

Employee Manager,

Cheapest Product,

Movies Live,

ATM Locations,

Developers,

Delete Orders,

Restaurant Menu,

Roads,

Sales,

Student Activities,

Transactions,

Projects,

Youngest Child,

Tasks,

Ingredients,

Autocomplete,

Bank Branches,

Hospital Patients,

Menu Items,

Ban Users,

SMS Messages,

Student Max Score,

Age and Earnings,

Subscribers,

Credit Wizard,

Clean CSV,

Median Height,

Free Throws,

Birthday Cards,

CTR,

Class Grades,

Bacterial Growth,

Distribution Fitting,

Department Report,

Merge Stock Index,

Credit Score,

Cubic Approximation##### Skills

SQL
SQL

SQL is the dominant technology for accessing application data. It is increasingly becoming a performance bottleneck when it comes to scalability. Given its dominance, SQL is a crucial skill for all engineers.

Aggregation
Aggregation

An aggregate function is typically used in database queries to group together multiple rows to form a single value of meaningful data. A good programmer should be skilled at using data aggregation functions when interacting with databases.

Subqueries
Subqueries

Subqueries are commonly used in database interactions, making it important for a programmer to be skilled at writing them.

Indexes
Indexes

The proper implementation and use of indexes are important for improving the performance of database queries.

Conditions
Conditions

Conditional statements are a feature of most programming and query languages. They allow the programmer to control what computations are carried out based on a Boolean condition.

Insertion
Insertion

Even though most database insert queries are simple, a good programmer should know how to handle more complicated situations like batch inserts.

Language
Language

A programmer should use a language as a tool, always taking advantage of language-specific data types and built-in functions.

Joins
Joins

A normalized database is normally made up of multiple tables. Joins are, therefore, required to query across multiple tables.

Constraints
Constraints

Constraints are used to define rules and relationships. They are applied to a dataset. A constraint may take many forms, such as x ≤ 5 in a programming language and a NOT NULL constraint in a SQL table definition.

Database schema
Database schema

A database schema defines how data is stored in a database. An SQL database uses a schema to define tables consisting of rows and columns that use fixed data types to store data. Formalizing how data is stored is the first step towards building an application or service.

Bug fixing
Bug fixing

Everyone makes mistakes. A good programmer should be able to find and fix a bug in their or someone else's code.

Delete
Delete

The delete statement is used to delete records in a table and is one of the four basic CRUD functions (create, read, update, and delete) required for working with any persistent storage.

Update
Update

The UPDATE statement is used to modify the existing records in a table and is one of the most used operations for working with the database.

Views
Views

A database view is a result set that is defined by a stored query, the results of which can can also be queried. As a fundamental and widely used database construct, it's useful for candidates to understand how and when they should be used.

Performance tuning
Performance tuning

The performance of an application or system is important. The responsiveness and scalability of an application are all related to how performant an application is. Each algorithm and query can have a large positive or negative effect on the whole system.

Data Science
Data Science

When we need to discover the information hidden in vast amounts of data, or make smarter decisions to deliver even better products, data scientists hold the key to the answers you need.

Linear regression
Linear regression

Linear regression is one of the most frequently used methods for data analysis due to its simplicity and applicability to a wide variety of problems.

Poisson distribution
Poisson distribution

Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring within a fixed interval of time and/or space, if these events occur with a known average rate and independently of the time since the last event. As one of the most widely used distributions, it is important for all Data Scientists to be familiar with it.

Probability
Probability

Probability theory is the foundation of most statistical and machine-learning algorithms.

Decision tree
Decision tree

A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences. It is usually a tool for displaying an algorithm that contains only conditional control statements and is a must-know for every data scientist.

Python data libraries
Python data libraries

Numpy, Pandas, SciPy, and Scikit-learn are essential libraries for data manipulation and analysis that every data scientist should know how to use.

Data aggregation
Data aggregation

Data aggregation is the process of gathering and summarizing information in a specified form. It is a common component of most statistical analysis processes.

Sorting
Sorting

Every programmer should be familiar with data-sorting methods, as sorting is very common in data-analysis processes.

Binomial distribution
Binomial distribution

Binomial distribution is the discrete probability distribution of the number of successes in a sequence of independent yes/no experiments, each of which yields success with a given probability.

p-value
p-value

An important concept, p-value is defined as the probability of obtaining a result equal to or "more extreme" than what was actually observed, when the null hypothesis is true.

Curve Fitting
Curve Fitting

Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points. This is basic knowledge of every data scientist.

Cauchy distribution
Cauchy distribution

Cauchy distribution is the distribution of the ratio of two independent normally distributed Gaussian random variables. As one of the most widely used distributions, it is important for all Data Scientists to be familiar with it.

Exponential distribution
Exponential distribution

Exponential distribution is the probability distribution that describes the time between events in a process in which events occur continuously and independently at a constant average rate. As one of the most widely used distributions, it is important for all Data Scientists to be familiar with it.

Normal distribution
Normal distribution

Normal distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known. As one of the most widely used distributions, it is important for all Data Scientists to be familiar with it.

Union
Union

The UNION operator is used to combine the result-set of two or more SELECT statements. It is often used when a report needs to be made based on multiple tables.

k-nearest neighbors
k-nearest neighbors

An important Data Science algorithm, the k-nearest neighbors algorithm is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression.

Machine learning
Machine learning

Machine learning is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. It’s important for all tasks where it’s infeasible to construct conventional algorithms, which is often the case in Data Science.

Nonlinear regression
Nonlinear regression

Nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables. Since many problems are not linear, nonlinear regression is important for machine learning practitioners.