Covered Skills
Testing of these skills is covered in this pre-built test because they’re closely related. On our paid plan, you can easily create your own custom multi-skill tests.
Python
Python
Python is a widely used, high-level, general-purpose, interpreted, dynamic programming language. Having a basic familiarity with the programming language used on the job is a prerequisite for quickly getting up to speed.
Bug fixing
Bug fixing
Everyone makes mistakes. A good programmer should be able to find and fix a bug in their or someone else's code.
Language
Language
A programmer should use a language as a tool, always taking advantage of language-specific data types and built-in functions.
List comprehension
List comprehension
A list comprehension is a syntactic construct for creating a list based on existing lists. As this is a common task, every programmer should be familiar with it.
Strings
Strings
The string data structure is used to represent text. It is one of the most commonly used data structures. Therefore, every programmer should be skilled at string manipulation.
Arithmetic
Arithmetic
Arithmetic is a fundamental branch of mathematics. An understanding of arithmetic concepts, and their application, is important for every candidate.
Exceptions
Exceptions
Exceptions exist in most modern programming languages, making it important for a programmer to understand them and know how to handle them.
Monkey patching
Monkey patching
Monkey Patching is a method of either adding new or overriding existing functionality without the creation of a new type. As such it's an important tool for developers to be familiar with.
SQL
SQL
SQL is the dominant technology for accessing application data. It is increasingly becoming a performance bottleneck when it comes to scalability. Given its dominance, SQL is a crucial skill for all engineers.
Conditions
Conditions
Conditional statements are a feature of most programming and query languages. They allow the programmer to control what computations are carried out based on a Boolean condition.
Select
Select
The SELECT statement is used to select data from a database. It is the most used SQL command.
Dictionary
Dictionary
A dictionary (or associative array) is a data type composed of a collection of key-value pairs, where each possible key appears at most once in the collection. It is used when we need to access items by their keys.
Linked list
Linked list
A linked list is a linear collection of data elements where each element points to the next. It is a data structure consisting of a collection of nodes which together represent a sequence. It is usually used for advanced scenarios where we need fast access to the next element, or when we need to remove an element from anywhere in the collection.
Aggregation
Aggregation
An aggregate function is typically used in database queries to group together multiple rows to form a single value of meaningful data. A good programmer should be skilled at using data aggregation functions when interacting with databases.
Subqueries
Subqueries
Subqueries are commonly used in database interactions, making it important for a programmer to be skilled at writing them.
Ordering
Ordering
Knowing how to order data is a common task for every programmer.
Left join
Left join
LEFT JOIN is one of the ways to merge rows from two tables. We use it when we also want to show rows that exist in one table, but don't exist in the other table.
Union
Union
The UNION operator is used to combine the result-set of two or more SELECT statements. It is often used when a report needs to be made based on multiple tables.
Group by
Group by
The GROUP BY statement groups rows by some attribute into summary rows. It is a common command when making various reports.
Insert
Insert
Even though most database insert queries are simple, a good programmer should know how to handle more complicated situations like batch inserts.
Joins
Joins
A normalized database is normally made up of multiple tables. Joins are, therefore, required to query across multiple tables.
Serialization
Serialization
Familiarity with data serialization to and from formats such as XML and JSON is important as it is commonly used for interprocess communication
XML
XML
Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The design goals of XML emphasize simplicity, generality, and usability across the Internet. This is one of the most used formats for exchanging data over the web.
Regex
Regex
A regular expression (regex) is a special text string for describing a search pattern. It is a common way for extracting data from text.
JSON
JSON
JSON is an open-standard format that uses human-readable text to transmit data objects consisting of attribute-value pairs. It's the most common data format used for asynchronous browser/server communication.
Sorting
Sorting
Every programmer should be familiar with data-sorting methods, as sorting is very common in data-analysis processes.
Inheritance
Inheritance
In object-oriented programming, inheritance is the mechanism of basing a class upon another class, retaining similar implementation. Inheritance allows programmers to reuse code and is a must know topic for every programmer who works with OOP languages.
OOP
OOP
Object-oriented programming is a paradigm based on encapsulating logic and data into objects, which may then contain fields and procedures. Many of the most widely used programming languages are based on OOP, making it a very important concept in modern programming.
SQL CASE
SQL CASE
The CASE statement is SQL's control statement. It goes through conditions and returns a value.
Algorithmic thinking
Algorithmic thinking
When designing and/or analyzing an algorithm or data structure, it is important to consider the performance and structure of an implementation. Algorithmic thinking is one of the key traits of a good programmer, especially one working on complex or performance-critical code.
Tuples
Tuples
A tuple is an immutable collection which is ordered and unchangeable. It is a common collection in many programming languages.
Stream
Stream
A stream is a sequence of data elements made available over time. It is particularly useful for tasks that may benefit from being asynchronous, including tasks such as I/O processing or reading from a file, and as such is important for developers to understand.
Queue
Queue
A queue is a collection of items that are maintained in a sequence and can be modified by the addition of entities at one end of the sequence and removal from the other end of the sequence. It is the collection to be used when first-in-first-out (FIFO) collection is needed.
Named tuple
Named tuple
Named Tuple is a tuple where each value has a preassigned name. It allows accessing values not just by index, but also by name. Among other things, it can increase the readability and maintainability of the code.
Lists
Lists
Lists are collections that act as dynamic arrays. Lists offer the flexibility of dynamically sized arrays, the simplicity of access of arrays, and are more performant than more ubiquitous collections in most scenarios.
Iteration
Iteration
Iteration is the act of repeating a process, or cycling through a collection. Iteration is one of the fundamental flow control tools available to developers.
Python Data Science
Python Data Science
The Python programming language and its libraries contain a lot of functionality that's useful to data scientists. Powerful libraries like Numpy, Pandas, and Scipy are valuable tools for data scientists who use Python.
Grouping
Grouping
Grouping is the process of separating items into different groups. Developers and data scientists often need to group data so they can examine them separately.
NumPy
NumPy
NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. NumPy is an essential library for any data scientist who works with Python.
Pandas
Pandas
Pandas is a library for the Python programming language that’s used for data manipulation and analysis. It is an essential library for any data scientist who works with Python.
General Data Science
General Data Science
When we need to discover the information hidden in vast amounts of data, or make smarter decisions to deliver even better products, data scientists hold the key to the answers you need.
Linear regression
Linear regression
Linear regression is one of the most frequently used methods for data analysis due to its simplicity and applicability to a wide variety of problems.
Cauchy distribution
Cauchy distribution
Cauchy distribution is the distribution of the ratio of two independent normally distributed Gaussian random variables. As one of the most widely used distributions, it is important for all Data Scientists to be familiar with it.
Exponential distribution
Exponential distribution
Exponential distribution is the probability distribution that describes the time between events in a process in which events occur continuously and independently at a constant average rate. As one of the most widely used distributions, it is important for all Data Scientists to be familiar with it.
Normal distribution
Normal distribution
Normal distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known. As one of the most widely used distributions, it is important for all Data Scientists to be familiar with it.
SciPy
SciPy
SciPy is a Python library used for scientific and technical computing. Every data scientist who uses Python as a programming language should know how to use it for tasks such as optimization, linear algebra, integration, etc.
Data cleaning
Data cleaning
Data cleaning or data cleansing is the process of detecting and correcting (or removing) corrupt or inaccurate records. Data scientists should be familiar with it to avoid incorrect records that can affect analysis.
Data aggregation
Data aggregation
Data aggregation is the process of gathering and summarizing information in a specified form. It is a common component of most statistical analysis processes.
Curve Fitting
Curve Fitting
Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points. This is basic knowledge of every data scientist.
Performance tuning
Performance tuning
The performance of an application or system is important. The responsiveness and scalability of an application are all related to how performant an application is. Each algorithm and query can have a large positive or negative effect on the whole system.
Machine learning
Machine learning
Machine learning is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. It’s important for all tasks where it’s infeasible to construct conventional algorithms, which is often the case in Data Science.
Nonlinear regression
Nonlinear regression
Nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables. Since many problems are not linear, nonlinear regression is important for machine learning practitioners.
Scikit-learn
Scikit-learn
Scikit-learn (or sklearn) is a machine learning library for the Python programming language. Every data scientist who works with Python and tasks such as classification, regression, and clustering algorithms should know how to use it.
Probability
Probability
Probability theory is the foundation of most statistical and machine-learning algorithms.
Probability distributions
Probability distributions
A probability distribution is a function that describes the likelihood of obtaining the possible values that a random variable can assume. They describe what we can expect from random trials.
Classification
Classification
Classification is the problem of identifying to which set of categories a new observation belongs, on the basis of a training set of data containing observations whose category membership is known. As one of the common tasks in machine learning, it’s important for all data scientists.
Decision tree
Decision tree
A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences. It is usually a tool for displaying an algorithm that contains only conditional control statements and is a must-know for every data scientist.
Outliers
Outliers
An outlier is a data point that differs significantly from other observations. An outlier may be due to variability in the measurement or it may indicate experimental error; the latter are sometimes excluded from the data set. An outlier can cause serious problems in statistical analyses.
Set
Set
A set is a collection of distinct objects. It's one of the most used types of collection, alongside arrays, lists, and maps. There are many different types of set, each with multiple specific optimizations and use cases. It is, therefore, one of the most important collections for a developer to be familiar with.
Right join
Right join
RIGHT JOIN is one of the ways to merge rows from two tables. We use it when we also want to show rows that exist in one table, but don't exist in the other table.
CTE
CTE
A CTE (Common Table Expression) is a temporary result set that can be referenced within another SELECT, INSERT, UPDATE, or DELETE statement. Recursive CTEs can reference themselves, which enables developers to work with hierarchical data.
Graphs
Graphs
Many real-life situations are best modeled by graphs. Therefore, an in-depth knowledge of graph data structures is important for a good programmer.
Dynamic Objects
Dynamic Objects
Dynamic objects expose members such as properties and methods at run time, instead of at compile time. This enables you to create objects to work with structures that do not match a static type or format.
Multithreading
Multithreading
Multi-threading allows a process to make more efficient use of modern hardware by allowing code to execute asynchronously. It can drastically improve the performance of any app; however, it can be tricky to get right, making this an important topic for any programmer.
Synchronization
Synchronization
When using multithreading, developers need to know how to make one thread wait for another to finish its task before continuing with its work.
Random
Random
Random number generators are used to generate random numbers and/or symbols. There are a wide variety of random number generators each with very specific use cases, as such it's important for all developers to know and understand when to use each type.
Bayes' theorem
Bayes' theorem
Bayes' theorem describes the probability of an event based on conditions related to the event. It is the central idea behind Bayesian inference, an important and increasingly popular technique in statistics.
Tree
Tree
A tree is a hierarchical structure defined recursively starting with the root node, where each node is a data structure consisting of a value, together with a list of references to other nodes (the "children"). A lot of problems can be solved efficiently with trees, which makes them important for developers.
Decision boundary
Decision boundary
In a binary classification problem with two classes, a decision boundary or decision surface is a hypersurface that partitions the underlying vector space into two sets, one for each class. The classifier will classify all the points on one side of the decision boundary as belonging to one class and all those on the other side as belonging to the other class.
ROC
ROC
A receiver operating characteristic curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate against the false positive rate at all possible decision boundaries. It is useful for selecting possibly optimal models and to discard suboptimal ones prior to specifying decision boundaries.
Poisson distribution
Poisson distribution
Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring within a fixed interval of time and/or space, if these events occur with a known average rate and independently of the time since the last event. As one of the most widely used distributions, it is important for all Data Scientists to be familiar with it.
Correlation
Correlation
Correlation is any statistical relationship, whether causal or not, between two random variables or two sets of data. As one of the fundamentals of Data Science, correlation is an important concept for all Data Scientists to be familiar with.
Multicollinearity
Multicollinearity
Multicollinearity is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. In this situation the coefficient estimates of the multiple regression may change erratically in response to small changes in the model or the data. As such, it’s important for all data scientists to check for collinear variables when looking at individual predictor variables in multiple regression models.
Method overriding
Method overriding
Method overriding, in object-oriented programming, is a language feature that allows a subclass to provide a specific implementation of a method that is already provided by one of its parent classes.
Processing CSV
Processing CSV
A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. Processing CSV files is a common task when working with tabular data.
k-nearest neighbors
k-nearest neighbors
An important Data Science algorithm, the k-nearest neighbors algorithm is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression.
Binomial distribution
Binomial distribution
Binomial distribution is the discrete probability distribution of the number of successes in a sequence of independent yes/no experiments, each of which yields success with a given probability.
p-value
p-value
An important concept, p-value is defined as the probability of obtaining a result equal to or "more extreme" than what was actually observed, when the null hypothesis is true.
Integer division
Integer division
Integer division is division in which the fractional part (remainder) is discarded. Knowing this is important for optimal implementation of some algorithms and for avoiding common bugs.
Recommended Job Roles
These are the job roles that we recommend for the General and Python Data Science, Python, and SQL online test.
Data Analyst