SQL (Structured Query Language) is a domain-specific language that is used in programming to manage data that is stored in a relational database management system. It was initially developed at IBM by Donald D. Chamberlin and Raymond F. Boyce in the early 1970s. The system was designed in order to manage and retrieve data that was stored in IBM’s original relational database management system, System R. SQL helped introduce the concept of accessing many records with one command and eliminated the need to specify how to reach a record. …

SQL (Structured Query Language) is a domain-specific language that is used in programming to manage data that is stored in a relational database management system. It was initially developed at IBM by Donald D. Chamberlin and Raymond F. Boyce in the early 1970s. The system was designed in order to manage and retrieve data that was stored in IBM’s original relational database management system, System R. SQL helped introduce the concept of accessing many records with one command and eliminated the need to specify how to reach a record. …

When building a classification model, you may get great results (high accuracy score) only to realize that your model is only predicting every observation to one class. This is caused by class imbalance. Class imbalance is a problem in machine learning where the total number of one class of data significantly outnumbers the total number of another class of data. To illustrate what class imbalance looks like and how it works, let’s say that you have a two-class dataset that includes 50 diabetes patients and 5000 non-diabetes patients. In this example, the classification model will tend to classify patients as…

SQL (Structured Query Language) is a domain-specific language that is used in programming to manage data that is stored in a relational database management system. It was initially developed at IBM by Donald D. Chamberlin and Raymond F. Boyce in the early 1970s. The system was designed in order to manage and retrieve data that was stored in IBM’s original relational database management system, System R. SQL helped introduce the concept of accessing many records with one command and eliminated the need to specify how to reach a record. …

When building a machine learning model, it is important to make sure that your model is not over-fitting or under-fitting. While under-fitting is usually the result of a model not having enough data, over-fitting can be the result of a range of different scenarios. The objective in machine learning is to build a model that performs well with both the training data and the new data that is added to make predictions.

- Under-fitting — when a statistical model does not adequately capture the underlying structure of the data and, therefore, does not include some parameters that would appear in a…

While I’ve talked a lot about the different types of machine learning algorithms, I’d like to spend some time giving an overview of Statistical Learning theory for those who may be confused. Statistical learning theory is a framework for machine learning that draws from statistics and functional analysis. It deals with finding a predictive function based on the data presented. The main idea in statistical learning theory is to build a model that can draw conclusions from data and make predictions.

With statistical learning theory, there are two main types of data:

- Dependent Variable — a variable (y) whose values…

ANOVA (Analysis of Variance) provides a statistical test of whether two or more population means are equal. Assume we want to determine whether multiple groups differ from one another in a measurement. For example, lets say we want to determine whether the amount of uber riders differs by season in New York City. You could use t-test to determine this, but that would require you to use 6 tests (n(n-1)/2). The more tests you conduct, the bigger the risk is that you come to a false conclusion. To counteract this issue, you can use an ANOVA test. …

**Introduction:**

In hypothesis testing, the goal is to determine whether a statement (null hypothesis) is true or false. For example, you might want to test whether a store’s marketing campaign is effective. In order to do this, you would compare statistics, such as the average number of purchases in a given day, before and after the campaign.

In some cases, however, researchers will reject or accept the null hypothesis when they shouldn’t have. Data Scientists refer to these errors as Type I(False Positive) and Type II(False Negative) errors.

**Type I Errors:**

When conducting hypothesis tests, there is always a chance…

Machine learning is an application of AI that provides systems the ability to automatically learn from data and experience without explicit programming.

There is some variation as to what purposes machine learning can be used for:

- Supervised Learning
- Unsupervised Learning

Supervised Learning is the task of learning the mapping function from the input variable (x) to the output variable (y). It is called supervised learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process.

Supervised Learning can be used for regression and classification problems:

**Regression:** when the…

The one-sample z-test is one of the most basic types of hypothesis testing. It is performed when the population mean and standard deviation are known. The one-sample z-test is used to test whether the mean of a population is greater than, less than or equal to a certain value. It is best suited for determining whether a given sample can come from a certain population.

**Step 1: State Your Hypothesis**

- The Alternative Hypothesis → This reflects the theory that you are testing in the hypothesis test. For example, if you want to determine whether the soccer players that did an…