In general, a learning problem considers a set of n samples of data and then tries to predict properties of unknown data. If each sample is more than a single number and, for instance, a multi-dimensional entry (aka multivariate data), it is said to have several attributes or features. Learning problems fall into a few categories: supervised learning, in which the data comes with additional attributes that we want to predict (Click here to go to the scikit-learn supervised learning page).This problem can be either: classification: samples belong to two or more classes and we want to learn from already labeled data how to predict the class of unlabeled data. An example of a classification problem would be handwritten digit recognition, in which the aim is to assign each input vector to one of a finite number of discrete categories. Another way to think of classification is as a discrete (as opposed to continuous) form of supervised learning where one has a limited number of categories and for each of the n samples provided, one is to try to label them with the correct category or class. regression: if the desired output consists of one or more continuous variables, then the task is called regression. An example of a regression problem would be the prediction of the length of a salmon as a function of its age and weight. unsupervised learning, in which the training data consists of a set of input vectors x without any corresponding target values. The goal in such problems may be to discover groups of similar examples within the data, where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization (Click here to go to the Scikit-Learn unsupervised learning page).
MIT License
Defination: Machine learning is the scientific study of algorithms
and statistical models
that computer systems
use in order to perform a specific task
effectively without using explicit instructions
, relying on patterns and inference instead. It is seen as a subset of artificial intelligence.
When applying machine learning to real-world data, there are a lot of steps involved in the process -- starting with collecting the data and ending with generating predictions.
scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license.
The project was started in 2007 by David Cournapeau as a Google Summer of Code project, and since then many volunteers have contributed. See the About us <https://scikit-learn.org/dev/about.html#authors>
__ page
for a list of core contributors.
It is currently maintained by a team of volunteers.
Website: https://scikit-learn.org
Python (>= 3.6)
NumPy (>= 1.13.3)
SciPy (>= 0.19.1)
joblib (>= 0.11)
Scikit-learn 0.20 was the last version to support Python 2.7 and Python 3.4.
scikit-learn 0.23 and later require Python 3.6 or newer.
Scikit-learn plotting capabilities (i.e., functions start with plot_
and classes end with "Display") require Matplotlib (>= 2.1.1). For running the examples Matplotlib >= 2.1.1 is required. A few examples require scikit-image >= 0.13, a few examples require pandas >= 0.18.0, some examples require seaborn >= 0.9.0.
If you already have a working installation of numpy and scipy, the easiest way to install scikit-learn is using pip
::
pip install -U scikit-learn
or conda
::
conda install scikit-learn
The documentation includes more detailed installation instructions <https://scikit-learn.org/stable/install.html>
_.