1750 lines
78 KiB
ReStructuredText
1750 lines
78 KiB
ReStructuredText
|
.. include:: _contributors.rst
|
||
|
|
||
|
.. currentmodule:: sklearn
|
||
|
|
||
|
============
|
||
|
Version 0.20
|
||
|
============
|
||
|
|
||
|
.. warning::
|
||
|
|
||
|
Version 0.20 is the last version of scikit-learn to support Python 2.7 and Python 3.4.
|
||
|
Scikit-learn 0.21 will require Python 3.5 or higher.
|
||
|
|
||
|
.. include:: changelog_legend.inc
|
||
|
|
||
|
.. _changes_0_20_4:
|
||
|
|
||
|
Version 0.20.4
|
||
|
==============
|
||
|
|
||
|
**July 30, 2019**
|
||
|
|
||
|
This is a bug-fix release with some bug fixes applied to version 0.20.3.
|
||
|
|
||
|
Changelog
|
||
|
---------
|
||
|
|
||
|
The bundled version of joblib was upgraded from 0.13.0 to 0.13.2.
|
||
|
|
||
|
:mod:`sklearn.cluster`
|
||
|
..............................
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`cluster.KMeans` where KMeans++ initialisation
|
||
|
could rarely result in an IndexError. :issue:`11756` by `Joel Nothman`_.
|
||
|
|
||
|
:mod:`sklearn.compose`
|
||
|
.......................
|
||
|
|
||
|
- |Fix| Fixed an issue in :class:`compose.ColumnTransformer` where using
|
||
|
DataFrames whose column order differs between :func:``fit`` and
|
||
|
:func:``transform`` could lead to silently passing incorrect columns to the
|
||
|
``remainder`` transformer.
|
||
|
:pr:`14237` by `Andreas Schuderer <schuderer>`.
|
||
|
|
||
|
:mod:`sklearn.decomposition`
|
||
|
............................
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`cross_decomposition.CCA` improving numerical
|
||
|
stability when `Y` is close to zero. :pr:`13903` by `Thomas Fan`_.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.model_selection`
|
||
|
..............................
|
||
|
|
||
|
- |Fix| Fixed a bug where :class:`model_selection.StratifiedKFold`
|
||
|
shuffles each class's samples with the same ``random_state``,
|
||
|
making ``shuffle=True`` ineffective.
|
||
|
:issue:`13124` by :user:`Hanmin Qin <qinhanmin2014>`.
|
||
|
|
||
|
:mod:`sklearn.neighbors`
|
||
|
........................
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`neighbors.KernelDensity` which could not be
|
||
|
restored from a pickle if ``sample_weight`` had been used.
|
||
|
:issue:`13772` by :user:`Aditya Vyas <aditya1702>`.
|
||
|
|
||
|
.. _changes_0_20_3:
|
||
|
|
||
|
Version 0.20.3
|
||
|
==============
|
||
|
|
||
|
**March 1, 2019**
|
||
|
|
||
|
This is a bug-fix release with some minor documentation improvements and
|
||
|
enhancements to features released in 0.20.0.
|
||
|
|
||
|
Changelog
|
||
|
---------
|
||
|
|
||
|
:mod:`sklearn.cluster`
|
||
|
......................
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`cluster.KMeans` where computation was single
|
||
|
threaded when `n_jobs > 1` or `n_jobs = -1`.
|
||
|
:issue:`12949` by :user:`Prabakaran Kumaresshan <nixphix>`.
|
||
|
|
||
|
:mod:`sklearn.compose`
|
||
|
......................
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`compose.ColumnTransformer` to handle
|
||
|
negative indexes in the columns list of the transformers.
|
||
|
:issue:`12946` by :user:`Pierre Tallotte <pierretallotte>`.
|
||
|
|
||
|
:mod:`sklearn.covariance`
|
||
|
.........................
|
||
|
|
||
|
- |Fix| Fixed a regression in :func:`covariance.graphical_lasso` so that
|
||
|
the case `n_features=2` is handled correctly. :issue:`13276` by
|
||
|
:user:`Aurélien Bellet <bellet>`.
|
||
|
|
||
|
:mod:`sklearn.decomposition`
|
||
|
............................
|
||
|
|
||
|
- |Fix| Fixed a bug in :func:`decomposition.sparse_encode` where computation was single
|
||
|
threaded when `n_jobs > 1` or `n_jobs = -1`.
|
||
|
:issue:`13005` by :user:`Prabakaran Kumaresshan <nixphix>`.
|
||
|
|
||
|
:mod:`sklearn.datasets`
|
||
|
............................
|
||
|
|
||
|
- |Efficiency| :func:`sklearn.datasets.fetch_openml` now loads data by
|
||
|
streaming, avoiding high memory usage. :issue:`13312` by `Joris Van den
|
||
|
Bossche`_.
|
||
|
|
||
|
:mod:`sklearn.feature_extraction`
|
||
|
.................................
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`feature_extraction.text.CountVectorizer` which
|
||
|
would result in the sparse feature matrix having conflicting `indptr` and
|
||
|
`indices` precisions under very large vocabularies. :issue:`11295` by
|
||
|
:user:`Gabriel Vacaliuc <gvacaliuc>`.
|
||
|
|
||
|
:mod:`sklearn.impute`
|
||
|
.....................
|
||
|
|
||
|
- |Fix| add support for non-numeric data in
|
||
|
:class:`sklearn.impute.MissingIndicator` which was not supported while
|
||
|
:class:`sklearn.impute.SimpleImputer` was supporting this for some
|
||
|
imputation strategies.
|
||
|
:issue:`13046` by :user:`Guillaume Lemaitre <glemaitre>`.
|
||
|
|
||
|
:mod:`sklearn.linear_model`
|
||
|
...........................
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`linear_model.MultiTaskElasticNet` and
|
||
|
:class:`linear_model.MultiTaskLasso` which were breaking when
|
||
|
``warm_start = True``. :issue:`12360` by :user:`Aakanksha Joshi <joaak>`.
|
||
|
|
||
|
:mod:`sklearn.preprocessing`
|
||
|
............................
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`preprocessing.KBinsDiscretizer` where
|
||
|
``strategy='kmeans'`` fails with an error during transformation due to unsorted
|
||
|
bin edges. :issue:`13134` by :user:`Sandro Casagrande <SandroCasagrande>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`preprocessing.OneHotEncoder` where the
|
||
|
deprecation of ``categorical_features`` was handled incorrectly in
|
||
|
combination with ``handle_unknown='ignore'``.
|
||
|
:issue:`12881` by `Joris Van den Bossche`_.
|
||
|
|
||
|
- |Fix| Bins whose width are too small (i.e., <= 1e-8) are removed
|
||
|
with a warning in :class:`preprocessing.KBinsDiscretizer`.
|
||
|
:issue:`13165` by :user:`Hanmin Qin <qinhanmin2014>`.
|
||
|
|
||
|
:mod:`sklearn.svm`
|
||
|
..................
|
||
|
|
||
|
- |FIX| Fixed a bug in :class:`svm.SVC`, :class:`svm.NuSVC`, :class:`svm.SVR`,
|
||
|
:class:`svm.NuSVR` and :class:`svm.OneClassSVM` where the ``scale`` option
|
||
|
of parameter ``gamma`` is erroneously defined as
|
||
|
``1 / (n_features * X.std())``. It's now defined as
|
||
|
``1 / (n_features * X.var())``.
|
||
|
:issue:`13221` by :user:`Hanmin Qin <qinhanmin2014>`.
|
||
|
|
||
|
Code and Documentation Contributors
|
||
|
-----------------------------------
|
||
|
|
||
|
With thanks to:
|
||
|
|
||
|
Adrin Jalali, Agamemnon Krasoulis, Albert Thomas, Andreas Mueller, Aurélien
|
||
|
Bellet, bertrandhaut, Bharat Raghunathan, Dowon, Emmanuel Arias, Fibinse
|
||
|
Xavier, Finn O'Shea, Gabriel Vacaliuc, Gael Varoquaux, Guillaume Lemaitre,
|
||
|
Hanmin Qin, joaak, Joel Nothman, Joris Van den Bossche, Jérémie Méhault, kms15,
|
||
|
Kossori Aruku, Lakshya KD, maikia, Manuel López-Ibáñez, Marco Gorelli,
|
||
|
MarcoGorelli, mferrari3, Mickaël Schoentgen, Nicolas Hug, pavlos kallis, Pierre
|
||
|
Glaser, pierretallotte, Prabakaran Kumaresshan, Reshama Shaikh, Rohit Kapoor,
|
||
|
Roman Yurchak, SandroCasagrande, Tashay Green, Thomas Fan, Vishaal Kapoor,
|
||
|
Zhuyi Xue, Zijie (ZJ) Poh
|
||
|
|
||
|
.. _changes_0_20_2:
|
||
|
|
||
|
Version 0.20.2
|
||
|
==============
|
||
|
|
||
|
**December 20, 2018**
|
||
|
|
||
|
This is a bug-fix release with some minor documentation improvements and
|
||
|
enhancements to features released in 0.20.0.
|
||
|
|
||
|
Changed models
|
||
|
--------------
|
||
|
|
||
|
The following estimators and functions, when fit with the same data and
|
||
|
parameters, may produce different models from the previous version. This often
|
||
|
occurs due to changes in the modelling logic (bug fixes or enhancements), or in
|
||
|
random sampling procedures.
|
||
|
|
||
|
- :mod:`sklearn.neighbors` when ``metric=='jaccard'`` (bug fix)
|
||
|
- use of ``'seuclidean'`` or ``'mahalanobis'`` metrics in some cases (bug fix)
|
||
|
|
||
|
Changelog
|
||
|
---------
|
||
|
|
||
|
:mod:`sklearn.compose`
|
||
|
......................
|
||
|
|
||
|
- |Fix| Fixed an issue in :func:`compose.make_column_transformer` which raises
|
||
|
unexpected error when columns is pandas Index or pandas Series.
|
||
|
:issue:`12704` by :user:`Hanmin Qin <qinhanmin2014>`.
|
||
|
|
||
|
:mod:`sklearn.metrics`
|
||
|
......................
|
||
|
|
||
|
- |Fix| Fixed a bug in :func:`metrics.pairwise_distances` and
|
||
|
:func:`metrics.pairwise_distances_chunked` where parameters ``V`` of
|
||
|
``"seuclidean"`` and ``VI`` of ``"mahalanobis"`` metrics were computed after
|
||
|
the data was split into chunks instead of being pre-computed on whole data.
|
||
|
:issue:`12701` by :user:`Jeremie du Boisberranger <jeremiedbb>`.
|
||
|
|
||
|
:mod:`sklearn.neighbors`
|
||
|
........................
|
||
|
|
||
|
- |Fix| Fixed `sklearn.neighbors.DistanceMetric` jaccard distance
|
||
|
function to return 0 when two all-zero vectors are compared.
|
||
|
:issue:`12685` by :user:`Thomas Fan <thomasjpfan>`.
|
||
|
|
||
|
:mod:`sklearn.utils`
|
||
|
....................
|
||
|
|
||
|
- |Fix| Calling :func:`utils.check_array` on `pandas.Series` with categorical
|
||
|
data, which raised an error in 0.20.0, now returns the expected output again.
|
||
|
:issue:`12699` by `Joris Van den Bossche`_.
|
||
|
|
||
|
Code and Documentation Contributors
|
||
|
-----------------------------------
|
||
|
|
||
|
With thanks to:
|
||
|
|
||
|
|
||
|
adanhawth, Adrin Jalali, Albert Thomas, Andreas Mueller, Dan Stine, Feda Curic,
|
||
|
Hanmin Qin, Jan S, jeremiedbb, Joel Nothman, Joris Van den Bossche,
|
||
|
josephsalmon, Katrin Leinweber, Loic Esteve, Muhammad Hassaan Rafique, Nicolas
|
||
|
Hug, Olivier Grisel, Paul Paczuski, Reshama Shaikh, Sam Waterbury, Shivam
|
||
|
Kotwalia, Thomas Fan
|
||
|
|
||
|
.. _changes_0_20_1:
|
||
|
|
||
|
Version 0.20.1
|
||
|
==============
|
||
|
|
||
|
**November 21, 2018**
|
||
|
|
||
|
This is a bug-fix release with some minor documentation improvements and
|
||
|
enhancements to features released in 0.20.0. Note that we also include some
|
||
|
API changes in this release, so you might get some extra warnings after
|
||
|
updating from 0.20.0 to 0.20.1.
|
||
|
|
||
|
Changed models
|
||
|
--------------
|
||
|
|
||
|
The following estimators and functions, when fit with the same data and
|
||
|
parameters, may produce different models from the previous version. This often
|
||
|
occurs due to changes in the modelling logic (bug fixes or enhancements), or in
|
||
|
random sampling procedures.
|
||
|
|
||
|
- :class:`decomposition.IncrementalPCA` (bug fix)
|
||
|
|
||
|
Changelog
|
||
|
---------
|
||
|
|
||
|
:mod:`sklearn.cluster`
|
||
|
......................
|
||
|
|
||
|
- |Efficiency| make :class:`cluster.MeanShift` no longer try to do nested
|
||
|
parallelism as the overhead would hurt performance significantly when
|
||
|
``n_jobs > 1``.
|
||
|
:issue:`12159` by :user:`Olivier Grisel <ogrisel>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`cluster.DBSCAN` with precomputed sparse neighbors
|
||
|
graph, which would add explicitly zeros on the diagonal even when already
|
||
|
present. :issue:`12105` by `Tom Dupre la Tour`_.
|
||
|
|
||
|
:mod:`sklearn.compose`
|
||
|
......................
|
||
|
|
||
|
- |Fix| Fixed an issue in :class:`compose.ColumnTransformer` when stacking
|
||
|
columns with types not convertible to a numeric.
|
||
|
:issue:`11912` by :user:`Adrin Jalali <adrinjalali>`.
|
||
|
|
||
|
- |API| :class:`compose.ColumnTransformer` now applies the ``sparse_threshold``
|
||
|
even if all transformation results are sparse. :issue:`12304` by `Andreas
|
||
|
Müller`_.
|
||
|
|
||
|
- |API| :func:`compose.make_column_transformer` now expects
|
||
|
``(transformer, columns)`` instead of ``(columns, transformer)`` to keep
|
||
|
consistent with :class:`compose.ColumnTransformer`.
|
||
|
:issue:`12339` by :user:`Adrin Jalali <adrinjalali>`.
|
||
|
|
||
|
:mod:`sklearn.datasets`
|
||
|
............................
|
||
|
|
||
|
- |Fix| :func:`datasets.fetch_openml` to correctly use the local cache.
|
||
|
:issue:`12246` by :user:`Jan N. van Rijn <janvanrijn>`.
|
||
|
|
||
|
- |Fix| :func:`datasets.fetch_openml` to correctly handle ignore attributes and
|
||
|
row id attributes. :issue:`12330` by :user:`Jan N. van Rijn <janvanrijn>`.
|
||
|
|
||
|
- |Fix| Fixed integer overflow in :func:`datasets.make_classification`
|
||
|
for values of ``n_informative`` parameter larger than 64.
|
||
|
:issue:`10811` by :user:`Roman Feldbauer <VarIr>`.
|
||
|
|
||
|
- |Fix| Fixed olivetti faces dataset ``DESCR`` attribute to point to the right
|
||
|
location in :func:`datasets.fetch_olivetti_faces`. :issue:`12441` by
|
||
|
:user:`Jérémie du Boisberranger <jeremiedbb>`
|
||
|
|
||
|
- |Fix| :func:`datasets.fetch_openml` to retry downloading when reading
|
||
|
from local cache fails. :issue:`12517` by :user:`Thomas Fan <thomasjpfan>`.
|
||
|
|
||
|
:mod:`sklearn.decomposition`
|
||
|
............................
|
||
|
|
||
|
- |Fix| Fixed a regression in :class:`decomposition.IncrementalPCA` where
|
||
|
0.20.0 raised an error if the number of samples in the final batch for
|
||
|
fitting IncrementalPCA was smaller than n_components.
|
||
|
:issue:`12234` by :user:`Ming Li <minggli>`.
|
||
|
|
||
|
:mod:`sklearn.ensemble`
|
||
|
.......................
|
||
|
|
||
|
- |Fix| Fixed a bug mostly affecting :class:`ensemble.RandomForestClassifier`
|
||
|
where ``class_weight='balanced_subsample'`` failed with more than 32 classes.
|
||
|
:issue:`12165` by `Joel Nothman`_.
|
||
|
|
||
|
- |Fix| Fixed a bug affecting :class:`ensemble.BaggingClassifier`,
|
||
|
:class:`ensemble.BaggingRegressor` and :class:`ensemble.IsolationForest`,
|
||
|
where ``max_features`` was sometimes rounded down to zero.
|
||
|
:issue:`12388` by :user:`Connor Tann <Connossor>`.
|
||
|
|
||
|
:mod:`sklearn.feature_extraction`
|
||
|
..................................
|
||
|
|
||
|
- |Fix| Fixed a regression in v0.20.0 where
|
||
|
:func:`feature_extraction.text.CountVectorizer` and other text vectorizers
|
||
|
could error during stop words validation with custom preprocessors
|
||
|
or tokenizers. :issue:`12393` by `Roman Yurchak`_.
|
||
|
|
||
|
:mod:`sklearn.linear_model`
|
||
|
...........................
|
||
|
|
||
|
- |Fix| :class:`linear_model.SGDClassifier` and variants
|
||
|
with ``early_stopping=True`` would not use a consistent validation
|
||
|
split in the multiclass case and this would cause a crash when using
|
||
|
those estimators as part of parallel parameter search or cross-validation.
|
||
|
:issue:`12122` by :user:`Olivier Grisel <ogrisel>`.
|
||
|
|
||
|
- |Fix| Fixed a bug affecting :class:`linear_model.SGDClassifier` in the multiclass
|
||
|
case. Each one-versus-all step is run in a :class:`joblib.Parallel` call and
|
||
|
mutating a common parameter, causing a segmentation fault if called within a
|
||
|
backend using processes and not threads. We now use ``require=sharedmem``
|
||
|
at the :class:`joblib.Parallel` instance creation. :issue:`12518` by
|
||
|
:user:`Pierre Glaser <pierreglaser>` and :user:`Olivier Grisel <ogrisel>`.
|
||
|
|
||
|
:mod:`sklearn.metrics`
|
||
|
......................
|
||
|
|
||
|
- |Fix| Fixed a bug in `metrics.pairwise.pairwise_distances_argmin_min`
|
||
|
which returned the square root of the distance when the metric parameter was
|
||
|
set to "euclidean". :issue:`12481` by
|
||
|
:user:`Jérémie du Boisberranger <jeremiedbb>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in `metrics.pairwise.pairwise_distances_chunked`
|
||
|
which didn't ensure the diagonal is zero for euclidean distances.
|
||
|
:issue:`12612` by :user:`Andreas Müller <amueller>`.
|
||
|
|
||
|
- |API| The `metrics.calinski_harabaz_score` has been renamed to
|
||
|
:func:`metrics.calinski_harabasz_score` and will be removed in version 0.23.
|
||
|
:issue:`12211` by :user:`Lisa Thomas <LisaThomas9>`,
|
||
|
:user:`Mark Hannel <markhannel>` and :user:`Melissa Ferrari <mferrari3>`.
|
||
|
|
||
|
:mod:`sklearn.mixture`
|
||
|
........................
|
||
|
|
||
|
- |Fix| Ensure that the ``fit_predict`` method of
|
||
|
:class:`mixture.GaussianMixture` and :class:`mixture.BayesianGaussianMixture`
|
||
|
always yield assignments consistent with ``fit`` followed by ``predict`` even
|
||
|
if the convergence criterion is too loose or not met. :issue:`12451`
|
||
|
by :user:`Olivier Grisel <ogrisel>`.
|
||
|
|
||
|
:mod:`sklearn.neighbors`
|
||
|
........................
|
||
|
|
||
|
- |Fix| force the parallelism backend to :code:`threading` for
|
||
|
:class:`neighbors.KDTree` and :class:`neighbors.BallTree` in Python 2.7 to
|
||
|
avoid pickling errors caused by the serialization of their methods.
|
||
|
:issue:`12171` by :user:`Thomas Moreau <tomMoral>`.
|
||
|
|
||
|
:mod:`sklearn.preprocessing`
|
||
|
.............................
|
||
|
|
||
|
- |Fix| Fixed bug in :class:`preprocessing.OrdinalEncoder` when passing
|
||
|
manually specified categories. :issue:`12365` by `Joris Van den Bossche`_.
|
||
|
|
||
|
- |Fix| Fixed bug in :class:`preprocessing.KBinsDiscretizer` where the
|
||
|
``transform`` method mutates the ``_encoder`` attribute. The ``transform``
|
||
|
method is now thread safe. :issue:`12514` by
|
||
|
:user:`Hanmin Qin <qinhanmin2014>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`preprocessing.PowerTransformer` where the
|
||
|
Yeo-Johnson transform was incorrect for lambda parameters outside of `[0, 2]`
|
||
|
:issue:`12522` by :user:`Nicolas Hug<NicolasHug>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`preprocessing.OneHotEncoder` where transform
|
||
|
failed when set to ignore unknown numpy strings of different lengths
|
||
|
:issue:`12471` by :user:`Gabriel Marzinotto<GMarzinotto>`.
|
||
|
|
||
|
- |API| The default value of the :code:`method` argument in
|
||
|
:func:`preprocessing.power_transform` will be changed from :code:`box-cox`
|
||
|
to :code:`yeo-johnson` to match :class:`preprocessing.PowerTransformer`
|
||
|
in version 0.23. A FutureWarning is raised when the default value is used.
|
||
|
:issue:`12317` by :user:`Eric Chang <chang>`.
|
||
|
|
||
|
:mod:`sklearn.utils`
|
||
|
........................
|
||
|
|
||
|
- |Fix| Use float64 for mean accumulator to avoid floating point
|
||
|
precision issues in :class:`preprocessing.StandardScaler` and
|
||
|
:class:`decomposition.IncrementalPCA` when using float32 datasets.
|
||
|
:issue:`12338` by :user:`bauks <bauks>`.
|
||
|
|
||
|
- |Fix| Calling :func:`utils.check_array` on `pandas.Series`, which
|
||
|
raised an error in 0.20.0, now returns the expected output again.
|
||
|
:issue:`12625` by `Andreas Müller`_
|
||
|
|
||
|
Miscellaneous
|
||
|
.............
|
||
|
|
||
|
- |Fix| When using site joblib by setting the environment variable
|
||
|
`SKLEARN_SITE_JOBLIB`, added compatibility with joblib 0.11 in addition
|
||
|
to 0.12+. :issue:`12350` by `Joel Nothman`_ and `Roman Yurchak`_.
|
||
|
|
||
|
- |Fix| Make sure to avoid raising ``FutureWarning`` when calling
|
||
|
``np.vstack`` with numpy 1.16 and later (use list comprehensions
|
||
|
instead of generator expressions in many locations of the scikit-learn
|
||
|
code base). :issue:`12467` by :user:`Olivier Grisel <ogrisel>`.
|
||
|
|
||
|
- |API| Removed all mentions of ``sklearn.externals.joblib``, and deprecated
|
||
|
joblib methods exposed in ``sklearn.utils``, except for
|
||
|
:func:`utils.parallel_backend` and :func:`utils.register_parallel_backend`,
|
||
|
which allow users to configure parallel computation in scikit-learn.
|
||
|
Other functionalities are part of `joblib <https://joblib.readthedocs.io/>`_.
|
||
|
package and should be used directly, by installing it.
|
||
|
The goal of this change is to prepare for
|
||
|
unvendoring joblib in future version of scikit-learn.
|
||
|
:issue:`12345` by :user:`Thomas Moreau <tomMoral>`
|
||
|
|
||
|
Code and Documentation Contributors
|
||
|
-----------------------------------
|
||
|
|
||
|
With thanks to:
|
||
|
|
||
|
^__^, Adrin Jalali, Andrea Navarrete, Andreas Mueller,
|
||
|
bauks, BenjaStudio, Cheuk Ting Ho, Connossor,
|
||
|
Corey Levinson, Dan Stine, daten-kieker, Denis Kataev,
|
||
|
Dillon Gardner, Dmitry Vukolov, Dougal J. Sutherland, Edward J Brown,
|
||
|
Eric Chang, Federico Caselli, Gabriel Marzinotto, Gael Varoquaux,
|
||
|
GauravAhlawat, Gustavo De Mari Pereira, Hanmin Qin, haroldfox,
|
||
|
JackLangerman, Jacopo Notarstefano, janvanrijn, jdethurens,
|
||
|
jeremiedbb, Joel Nothman, Joris Van den Bossche, Koen,
|
||
|
Kushal Chauhan, Lee Yi Jie Joel, Lily Xiong, mail-liam,
|
||
|
Mark Hannel, melsyt, Ming Li, Nicholas Smith,
|
||
|
Nicolas Hug, Nikolay Shebanov, Oleksandr Pavlyk, Olivier Grisel,
|
||
|
Peter Hausamann, Pierre Glaser, Pulkit Maloo, Quentin Batista,
|
||
|
Radostin Stoyanov, Ramil Nugmanov, Rebekah Kim, Reshama Shaikh,
|
||
|
Rohan Singh, Roman Feldbauer, Roman Yurchak, Roopam Sharma,
|
||
|
Sam Waterbury, Scott Lowe, Sebastian Raschka, Stephen Tierney,
|
||
|
SylvainLan, TakingItCasual, Thomas Fan, Thomas Moreau,
|
||
|
Tom Dupré la Tour, Tulio Casagrande, Utkarsh Upadhyay, Xing Han Lu,
|
||
|
Yaroslav Halchenko, Zach Miller
|
||
|
|
||
|
|
||
|
.. _changes_0_20:
|
||
|
|
||
|
Version 0.20.0
|
||
|
==============
|
||
|
|
||
|
**September 25, 2018**
|
||
|
|
||
|
This release packs in a mountain of bug fixes, features and enhancements for
|
||
|
the Scikit-learn library, and improvements to the documentation and examples.
|
||
|
Thanks to our contributors!
|
||
|
|
||
|
This release is dedicated to the memory of Raghav Rajagopalan.
|
||
|
|
||
|
Highlights
|
||
|
----------
|
||
|
|
||
|
We have tried to improve our support for common data-science use-cases
|
||
|
including missing values, categorical variables, heterogeneous data, and
|
||
|
features/targets with unusual distributions.
|
||
|
Missing values in features, represented by NaNs, are now accepted in
|
||
|
column-wise preprocessing such as scalers. Each feature is fitted disregarding
|
||
|
NaNs, and data containing NaNs can be transformed. The new :mod:`sklearn.impute`
|
||
|
module provides estimators for learning despite missing data.
|
||
|
|
||
|
:class:`~compose.ColumnTransformer` handles the case where different features
|
||
|
or columns of a pandas.DataFrame need different preprocessing.
|
||
|
String or pandas Categorical columns can now be encoded with
|
||
|
:class:`~preprocessing.OneHotEncoder` or
|
||
|
:class:`~preprocessing.OrdinalEncoder`.
|
||
|
|
||
|
:class:`~compose.TransformedTargetRegressor` helps when the regression target
|
||
|
needs to be transformed to be modeled. :class:`~preprocessing.PowerTransformer`
|
||
|
and :class:`~preprocessing.KBinsDiscretizer` join
|
||
|
:class:`~preprocessing.QuantileTransformer` as non-linear transformations.
|
||
|
|
||
|
Beyond this, we have added :term:`sample_weight` support to several estimators
|
||
|
(including :class:`~cluster.KMeans`, :class:`~linear_model.BayesianRidge` and
|
||
|
:class:`~neighbors.KernelDensity`) and improved stopping criteria in others
|
||
|
(including :class:`~neural_network.MLPRegressor`,
|
||
|
:class:`~ensemble.GradientBoostingRegressor` and
|
||
|
:class:`~linear_model.SGDRegressor`).
|
||
|
|
||
|
This release is also the first to be accompanied by a :ref:`glossary` developed
|
||
|
by `Joel Nothman`_. The glossary is a reference resource to help users and
|
||
|
contributors become familiar with the terminology and conventions used in
|
||
|
Scikit-learn.
|
||
|
|
||
|
Sorry if your contribution didn't make it into the highlights. There's a lot
|
||
|
here...
|
||
|
|
||
|
Changed models
|
||
|
--------------
|
||
|
|
||
|
The following estimators and functions, when fit with the same data and
|
||
|
parameters, may produce different models from the previous version. This often
|
||
|
occurs due to changes in the modelling logic (bug fixes or enhancements), or in
|
||
|
random sampling procedures.
|
||
|
|
||
|
- :class:`cluster.MeanShift` (bug fix)
|
||
|
- :class:`decomposition.IncrementalPCA` in Python 2 (bug fix)
|
||
|
- :class:`decomposition.SparsePCA` (bug fix)
|
||
|
- :class:`ensemble.GradientBoostingClassifier` (bug fix affecting feature importances)
|
||
|
- :class:`isotonic.IsotonicRegression` (bug fix)
|
||
|
- :class:`linear_model.ARDRegression` (bug fix)
|
||
|
- :class:`linear_model.LogisticRegressionCV` (bug fix)
|
||
|
- :class:`linear_model.OrthogonalMatchingPursuit` (bug fix)
|
||
|
- :class:`linear_model.PassiveAggressiveClassifier` (bug fix)
|
||
|
- :class:`linear_model.PassiveAggressiveRegressor` (bug fix)
|
||
|
- :class:`linear_model.Perceptron` (bug fix)
|
||
|
- :class:`linear_model.SGDClassifier` (bug fix)
|
||
|
- :class:`linear_model.SGDRegressor` (bug fix)
|
||
|
- :class:`metrics.roc_auc_score` (bug fix)
|
||
|
- :class:`metrics.roc_curve` (bug fix)
|
||
|
- `neural_network.BaseMultilayerPerceptron` (bug fix)
|
||
|
- :class:`neural_network.MLPClassifier` (bug fix)
|
||
|
- :class:`neural_network.MLPRegressor` (bug fix)
|
||
|
- The v0.19.0 release notes failed to mention a backwards incompatibility with
|
||
|
:class:`model_selection.StratifiedKFold` when ``shuffle=True`` due to
|
||
|
:issue:`7823`.
|
||
|
|
||
|
Details are listed in the changelog below.
|
||
|
|
||
|
(While we are trying to better inform users by providing this information, we
|
||
|
cannot assure that this list is complete.)
|
||
|
|
||
|
Known Major Bugs
|
||
|
----------------
|
||
|
|
||
|
* :issue:`11924`: :class:`linear_model.LogisticRegressionCV` with
|
||
|
`solver='lbfgs'` and `multi_class='multinomial'` may be non-deterministic or
|
||
|
otherwise broken on macOS. This appears to be the case on Travis CI servers,
|
||
|
but has not been confirmed on personal MacBooks! This issue has been present
|
||
|
in previous releases.
|
||
|
|
||
|
* :issue:`9354`: :func:`metrics.pairwise.euclidean_distances` (which is used
|
||
|
several times throughout the library) gives results with poor precision,
|
||
|
which particularly affects its use with 32-bit float inputs. This became
|
||
|
more problematic in versions 0.18 and 0.19 when some algorithms were changed
|
||
|
to avoid casting 32-bit data into 64-bit.
|
||
|
|
||
|
Changelog
|
||
|
---------
|
||
|
|
||
|
Support for Python 3.3 has been officially dropped.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.cluster`
|
||
|
......................
|
||
|
|
||
|
- |MajorFeature| :class:`cluster.AgglomerativeClustering` now supports Single
|
||
|
Linkage clustering via ``linkage='single'``. :issue:`9372` by :user:`Leland
|
||
|
McInnes <lmcinnes>` and :user:`Steve Astels <sastels>`.
|
||
|
|
||
|
- |Feature| :class:`cluster.KMeans` and :class:`cluster.MiniBatchKMeans` now support
|
||
|
sample weights via new parameter ``sample_weight`` in ``fit`` function.
|
||
|
:issue:`10933` by :user:`Johannes Hansen <jnhansen>`.
|
||
|
|
||
|
- |Efficiency| :class:`cluster.KMeans`, :class:`cluster.MiniBatchKMeans` and
|
||
|
:func:`cluster.k_means` passed with ``algorithm='full'`` now enforces
|
||
|
row-major ordering, improving runtime.
|
||
|
:issue:`10471` by :user:`Gaurav Dhingra <gxyd>`.
|
||
|
|
||
|
- |Efficiency| :class:`cluster.DBSCAN` now is parallelized according to ``n_jobs``
|
||
|
regardless of ``algorithm``.
|
||
|
:issue:`8003` by :user:`Joël Billaud <recamshak>`.
|
||
|
|
||
|
- |Enhancement| :class:`cluster.KMeans` now gives a warning if the number of
|
||
|
distinct clusters found is smaller than ``n_clusters``. This may occur when
|
||
|
the number of distinct points in the data set is actually smaller than the
|
||
|
number of cluster one is looking for.
|
||
|
:issue:`10059` by :user:`Christian Braune <christianbraune79>`.
|
||
|
|
||
|
- |Fix| Fixed a bug where the ``fit`` method of
|
||
|
:class:`cluster.AffinityPropagation` stored cluster
|
||
|
centers as 3d array instead of 2d array in case of non-convergence. For the
|
||
|
same class, fixed undefined and arbitrary behavior in case of training data
|
||
|
where all samples had equal similarity.
|
||
|
:issue:`9612`. By :user:`Jonatan Samoocha <jsamoocha>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :func:`cluster.spectral_clustering` where the normalization of
|
||
|
the spectrum was using a division instead of a multiplication. :issue:`8129`
|
||
|
by :user:`Jan Margeta <jmargeta>`, :user:`Guillaume Lemaitre <glemaitre>`,
|
||
|
and :user:`Devansh D. <devanshdalal>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in `cluster.k_means_elkan` where the returned
|
||
|
``iteration`` was 1 less than the correct value. Also added the missing
|
||
|
``n_iter_`` attribute in the docstring of :class:`cluster.KMeans`.
|
||
|
:issue:`11353` by :user:`Jeremie du Boisberranger <jeremiedbb>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :func:`cluster.mean_shift` where the assigned labels
|
||
|
were not deterministic if there were multiple clusters with the same
|
||
|
intensities.
|
||
|
:issue:`11901` by :user:`Adrin Jalali <adrinjalali>`.
|
||
|
|
||
|
- |API| Deprecate ``pooling_func`` unused parameter in
|
||
|
:class:`cluster.AgglomerativeClustering`.
|
||
|
:issue:`9875` by :user:`Kumar Ashutosh <thechargedneutron>`.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.compose`
|
||
|
......................
|
||
|
|
||
|
- New module.
|
||
|
|
||
|
- |MajorFeature| Added :class:`compose.ColumnTransformer`, which allows to
|
||
|
apply different transformers to different columns of arrays or pandas
|
||
|
DataFrames. :issue:`9012` by `Andreas Müller`_ and `Joris Van den Bossche`_,
|
||
|
and :issue:`11315` by :user:`Thomas Fan <thomasjpfan>`.
|
||
|
|
||
|
- |MajorFeature| Added the :class:`compose.TransformedTargetRegressor` which
|
||
|
transforms the target y before fitting a regression model. The predictions
|
||
|
are mapped back to the original space via an inverse transform. :issue:`9041`
|
||
|
by `Andreas Müller`_ and :user:`Guillaume Lemaitre <glemaitre>`.
|
||
|
|
||
|
|
||
|
|
||
|
:mod:`sklearn.covariance`
|
||
|
.........................
|
||
|
|
||
|
- |Efficiency| Runtime improvements to :class:`covariance.GraphicalLasso`.
|
||
|
:issue:`9858` by :user:`Steven Brown <stevendbrown>`.
|
||
|
|
||
|
- |API| The `covariance.graph_lasso`,
|
||
|
`covariance.GraphLasso` and `covariance.GraphLassoCV` have been
|
||
|
renamed to :func:`covariance.graphical_lasso`,
|
||
|
:class:`covariance.GraphicalLasso` and :class:`covariance.GraphicalLassoCV`
|
||
|
respectively and will be removed in version 0.22.
|
||
|
:issue:`9993` by :user:`Artiem Krinitsyn <artiemq>`
|
||
|
|
||
|
|
||
|
:mod:`sklearn.datasets`
|
||
|
.......................
|
||
|
|
||
|
- |MajorFeature| Added :func:`datasets.fetch_openml` to fetch datasets from
|
||
|
`OpenML <https://openml.org>`_. OpenML is a free, open data sharing platform
|
||
|
and will be used instead of mldata as it provides better service availability.
|
||
|
:issue:`9908` by `Andreas Müller`_ and :user:`Jan N. van Rijn <janvanrijn>`.
|
||
|
|
||
|
- |Feature| In :func:`datasets.make_blobs`, one can now pass a list to the
|
||
|
``n_samples`` parameter to indicate the number of samples to generate per
|
||
|
cluster. :issue:`8617` by :user:`Maskani Filali Mohamed <maskani-moh>` and
|
||
|
:user:`Konstantinos Katrioplas <kkatrio>`.
|
||
|
|
||
|
- |Feature| Add ``filename`` attribute to :mod:`sklearn.datasets` that have a CSV file.
|
||
|
:issue:`9101` by :user:`alex-33 <alex-33>`
|
||
|
and :user:`Maskani Filali Mohamed <maskani-moh>`.
|
||
|
|
||
|
- |Feature| ``return_X_y`` parameter has been added to several dataset loaders.
|
||
|
:issue:`10774` by :user:`Chris Catalfo <ccatalfo>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in `datasets.load_boston` which had a wrong data
|
||
|
point. :issue:`10795` by :user:`Takeshi Yoshizawa <tarcusx>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :func:`datasets.load_iris` which had two wrong data points.
|
||
|
:issue:`11082` by :user:`Sadhana Srinivasan <rotuna>`
|
||
|
and :user:`Hanmin Qin <qinhanmin2014>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :func:`datasets.fetch_kddcup99`, where data were not
|
||
|
properly shuffled. :issue:`9731` by `Nicolas Goix`_.
|
||
|
|
||
|
- |Fix| Fixed a bug in :func:`datasets.make_circles`, where no odd number of
|
||
|
data points could be generated. :issue:`10045` by :user:`Christian Braune
|
||
|
<christianbraune79>`.
|
||
|
|
||
|
- |API| Deprecated `sklearn.datasets.fetch_mldata` to be removed in
|
||
|
version 0.22. mldata.org is no longer operational. Until removal it will
|
||
|
remain possible to load cached datasets. :issue:`11466` by `Joel Nothman`_.
|
||
|
|
||
|
:mod:`sklearn.decomposition`
|
||
|
............................
|
||
|
|
||
|
- |Feature| :func:`decomposition.dict_learning` functions and models now
|
||
|
support positivity constraints. This applies to the dictionary and sparse
|
||
|
code. :issue:`6374` by :user:`John Kirkham <jakirkham>`.
|
||
|
|
||
|
- |Feature| |Fix| :class:`decomposition.SparsePCA` now exposes
|
||
|
``normalize_components``. When set to True, the train and test data are
|
||
|
centered with the train mean respectively during the fit phase and the
|
||
|
transform phase. This fixes the behavior of SparsePCA. When set to False,
|
||
|
which is the default, the previous abnormal behaviour still holds. The False
|
||
|
value is for backward compatibility and should not be used. :issue:`11585`
|
||
|
by :user:`Ivan Panico <FollowKenny>`.
|
||
|
|
||
|
- |Efficiency| Efficiency improvements in :func:`decomposition.dict_learning`.
|
||
|
:issue:`11420` and others by :user:`John Kirkham <jakirkham>`.
|
||
|
|
||
|
- |Fix| Fix for uninformative error in :class:`decomposition.IncrementalPCA`:
|
||
|
now an error is raised if the number of components is larger than the
|
||
|
chosen batch size. The ``n_components=None`` case was adapted accordingly.
|
||
|
:issue:`6452`. By :user:`Wally Gauze <wallygauze>`.
|
||
|
|
||
|
- |Fix| Fixed a bug where the ``partial_fit`` method of
|
||
|
:class:`decomposition.IncrementalPCA` used integer division instead of float
|
||
|
division on Python 2.
|
||
|
:issue:`9492` by :user:`James Bourbeau <jrbourbeau>`.
|
||
|
|
||
|
- |Fix| In :class:`decomposition.PCA` selecting a n_components parameter greater
|
||
|
than the number of samples now raises an error. Similarly, the
|
||
|
``n_components=None`` case now selects the minimum of ``n_samples`` and
|
||
|
``n_features``.
|
||
|
:issue:`8484` by :user:`Wally Gauze <wallygauze>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`decomposition.PCA` where users will get
|
||
|
unexpected error with large datasets when ``n_components='mle'`` on Python 3
|
||
|
versions.
|
||
|
:issue:`9886` by :user:`Hanmin Qin <qinhanmin2014>`.
|
||
|
|
||
|
- |Fix| Fixed an underflow in calculating KL-divergence for
|
||
|
:class:`decomposition.NMF` :issue:`10142` by `Tom Dupre la Tour`_.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`decomposition.SparseCoder` when running OMP
|
||
|
sparse coding in parallel using read-only memory mapped datastructures.
|
||
|
:issue:`5956` by :user:`Vighnesh Birodkar <vighneshbirodkar>` and
|
||
|
:user:`Olivier Grisel <ogrisel>`.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.discriminant_analysis`
|
||
|
....................................
|
||
|
|
||
|
- |Efficiency| Memory usage improvement for `_class_means` and
|
||
|
`_class_cov` in :mod:`sklearn.discriminant_analysis`. :issue:`10898` by
|
||
|
:user:`Nanxin Chen <bobchennan>`.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.dummy`
|
||
|
....................
|
||
|
|
||
|
- |Feature| :class:`dummy.DummyRegressor` now has a ``return_std`` option in its
|
||
|
``predict`` method. The returned standard deviations will be zeros.
|
||
|
|
||
|
- |Feature| :class:`dummy.DummyClassifier` and :class:`dummy.DummyRegressor` now
|
||
|
only require X to be an object with finite length or shape. :issue:`9832` by
|
||
|
:user:`Vrishank Bhardwaj <vrishank97>`.
|
||
|
|
||
|
- |Feature| :class:`dummy.DummyClassifier` and :class:`dummy.DummyRegressor`
|
||
|
can now be scored without supplying test samples.
|
||
|
:issue:`11951` by :user:`Rüdiger Busche <JarnoRFB>`.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.ensemble`
|
||
|
.......................
|
||
|
|
||
|
- |Feature| :class:`ensemble.BaggingRegressor` and
|
||
|
:class:`ensemble.BaggingClassifier` can now be fit with missing/non-finite
|
||
|
values in X and/or multi-output Y to support wrapping pipelines that perform
|
||
|
their own imputation. :issue:`9707` by :user:`Jimmy Wan <jimmywan>`.
|
||
|
|
||
|
- |Feature| :class:`ensemble.GradientBoostingClassifier` and
|
||
|
:class:`ensemble.GradientBoostingRegressor` now support early stopping
|
||
|
via ``n_iter_no_change``, ``validation_fraction`` and ``tol``. :issue:`7071`
|
||
|
by `Raghav RV`_
|
||
|
|
||
|
- |Feature| Added ``named_estimators_`` parameter in
|
||
|
:class:`ensemble.VotingClassifier` to access fitted estimators.
|
||
|
:issue:`9157` by :user:`Herilalaina Rakotoarison <herilalaina>`.
|
||
|
|
||
|
- |Fix| Fixed a bug when fitting :class:`ensemble.GradientBoostingClassifier` or
|
||
|
:class:`ensemble.GradientBoostingRegressor` with ``warm_start=True`` which
|
||
|
previously raised a segmentation fault due to a non-conversion of CSC matrix
|
||
|
into CSR format expected by ``decision_function``. Similarly, Fortran-ordered
|
||
|
arrays are converted to C-ordered arrays in the dense case. :issue:`9991` by
|
||
|
:user:`Guillaume Lemaitre <glemaitre>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`ensemble.GradientBoostingRegressor`
|
||
|
and :class:`ensemble.GradientBoostingClassifier` to have
|
||
|
feature importances summed and then normalized, rather than normalizing on a
|
||
|
per-tree basis. The previous behavior over-weighted the Gini importance of
|
||
|
features that appear in later stages. This issue only affected feature
|
||
|
importances. :issue:`11176` by :user:`Gil Forsyth <gforsyth>`.
|
||
|
|
||
|
- |API| The default value of the ``n_estimators`` parameter of
|
||
|
:class:`ensemble.RandomForestClassifier`, :class:`ensemble.RandomForestRegressor`,
|
||
|
:class:`ensemble.ExtraTreesClassifier`, :class:`ensemble.ExtraTreesRegressor`,
|
||
|
and :class:`ensemble.RandomTreesEmbedding` will change from 10 in version 0.20
|
||
|
to 100 in 0.22. A FutureWarning is raised when the default value is used.
|
||
|
:issue:`11542` by :user:`Anna Ayzenshtat <annaayzenshtat>`.
|
||
|
|
||
|
- |API| Classes derived from `ensemble.BaseBagging`. The attribute
|
||
|
``estimators_samples_`` will return a list of arrays containing the indices
|
||
|
selected for each bootstrap instead of a list of arrays containing the mask
|
||
|
of the samples selected for each bootstrap. Indices allows to repeat samples
|
||
|
while mask does not allow this functionality.
|
||
|
:issue:`9524` by :user:`Guillaume Lemaitre <glemaitre>`.
|
||
|
|
||
|
- |Fix| `ensemble.BaseBagging` where one could not deterministically
|
||
|
reproduce ``fit`` result using the object attributes when ``random_state``
|
||
|
is set. :issue:`9723` by :user:`Guillaume Lemaitre <glemaitre>`.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.feature_extraction`
|
||
|
.................................
|
||
|
|
||
|
- |Feature| Enable the call to `get_feature_names` in unfitted
|
||
|
:class:`feature_extraction.text.CountVectorizer` initialized with a
|
||
|
vocabulary. :issue:`10908` by :user:`Mohamed Maskani <maskani-moh>`.
|
||
|
|
||
|
- |Enhancement| ``idf_`` can now be set on a
|
||
|
:class:`feature_extraction.text.TfidfTransformer`.
|
||
|
:issue:`10899` by :user:`Sergey Melderis <serega>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :func:`feature_extraction.image.extract_patches_2d` which
|
||
|
would throw an exception if ``max_patches`` was greater than or equal to the
|
||
|
number of all possible patches rather than simply returning the number of
|
||
|
possible patches. :issue:`10101` by :user:`Varun Agrawal <varunagrawal>`
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`feature_extraction.text.CountVectorizer`,
|
||
|
:class:`feature_extraction.text.TfidfVectorizer`,
|
||
|
:class:`feature_extraction.text.HashingVectorizer` to support 64 bit sparse
|
||
|
array indexing necessary to process large datasets with more than 2·10⁹ tokens
|
||
|
(words or n-grams). :issue:`9147` by :user:`Claes-Fredrik Mannby <mannby>`
|
||
|
and `Roman Yurchak`_.
|
||
|
|
||
|
- |Fix| Fixed bug in :class:`feature_extraction.text.TfidfVectorizer` which
|
||
|
was ignoring the parameter ``dtype``. In addition,
|
||
|
:class:`feature_extraction.text.TfidfTransformer` will preserve ``dtype``
|
||
|
for floating and raise a warning if ``dtype`` requested is integer.
|
||
|
:issue:`10441` by :user:`Mayur Kulkarni <maykulkarni>` and
|
||
|
:user:`Guillaume Lemaitre <glemaitre>`.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.feature_selection`
|
||
|
................................
|
||
|
|
||
|
- |Feature| Added select K best features functionality to
|
||
|
:class:`feature_selection.SelectFromModel`.
|
||
|
:issue:`6689` by :user:`Nihar Sheth <nsheth12>` and
|
||
|
:user:`Quazi Rahman <qmaruf>`.
|
||
|
|
||
|
- |Feature| Added ``min_features_to_select`` parameter to
|
||
|
:class:`feature_selection.RFECV` to bound evaluated features counts.
|
||
|
:issue:`11293` by :user:`Brent Yi <brentyi>`.
|
||
|
|
||
|
- |Feature| :class:`feature_selection.RFECV`'s fit method now supports
|
||
|
:term:`groups`. :issue:`9656` by :user:`Adam Greenhall <adamgreenhall>`.
|
||
|
|
||
|
- |Fix| Fixed computation of ``n_features_to_compute`` for edge case with tied
|
||
|
CV scores in :class:`feature_selection.RFECV`.
|
||
|
:issue:`9222` by :user:`Nick Hoh <nickypie>`.
|
||
|
|
||
|
:mod:`sklearn.gaussian_process`
|
||
|
...............................
|
||
|
|
||
|
- |Efficiency| In :class:`gaussian_process.GaussianProcessRegressor`, method
|
||
|
``predict`` is faster when using ``return_std=True`` in particular more when
|
||
|
called several times in a row. :issue:`9234` by :user:`andrewww <andrewww>`
|
||
|
and :user:`Minghui Liu <minghui-liu>`.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.impute`
|
||
|
.....................
|
||
|
|
||
|
- New module, adopting ``preprocessing.Imputer`` as
|
||
|
:class:`impute.SimpleImputer` with minor changes (see under preprocessing
|
||
|
below).
|
||
|
|
||
|
- |MajorFeature| Added :class:`impute.MissingIndicator` which generates a
|
||
|
binary indicator for missing values. :issue:`8075` by :user:`Maniteja Nandana
|
||
|
<maniteja123>` and :user:`Guillaume Lemaitre <glemaitre>`.
|
||
|
|
||
|
- |Feature| The :class:`impute.SimpleImputer` has a new strategy,
|
||
|
``'constant'``, to complete missing values with a fixed one, given by the
|
||
|
``fill_value`` parameter. This strategy supports numeric and non-numeric
|
||
|
data, and so does the ``'most_frequent'`` strategy now. :issue:`11211` by
|
||
|
:user:`Jeremie du Boisberranger <jeremiedbb>`.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.isotonic`
|
||
|
.......................
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`isotonic.IsotonicRegression` which incorrectly
|
||
|
combined weights when fitting a model to data involving points with
|
||
|
identical X values.
|
||
|
:issue:`9484` by :user:`Dallas Card <dallascard>`
|
||
|
|
||
|
|
||
|
:mod:`sklearn.linear_model`
|
||
|
...........................
|
||
|
|
||
|
- |Feature| :class:`linear_model.SGDClassifier`,
|
||
|
:class:`linear_model.SGDRegressor`,
|
||
|
:class:`linear_model.PassiveAggressiveClassifier`,
|
||
|
:class:`linear_model.PassiveAggressiveRegressor` and
|
||
|
:class:`linear_model.Perceptron` now expose ``early_stopping``,
|
||
|
``validation_fraction`` and ``n_iter_no_change`` parameters, to stop
|
||
|
optimization monitoring the score on a validation set. A new learning rate
|
||
|
``"adaptive"`` strategy divides the learning rate by 5 each time
|
||
|
``n_iter_no_change`` consecutive epochs fail to improve the model.
|
||
|
:issue:`9043` by `Tom Dupre la Tour`_.
|
||
|
|
||
|
- |Feature| Add `sample_weight` parameter to the fit method of
|
||
|
:class:`linear_model.BayesianRidge` for weighted linear regression.
|
||
|
:issue:`10112` by :user:`Peter St. John <pstjohn>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in `logistic.logistic_regression_path` to ensure
|
||
|
that the returned coefficients are correct when ``multiclass='multinomial'``.
|
||
|
Previously, some of the coefficients would override each other, leading to
|
||
|
incorrect results in :class:`linear_model.LogisticRegressionCV`.
|
||
|
:issue:`11724` by :user:`Nicolas Hug <NicolasHug>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`linear_model.LogisticRegression` where when using
|
||
|
the parameter ``multi_class='multinomial'``, the ``predict_proba`` method was
|
||
|
returning incorrect probabilities in the case of binary outcomes.
|
||
|
:issue:`9939` by :user:`Roger Westover <rwolst>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`linear_model.LogisticRegressionCV` where the
|
||
|
``score`` method always computes accuracy, not the metric given by
|
||
|
the ``scoring`` parameter.
|
||
|
:issue:`10998` by :user:`Thomas Fan <thomasjpfan>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`linear_model.LogisticRegressionCV` where the
|
||
|
'ovr' strategy was always used to compute cross-validation scores in the
|
||
|
multiclass setting, even if ``'multinomial'`` was set.
|
||
|
:issue:`8720` by :user:`William de Vazelhes <wdevazelhes>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`linear_model.OrthogonalMatchingPursuit` that was
|
||
|
broken when setting ``normalize=False``.
|
||
|
:issue:`10071` by `Alexandre Gramfort`_.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`linear_model.ARDRegression` which caused
|
||
|
incorrectly updated estimates for the standard deviation and the
|
||
|
coefficients. :issue:`10153` by :user:`Jörg Döpfert <jdoepfert>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`linear_model.ARDRegression` and
|
||
|
:class:`linear_model.BayesianRidge` which caused NaN predictions when fitted
|
||
|
with a constant target.
|
||
|
:issue:`10095` by :user:`Jörg Döpfert <jdoepfert>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`linear_model.RidgeClassifierCV` where
|
||
|
the parameter ``store_cv_values`` was not implemented though
|
||
|
it was documented in ``cv_values`` as a way to set up the storage
|
||
|
of cross-validation values for different alphas. :issue:`10297` by
|
||
|
:user:`Mabel Villalba-Jiménez <mabelvj>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`linear_model.ElasticNet` which caused the input
|
||
|
to be overridden when using parameter ``copy_X=True`` and
|
||
|
``check_input=False``. :issue:`10581` by :user:`Yacine Mazari <ymazari>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`sklearn.linear_model.Lasso`
|
||
|
where the coefficient had wrong shape when ``fit_intercept=False``.
|
||
|
:issue:`10687` by :user:`Martin Hahn <martin-hahn>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :func:`sklearn.linear_model.LogisticRegression` where the
|
||
|
``multi_class='multinomial'`` with binary output ``with warm_start=True``
|
||
|
:issue:`10836` by :user:`Aishwarya Srinivasan <aishgrt1>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`linear_model.RidgeCV` where using integer
|
||
|
``alphas`` raised an error.
|
||
|
:issue:`10397` by :user:`Mabel Villalba-Jiménez <mabelvj>`.
|
||
|
|
||
|
- |Fix| Fixed condition triggering gap computation in
|
||
|
:class:`linear_model.Lasso` and :class:`linear_model.ElasticNet` when working
|
||
|
with sparse matrices. :issue:`10992` by `Alexandre Gramfort`_.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`linear_model.SGDClassifier`,
|
||
|
:class:`linear_model.SGDRegressor`,
|
||
|
:class:`linear_model.PassiveAggressiveClassifier`,
|
||
|
:class:`linear_model.PassiveAggressiveRegressor` and
|
||
|
:class:`linear_model.Perceptron`, where the stopping criterion was stopping
|
||
|
the algorithm before convergence. A parameter ``n_iter_no_change`` was added
|
||
|
and set by default to 5. Previous behavior is equivalent to setting the
|
||
|
parameter to 1. :issue:`9043` by `Tom Dupre la Tour`_.
|
||
|
|
||
|
- |Fix| Fixed a bug where liblinear and libsvm-based estimators would segfault
|
||
|
if passed a scipy.sparse matrix with 64-bit indices. They now raise a
|
||
|
ValueError.
|
||
|
:issue:`11327` by :user:`Karan Dhingra <kdhingra307>` and `Joel Nothman`_.
|
||
|
|
||
|
- |API| The default values of the ``solver`` and ``multi_class`` parameters of
|
||
|
:class:`linear_model.LogisticRegression` will change respectively from
|
||
|
``'liblinear'`` and ``'ovr'`` in version 0.20 to ``'lbfgs'`` and
|
||
|
``'auto'`` in version 0.22. A FutureWarning is raised when the default
|
||
|
values are used. :issue:`11905` by `Tom Dupre la Tour`_ and `Joel Nothman`_.
|
||
|
|
||
|
- |API| Deprecate ``positive=True`` option in :class:`linear_model.Lars` as
|
||
|
the underlying implementation is broken. Use :class:`linear_model.Lasso`
|
||
|
instead. :issue:`9837` by `Alexandre Gramfort`_.
|
||
|
|
||
|
- |API| ``n_iter_`` may vary from previous releases in
|
||
|
:class:`linear_model.LogisticRegression` with ``solver='lbfgs'`` and
|
||
|
:class:`linear_model.HuberRegressor`. For Scipy <= 1.0.0, the optimizer could
|
||
|
perform more than the requested maximum number of iterations. Now both
|
||
|
estimators will report at most ``max_iter`` iterations even if more were
|
||
|
performed. :issue:`10723` by `Joel Nothman`_.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.manifold`
|
||
|
.......................
|
||
|
|
||
|
- |Efficiency| Speed improvements for both 'exact' and 'barnes_hut' methods in
|
||
|
:class:`manifold.TSNE`. :issue:`10593` and :issue:`10610` by
|
||
|
`Tom Dupre la Tour`_.
|
||
|
|
||
|
- |Feature| Support sparse input in :meth:`manifold.Isomap.fit`.
|
||
|
:issue:`8554` by :user:`Leland McInnes <lmcinnes>`.
|
||
|
|
||
|
- |Feature| `manifold.t_sne.trustworthiness` accepts metrics other than
|
||
|
Euclidean. :issue:`9775` by :user:`William de Vazelhes <wdevazelhes>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :func:`manifold.spectral_embedding` where the
|
||
|
normalization of the spectrum was using a division instead of a
|
||
|
multiplication. :issue:`8129` by :user:`Jan Margeta <jmargeta>`,
|
||
|
:user:`Guillaume Lemaitre <glemaitre>`, and :user:`Devansh D.
|
||
|
<devanshdalal>`.
|
||
|
|
||
|
- |API| |Feature| Deprecate ``precomputed`` parameter in function
|
||
|
`manifold.t_sne.trustworthiness`. Instead, the new parameter ``metric``
|
||
|
should be used with any compatible metric including 'precomputed', in which
|
||
|
case the input matrix ``X`` should be a matrix of pairwise distances or
|
||
|
squared distances. :issue:`9775` by :user:`William de Vazelhes
|
||
|
<wdevazelhes>`.
|
||
|
|
||
|
- |API| Deprecate ``precomputed`` parameter in function
|
||
|
`manifold.t_sne.trustworthiness`. Instead, the new parameter
|
||
|
``metric`` should be used with any compatible metric including
|
||
|
'precomputed', in which case the input matrix ``X`` should be a matrix of
|
||
|
pairwise distances or squared distances. :issue:`9775` by
|
||
|
:user:`William de Vazelhes <wdevazelhes>`.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.metrics`
|
||
|
......................
|
||
|
|
||
|
- |MajorFeature| Added the :func:`metrics.davies_bouldin_score` metric for
|
||
|
evaluation of clustering models without a ground truth. :issue:`10827` by
|
||
|
:user:`Luis Osa <logc>`.
|
||
|
|
||
|
- |MajorFeature| Added the :func:`metrics.balanced_accuracy_score` metric and
|
||
|
a corresponding ``'balanced_accuracy'`` scorer for binary and multiclass
|
||
|
classification. :issue:`8066` by :user:`xyguo` and :user:`Aman Dalmia
|
||
|
<dalmia>`, and :issue:`10587` by `Joel Nothman`_.
|
||
|
|
||
|
- |Feature| Partial AUC is available via ``max_fpr`` parameter in
|
||
|
:func:`metrics.roc_auc_score`. :issue:`3840` by
|
||
|
:user:`Alexander Niederbühl <Alexander-N>`.
|
||
|
|
||
|
- |Feature| A scorer based on :func:`metrics.brier_score_loss` is also
|
||
|
available. :issue:`9521` by :user:`Hanmin Qin <qinhanmin2014>`.
|
||
|
|
||
|
- |Feature| Added control over the normalization in
|
||
|
:func:`metrics.normalized_mutual_info_score` and
|
||
|
:func:`metrics.adjusted_mutual_info_score` via the ``average_method``
|
||
|
parameter. In version 0.22, the default normalizer for each will become
|
||
|
the *arithmetic* mean of the entropies of each clustering. :issue:`11124` by
|
||
|
:user:`Arya McCarthy <aryamccarthy>`.
|
||
|
|
||
|
- |Feature| Added ``output_dict`` parameter in :func:`metrics.classification_report`
|
||
|
to return classification statistics as dictionary.
|
||
|
:issue:`11160` by :user:`Dan Barkhorn <danielbarkhorn>`.
|
||
|
|
||
|
- |Feature| :func:`metrics.classification_report` now reports all applicable averages on
|
||
|
the given data, including micro, macro and weighted average as well as samples
|
||
|
average for multilabel data. :issue:`11679` by :user:`Alexander Pacha <apacha>`.
|
||
|
|
||
|
- |Feature| :func:`metrics.average_precision_score` now supports binary
|
||
|
``y_true`` other than ``{0, 1}`` or ``{-1, 1}`` through ``pos_label``
|
||
|
parameter. :issue:`9980` by :user:`Hanmin Qin <qinhanmin2014>`.
|
||
|
|
||
|
- |Feature| :func:`metrics.label_ranking_average_precision_score` now supports
|
||
|
``sample_weight``.
|
||
|
:issue:`10845` by :user:`Jose Perez-Parras Toledano <jopepato>`.
|
||
|
|
||
|
- |Feature| Add ``dense_output`` parameter to :func:`metrics.pairwise.linear_kernel`.
|
||
|
When False and both inputs are sparse, will return a sparse matrix.
|
||
|
:issue:`10999` by :user:`Taylor G Smith <tgsmith61591>`.
|
||
|
|
||
|
- |Efficiency| :func:`metrics.silhouette_score` and
|
||
|
:func:`metrics.silhouette_samples` are more memory efficient and run
|
||
|
faster. This avoids some reported freezes and MemoryErrors.
|
||
|
:issue:`11135` by `Joel Nothman`_.
|
||
|
|
||
|
- |Fix| Fixed a bug in :func:`metrics.precision_recall_fscore_support`
|
||
|
when truncated `range(n_labels)` is passed as value for `labels`.
|
||
|
:issue:`10377` by :user:`Gaurav Dhingra <gxyd>`.
|
||
|
|
||
|
- |Fix| Fixed a bug due to floating point error in
|
||
|
:func:`metrics.roc_auc_score` with non-integer sample weights. :issue:`9786`
|
||
|
by :user:`Hanmin Qin <qinhanmin2014>`.
|
||
|
|
||
|
- |Fix| Fixed a bug where :func:`metrics.roc_curve` sometimes starts on y-axis
|
||
|
instead of (0, 0), which is inconsistent with the document and other
|
||
|
implementations. Note that this will not influence the result from
|
||
|
:func:`metrics.roc_auc_score` :issue:`10093` by :user:`alexryndin
|
||
|
<alexryndin>` and :user:`Hanmin Qin <qinhanmin2014>`.
|
||
|
|
||
|
- |Fix| Fixed a bug to avoid integer overflow. Casted product to 64 bits integer in
|
||
|
:func:`metrics.mutual_info_score`.
|
||
|
:issue:`9772` by :user:`Kumar Ashutosh <thechargedneutron>`.
|
||
|
|
||
|
- |Fix| Fixed a bug where :func:`metrics.average_precision_score` will sometimes return
|
||
|
``nan`` when ``sample_weight`` contains 0.
|
||
|
:issue:`9980` by :user:`Hanmin Qin <qinhanmin2014>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :func:`metrics.fowlkes_mallows_score` to avoid integer
|
||
|
overflow. Casted return value of `contingency_matrix` to `int64` and computed
|
||
|
product of square roots rather than square root of product.
|
||
|
:issue:`9515` by :user:`Alan Liddell <aliddell>` and
|
||
|
:user:`Manh Dao <manhdao>`.
|
||
|
|
||
|
- |API| Deprecate ``reorder`` parameter in :func:`metrics.auc` as it's no
|
||
|
longer required for :func:`metrics.roc_auc_score`. Moreover using
|
||
|
``reorder=True`` can hide bugs due to floating point error in the input.
|
||
|
:issue:`9851` by :user:`Hanmin Qin <qinhanmin2014>`.
|
||
|
|
||
|
- |API| In :func:`metrics.normalized_mutual_info_score` and
|
||
|
:func:`metrics.adjusted_mutual_info_score`, warn that
|
||
|
``average_method`` will have a new default value. In version 0.22, the
|
||
|
default normalizer for each will become the *arithmetic* mean of the
|
||
|
entropies of each clustering. Currently,
|
||
|
:func:`metrics.normalized_mutual_info_score` uses the default of
|
||
|
``average_method='geometric'``, and
|
||
|
:func:`metrics.adjusted_mutual_info_score` uses the default of
|
||
|
``average_method='max'`` to match their behaviors in version 0.19.
|
||
|
:issue:`11124` by :user:`Arya McCarthy <aryamccarthy>`.
|
||
|
|
||
|
- |API| The ``batch_size`` parameter to :func:`metrics.pairwise_distances_argmin_min`
|
||
|
and :func:`metrics.pairwise_distances_argmin` is deprecated to be removed in
|
||
|
v0.22. It no longer has any effect, as batch size is determined by global
|
||
|
``working_memory`` config. See :ref:`working_memory`. :issue:`10280` by `Joel
|
||
|
Nothman`_ and :user:`Aman Dalmia <dalmia>`.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.mixture`
|
||
|
......................
|
||
|
|
||
|
- |Feature| Added function :term:`fit_predict` to :class:`mixture.GaussianMixture`
|
||
|
and :class:`mixture.GaussianMixture`, which is essentially equivalent to
|
||
|
calling :term:`fit` and :term:`predict`. :issue:`10336` by :user:`Shu Haoran
|
||
|
<haoranShu>` and :user:`Andrew Peng <Andrew-peng>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in `mixture.BaseMixture` where the reported `n_iter_` was
|
||
|
missing an iteration. It affected :class:`mixture.GaussianMixture` and
|
||
|
:class:`mixture.BayesianGaussianMixture`. :issue:`10740` by :user:`Erich
|
||
|
Schubert <kno10>` and :user:`Guillaume Lemaitre <glemaitre>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in `mixture.BaseMixture` and its subclasses
|
||
|
:class:`mixture.GaussianMixture` and :class:`mixture.BayesianGaussianMixture`
|
||
|
where the ``lower_bound_`` was not the max lower bound across all
|
||
|
initializations (when ``n_init > 1``), but just the lower bound of the last
|
||
|
initialization. :issue:`10869` by :user:`Aurélien Géron <ageron>`.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.model_selection`
|
||
|
..............................
|
||
|
|
||
|
- |Feature| Add `return_estimator` parameter in
|
||
|
:func:`model_selection.cross_validate` to return estimators fitted on each
|
||
|
split. :issue:`9686` by :user:`Aurélien Bellet <bellet>`.
|
||
|
|
||
|
- |Feature| New ``refit_time_`` attribute will be stored in
|
||
|
:class:`model_selection.GridSearchCV` and
|
||
|
:class:`model_selection.RandomizedSearchCV` if ``refit`` is set to ``True``.
|
||
|
This will allow measuring the complete time it takes to perform
|
||
|
hyperparameter optimization and refitting the best model on the whole
|
||
|
dataset. :issue:`11310` by :user:`Matthias Feurer <mfeurer>`.
|
||
|
|
||
|
- |Feature| Expose `error_score` parameter in
|
||
|
:func:`model_selection.cross_validate`,
|
||
|
:func:`model_selection.cross_val_score`,
|
||
|
:func:`model_selection.learning_curve` and
|
||
|
:func:`model_selection.validation_curve` to control the behavior triggered
|
||
|
when an error occurs in `model_selection._fit_and_score`.
|
||
|
:issue:`11576` by :user:`Samuel O. Ronsin <samronsin>`.
|
||
|
|
||
|
- |Feature| `BaseSearchCV` now has an experimental, private interface to
|
||
|
support customized parameter search strategies, through its ``_run_search``
|
||
|
method. See the implementations in :class:`model_selection.GridSearchCV` and
|
||
|
:class:`model_selection.RandomizedSearchCV` and please provide feedback if
|
||
|
you use this. Note that we do not assure the stability of this API beyond
|
||
|
version 0.20. :issue:`9599` by `Joel Nothman`_
|
||
|
|
||
|
- |Enhancement| Add improved error message in
|
||
|
:func:`model_selection.cross_val_score` when multiple metrics are passed in
|
||
|
``scoring`` keyword. :issue:`11006` by :user:`Ming Li <minggli>`.
|
||
|
|
||
|
- |API| The default number of cross-validation folds ``cv`` and the default
|
||
|
number of splits ``n_splits`` in the :class:`model_selection.KFold`-like
|
||
|
splitters will change from 3 to 5 in 0.22 as 3-fold has a lot of variance.
|
||
|
:issue:`11557` by :user:`Alexandre Boucaud <aboucaud>`.
|
||
|
|
||
|
- |API| The default of ``iid`` parameter of :class:`model_selection.GridSearchCV`
|
||
|
and :class:`model_selection.RandomizedSearchCV` will change from ``True`` to
|
||
|
``False`` in version 0.22 to correspond to the standard definition of
|
||
|
cross-validation, and the parameter will be removed in version 0.24
|
||
|
altogether. This parameter is of greatest practical significance where the
|
||
|
sizes of different test sets in cross-validation were very unequal, i.e. in
|
||
|
group-based CV strategies. :issue:`9085` by :user:`Laurent Direr <ldirer>`
|
||
|
and `Andreas Müller`_.
|
||
|
|
||
|
- |API| The default value of the ``error_score`` parameter in
|
||
|
:class:`model_selection.GridSearchCV` and
|
||
|
:class:`model_selection.RandomizedSearchCV` will change to ``np.NaN`` in
|
||
|
version 0.22. :issue:`10677` by :user:`Kirill Zhdanovich <Zhdanovich>`.
|
||
|
|
||
|
- |API| Changed ValueError exception raised in
|
||
|
:class:`model_selection.ParameterSampler` to a UserWarning for case where the
|
||
|
class is instantiated with a greater value of ``n_iter`` than the total space
|
||
|
of parameters in the parameter grid. ``n_iter`` now acts as an upper bound on
|
||
|
iterations. :issue:`10982` by :user:`Juliet Lawton <julietcl>`
|
||
|
|
||
|
- |API| Invalid input for :class:`model_selection.ParameterGrid` now
|
||
|
raises TypeError.
|
||
|
:issue:`10928` by :user:`Solutus Immensus <solutusimmensus>`
|
||
|
|
||
|
|
||
|
:mod:`sklearn.multioutput`
|
||
|
..........................
|
||
|
|
||
|
- |MajorFeature| Added :class:`multioutput.RegressorChain` for multi-target
|
||
|
regression. :issue:`9257` by :user:`Kumar Ashutosh <thechargedneutron>`.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.naive_bayes`
|
||
|
..........................
|
||
|
|
||
|
- |MajorFeature| Added :class:`naive_bayes.ComplementNB`, which implements the
|
||
|
Complement Naive Bayes classifier described in Rennie et al. (2003).
|
||
|
:issue:`8190` by :user:`Michael A. Alcorn <airalcorn2>`.
|
||
|
|
||
|
- |Feature| Add `var_smoothing` parameter in :class:`naive_bayes.GaussianNB`
|
||
|
to give a precise control over variances calculation.
|
||
|
:issue:`9681` by :user:`Dmitry Mottl <Mottl>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`naive_bayes.GaussianNB` which incorrectly
|
||
|
raised error for prior list which summed to 1.
|
||
|
:issue:`10005` by :user:`Gaurav Dhingra <gxyd>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`naive_bayes.MultinomialNB` which did not accept
|
||
|
vector valued pseudocounts (alpha).
|
||
|
:issue:`10346` by :user:`Tobias Madsen <TobiasMadsen>`
|
||
|
|
||
|
|
||
|
:mod:`sklearn.neighbors`
|
||
|
........................
|
||
|
|
||
|
- |Efficiency| :class:`neighbors.RadiusNeighborsRegressor` and
|
||
|
:class:`neighbors.RadiusNeighborsClassifier` are now
|
||
|
parallelized according to ``n_jobs`` regardless of ``algorithm``.
|
||
|
:issue:`10887` by :user:`Joël Billaud <recamshak>`.
|
||
|
|
||
|
- |Efficiency| :mod:`sklearn.neighbors` query methods are now more
|
||
|
memory efficient when ``algorithm='brute'``.
|
||
|
:issue:`11136` by `Joel Nothman`_ and :user:`Aman Dalmia <dalmia>`.
|
||
|
|
||
|
- |Feature| Add ``sample_weight`` parameter to the fit method of
|
||
|
:class:`neighbors.KernelDensity` to enable weighting in kernel density
|
||
|
estimation.
|
||
|
:issue:`4394` by :user:`Samuel O. Ronsin <samronsin>`.
|
||
|
|
||
|
- |Feature| Novelty detection with :class:`neighbors.LocalOutlierFactor`:
|
||
|
Add a ``novelty`` parameter to :class:`neighbors.LocalOutlierFactor`. When
|
||
|
``novelty`` is set to True, :class:`neighbors.LocalOutlierFactor` can then
|
||
|
be used for novelty detection, i.e. predict on new unseen data. Available
|
||
|
prediction methods are ``predict``, ``decision_function`` and
|
||
|
``score_samples``. By default, ``novelty`` is set to ``False``, and only
|
||
|
the ``fit_predict`` method is available.
|
||
|
By :user:`Albert Thomas <albertcthomas>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`neighbors.NearestNeighbors` where fitting a
|
||
|
NearestNeighbors model fails when a) the distance metric used is a
|
||
|
callable and b) the input to the NearestNeighbors model is sparse.
|
||
|
:issue:`9579` by :user:`Thomas Kober <tttthomasssss>`.
|
||
|
|
||
|
- |Fix| Fixed a bug so ``predict`` in
|
||
|
:class:`neighbors.RadiusNeighborsRegressor` can handle empty neighbor set
|
||
|
when using non uniform weights. Also raises a new warning when no neighbors
|
||
|
are found for samples. :issue:`9655` by :user:`Andreas Bjerre-Nielsen
|
||
|
<abjer>`.
|
||
|
|
||
|
- |Fix| |Efficiency| Fixed a bug in ``KDTree`` construction that results in
|
||
|
faster construction and querying times.
|
||
|
:issue:`11556` by :user:`Jake VanderPlas <jakevdp>`
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`neighbors.KDTree` and :class:`neighbors.BallTree` where
|
||
|
pickled tree objects would change their type to the super class `BinaryTree`.
|
||
|
:issue:`11774` by :user:`Nicolas Hug <NicolasHug>`.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.neural_network`
|
||
|
.............................
|
||
|
|
||
|
- |Feature| Add `n_iter_no_change` parameter in
|
||
|
`neural_network.BaseMultilayerPerceptron`,
|
||
|
:class:`neural_network.MLPRegressor`, and
|
||
|
:class:`neural_network.MLPClassifier` to give control over
|
||
|
maximum number of epochs to not meet ``tol`` improvement.
|
||
|
:issue:`9456` by :user:`Nicholas Nadeau <nnadeau>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in `neural_network.BaseMultilayerPerceptron`,
|
||
|
:class:`neural_network.MLPRegressor`, and
|
||
|
:class:`neural_network.MLPClassifier` with new ``n_iter_no_change``
|
||
|
parameter now at 10 from previously hardcoded 2.
|
||
|
:issue:`9456` by :user:`Nicholas Nadeau <nnadeau>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`neural_network.MLPRegressor` where fitting
|
||
|
quit unexpectedly early due to local minima or fluctuations.
|
||
|
:issue:`9456` by :user:`Nicholas Nadeau <nnadeau>`
|
||
|
|
||
|
|
||
|
:mod:`sklearn.pipeline`
|
||
|
.......................
|
||
|
|
||
|
- |Feature| The ``predict`` method of :class:`pipeline.Pipeline` now passes
|
||
|
keyword arguments on to the pipeline's last estimator, enabling the use of
|
||
|
parameters such as ``return_std`` in a pipeline with caution.
|
||
|
:issue:`9304` by :user:`Breno Freitas <brenolf>`.
|
||
|
|
||
|
- |API| :class:`pipeline.FeatureUnion` now supports ``'drop'`` as a transformer
|
||
|
to drop features. :issue:`11144` by :user:`Thomas Fan <thomasjpfan>`.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.preprocessing`
|
||
|
............................
|
||
|
|
||
|
- |MajorFeature| Expanded :class:`preprocessing.OneHotEncoder` to allow to
|
||
|
encode categorical string features as a numeric array using a one-hot (or
|
||
|
dummy) encoding scheme, and added :class:`preprocessing.OrdinalEncoder` to
|
||
|
convert to ordinal integers. Those two classes now handle encoding of all
|
||
|
feature types (also handles string-valued features) and derives the
|
||
|
categories based on the unique values in the features instead of the maximum
|
||
|
value in the features. :issue:`9151` and :issue:`10521` by :user:`Vighnesh
|
||
|
Birodkar <vighneshbirodkar>` and `Joris Van den Bossche`_.
|
||
|
|
||
|
- |MajorFeature| Added :class:`preprocessing.KBinsDiscretizer` for turning
|
||
|
continuous features into categorical or one-hot encoded
|
||
|
features. :issue:`7668`, :issue:`9647`, :issue:`10195`,
|
||
|
:issue:`10192`, :issue:`11272`, :issue:`11467` and :issue:`11505`.
|
||
|
by :user:`Henry Lin <hlin117>`, `Hanmin Qin`_,
|
||
|
`Tom Dupre la Tour`_ and :user:`Giovanni Giuseppe Costa <ggc87>`.
|
||
|
|
||
|
- |MajorFeature| Added :class:`preprocessing.PowerTransformer`, which
|
||
|
implements the Yeo-Johnson and Box-Cox power transformations. Power
|
||
|
transformations try to find a set of feature-wise parametric transformations
|
||
|
to approximately map data to a Gaussian distribution centered at zero and
|
||
|
with unit variance. This is useful as a variance-stabilizing transformation
|
||
|
in situations where normality and homoscedasticity are desirable.
|
||
|
:issue:`10210` by :user:`Eric Chang <chang>` and :user:`Maniteja
|
||
|
Nandana <maniteja123>`, and :issue:`11520` by :user:`Nicolas Hug
|
||
|
<nicolashug>`.
|
||
|
|
||
|
- |MajorFeature| NaN values are ignored and handled in the following
|
||
|
preprocessing methods:
|
||
|
:class:`preprocessing.MaxAbsScaler`,
|
||
|
:class:`preprocessing.MinMaxScaler`,
|
||
|
:class:`preprocessing.RobustScaler`,
|
||
|
:class:`preprocessing.StandardScaler`,
|
||
|
:class:`preprocessing.PowerTransformer`,
|
||
|
:class:`preprocessing.QuantileTransformer` classes and
|
||
|
:func:`preprocessing.maxabs_scale`,
|
||
|
:func:`preprocessing.minmax_scale`,
|
||
|
:func:`preprocessing.robust_scale`,
|
||
|
:func:`preprocessing.scale`,
|
||
|
:func:`preprocessing.power_transform`,
|
||
|
:func:`preprocessing.quantile_transform` functions respectively addressed in
|
||
|
issues :issue:`11011`, :issue:`11005`, :issue:`11308`, :issue:`11206`,
|
||
|
:issue:`11306`, and :issue:`10437`.
|
||
|
By :user:`Lucija Gregov <LucijaGregov>` and
|
||
|
:user:`Guillaume Lemaitre <glemaitre>`.
|
||
|
|
||
|
- |Feature| :class:`preprocessing.PolynomialFeatures` now supports sparse
|
||
|
input. :issue:`10452` by :user:`Aman Dalmia <dalmia>` and `Joel Nothman`_.
|
||
|
|
||
|
- |Feature| :class:`preprocessing.RobustScaler` and
|
||
|
:func:`preprocessing.robust_scale` can be fitted using sparse matrices.
|
||
|
:issue:`11308` by :user:`Guillaume Lemaitre <glemaitre>`.
|
||
|
|
||
|
- |Feature| :class:`preprocessing.OneHotEncoder` now supports the
|
||
|
`get_feature_names` method to obtain the transformed feature names.
|
||
|
:issue:`10181` by :user:`Nirvan Anjirbag <Nirvan101>` and
|
||
|
`Joris Van den Bossche`_.
|
||
|
|
||
|
- |Feature| A parameter ``check_inverse`` was added to
|
||
|
:class:`preprocessing.FunctionTransformer` to ensure that ``func`` and
|
||
|
``inverse_func`` are the inverse of each other.
|
||
|
:issue:`9399` by :user:`Guillaume Lemaitre <glemaitre>`.
|
||
|
|
||
|
- |Feature| The ``transform`` method of :class:`sklearn.preprocessing.MultiLabelBinarizer`
|
||
|
now ignores any unknown classes. A warning is raised stating the unknown classes
|
||
|
classes found which are ignored.
|
||
|
:issue:`10913` by :user:`Rodrigo Agundez <rragundez>`.
|
||
|
|
||
|
- |Fix| Fixed bugs in :class:`preprocessing.LabelEncoder` which would
|
||
|
sometimes throw errors when ``transform`` or ``inverse_transform`` was called
|
||
|
with empty arrays. :issue:`10458` by :user:`Mayur Kulkarni <maykulkarni>`.
|
||
|
|
||
|
- |Fix| Fix ValueError in :class:`preprocessing.LabelEncoder` when using
|
||
|
``inverse_transform`` on unseen labels. :issue:`9816` by :user:`Charlie Newey
|
||
|
<newey01c>`.
|
||
|
|
||
|
- |Fix| Fix bug in :class:`preprocessing.OneHotEncoder` which discarded the
|
||
|
``dtype`` when returning a sparse matrix output.
|
||
|
:issue:`11042` by :user:`Daniel Morales <DanielMorales9>`.
|
||
|
|
||
|
- |Fix| Fix ``fit`` and ``partial_fit`` in
|
||
|
:class:`preprocessing.StandardScaler` in the rare case when ``with_mean=False``
|
||
|
and `with_std=False` which was crashing by calling ``fit`` more than once and
|
||
|
giving inconsistent results for ``mean_`` whether the input was a sparse or a
|
||
|
dense matrix. ``mean_`` will be set to ``None`` with both sparse and dense
|
||
|
inputs. ``n_samples_seen_`` will be also reported for both input types.
|
||
|
:issue:`11235` by :user:`Guillaume Lemaitre <glemaitre>`.
|
||
|
|
||
|
- |API| Deprecate ``n_values`` and ``categorical_features`` parameters and
|
||
|
``active_features_``, ``feature_indices_`` and ``n_values_`` attributes
|
||
|
of :class:`preprocessing.OneHotEncoder`. The ``n_values`` parameter can be
|
||
|
replaced with the new ``categories`` parameter, and the attributes with the
|
||
|
new ``categories_`` attribute. Selecting the categorical features with
|
||
|
the ``categorical_features`` parameter is now better supported using the
|
||
|
:class:`compose.ColumnTransformer`.
|
||
|
:issue:`10521` by `Joris Van den Bossche`_.
|
||
|
|
||
|
- |API| Deprecate `preprocessing.Imputer` and move
|
||
|
the corresponding module to :class:`impute.SimpleImputer`.
|
||
|
:issue:`9726` by :user:`Kumar Ashutosh
|
||
|
<thechargedneutron>`.
|
||
|
|
||
|
- |API| The ``axis`` parameter that was in
|
||
|
`preprocessing.Imputer` is no longer present in
|
||
|
:class:`impute.SimpleImputer`. The behavior is equivalent
|
||
|
to ``axis=0`` (impute along columns). Row-wise
|
||
|
imputation can be performed with FunctionTransformer
|
||
|
(e.g., ``FunctionTransformer(lambda X:
|
||
|
SimpleImputer().fit_transform(X.T).T)``). :issue:`10829`
|
||
|
by :user:`Guillaume Lemaitre <glemaitre>` and
|
||
|
:user:`Gilberto Olimpio <gilbertoolimpio>`.
|
||
|
|
||
|
- |API| The NaN marker for the missing values has been changed
|
||
|
between the `preprocessing.Imputer` and the
|
||
|
`impute.SimpleImputer`.
|
||
|
``missing_values='NaN'`` should now be
|
||
|
``missing_values=np.nan``. :issue:`11211` by
|
||
|
:user:`Jeremie du Boisberranger <jeremiedbb>`.
|
||
|
|
||
|
- |API| In :class:`preprocessing.FunctionTransformer`, the default of
|
||
|
``validate`` will be from ``True`` to ``False`` in 0.22.
|
||
|
:issue:`10655` by :user:`Guillaume Lemaitre <glemaitre>`.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.svm`
|
||
|
..................
|
||
|
|
||
|
- |Fix| Fixed a bug in :class:`svm.SVC` where when the argument ``kernel`` is
|
||
|
unicode in Python2, the ``predict_proba`` method was raising an
|
||
|
unexpected TypeError given dense inputs.
|
||
|
:issue:`10412` by :user:`Jiongyan Zhang <qmick>`.
|
||
|
|
||
|
- |API| Deprecate ``random_state`` parameter in :class:`svm.OneClassSVM` as
|
||
|
the underlying implementation is not random.
|
||
|
:issue:`9497` by :user:`Albert Thomas <albertcthomas>`.
|
||
|
|
||
|
- |API| The default value of ``gamma`` parameter of :class:`svm.SVC`,
|
||
|
:class:`~svm.NuSVC`, :class:`~svm.SVR`, :class:`~svm.NuSVR`,
|
||
|
:class:`~svm.OneClassSVM` will change from ``'auto'`` to ``'scale'`` in
|
||
|
version 0.22 to account better for unscaled features. :issue:`8361` by
|
||
|
:user:`Gaurav Dhingra <gxyd>` and :user:`Ting Neo <neokt>`.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.tree`
|
||
|
...................
|
||
|
|
||
|
- |Enhancement| Although private (and hence not assured API stability),
|
||
|
`tree._criterion.ClassificationCriterion` and
|
||
|
`tree._criterion.RegressionCriterion` may now be cimported and
|
||
|
extended. :issue:`10325` by :user:`Camil Staps <camilstaps>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in `tree.BaseDecisionTree` with `splitter="best"`
|
||
|
where split threshold could become infinite when values in X were
|
||
|
near infinite. :issue:`10536` by :user:`Jonathan Ohayon <Johayon>`.
|
||
|
|
||
|
- |Fix| Fixed a bug in `tree.MAE` to ensure sample weights are being
|
||
|
used during the calculation of tree MAE impurity. Previous behaviour could
|
||
|
cause suboptimal splits to be chosen since the impurity calculation
|
||
|
considered all samples to be of equal weight importance.
|
||
|
:issue:`11464` by :user:`John Stott <JohnStott>`.
|
||
|
|
||
|
|
||
|
:mod:`sklearn.utils`
|
||
|
....................
|
||
|
|
||
|
- |Feature| :func:`utils.check_array` and :func:`utils.check_X_y` now have
|
||
|
``accept_large_sparse`` to control whether scipy.sparse matrices with 64-bit
|
||
|
indices should be rejected.
|
||
|
:issue:`11327` by :user:`Karan Dhingra <kdhingra307>` and `Joel Nothman`_.
|
||
|
|
||
|
- |Efficiency| |Fix| Avoid copying the data in :func:`utils.check_array` when
|
||
|
the input data is a memmap (and ``copy=False``). :issue:`10663` by
|
||
|
:user:`Arthur Mensch <arthurmensch>` and :user:`Loïc Estève <lesteve>`.
|
||
|
|
||
|
- |API| :func:`utils.check_array` yield a ``FutureWarning`` indicating
|
||
|
that arrays of bytes/strings will be interpreted as decimal numbers
|
||
|
beginning in version 0.22. :issue:`10229` by :user:`Ryan Lee <rtlee9>`
|
||
|
|
||
|
|
||
|
Multiple modules
|
||
|
................
|
||
|
|
||
|
- |Feature| |API| More consistent outlier detection API:
|
||
|
Add a ``score_samples`` method in :class:`svm.OneClassSVM`,
|
||
|
:class:`ensemble.IsolationForest`, :class:`neighbors.LocalOutlierFactor`,
|
||
|
:class:`covariance.EllipticEnvelope`. It allows to access raw score
|
||
|
functions from original papers. A new ``offset_`` parameter allows to link
|
||
|
``score_samples`` and ``decision_function`` methods.
|
||
|
The ``contamination`` parameter of :class:`ensemble.IsolationForest` and
|
||
|
:class:`neighbors.LocalOutlierFactor` ``decision_function`` methods is used
|
||
|
to define this ``offset_`` such that outliers (resp. inliers) have negative (resp.
|
||
|
positive) ``decision_function`` values. By default, ``contamination`` is
|
||
|
kept unchanged to 0.1 for a deprecation period. In 0.22, it will be set to "auto",
|
||
|
thus using method-specific score offsets.
|
||
|
In :class:`covariance.EllipticEnvelope` ``decision_function`` method, the
|
||
|
``raw_values`` parameter is deprecated as the shifted Mahalanobis distance
|
||
|
will be always returned in 0.22. :issue:`9015` by `Nicolas Goix`_.
|
||
|
|
||
|
- |Feature| |API| A ``behaviour`` parameter has been introduced in :class:`ensemble.IsolationForest`
|
||
|
to ensure backward compatibility.
|
||
|
In the old behaviour, the ``decision_function`` is independent of the ``contamination``
|
||
|
parameter. A threshold attribute depending on the ``contamination`` parameter is thus
|
||
|
used.
|
||
|
In the new behaviour the ``decision_function`` is dependent on the ``contamination``
|
||
|
parameter, in such a way that 0 becomes its natural threshold to detect outliers.
|
||
|
Setting behaviour to "old" is deprecated and will not be possible in version 0.22.
|
||
|
Beside, the behaviour parameter will be removed in 0.24.
|
||
|
:issue:`11553` by `Nicolas Goix`_.
|
||
|
|
||
|
- |API| Added convergence warning to :class:`svm.LinearSVC` and
|
||
|
:class:`linear_model.LogisticRegression` when ``verbose`` is set to 0.
|
||
|
:issue:`10881` by :user:`Alexandre Sevin <AlexandreSev>`.
|
||
|
|
||
|
- |API| Changed warning type from :class:`UserWarning` to
|
||
|
:class:`exceptions.ConvergenceWarning` for failing convergence in
|
||
|
`linear_model.logistic_regression_path`,
|
||
|
:class:`linear_model.RANSACRegressor`, :func:`linear_model.ridge_regression`,
|
||
|
:class:`gaussian_process.GaussianProcessRegressor`,
|
||
|
:class:`gaussian_process.GaussianProcessClassifier`,
|
||
|
:func:`decomposition.fastica`, :class:`cross_decomposition.PLSCanonical`,
|
||
|
:class:`cluster.AffinityPropagation`, and :class:`cluster.Birch`.
|
||
|
:issue:`10306` by :user:`Jonathan Siebert <jotasi>`.
|
||
|
|
||
|
|
||
|
Miscellaneous
|
||
|
.............
|
||
|
|
||
|
- |MajorFeature| A new configuration parameter, ``working_memory`` was added
|
||
|
to control memory consumption limits in chunked operations, such as the new
|
||
|
:func:`metrics.pairwise_distances_chunked`. See :ref:`working_memory`.
|
||
|
:issue:`10280` by `Joel Nothman`_ and :user:`Aman Dalmia <dalmia>`.
|
||
|
|
||
|
- |Feature| The version of :mod:`joblib` bundled with Scikit-learn is now 0.12.
|
||
|
This uses a new default multiprocessing implementation, named `loky
|
||
|
<https://github.com/tomMoral/loky>`_. While this may incur some memory and
|
||
|
communication overhead, it should provide greater cross-platform stability
|
||
|
than relying on Python standard library multiprocessing. :issue:`11741` by
|
||
|
the Joblib developers, especially :user:`Thomas Moreau <tomMoral>` and
|
||
|
`Olivier Grisel`_.
|
||
|
|
||
|
- |Feature| An environment variable to use the site joblib instead of the
|
||
|
vendored one was added (:ref:`environment_variable`). The main API of joblib
|
||
|
is now exposed in :mod:`sklearn.utils`.
|
||
|
:issue:`11166` by `Gael Varoquaux`_.
|
||
|
|
||
|
- |Feature| Add almost complete PyPy 3 support. Known unsupported
|
||
|
functionalities are :func:`datasets.load_svmlight_file`,
|
||
|
:class:`feature_extraction.FeatureHasher` and
|
||
|
:class:`feature_extraction.text.HashingVectorizer`. For running on PyPy,
|
||
|
PyPy3-v5.10+, Numpy 1.14.0+, and scipy 1.1.0+ are required.
|
||
|
:issue:`11010` by :user:`Ronan Lamy <rlamy>` and `Roman Yurchak`_.
|
||
|
|
||
|
- |Feature| A utility method :func:`sklearn.show_versions()` was added to
|
||
|
print out information relevant for debugging. It includes the user system,
|
||
|
the Python executable, the version of the main libraries and BLAS binding
|
||
|
information. :issue:`11596` by :user:`Alexandre Boucaud <aboucaud>`
|
||
|
|
||
|
- |Fix| Fixed a bug when setting parameters on meta-estimator, involving both
|
||
|
a wrapped estimator and its parameter. :issue:`9999` by :user:`Marcus Voss
|
||
|
<marcus-voss>` and `Joel Nothman`_.
|
||
|
|
||
|
- |Fix| Fixed a bug where calling :func:`sklearn.base.clone` was not thread
|
||
|
safe and could result in a "pop from empty list" error. :issue:`9569`
|
||
|
by `Andreas Müller`_.
|
||
|
|
||
|
- |API| The default value of ``n_jobs`` is changed from ``1`` to ``None`` in
|
||
|
all related functions and classes. ``n_jobs=None`` means ``unset``. It will
|
||
|
generally be interpreted as ``n_jobs=1``, unless the current
|
||
|
``joblib.Parallel`` backend context specifies otherwise (See
|
||
|
:term:`Glossary <n_jobs>` for additional information). Note that this change
|
||
|
happens immediately (i.e., without a deprecation cycle).
|
||
|
:issue:`11741` by `Olivier Grisel`_.
|
||
|
|
||
|
- |Fix| Fixed a bug in validation helpers where passing a Dask DataFrame results
|
||
|
in an error. :issue:`12462` by :user:`Zachariah Miller <zwmiller>`
|
||
|
|
||
|
Changes to estimator checks
|
||
|
---------------------------
|
||
|
|
||
|
These changes mostly affect library developers.
|
||
|
|
||
|
- Checks for transformers now apply if the estimator implements
|
||
|
:term:`transform`, regardless of whether it inherits from
|
||
|
:class:`sklearn.base.TransformerMixin`. :issue:`10474` by `Joel Nothman`_.
|
||
|
|
||
|
- Classifiers are now checked for consistency between :term:`decision_function`
|
||
|
and categorical predictions.
|
||
|
:issue:`10500` by :user:`Narine Kokhlikyan <NarineK>`.
|
||
|
|
||
|
- Allow tests in :func:`utils.estimator_checks.check_estimator` to test functions
|
||
|
that accept pairwise data.
|
||
|
:issue:`9701` by :user:`Kyle Johnson <gkjohns>`
|
||
|
|
||
|
- Allow :func:`utils.estimator_checks.check_estimator` to check that there is no
|
||
|
private settings apart from parameters during estimator initialization.
|
||
|
:issue:`9378` by :user:`Herilalaina Rakotoarison <herilalaina>`
|
||
|
|
||
|
- The set of checks in :func:`utils.estimator_checks.check_estimator` now includes a
|
||
|
``check_set_params`` test which checks that ``set_params`` is equivalent to
|
||
|
passing parameters in ``__init__`` and warns if it encounters parameter
|
||
|
validation. :issue:`7738` by :user:`Alvin Chiang <absolutelyNoWarranty>`
|
||
|
|
||
|
- Add invariance tests for clustering metrics. :issue:`8102` by :user:`Ankita
|
||
|
Sinha <anki08>` and :user:`Guillaume Lemaitre <glemaitre>`.
|
||
|
|
||
|
- Add ``check_methods_subset_invariance`` to
|
||
|
:func:`~utils.estimator_checks.check_estimator`, which checks that
|
||
|
estimator methods are invariant if applied to a data subset.
|
||
|
:issue:`10428` by :user:`Jonathan Ohayon <Johayon>`
|
||
|
|
||
|
- Add tests in :func:`utils.estimator_checks.check_estimator` to check that an
|
||
|
estimator can handle read-only memmap input data. :issue:`10663` by
|
||
|
:user:`Arthur Mensch <arthurmensch>` and :user:`Loïc Estève <lesteve>`.
|
||
|
|
||
|
- ``check_sample_weights_pandas_series`` now uses 8 rather than 6 samples
|
||
|
to accommodate for the default number of clusters in :class:`cluster.KMeans`.
|
||
|
:issue:`10933` by :user:`Johannes Hansen <jnhansen>`.
|
||
|
|
||
|
- Estimators are now checked for whether ``sample_weight=None`` equates to
|
||
|
``sample_weight=np.ones(...)``.
|
||
|
:issue:`11558` by :user:`Sergul Aydore <sergulaydore>`.
|
||
|
|
||
|
|
||
|
Code and Documentation Contributors
|
||
|
-----------------------------------
|
||
|
|
||
|
Thanks to everyone who has contributed to the maintenance and improvement of the
|
||
|
project since version 0.19, including:
|
||
|
|
||
|
211217613, Aarshay Jain, absolutelyNoWarranty, Adam Greenhall, Adam Kleczewski,
|
||
|
Adam Richie-Halford, adelr, AdityaDaflapurkar, Adrin Jalali, Aidan Fitzgerald,
|
||
|
aishgrt1, Akash Shivram, Alan Liddell, Alan Yee, Albert Thomas, Alexander
|
||
|
Lenail, Alexander-N, Alexandre Boucaud, Alexandre Gramfort, Alexandre Sevin,
|
||
|
Alex Egg, Alvaro Perez-Diaz, Amanda, Aman Dalmia, Andreas Bjerre-Nielsen,
|
||
|
Andreas Mueller, Andrew Peng, Angus Williams, Aniruddha Dave, annaayzenshtat,
|
||
|
Anthony Gitter, Antonio Quinonez, Anubhav Marwaha, Arik Pamnani, Arthur Ozga,
|
||
|
Artiem K, Arunava, Arya McCarthy, Attractadore, Aurélien Bellet, Aurélien
|
||
|
Geron, Ayush Gupta, Balakumaran Manoharan, Bangda Sun, Barry Hart, Bastian
|
||
|
Venthur, Ben Lawson, Benn Roth, Breno Freitas, Brent Yi, brett koonce, Caio
|
||
|
Oliveira, Camil Staps, cclauss, Chady Kamar, Charlie Brummitt, Charlie Newey,
|
||
|
chris, Chris, Chris Catalfo, Chris Foster, Chris Holdgraf, Christian Braune,
|
||
|
Christian Hirsch, Christian Hogan, Christopher Jenness, Clement Joudet, cnx,
|
||
|
cwitte, Dallas Card, Dan Barkhorn, Daniel, Daniel Ferreira, Daniel Gomez,
|
||
|
Daniel Klevebring, Danielle Shwed, Daniel Mohns, Danil Baibak, Darius Morawiec,
|
||
|
David Beach, David Burns, David Kirkby, David Nicholson, David Pickup, Derek,
|
||
|
Didi Bar-Zev, diegodlh, Dillon Gardner, Dillon Niederhut, dilutedsauce,
|
||
|
dlovell, Dmitry Mottl, Dmitry Petrov, Dor Cohen, Douglas Duhaime, Ekaterina
|
||
|
Tuzova, Eric Chang, Eric Dean Sanchez, Erich Schubert, Eunji, Fang-Chieh Chou,
|
||
|
FarahSaeed, felix, Félix Raimundo, fenx, filipj8, FrankHui, Franz Wompner,
|
||
|
Freija Descamps, frsi, Gabriele Calvo, Gael Varoquaux, Gaurav Dhingra, Georgi
|
||
|
Peev, Gil Forsyth, Giovanni Giuseppe Costa, gkevinyen5418, goncalo-rodrigues,
|
||
|
Gryllos Prokopis, Guillaume Lemaitre, Guillaume "Vermeille" Sanchez, Gustavo De
|
||
|
Mari Pereira, hakaa1, Hanmin Qin, Henry Lin, Hong, Honghe, Hossein Pourbozorg,
|
||
|
Hristo, Hunan Rostomyan, iampat, Ivan PANICO, Jaewon Chung, Jake VanderPlas,
|
||
|
jakirkham, James Bourbeau, James Malcolm, Jamie Cox, Jan Koch, Jan Margeta, Jan
|
||
|
Schlüter, janvanrijn, Jason Wolosonovich, JC Liu, Jeb Bearer, jeremiedbb, Jimmy
|
||
|
Wan, Jinkun Wang, Jiongyan Zhang, jjabl, jkleint, Joan Massich, Joël Billaud,
|
||
|
Joel Nothman, Johannes Hansen, JohnStott, Jonatan Samoocha, Jonathan Ohayon,
|
||
|
Jörg Döpfert, Joris Van den Bossche, Jose Perez-Parras Toledano, josephsalmon,
|
||
|
jotasi, jschendel, Julian Kuhlmann, Julien Chaumond, julietcl, Justin Shenk,
|
||
|
Karl F, Kasper Primdal Lauritzen, Katrin Leinweber, Kirill, ksemb, Kuai Yu,
|
||
|
Kumar Ashutosh, Kyeongpil Kang, Kye Taylor, kyledrogo, Leland McInnes, Léo DS,
|
||
|
Liam Geron, Liutong Zhou, Lizao Li, lkjcalc, Loic Esteve, louib, Luciano Viola,
|
||
|
Lucija Gregov, Luis Osa, Luis Pedro Coelho, Luke M Craig, Luke Persola, Mabel,
|
||
|
Mabel Villalba, Maniteja Nandana, MarkIwanchyshyn, Mark Roth, Markus Müller,
|
||
|
MarsGuy, Martin Gubri, martin-hahn, martin-kokos, mathurinm, Matthias Feurer,
|
||
|
Max Copeland, Mayur Kulkarni, Meghann Agarwal, Melanie Goetz, Michael A.
|
||
|
Alcorn, Minghui Liu, Ming Li, Minh Le, Mohamed Ali Jamaoui, Mohamed Maskani,
|
||
|
Mohammad Shahebaz, Muayyad Alsadi, Nabarun Pal, Nagarjuna Kumar, Naoya Kanai,
|
||
|
Narendran Santhanam, NarineK, Nathaniel Saul, Nathan Suh, Nicholas Nadeau,
|
||
|
P.Eng., AVS, Nick Hoh, Nicolas Goix, Nicolas Hug, Nicolau Werneck,
|
||
|
nielsenmarkus11, Nihar Sheth, Nikita Titov, Nilesh Kevlani, Nirvan Anjirbag,
|
||
|
notmatthancock, nzw, Oleksandr Pavlyk, oliblum90, Oliver Rausch, Olivier
|
||
|
Grisel, Oren Milman, Osaid Rehman Nasir, pasbi, Patrick Fernandes, Patrick
|
||
|
Olden, Paul Paczuski, Pedro Morales, Peter, Peter St. John, pierreablin,
|
||
|
pietruh, Pinaki Nath Chowdhury, Piotr Szymański, Pradeep Reddy Raamana, Pravar
|
||
|
D Mahajan, pravarmahajan, QingYing Chen, Raghav RV, Rajendra arora,
|
||
|
RAKOTOARISON Herilalaina, Rameshwar Bhaskaran, RankyLau, Rasul Kerimov,
|
||
|
Reiichiro Nakano, Rob, Roman Kosobrodov, Roman Yurchak, Ronan Lamy, rragundez,
|
||
|
Rüdiger Busche, Ryan, Sachin Kelkar, Sagnik Bhattacharya, Sailesh Choyal, Sam
|
||
|
Radhakrishnan, Sam Steingold, Samuel Bell, Samuel O. Ronsin, Saqib Nizam
|
||
|
Shamsi, SATISH J, Saurabh Gupta, Scott Gigante, Sebastian Flennerhag, Sebastian
|
||
|
Raschka, Sebastien Dubois, Sébastien Lerique, Sebastin Santy, Sergey Feldman,
|
||
|
Sergey Melderis, Sergul Aydore, Shahebaz, Shalil Awaley, Shangwu Yao, Sharad
|
||
|
Vijalapuram, Sharan Yalburgi, shenhanc78, Shivam Rastogi, Shu Haoran, siftikha,
|
||
|
Sinclert Pérez, SolutusImmensus, Somya Anand, srajan paliwal, Sriharsha Hatwar,
|
||
|
Sri Krishna, Stefan van der Walt, Stephen McDowell, Steven Brown, syonekura,
|
||
|
Taehoon Lee, Takanori Hayashi, tarcusx, Taylor G Smith, theriley106, Thomas,
|
||
|
Thomas Fan, Thomas Heavey, Tobias Madsen, tobycheese, Tom Augspurger, Tom Dupré
|
||
|
la Tour, Tommy, Trevor Stephens, Trishnendu Ghorai, Tulio Casagrande,
|
||
|
twosigmajab, Umar Farouk Umar, Urvang Patel, Utkarsh Upadhyay, Vadim
|
||
|
Markovtsev, Varun Agrawal, Vathsala Achar, Vilhelm von Ehrenheim, Vinayak
|
||
|
Mehta, Vinit, Vinod Kumar L, Viraj Mavani, Viraj Navkal, Vivek Kumar, Vlad
|
||
|
Niculae, vqean3, Vrishank Bhardwaj, vufg, wallygauze, Warut Vijitbenjaronk,
|
||
|
wdevazelhes, Wenhao Zhang, Wes Barnett, Will, William de Vazelhes, Will
|
||
|
Rosenfeld, Xin Xiong, Yiming (Paul) Li, ymazari, Yufeng, Zach Griffith, Zé
|
||
|
Vinícius, Zhenqing Hu, Zhiqing Xiao, Zijie (ZJ) Poh
|