sklearn/doc/whats_new/v0.18.rst

.. include:: _contributors.rst

.. currentmodule:: sklearn

============
Version 0.18
============

.. warning::

    Scikit-learn 0.18 is the last major release of scikit-learn to support Python 2.6.
    Later versions of scikit-learn will require Python 2.7 or above.


.. _changes_0_18_2:

Version 0.18.2
==============

**June 20, 2017**

Changelog
---------

- Fixes for compatibility with NumPy 1.13.0: :issue:`7946` :issue:`8355` by
  `Loic Esteve`_.

- Minor compatibility changes in the examples :issue:`9010` :issue:`8040`
  :issue:`9149`.

Code Contributors
-----------------
Aman Dalmia, Loic Esteve, Nate Guerin, Sergei Lebedev


.. _changes_0_18_1:

Version 0.18.1
==============

**November 11, 2016**

Changelog
---------

Enhancements
............

- Improved ``sample_without_replacement`` speed by utilizing
  numpy.random.permutation for most cases. As a result,
  samples may differ in this release for a fixed random state.
  Affected estimators:

  - :class:`ensemble.BaggingClassifier`
  - :class:`ensemble.BaggingRegressor`
  - :class:`linear_model.RANSACRegressor`
  - :class:`model_selection.RandomizedSearchCV`
  - :class:`random_projection.SparseRandomProjection`

  This also affects the :meth:`datasets.make_classification`
  method.

Bug fixes
.........

- Fix issue where ``min_grad_norm`` and ``n_iter_without_progress``
  parameters were not being utilised by :class:`manifold.TSNE`.
  :issue:`6497` by :user:`Sebastian Säger <ssaeger>`

- Fix bug for svm's decision values when ``decision_function_shape``
  is ``ovr`` in :class:`svm.SVC`.
  :class:`svm.SVC`'s decision_function was incorrect from versions
  0.17.0 through 0.18.0.
  :issue:`7724` by `Bing Tian Dai`_

- Attribute ``explained_variance_ratio`` of
  :class:`discriminant_analysis.LinearDiscriminantAnalysis` calculated
  with SVD and Eigen solver are now of the same length. :issue:`7632`
  by :user:`JPFrancoia <JPFrancoia>`

- Fixes issue in :ref:`univariate_feature_selection` where score
  functions were not accepting multi-label targets. :issue:`7676`
  by :user:`Mohammed Affan <affanv14>`

- Fixed setting parameters when calling ``fit`` multiple times on
  :class:`feature_selection.SelectFromModel`. :issue:`7756` by `Andreas Müller`_

- Fixes issue in ``partial_fit`` method of
  :class:`multiclass.OneVsRestClassifier` when number of classes used in
  ``partial_fit`` was less than the total number of classes in the
  data. :issue:`7786` by `Srivatsan Ramesh`_

- Fixes issue in :class:`calibration.CalibratedClassifierCV` where
  the sum of probabilities of each class for a data was not 1, and
  ``CalibratedClassifierCV`` now handles the case where the training set
  has less number of classes than the total data. :issue:`7799` by
  `Srivatsan Ramesh`_

- Fix a bug where :class:`sklearn.feature_selection.SelectFdr` did not
  exactly implement Benjamini-Hochberg procedure. It formerly may have
  selected fewer features than it should.
  :issue:`7490` by :user:`Peng Meng <mpjlu>`.

- :class:`sklearn.manifold.LocallyLinearEmbedding` now correctly handles
  integer inputs. :issue:`6282` by `Jake Vanderplas`_.

- The ``min_weight_fraction_leaf`` parameter of tree-based classifiers and
  regressors now assumes uniform sample weights by default if the
  ``sample_weight`` argument is not passed to the ``fit`` function.
  Previously, the parameter was silently ignored. :issue:`7301`
  by :user:`Nelson Liu <nelson-liu>`.

- Numerical issue with :class:`linear_model.RidgeCV` on centered data when
  `n_features > n_samples`. :issue:`6178` by `Bertrand Thirion`_

- Tree splitting criterion classes' cloning/pickling is now memory safe
  :issue:`7680` by :user:`Ibraim Ganiev <olologin>`.

- Fixed a bug where :class:`decomposition.NMF` sets its ``n_iters_``
  attribute in `transform()`. :issue:`7553` by :user:`Ekaterina
  Krivich <kiote>`.

- :class:`sklearn.linear_model.LogisticRegressionCV` now correctly handles
  string labels. :issue:`5874` by `Raghav RV`_.

- Fixed a bug where :func:`sklearn.model_selection.train_test_split` raised
  an error when ``stratify`` is a list of string labels. :issue:`7593` by
  `Raghav RV`_.

- Fixed a bug where :class:`sklearn.model_selection.GridSearchCV` and
  :class:`sklearn.model_selection.RandomizedSearchCV` were not pickleable
  because of a pickling bug in ``np.ma.MaskedArray``. :issue:`7594` by
  `Raghav RV`_.

- All cross-validation utilities in :mod:`sklearn.model_selection` now
  permit one time cross-validation splitters for the ``cv`` parameter. Also
  non-deterministic cross-validation splitters (where multiple calls to
  ``split`` produce dissimilar splits) can be used as ``cv`` parameter.
  The :class:`sklearn.model_selection.GridSearchCV` will cross-validate each
  parameter setting on the split produced by the first ``split`` call
  to the cross-validation splitter.  :issue:`7660` by `Raghav RV`_.

- Fix bug where :meth:`preprocessing.MultiLabelBinarizer.fit_transform`
  returned an invalid CSR matrix.
  :issue:`7750` by :user:`CJ Carey <perimosocordiae>`.

- Fixed a bug where :func:`metrics.pairwise.cosine_distances` could return a
  small negative distance. :issue:`7732` by :user:`Artsion <asanakoy>`.

API changes summary
-------------------

Trees and forests

- The ``min_weight_fraction_leaf`` parameter of tree-based classifiers and
  regressors now assumes uniform sample weights by default if the
  ``sample_weight`` argument is not passed to the ``fit`` function.
  Previously, the parameter was silently ignored. :issue:`7301` by :user:`Nelson
  Liu <nelson-liu>`.

- Tree splitting criterion classes' cloning/pickling is now memory safe.
  :issue:`7680` by :user:`Ibraim Ganiev <olologin>`.


Linear, kernelized and related models

- Length of ``explained_variance_ratio`` of
  :class:`discriminant_analysis.LinearDiscriminantAnalysis`
  changed for both Eigen and SVD solvers. The attribute has now a length
  of min(n_components, n_classes - 1). :issue:`7632`
  by :user:`JPFrancoia <JPFrancoia>`

- Numerical issue with :class:`linear_model.RidgeCV` on centered data when
  ``n_features > n_samples``. :issue:`6178` by `Bertrand Thirion`_

.. _changes_0_18:

Version 0.18
============

**September 28, 2016**

.. _model_selection_changes:

Model Selection Enhancements and API Changes
--------------------------------------------

- **The model_selection module**

  The new module :mod:`sklearn.model_selection`, which groups together the
  functionalities of formerly `sklearn.cross_validation`,
  `sklearn.grid_search` and `sklearn.learning_curve`, introduces new
  possibilities such as nested cross-validation and better manipulation of
  parameter searches with Pandas.

  Many things will stay the same but there are some key differences. Read
  below to know more about the changes.

- **Data-independent CV splitters enabling nested cross-validation**

  The new cross-validation splitters, defined in the
  :mod:`sklearn.model_selection`, are no longer initialized with any
  data-dependent parameters such as ``y``. Instead they expose a
  `split` method that takes in the data and yields a generator for the
  different splits.

  This change makes it possible to use the cross-validation splitters to
  perform nested cross-validation, facilitated by
  :class:`model_selection.GridSearchCV` and
  :class:`model_selection.RandomizedSearchCV` utilities.

- **The enhanced cv_results_ attribute**

  The new ``cv_results_`` attribute (of :class:`model_selection.GridSearchCV`
  and :class:`model_selection.RandomizedSearchCV`) introduced in lieu of the
  ``grid_scores_`` attribute is a dict of 1D arrays with elements in each
  array corresponding to the parameter settings (i.e. search candidates).

  The ``cv_results_`` dict can be easily imported into ``pandas`` as a
  ``DataFrame`` for exploring the search results.

  The ``cv_results_`` arrays include scores for each cross-validation split
  (with keys such as ``'split0_test_score'``), as well as their mean
  (``'mean_test_score'``) and standard deviation (``'std_test_score'``).

  The ranks for the search candidates (based on their mean
  cross-validation score) is available at ``cv_results_['rank_test_score']``.

  The parameter values for each parameter is stored separately as numpy
  masked object arrays. The value, for that search candidate, is masked if
  the corresponding parameter is not applicable. Additionally a list of all
  the parameter dicts are stored at ``cv_results_['params']``.

- **Parameters n_folds and n_iter renamed to n_splits**

  Some parameter names have changed:
  The ``n_folds`` parameter in new :class:`model_selection.KFold`,
  :class:`model_selection.GroupKFold` (see below for the name change),
  and :class:`model_selection.StratifiedKFold` is now renamed to
  ``n_splits``. The ``n_iter`` parameter in
  :class:`model_selection.ShuffleSplit`, the new class
  :class:`model_selection.GroupShuffleSplit` and
  :class:`model_selection.StratifiedShuffleSplit` is now renamed to
  ``n_splits``.

- **Rename of splitter classes which accepts group labels along with data**

  The cross-validation splitters ``LabelKFold``,
  ``LabelShuffleSplit``, ``LeaveOneLabelOut`` and ``LeavePLabelOut`` have
  been renamed to :class:`model_selection.GroupKFold`,
  :class:`model_selection.GroupShuffleSplit`,
  :class:`model_selection.LeaveOneGroupOut` and
  :class:`model_selection.LeavePGroupsOut` respectively.

  Note the change from singular to plural form in
  :class:`model_selection.LeavePGroupsOut`.

- **Fit parameter labels renamed to groups**

  The ``labels`` parameter in the `split` method of the newly renamed
  splitters :class:`model_selection.GroupKFold`,
  :class:`model_selection.LeaveOneGroupOut`,
  :class:`model_selection.LeavePGroupsOut`,
  :class:`model_selection.GroupShuffleSplit` is renamed to ``groups``
  following the new nomenclature of their class names.

- **Parameter n_labels renamed to n_groups**

  The parameter ``n_labels`` in the newly renamed
  :class:`model_selection.LeavePGroupsOut` is changed to ``n_groups``.

- Training scores and Timing information

  ``cv_results_`` also includes the training scores for each
  cross-validation split (with keys such as ``'split0_train_score'``), as
  well as their mean (``'mean_train_score'``) and standard deviation
  (``'std_train_score'``). To avoid the cost of evaluating training score,
  set ``return_train_score=False``.

  Additionally the mean and standard deviation of the times taken to split,
  train and score the model across all the cross-validation splits is
  available at the key ``'mean_time'`` and ``'std_time'`` respectively.

Changelog
---------

New features
............

Classifiers and Regressors

- The Gaussian Process module has been reimplemented and now offers classification
  and regression estimators through :class:`gaussian_process.GaussianProcessClassifier`
  and  :class:`gaussian_process.GaussianProcessRegressor`. Among other things, the new
  implementation supports kernel engineering, gradient-based hyperparameter optimization or
  sampling of functions from GP prior and GP posterior. Extensive documentation and
  examples are provided. By `Jan Hendrik Metzen`_.

- Added new supervised learning algorithm: :ref:`Multi-layer Perceptron <multilayer_perceptron>`
  :issue:`3204` by :user:`Issam H. Laradji <IssamLaradji>`

- Added :class:`linear_model.HuberRegressor`, a linear model robust to outliers.
  :issue:`5291` by `Manoj Kumar`_.

- Added the :class:`multioutput.MultiOutputRegressor` meta-estimator. It
  converts single output regressors to multi-output regressors by fitting
  one regressor per output. By :user:`Tim Head <betatim>`.

Other estimators

- New :class:`mixture.GaussianMixture` and :class:`mixture.BayesianGaussianMixture`
  replace former mixture models, employing faster inference
  for sounder results. :issue:`7295` by :user:`Wei Xue <xuewei4d>` and
  :user:`Thierry Guillemot <tguillemot>`.

- Class `decomposition.RandomizedPCA` is now factored into :class:`decomposition.PCA`
  and it is available calling with parameter ``svd_solver='randomized'``.
  The default number of ``n_iter`` for ``'randomized'`` has changed to 4. The old
  behavior of PCA is recovered by ``svd_solver='full'``. An additional solver
  calls ``arpack`` and performs truncated (non-randomized) SVD. By default,
  the best solver is selected depending on the size of the input and the
  number of components requested. :issue:`5299` by :user:`Giorgio Patrini <giorgiop>`.

- Added two functions for mutual information estimation:
  :func:`feature_selection.mutual_info_classif` and
  :func:`feature_selection.mutual_info_regression`. These functions can be
  used in :class:`feature_selection.SelectKBest` and
  :class:`feature_selection.SelectPercentile` as score functions.
  By :user:`Andrea Bravi <AndreaBravi>` and :user:`Nikolay Mayorov <nmayorov>`.

- Added the :class:`ensemble.IsolationForest` class for anomaly detection based on
  random forests. By `Nicolas Goix`_.

- Added ``algorithm="elkan"`` to :class:`cluster.KMeans` implementing
  Elkan's fast K-Means algorithm. By `Andreas Müller`_.

Model selection and evaluation

- Added :func:`metrics.fowlkes_mallows_score`, the Fowlkes Mallows
  Index which measures the similarity of two clusterings of a set of points
  By :user:`Arnaud Fouchet <afouchet>` and :user:`Thierry Guillemot <tguillemot>`.

- Added `metrics.calinski_harabaz_score`, which computes the Calinski
  and Harabaz score to evaluate the resulting clustering of a set of points.
  By :user:`Arnaud Fouchet <afouchet>` and :user:`Thierry Guillemot <tguillemot>`.

- Added new cross-validation splitter
  :class:`model_selection.TimeSeriesSplit` to handle time series data.
  :issue:`6586` by :user:`YenChen Lin <yenchenlin>`

- The cross-validation iterators are replaced by cross-validation splitters
  available from :mod:`sklearn.model_selection`, allowing for nested
  cross-validation. See :ref:`model_selection_changes` for more information.
  :issue:`4294` by `Raghav RV`_.

Enhancements
............

Trees and ensembles

- Added a new splitting criterion for :class:`tree.DecisionTreeRegressor`,
  the mean absolute error. This criterion can also be used in
  :class:`ensemble.ExtraTreesRegressor`,
  :class:`ensemble.RandomForestRegressor`, and the gradient boosting
  estimators. :issue:`6667` by :user:`Nelson Liu <nelson-liu>`.

- Added weighted impurity-based early stopping criterion for decision tree
  growth. :issue:`6954` by :user:`Nelson Liu <nelson-liu>`

- The random forest, extra tree and decision tree estimators now has a
  method ``decision_path`` which returns the decision path of samples in
  the tree. By `Arnaud Joly`_.

- A new example has been added unveiling the decision tree structure.
  By `Arnaud Joly`_.

- Random forest, extra trees, decision trees and gradient boosting estimator
  accept the parameter ``min_samples_split`` and ``min_samples_leaf``
  provided as a percentage of the training samples. By :user:`yelite <yelite>` and `Arnaud Joly`_.

- Gradient boosting estimators accept the parameter ``criterion`` to specify
  to splitting criterion used in built decision trees.
  :issue:`6667` by :user:`Nelson Liu <nelson-liu>`.

- The memory footprint is reduced (sometimes greatly) for
  `ensemble.bagging.BaseBagging` and classes that inherit from it,
  i.e, :class:`ensemble.BaggingClassifier`,
  :class:`ensemble.BaggingRegressor`, and :class:`ensemble.IsolationForest`,
  by dynamically generating attribute ``estimators_samples_`` only when it is
  needed. By :user:`David Staub <staubda>`.

- Added ``n_jobs`` and ``sample_weight`` parameters for
  :class:`ensemble.VotingClassifier` to fit underlying estimators in parallel.
  :issue:`5805` by :user:`Ibraim Ganiev <olologin>`.

Linear, kernelized and related models

- In :class:`linear_model.LogisticRegression`, the SAG solver is now
  available in the multinomial case. :issue:`5251` by `Tom Dupre la Tour`_.

- :class:`linear_model.RANSACRegressor`, :class:`svm.LinearSVC` and
  :class:`svm.LinearSVR` now support ``sample_weight``.
  By :user:`Imaculate <Imaculate>`.

- Add parameter ``loss`` to :class:`linear_model.RANSACRegressor` to measure the
  error on the samples for every trial. By `Manoj Kumar`_.

- Prediction of out-of-sample events with Isotonic Regression
  (:class:`isotonic.IsotonicRegression`) is now much faster (over 1000x in tests with synthetic
  data). By :user:`Jonathan Arfa <jarfa>`.

- Isotonic regression (:class:`isotonic.IsotonicRegression`) now uses a better algorithm to avoid
  `O(n^2)` behavior in pathological cases, and is also generally faster
  (:issue:`#6691`). By `Antony Lee`_.

- :class:`naive_bayes.GaussianNB` now accepts data-independent class-priors
  through the parameter ``priors``. By :user:`Guillaume Lemaitre <glemaitre>`.

- :class:`linear_model.ElasticNet` and :class:`linear_model.Lasso`
  now works with ``np.float32`` input data without converting it
  into ``np.float64``. This allows to reduce the memory
  consumption. :issue:`6913` by :user:`YenChen Lin <yenchenlin>`.

- :class:`semi_supervised.LabelPropagation` and :class:`semi_supervised.LabelSpreading`
  now accept arbitrary kernel functions in addition to strings ``knn`` and ``rbf``.
  :issue:`5762` by :user:`Utkarsh Upadhyay <musically-ut>`.

Decomposition, manifold learning and clustering

- Added ``inverse_transform`` function to :class:`decomposition.NMF` to compute
  data matrix of original shape. By :user:`Anish Shah <AnishShah>`.

- :class:`cluster.KMeans` and :class:`cluster.MiniBatchKMeans` now works
  with ``np.float32`` and ``np.float64`` input data without converting it.
  This allows to reduce the memory consumption by using ``np.float32``.
  :issue:`6846` by :user:`Sebastian Säger <ssaeger>` and
  :user:`YenChen Lin <yenchenlin>`.

Preprocessing and feature selection

- :class:`preprocessing.RobustScaler` now accepts ``quantile_range`` parameter.
  :issue:`5929` by :user:`Konstantin Podshumok <podshumok>`.

- :class:`feature_extraction.FeatureHasher` now accepts string values.
  :issue:`6173` by :user:`Ryad Zenine <ryadzenine>` and
  :user:`Devashish Deshpande <dsquareindia>`.

- Keyword arguments can now be supplied to ``func`` in
  :class:`preprocessing.FunctionTransformer` by means of the ``kw_args``
  parameter. By `Brian McFee`_.

- :class:`feature_selection.SelectKBest` and :class:`feature_selection.SelectPercentile`
  now accept score functions that take X, y as input and return only the scores.
  By :user:`Nikolay Mayorov <nmayorov>`.

Model evaluation and meta-estimators

- :class:`multiclass.OneVsOneClassifier` and :class:`multiclass.OneVsRestClassifier`
  now support ``partial_fit``. By :user:`Asish Panda <kaichogami>` and
  :user:`Philipp Dowling <phdowling>`.

- Added support for substituting or disabling :class:`pipeline.Pipeline`
  and :class:`pipeline.FeatureUnion` components using the ``set_params``
  interface that powers `sklearn.grid_search`.
  See :ref:`sphx_glr_auto_examples_compose_plot_compare_reduction.py`
  By `Joel Nothman`_ and :user:`Robert McGibbon <rmcgibbo>`.

- The new ``cv_results_`` attribute of :class:`model_selection.GridSearchCV`
  (and :class:`model_selection.RandomizedSearchCV`) can be easily imported
  into pandas as a ``DataFrame``. Ref :ref:`model_selection_changes` for
  more information. :issue:`6697` by `Raghav RV`_.

- Generalization of :func:`model_selection.cross_val_predict`.
  One can pass method names such as `predict_proba` to be used in the cross
  validation framework instead of the default `predict`.
  By :user:`Ori Ziv <zivori>` and :user:`Sears Merritt <merritts>`.

- The training scores and time taken for training followed by scoring for
  each search candidate are now available at the ``cv_results_`` dict.
  See :ref:`model_selection_changes` for more information.
  :issue:`7325` by :user:`Eugene Chen <eyc88>` and `Raghav RV`_.

Metrics

- Added ``labels`` flag to :class:`metrics.log_loss` to explicitly provide
  the labels when the number of classes in ``y_true`` and ``y_pred`` differ.
  :issue:`7239` by :user:`Hong Guangguo <hongguangguo>` with help from
  :user:`Mads Jensen <indianajensen>` and :user:`Nelson Liu <nelson-liu>`.

- Support sparse contingency matrices in cluster evaluation
  (`metrics.cluster.supervised`) to scale to a large number of
  clusters.
  :issue:`7419` by :user:`Gregory Stupp <stuppie>` and `Joel Nothman`_.

- Add ``sample_weight`` parameter to :func:`metrics.matthews_corrcoef`.
  By :user:`Jatin Shah <jatinshah>` and `Raghav RV`_.

- Speed up :func:`metrics.silhouette_score` by using vectorized operations.
  By `Manoj Kumar`_.

- Add ``sample_weight`` parameter to :func:`metrics.confusion_matrix`.
  By :user:`Bernardo Stein <DanielSidhion>`.

Miscellaneous

- Added ``n_jobs`` parameter to :class:`feature_selection.RFECV` to compute
  the score on the test folds in parallel. By `Manoj Kumar`_

- Codebase does not contain C/C++ cython generated files: they are
  generated during build. Distribution packages will still contain generated
  C/C++ files. By :user:`Arthur Mensch <arthurmensch>`.

- Reduce the memory usage for 32-bit float input arrays of
  `utils.sparse_func.mean_variance_axis` and
  `utils.sparse_func.incr_mean_variance_axis` by supporting cython
  fused types. By :user:`YenChen Lin <yenchenlin>`.

- The `ignore_warnings` now accept a category argument to ignore only
  the warnings of a specified type. By :user:`Thierry Guillemot <tguillemot>`.

- Added parameter ``return_X_y`` and return type ``(data, target) : tuple`` option to
  :func:`datasets.load_iris` dataset
  :issue:`7049`,
  :func:`datasets.load_breast_cancer` dataset
  :issue:`7152`,
  :func:`datasets.load_digits` dataset,
  :func:`datasets.load_diabetes` dataset,
  :func:`datasets.load_linnerud` dataset,
  `datasets.load_boston` dataset
  :issue:`7154` by
  :user:`Manvendra Singh<manu-chroma>`.

- Simplification of the ``clone`` function, deprecate support for estimators
  that modify parameters in ``__init__``. :issue:`5540` by `Andreas Müller`_.

- When unpickling a scikit-learn estimator in a different version than the one
  the estimator was trained with, a ``UserWarning`` is raised, see :ref:`the documentation
  on model persistence <persistence_limitations>` for more details. (:issue:`7248`)
  By `Andreas Müller`_.

Bug fixes
.........

Trees and ensembles

- Random forest, extra trees, decision trees and gradient boosting
  won't accept anymore ``min_samples_split=1`` as at least 2 samples
  are required to split a decision tree node. By `Arnaud Joly`_

- :class:`ensemble.VotingClassifier` now raises ``NotFittedError`` if ``predict``,
  ``transform`` or ``predict_proba`` are called on the non-fitted estimator.
  by `Sebastian Raschka`_.

- Fix bug where :class:`ensemble.AdaBoostClassifier` and
  :class:`ensemble.AdaBoostRegressor` would perform poorly if the
  ``random_state`` was fixed
  (:issue:`7411`). By `Joel Nothman`_.

- Fix bug in ensembles with randomization where the ensemble would not
  set ``random_state`` on base estimators in a pipeline or similar nesting.
  (:issue:`7411`). Note, results for :class:`ensemble.BaggingClassifier`
  :class:`ensemble.BaggingRegressor`, :class:`ensemble.AdaBoostClassifier`
  and :class:`ensemble.AdaBoostRegressor` will now differ from previous
  versions. By `Joel Nothman`_.

Linear, kernelized and related models

- Fixed incorrect gradient computation for ``loss='squared_epsilon_insensitive'`` in
  :class:`linear_model.SGDClassifier` and :class:`linear_model.SGDRegressor`
  (:issue:`6764`). By :user:`Wenhua Yang <geekoala>`.

- Fix bug in :class:`linear_model.LogisticRegressionCV` where
  ``solver='liblinear'`` did not accept ``class_weights='balanced``.
  (:issue:`6817`). By `Tom Dupre la Tour`_.

- Fix bug in :class:`neighbors.RadiusNeighborsClassifier` where an error
  occurred when there were outliers being labelled and a weight function
  specified (:issue:`6902`).  By
  `LeonieBorne <https://github.com/LeonieBorne>`_.

- Fix :class:`linear_model.ElasticNet` sparse decision function to match
  output with dense in the multioutput case.

Decomposition, manifold learning and clustering

- `decomposition.RandomizedPCA` default number of `iterated_power` is 4 instead of 3.
  :issue:`5141` by :user:`Giorgio Patrini <giorgiop>`.

- :func:`utils.extmath.randomized_svd` performs 4 power iterations by default, instead or 0.
  In practice this is enough for obtaining a good approximation of the
  true eigenvalues/vectors in the presence of noise. When `n_components` is
  small (``< .1 * min(X.shape)``) `n_iter` is set to 7, unless the user specifies
  a higher number. This improves precision with few components.
  :issue:`5299` by :user:`Giorgio Patrini<giorgiop>`.

- Whiten/non-whiten inconsistency between components of :class:`decomposition.PCA`
  and `decomposition.RandomizedPCA` (now factored into PCA, see the
  New features) is fixed. `components_` are stored with no whitening.
  :issue:`5299` by :user:`Giorgio Patrini <giorgiop>`.

- Fixed bug in :func:`manifold.spectral_embedding` where diagonal of unnormalized
  Laplacian matrix was incorrectly set to 1. :issue:`4995` by :user:`Peter Fischer <yanlend>`.

- Fixed incorrect initialization of `utils.arpack.eigsh` on all
  occurrences. Affects `cluster.bicluster.SpectralBiclustering`,
  :class:`decomposition.KernelPCA`, :class:`manifold.LocallyLinearEmbedding`,
  and :class:`manifold.SpectralEmbedding` (:issue:`5012`). By
  :user:`Peter Fischer <yanlend>`.

- Attribute ``explained_variance_ratio_`` calculated with the SVD solver
  of :class:`discriminant_analysis.LinearDiscriminantAnalysis` now returns
  correct results. By :user:`JPFrancoia <JPFrancoia>`

Preprocessing and feature selection

- `preprocessing.data._transform_selected` now always passes a copy
  of ``X`` to transform function when ``copy=True`` (:issue:`7194`). By `Caio
  Oliveira <https://github.com/caioaao>`_.

Model evaluation and meta-estimators

- :class:`model_selection.StratifiedKFold` now raises error if all n_labels
  for individual classes is less than n_folds.
  :issue:`6182` by :user:`Devashish Deshpande <dsquareindia>`.

- Fixed bug in :class:`model_selection.StratifiedShuffleSplit`
  where train and test sample could overlap in some edge cases,
  see :issue:`6121` for
  more details. By `Loic Esteve`_.

- Fix in :class:`sklearn.model_selection.StratifiedShuffleSplit` to
  return splits of size ``train_size`` and ``test_size`` in all cases
  (:issue:`6472`). By `Andreas Müller`_.

- Cross-validation of :class:`multiclass.OneVsOneClassifier` and
  :class:`multiclass.OneVsRestClassifier` now works with precomputed kernels.
  :issue:`7350` by :user:`Russell Smith <rsmith54>`.

- Fix incomplete ``predict_proba`` method delegation from
  :class:`model_selection.GridSearchCV` to
  :class:`linear_model.SGDClassifier` (:issue:`7159`)
  by `Yichuan Liu <https://github.com/yl565>`_.

Metrics

- Fix bug in :func:`metrics.silhouette_score` in which clusters of
  size 1 were incorrectly scored. They should get a score of 0.
  By `Joel Nothman`_.

- Fix bug in :func:`metrics.silhouette_samples` so that it now works with
  arbitrary labels, not just those ranging from 0 to n_clusters - 1.

- Fix bug where expected and adjusted mutual information were incorrect if
  cluster contingency cells exceeded ``2**16``. By `Joel Nothman`_.

- :func:`metrics.pairwise_distances` now converts arrays to
  boolean arrays when required in ``scipy.spatial.distance``.
  :issue:`5460` by `Tom Dupre la Tour`_.

- Fix sparse input support in :func:`metrics.silhouette_score` as well as
  example examples/text/document_clustering.py. By :user:`YenChen Lin <yenchenlin>`.

- :func:`metrics.roc_curve` and :func:`metrics.precision_recall_curve` no
  longer round ``y_score`` values when creating ROC curves; this was causing
  problems for users with very small differences in scores (:issue:`7353`).

Miscellaneous

- `model_selection.tests._search._check_param_grid` now works correctly with all types
  that extends/implements `Sequence` (except string), including range (Python 3.x) and xrange
  (Python 2.x). :issue:`7323` by Viacheslav Kovalevskyi.

- :func:`utils.extmath.randomized_range_finder` is more numerically stable when many
  power iterations are requested, since it applies LU normalization by default.
  If ``n_iter<2`` numerical issues are unlikely, thus no normalization is applied.
  Other normalization options are available: ``'none', 'LU'`` and ``'QR'``.
  :issue:`5141` by :user:`Giorgio Patrini <giorgiop>`.

- Fix a bug where some formats of ``scipy.sparse`` matrix, and estimators
  with them as parameters, could not be passed to :func:`base.clone`.
  By `Loic Esteve`_.

- :func:`datasets.load_svmlight_file` now is able to read long int QID values.
  :issue:`7101` by :user:`Ibraim Ganiev <olologin>`.


API changes summary
-------------------

Linear, kernelized and related models

- ``residual_metric`` has been deprecated in :class:`linear_model.RANSACRegressor`.
  Use ``loss`` instead. By `Manoj Kumar`_.

- Access to public attributes ``.X_`` and ``.y_`` has been deprecated in
  :class:`isotonic.IsotonicRegression`. By :user:`Jonathan Arfa <jarfa>`.

Decomposition, manifold learning and clustering

- The old `mixture.DPGMM` is deprecated in favor of the new
  :class:`mixture.BayesianGaussianMixture` (with the parameter
  ``weight_concentration_prior_type='dirichlet_process'``).
  The new class solves the computational
  problems of the old class and computes the Gaussian mixture with a
  Dirichlet process prior faster than before.
  :issue:`7295` by :user:`Wei Xue <xuewei4d>` and :user:`Thierry Guillemot <tguillemot>`.

- The old `mixture.VBGMM` is deprecated in favor of the new
  :class:`mixture.BayesianGaussianMixture` (with the parameter
  ``weight_concentration_prior_type='dirichlet_distribution'``).
  The new class solves the computational
  problems of the old class and computes the Variational Bayesian Gaussian
  mixture faster than before.
  :issue:`6651` by :user:`Wei Xue <xuewei4d>` and :user:`Thierry Guillemot <tguillemot>`.

- The old `mixture.GMM` is deprecated in favor of the new
  :class:`mixture.GaussianMixture`. The new class computes the Gaussian mixture
  faster than before and some of computational problems have been solved.
  :issue:`6666` by :user:`Wei Xue <xuewei4d>` and :user:`Thierry Guillemot <tguillemot>`.

Model evaluation and meta-estimators

- The `sklearn.cross_validation`, `sklearn.grid_search` and
  `sklearn.learning_curve` have been deprecated and the classes and
  functions have been reorganized into the :mod:`sklearn.model_selection`
  module. Ref :ref:`model_selection_changes` for more information.
  :issue:`4294` by `Raghav RV`_.

- The ``grid_scores_`` attribute of :class:`model_selection.GridSearchCV`
  and :class:`model_selection.RandomizedSearchCV` is deprecated in favor of
  the attribute ``cv_results_``.
  Ref :ref:`model_selection_changes` for more information.
  :issue:`6697` by `Raghav RV`_.

- The parameters ``n_iter`` or ``n_folds`` in old CV splitters are replaced
  by the new parameter ``n_splits`` since it can provide a consistent
  and unambiguous interface to represent the number of train-test splits.
  :issue:`7187` by :user:`YenChen Lin <yenchenlin>`.

- ``classes`` parameter was renamed to ``labels`` in
  :func:`metrics.hamming_loss`. :issue:`7260` by :user:`Sebastián Vanrell <srvanrell>`.

- The splitter classes ``LabelKFold``, ``LabelShuffleSplit``,
  ``LeaveOneLabelOut`` and ``LeavePLabelsOut`` are renamed to
  :class:`model_selection.GroupKFold`,
  :class:`model_selection.GroupShuffleSplit`,
  :class:`model_selection.LeaveOneGroupOut`
  and :class:`model_selection.LeavePGroupsOut` respectively.
  Also the parameter ``labels`` in the `split` method of the newly
  renamed splitters :class:`model_selection.LeaveOneGroupOut` and
  :class:`model_selection.LeavePGroupsOut` is renamed to
  ``groups``. Additionally in :class:`model_selection.LeavePGroupsOut`,
  the parameter ``n_labels`` is renamed to ``n_groups``.
  :issue:`6660` by `Raghav RV`_.

- Error and loss names for ``scoring`` parameters are now prefixed by
  ``'neg_'``, such as ``neg_mean_squared_error``. The unprefixed versions
  are deprecated and will be removed in version 0.20.
  :issue:`7261` by :user:`Tim Head <betatim>`.

Code Contributors
-----------------
Aditya Joshi, Alejandro, Alexander Fabisch, Alexander Loginov, Alexander
Minyushkin, Alexander Rudy, Alexandre Abadie, Alexandre Abraham, Alexandre
Gramfort, Alexandre Saint, alexfields, Alvaro Ulloa, alyssaq, Amlan Kar,
Andreas Mueller, andrew giessel, Andrew Jackson, Andrew McCulloh, Andrew
Murray, Anish Shah, Arafat, Archit Sharma, Ariel Rokem, Arnaud Joly, Arnaud
Rachez, Arthur Mensch, Ash Hoover, asnt, b0noI, Behzad Tabibian, Bernardo,
Bernhard Kratzwald, Bhargav Mangipudi, blakeflei, Boyuan Deng, Brandon Carter,
Brett Naul, Brian McFee, Caio Oliveira, Camilo Lamus, Carol Willing, Cass,
CeShine Lee, Charles Truong, Chyi-Kwei Yau, CJ Carey, codevig, Colin Ni, Dan
Shiebler, Daniel, Daniel Hnyk, David Ellis, David Nicholson, David Staub, David
Thaler, David Warshaw, Davide Lasagna, Deborah, definitelyuncertain, Didi
Bar-Zev, djipey, dsquareindia, edwinENSAE, Elias Kuthe, Elvis DOHMATOB, Ethan
White, Fabian Pedregosa, Fabio Ticconi, fisache, Florian Wilhelm, Francis,
Francis O'Donovan, Gael Varoquaux, Ganiev Ibraim, ghg, Gilles Louppe, Giorgio
Patrini, Giovanni Cherubin, Giovanni Lanzani, Glenn Qian, Gordon
Mohr, govin-vatsan, Graham Clenaghan, Greg Reda, Greg Stupp, Guillaume
Lemaitre, Gustav Mörtberg, halwai, Harizo Rajaona, Harry Mavroforakis,
hashcode55, hdmetor, Henry Lin, Hobson Lane, Hugo Bowne-Anderson,
Igor Andriushchenko, Imaculate, Inki Hwang, Isaac Sijaranamual,
Ishank Gulati, Issam Laradji, Iver Jordal, jackmartin, Jacob Schreiber, Jake
Vanderplas, James Fiedler, James Routley, Jan Zikes, Janna Brettingen, jarfa, Jason
Laska, jblackburne, jeff levesque, Jeffrey Blackburne, Jeffrey04, Jeremy Hintz,
jeremynixon, Jeroen, Jessica Yung, Jill-Jênn Vie, Jimmy Jia, Jiyuan Qian, Joel
Nothman, johannah, John, John Boersma, John Kirkham, John Moeller,
jonathan.striebel, joncrall, Jordi, Joseph Munoz, Joshua Cook, JPFrancoia,
jrfiedler, JulianKahnert, juliathebrave, kaichogami, KamalakerDadi, Kenneth
Lyons, Kevin Wang, kingjr, kjell, Konstantin Podshumok, Kornel Kielczewski,
Krishna Kalyan, krishnakalyan3, Kvle Putnam, Kyle Jackson, Lars Buitinck,
ldavid, LeiG, LeightonZhang, Leland McInnes, Liang-Chi Hsieh, Lilian Besson,
lizsz, Loic Esteve, Louis Tiao, Léonie Borne, Mads Jensen, Maniteja Nandana,
Manoj Kumar, Manvendra Singh, Marco, Mario Krell, Mark Bao, Mark Szepieniec,
Martin Madsen, MartinBpr, MaryanMorel, Massil, Matheus, Mathieu Blondel,
Mathieu Dubois, Matteo, Matthias Ekman, Max Moroz, Michael Scherer, michiaki
ariga, Mikhail Korobov, Moussa Taifi, mrandrewandrade, Mridul Seth, nadya-p,
Naoya Kanai, Nate George, Nelle Varoquaux, Nelson Liu, Nick James,
NickleDave, Nico, Nicolas Goix, Nikolay Mayorov, ningchi, nlathia,
okbalefthanded, Okhlopkov, Olivier Grisel, Panos Louridas, Paul Strickland,
Perrine Letellier, pestrickland, Peter Fischer, Pieter, Ping-Yao, Chang,
practicalswift, Preston Parry, Qimu Zheng, Rachit Kansal, Raghav RV,
Ralf Gommers, Ramana.S, Rammig, Randy Olson, Rob Alexander, Robert Lutz,
Robin Schucker, Rohan Jain, Ruifeng Zheng, Ryan Yu, Rémy Léone, saihttam,
Saiwing Yeung, Sam Shleifer, Samuel St-Jean, Sartaj Singh, Sasank Chilamkurthy,
saurabh.bansod, Scott Andrews, Scott Lowe, seales, Sebastian Raschka, Sebastian
Saeger, Sebastián Vanrell, Sergei Lebedev, shagun Sodhani, shanmuga cv,
Shashank Shekhar, shawpan, shengxiduan, Shota, shuckle16, Skipper Seabold,
sklearn-ci, SmedbergM, srvanrell, Sébastien Lerique, Taranjeet, themrmax,
Thierry, Thierry Guillemot, Thomas, Thomas Hallock, Thomas Moreau, Tim Head,
tKammy, toastedcornflakes, Tom, TomDLT, Toshihiro Kamishima, tracer0tong, Trent
Hauck, trevorstephens, Tue Vo, Varun, Varun Jewalikar, Viacheslav, Vighnesh
Birodkar, Vikram, Villu Ruusmann, Vinayak Mehta, walter, waterponey, Wenhua
Yang, Wenjian Huang, Will Welch, wyseguy7, xyguo, yanlend, Yaroslav Halchenko,
yelite, Yen, YenChenLin, Yichuan Liu, Yoav Ram, Yoshiki, Zheng RuiFeng, zivori, Óscar Nájera