.. _user-guide-proximities:

Proximity Counts
----------------

Proximity counts are counts of the number of times that two samples share a leaf node. When a test set is present, the proximity counts of each sample in the test set with each sample in the training set can be computed::

    >>> from sklearn import datasets
    >>> from sklearn.model_selection import train_test_split
    >>> from quantile_forest import RandomForestQuantileRegressor
    >>> X, y = datasets.load_diabetes(return_X_y=True)
    >>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
    >>> qrf = RandomForestQuantileRegressor().fit(X_train, y_train)
    >>> proximities = qrf.proximity_counts(X_test)  # proximity counts for test data

For each test sample, the method outputs a list of tuples of the training index and proximity count, listed in descending order by proximity count. For example, a test sample with an output of [(1, 5), (0, 3), (3, 1)], means that the test sample shared 5, 3, and 1 leaf nodes with the training samples that were (zero-)indexed as 1, 0, and 3 during model fitting, respectively.

The maximum number of proximity counts output per test sample can be limited by specifying `max_proximities`::

    >>> proximities = qrf.proximity_counts(X_test, max_proximities=10)
    >>> all([len(prox) <= 10 for prox in proximities])
    True

Out-of-bag (OOB) proximity counts can be returned by specifying `oob_score=True`::

    >>> proximities = qrf.proximity_counts(X_train, oob_score=True)