Getting Started =============== Installation and Dependencies ----------------------------- You can install PySliceKit using pip: .. code-block:: bash pip install pyslicekit PySliceKit relies on standard data science libraries: * ``pandas`` (>= 1.0.0) * ``numpy`` (>= 1.18.0) * ``scikit-learn`` (>= 0.22.0) * ``scipy`` (>= 1.4.0) * ``matplotlib`` (>= 3.2.0) Supported Metrics ----------------- You must pass a valid string to the ``metric`` parameter. PySliceKit automatically understands whether higher or lower is better, and automatically selects the correct statistical test for the task type. .. list-table:: :header-rows: 1 * - Metric string - Task - Direction - Test used * - ``accuracy`` - Classification - higher is better - Z-test / Fisher * - ``f1``, ``f1_macro``, ``f1_weighted`` - Classification - higher is better - Z-test / Fisher * - ``precision``, ``recall`` - Classification - higher is better - Z-test / Fisher * - ``mae``, ``rmse``, ``mse`` - Regression - lower is better - Bootstrap CI * - ``r2`` - Regression - higher is better - Bootstrap CI What it Returns --------------- The ``pyslicekit.evaluate()`` function returns a list of ``SliceResult`` objects, sorted by absolute gap (worst performing segments first). You can loop through them or extract the exact properties you need: .. code-block:: python for result in results[:5]: # top 5 worst print(f"Segment: {result.label}") print(f"Gap: {result.gap:.3f}") print(f"Significant: {result.is_significant}") Complete Minimal Example ------------------------ .. code-block:: python import pandas as pd from sklearn.datasets import load_breast_cancer from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split import pyslicekit # 1. Load your data and train a model cancer = load_breast_cancer(as_frame=True) df = cancer.frame X = df.drop(columns=['target']) y = df['target'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42) model = LogisticRegression(max_iter=5000) model.fit(X_train, y_train) y_pred = model.predict(X_test) # 2. Evaluate! results = pyslicekit.evaluate( model=model, df=X_test, y_true=y_test, y_pred=y_pred, slice_cols=["mean radius", "mean texture"], metric="f1", render_visuals=True, top_n=15 )