Getting Started

Installation and Dependencies

You can install PySliceKit using pip:

pip install pyslicekit

PySliceKit relies on standard data science libraries:

  • pandas (>= 1.0.0)

  • numpy (>= 1.18.0)

  • scikit-learn (>= 0.22.0)

  • scipy (>= 1.4.0)

  • matplotlib (>= 3.2.0)

Supported Metrics

You must pass a valid string to the metric parameter. PySliceKit automatically understands whether higher or lower is better, and automatically selects the correct statistical test for the task type.

Metric string

Task

Direction

Test used

accuracy

Classification

higher is better

Z-test / Fisher

f1, f1_macro, f1_weighted

Classification

higher is better

Z-test / Fisher

precision, recall

Classification

higher is better

Z-test / Fisher

mae, rmse, mse

Regression

lower is better

Bootstrap CI

r2

Regression

higher is better

Bootstrap CI

What it Returns

The pyslicekit.evaluate() function returns a list of SliceResult objects, sorted by absolute gap (worst performing segments first).

You can loop through them or extract the exact properties you need:

for result in results[:5]:  # top 5 worst
    print(f"Segment: {result.label}")
    print(f"Gap: {result.gap:.3f}")
    print(f"Significant: {result.is_significant}")

Complete Minimal Example

import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import pyslicekit

# 1. Load your data and train a model
cancer = load_breast_cancer(as_frame=True)
df = cancer.frame
X = df.drop(columns=['target'])
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
model = LogisticRegression(max_iter=5000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# 2. Evaluate!
results = pyslicekit.evaluate(
    model=model,
    df=X_test,
    y_true=y_test,
    y_pred=y_pred,
    slice_cols=["mean radius", "mean texture"],
    metric="f1",
    render_visuals=True,
    top_n=15
)