💾 Archived View for gem.sdf.org › s.kaplan › cheatsheets › libraries-and-frameworks › scikit-learn.m… captured on 2024-08-18 at 18:17:42.
⬅️ Previous capture (2023-09-08)
-=-=-=-=-=-=-
# scikit-learn Cheatsheet scikit-learn is a popular open-source machine learning library for Python. It provides a wide range of tools for building and evaluating machine learning models, including classification, regression, clustering, and more. This cheatsheet provides a quick reference for some of scikit-learn's unique features, including code blocks for loading data, preprocessing, model selection, and more. Additionally, it includes a list of resources for further learning. ## Loading Data
from sklearn.datasets import load_digits
digits = load_digits()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, test_size=0.25, random_state=42)
## Preprocessing
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
y_train_encoded = encoder.fit_transform(y_train)
y_test_encoded = encoder.transform(y_test)
## Model Selection
from sklearn.svm import SVC
clf = SVC(kernel='linear', C=1.0, random_state=42)
clf.fit(X_train_scaled, y_train_encoded)
from sklearn.metrics import accuracy_score
y_pred = clf.predict(X_test_scaled)
accuracy_score(y_test_encoded, y_pred)
## Cross-Validation
from sklearn.model_selection import cross_val_score
scores = cross_val_score(clf, X_train_scaled, y_train_encoded, cv=5)
## Grid Search
from sklearn.model_selection import GridSearchCV
param_grid = {'C': [0.1, 1.0, 10.0], 'kernel': ['linear', 'rbf']}
grid_search = GridSearchCV(clf, param_grid, cv=5)
grid_search.fit(X_train_scaled, y_train_encoded)
## Other Useful Features
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(max_depth=3, random_state=42)
clf.fit(X_train_scaled, y_train_encoded)
from sklearn.tree import plot_tree
plot_tree(clf)
import joblib
joblib.dump(clf, 'model.joblib')
clf = joblib.load('model.joblib')
## Resources - [scikit-learn documentation](https://scikit-learn.org/stable/documentation.html) - [scikit-learn tutorials](https://scikit-learn.org/stable/tutorial/index.html) - [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/index.html)