EricCRX/books-tabular-dataset
Viewer • Updated • 330 • 155 • 1
Task: Predict Is_Textbook (Yes/No) from physical book attributes.
Dataset: EricCRX/books-tabular-dataset
Training: AutoGluon Tabular with presets="best_quality" on the augmented split, evaluated on the original split.
CatBoost/T18CatBoost/T18Best overall (leaf): CatBoost/T18 (CatBoostModel)
{
"iterations": 10000,
"learning_rate": 0.13950397023933195,
"random_seed": 0,
"allow_writing_files": false,
"eval_metric": "Accuracy",
"depth": 7,
"l2_leaf_reg": 3.4522538315365296
}
pip install autogluon.tabular datasets huggingface_hub cloudpickle
from huggingface_hub import hf_hub_download
from autogluon.tabular import TabularPredictor
# Download (native dir zip) from 0408happyfeet/books-tabular-autogluon
zip_path = hf_hub_download(repo_id="0408happyfeet/books-tabular-autogluon", repo_type="model", filename="autogluon_predictor_dir.zip")
# Unzip and load:
# unzip autogluon_predictor_dir.zip -d ./model_dir
# predictor = TabularPredictor.load("./model_dir")
# Or quick (pickle):
pkl_path = hf_hub_download(repo_id="0408happyfeet/books-tabular-autogluon", repo_type="model", filename="autogluon_predictor.pkl")
import cloudpickle
predictor = cloudpickle.load(open(pkl_path, "rb"))
import pandas as pd
X = pd.DataFrame([{
"Length_cm": 24.0,
"Width_cm": 16.0,
"Thickness_cm": 3.0,
"Pages": 400,
"Hardcover": "Yes",
"Cover_Color": "Blue"
}])
pred = predictor.predict(X)
proba = predictor.predict_proba(X)
print(pred.iloc[0], proba)
Educational AutoML exercise for classical ML on a peer’s Homework 1 dataset.
Source dataset: EricCRX/books-tabular-dataset (classmate). Splits used:
augmented (~300 rows) for training/HPO with an 80/20 holdout.original (30 rows) held out as an external test set.Features: Length_cm, Width_cm, Thickness_cm, Pages, Hardcover, Cover_Color.
Target: Is_Textbook ∈ {Yes, No}.
AutoGluon Tabular defaults (type inference, category handling, missing-value processing). No manual feature engineering.
best_quality).hyperparameter_tune_kwargs='auto' (AutoGluon scheduler/strategy).time_limit=1200 seconds.best_quality with automatic HPO.Units: fraction (0–1), where 1.0 = 100%.
MIT (matching the source dataset’s license).