Pythonで梱包サイズを多クラス分類したい。説明変数が５個あった場合、すべての組み合わせを試して一番精度の良かった説明変数を取得したい

2024年6月25日

梱包サイズを多クラス分類するために、異なる組み合わせの説明変数を試す方法はいくつかあります。ここでは、全ての組み合わせを試す方法の一例を示します。

まず、必要なライブラリをインポートします。

python
from itertools import combinations
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier

次に、全ての組み合わせを生成するための関数を定義します。

python
def get_combinations(features):
    all_combinations = []
    for r in range(1, len(features) + 1):
        all_combinations.extend(combinations(features, r))
    return all_combinations

データを読み込み、説明変数と目的変数に分割します。ここでは仮にデータがXとyに格納されているとします。

python
# X: 説明変数、y: 目的変数
# 例: X = [[1, 2, 3, 4, 5], [2, 3, 4, 5, 6], ...], y = [0, 1, 2, ...]

次に、トレーニングセットとテストセットにデータを分割します。

python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

トレーニングセットの特徴量を標準化します。

python
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

全ての組み合わせを取得します。

python
features = range(len(X_train[0]))  # 仮に5個の説明変数があると仮定
all_combinations = get_combinations(features)

最良の組み合わせとその精度を保存する変数を初期化します。

python
best_combination = None
best_accuracy = 0.0

全ての組み合わせに対して、モデルをトレーニングし、精度を評価します。

python
for combination in all_combinations:
    # 組み合わせに対応する説明変数を抽出
    selected_features = [X_train_scaled[:, idx] for idx in combination]
    X_train_selected = np.column_stack(selected_features)

    # KNNモデルを初期化し、トレーニング
    knn = KNeighborsClassifier()
    knn.fit(X_train_selected, y_train)

    # テストセットで評価
    X_test_selected = np.column_stack([X_test_scaled[:, idx] for idx in combination])
    y_pred = knn.predict(X_test_selected)
    accuracy = accuracy_score(y_test, y_pred)

    # もし精度が今までの最良を上回っていれば、最良の組み合わせを更新
    if accuracy > best_accuracy:
        best_accuracy = accuracy
        best_combination = combination

最良の組み合わせとその精度を表示します。

python
print("Best combination:", best_combination)
print("Best accuracy:", best_accuracy)

このスクリプトは、全ての組み合わせを試し、最も精度の高い組み合わせを見つけます。ただし、説明変数の数が増えると計算量が指数関数的に増加することに注意してください。

未分類

Posted by ぼっち