商品名略称は、カテゴリ変数なんだ。

2024年6月8日

了解しました。カテゴリ変数として扱うために、商品名略称を適切にエンコードする必要があります。通常、カテゴリ変数をエンコードする方法としては、One-Hotエンコーディングやラベルエンコーディングなどがあります。ランダムフォレストでは、通常はOne-Hotエンコーディングが使用されます。

以下は、One-Hotエンコーディングを用いて商品名略称を処理する例です。

python
# One-Hotエンコーディング
X_encoded = pd.get_dummies(X, columns=["商品名略称"])

# 訓練データとテストデータに分割
X_train_encoded, X_test_encoded, y_train_encoded, y_test_encoded = train_test_split(X_encoded, y, test_size=0.2, random_state=42)

# ランダムフォレストモデルを定義して学習
model_encoded = RandomForestRegressor(random_state=42)
model_encoded.fit(X_train_encoded, y_train_encoded)

# テストデータを用いて予測
y_pred_encoded = model_encoded.predict(X_test_encoded)

# 評価
mse_encoded = mean_squared_error(y_test_encoded, y_pred_encoded)
mae_encoded = mean_absolute_error(y_test_encoded, y_pred_encoded)
r2_encoded = r2_score(y_test_encoded, y_pred_encoded)

print("Mean Squared Error (One-Hot Encoding):", mse_encoded)
print("Mean Absolute Error (One-Hot Encoding):", mae_encoded)
print("Coefficient of Determination (R^2) (One-Hot Encoding):", r2_encoded)

このコードは、商品名略称をOne-Hotエンコーディングしてからランダムフォレストモデルを学習し、性能を評価します。

未分類

Posted by ぼっち

MemoryError: Unable to allocate 49.8 GiB for an array with shape (112733, 59267) and data type float64

カテゴリ変数もあったほうが良いよね？未知の商品に対して、単価、送料、原価だけでは説明変数として不足していると思う

コメント一覧

まだ、コメントがありません