ValueError Traceback (most recent call last) Cell In[154], line 26 23 vectorizer = CountVectorizer() 25 # CountVectorizerを使用してテキストデータをベクトル化 —> 26 X_product_name_vectorized = vectorizer.fit_transform(X_product_name) 27 X_category_name_vectorized = vectorizer.fit_transform(X_category_name) 28 X_description_vectorized = vectorizer.fit_transform(X_description) File /home/share/temp/venv/lib/python3.8/site-packages/sklearn/base.py:1152, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs) 1145 estimator._validate_params() 1147 with config_context( 1148 skip_parameter_validation=( 1149 prefer_skip_nested_validation or global_skip_validation 1150 ) 1151 ): -> 1152 return fit_method(estimator, *args, kwargs) File /home/share/temp/venv/lib/python3.8/site-packages/sklearn/feature_extraction/text.py:1389, in CountVectorizer.fit_transform(self, raw_documents, y) 1381 warnings.warn( 1382 “Upper case characters found in” 1383 ” vocabulary while ‘lowercase'” 1384 ” is True. These entries will not” 1385 ” be matched with any documents” 1386 ) … 1297 ) 1299 if indptr[-1] > np.iinfo(np.int32).max: # = 231 – 1 1300 if _IS_32BIT: ValueError: empty vocabulary; perhaps the documents only contain stop words

2024年6月8日

このエラーは、テキストデータが空の場合や、すべてのテキストがストップワードの場合に発生します。この場合、CountVectorizerが空の語彙しか生成せず、ベクトル化するための単語が存在しないためです。

データを確認して、テキストデータが適切に読み込まれているか、またテキストデータに意味のある情報が含まれているかを確認してください。テキストデータに含まれる単語が少ない場合や、すべての単語がストップワードである可能性があります。データを適切に前処理することでこの問題を解決できる場合があります。

Posted by ぼっち