Error: ‘utf-8’ codec can’t decode byte 0x82 in position 0: invalid start byte Error: ‘utf-8’ codec can’t decode byte 0x82 in position 0: invalid start byte

2024年6月8日

utf-8 コーデックでのデコードエラーが発生しているようです。このエラーは、ファイルが utf-8 エンコーディングではない可能性があることを示しています。以下のように、エラーが発生したときには utf-8 エンコーディングではない別のエンコーディングで再度試みることができます。

python
import chardet
import pandas as pd
from django.http import HttpResponse
from django.shortcuts import render
import io

def concat_csv(request):
    if request.method == 'POST' and request.FILES.getlist('files'):
        files = request.FILES.getlist('files')
        data_list = []

        for file in files:
            try:
                with io.TextIOWrapper(file, encoding='utf-8') as f:
                    # ファイルの中身を一部読み取ってエンコーディングを判定
                    result = chardet.detect(f.read(10000))

                detected_encoding = result['encoding']
                confidence = result['confidence']

                print(f"Detected encoding: {detected_encoding}, Confidence: {confidence}")

                # ファイルを再度開いてから読み込み
                with io.TextIOWrapper(file, encoding=detected_encoding) as f:
                    # ファイルがCSVかどうかのチェックも追加
                    if file.name.endswith('.csv'):
                        try:
                            df = pd.read_csv(f)
                            data_list.append(df)
                        except UnicodeDecodeError as e:
                            print(f"UnicodeDecodeError: {e}")
                            print("Failed to read CSV file even with detected encoding.")
                        except pd.errors.EmptyDataError:
                            print("CSV file is empty.")
                    else:
                        print(f"Unsupported file format: {file.name}")

            except Exception as e:
                print(f"Error: {e}")

        # データが1つ以上ある場合にのみ pd.concat を実行
        if data_list:
            df = pd.concat(data_list, axis=0)
            # 以下、データフレームの後続の処理を行う（例：CSVに保存、レスポンスとして返すなど）

            # 保存先のパスを指定（適切なパスを設定してください）
            output_path = "Concat_data.csv"

            # ファイルの保存
            df.to_csv(output_path, index=False, encoding="cp932", errors="ignore")

            # ファイルのダウンロードを促すレスポンスを返す
            with open(output_path, 'rb') as file:
                response = HttpResponse(file.read(), content_type='application/force-download')
                response['Content-Disposition'] = f'attachment; filename={os.path.basename(output_path)}'
                return response

    return render(request, 'tools/concat_csv.html')

この修正により、utf-8 エンコーディングでのデコードエラーが発生した場合に、検出されたエンコーディングで再度読み込むようになります。

未分類

Posted by ぼっち

shutil.moveの使い方

chardetなどのライブラリを使ってファイルのエンコーディングを判定することもできます。

コメント一覧

まだ、コメントがありません