シンプルなPythonのソースコードおしえてtensorflowの

2024年6月8日

以下は、TensorFlowを使用してシンプルな質問応答システムを実装するPythonのサンプルコードです。この例では、エンコーダー・デコーダーモデルを使用しています。

python
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# 質問とその答えのペアのデータセット
questions = ["What is your name?", "How old are you?", "What is your favorite color?"]
answers = ["My name is ChatGPT.", "I am 5 years old.", "My favorite color is blue."]

# Tokenizerの準備
tokenizer = Tokenizer()
tokenizer.fit_on_texts(questions + answers)

# データを数値シーケンスに変換
question_sequences = tokenizer.texts_to_sequences(questions)
answer_sequences = tokenizer.texts_to_sequences(answers)

# パディング
maxlen = max(max(len(seq) for seq in question_sequences), max(len(seq) for seq in answer_sequences))
question_sequences = pad_sequences(question_sequences, padding='post', maxlen=maxlen)
answer_sequences = pad_sequences(answer_sequences, padding='post', maxlen=maxlen)

# モデルの構築
vocab_size = len(tokenizer.word_index) + 1
embedding_dim = 128
units = 256

encoder_inputs = tf.keras.layers.Input(shape=(maxlen,))
encoder_embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim, mask_zero=True)(encoder_inputs)
encoder_outputs, state_h, state_c = tf.keras.layers.LSTM(units, return_state=True)(encoder_embedding)
encoder_states = [state_h, state_c]

decoder_inputs = tf.keras.layers.Input(shape=(maxlen,))
decoder_embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim, mask_zero=True)(decoder_inputs)
decoder_lstm = tf.keras.layers.LSTM(units, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=encoder_states)
decoder_dense = tf.keras.layers.Dense(vocab_size, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# モデルの定義
model = tf.keras.models.Model([encoder_inputs, decoder_inputs], decoder_outputs)

# モデルのコンパイル
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# モデルのトレーニング
model.fit([question_sequences, answer_sequences[:, :-1]], answer_sequences[:, 1:], epochs=100)

# モデルの保存
model.save("qa_model")

このコードでは、Tokenizerを使用してテキストデータを数値シーケンスに変換し、エンコーダー・デコーダーモデルを構築しています。トレーニングは、質問の数値シーケンスと回答の数値シーケンスを入力として行われます。

未分類

Posted by ぼっち

int8, int16, int32, int64と沢山のint型があるが、何も指定しなければどうなるの？

チャットGPTは、学習データが膨大にあるから、適切な答えをだせてるの？

コメント一覧

まだ、コメントがありません