Multimodal Chatbot (Text + Image + Voice)

Interact with a chatbot using text, image, or voice inputs

Description:

This is a multimodal chatbot that can handle text, image, and voice inputs.

  • You can ask questions or provide text, and the assistant will respond.
  • You can also upload an image, and the assistant will process it and answer questions about the image.
  • Voice input is supported: You can upload or record an audio file, and it will be transcribed to text and sent to the assistant.
  • Enter your OpenAI API key to start interacting with the model.
  • You can use the 'Clear History' button to remove the conversation history.
  • "o1" is for image chat and "o3-mini" is for text chat.

Reasoning Effort:

The reasoning effort controls how complex or detailed the assistant's answers should be.

  • Low: Provides quick, concise answers with minimal reasoning or details.
  • Medium: Offers a balanced response with a reasonable level of detail and thought.
  • High: Produces more detailed, analytical, or thoughtful responses, requiring deeper reasoning.
Reasoning Effort
Select Model