Multimodal Chatbot (Text + Image + Voice + PDF + Quiz)

Interact with a chatbot using text, image, voice, or PDF inputs

Description:

This is a multimodal chatbot that can handle text, image, voice, PDF inputs, and generate quizzes from PDFs.

You can ask questions or provide text, and the assistant will respond.
You can upload an image, and the assistant will process it and answer questions about the image.
Voice input is supported: You can upload or record an audio file, and it will be transcribed to text and sent to the assistant.
PDF support: Upload a PDF and ask questions about its content.
PDF Quiz: Upload a PDF and specify how many MCQ questions you want generated based on the content.
Enter your OpenAI API key to start interacting with the model.
You can use the 'Clear History' button to remove the conversation history.
"o1" is for image, voice, PDF and text chat and "o3-mini" is for text, PDF and voice chat only.

Reasoning Effort:

The reasoning effort controls how complex or detailed the assistant's answers should be.

Low: Provides quick, concise answers with minimal reasoning or details.
Medium: Offers a balanced response with a reasonable level of detail and thought.
High: Produces more detailed, analytical, or thoughtful responses, requiring deeper reasoning.

Enter OpenAI API Key

Choose Input Type

Text Image Voice PDF PDF(QUIZ)

Enter Text Question

Upload an Image

Upload or Record Audio

Upload your PDF

Number of Quiz Questions

1 20

Quiz Mode

Reasoning Effort

Select Model

Chatbot

·

Built with Gradio logo

·