C.O.O.P.E.R: Conversational Office Organizer and Personal Entertainment Robot

Welcome to the DeskBuddy/C.O.O.P.E.R project, my passion and ongoing adventure in AI and robotics. C.O.O.P.E.R is not just a chatbot; he's an intelligent companion with context memory, emotional awareness, and advanced question-answering capabilities, all running smoothly on local hardware. This is where I introduce C.O.O.P.E.R., my brainchild in AI and robotics.

C.O.O.P.E.R stands for Conversational Office Organizer and Personal Entertainment RobotI must mention that C.O.O.P.E.R. is still in its alpha stage. The intricacies of this project mean that it might not run seamlessly on different machines just yet. But don't worry, I'm working tirelessly to make it more accessible and user-friendly. One of my goals is to replace the physical robot requirement with a graphical model – a 3D face that will bring C.O.O.P.E.R. to life on any screen.

The heart of C.O.O.P.E.R. lies in its ability to use small local models or GPT models via OpenAI APIs, bolstered by custom solutions like my version of the Larynx docker server for speech synthesis, and the Whisper ASR model for speech recognition. Both are essential for the bot's interactive capabilities, and I'm excited to soon share these custom scripts with the community.

A major triumph has been the improvement in lip-syncing, making C.O.O.P.E.R.'s responses more lifelike. Additionally, an online mode has been introduced alongside the offline mode, significantly boosting C.O.O.P.E.R.'s versatility and intelligence by integrating the gpt-3.5-turbo model via the OpenAI API.

Since March of 2023, DeskBuddy has been equipped with multimodal capabilities, most notably in the form of Visual Question Answering (VQA). This feature marks a significant advancement in its interactive abilities, allowing it to process and respond to queries that involve both visual and textual elements. At the heart of this functionality is a sophisticated vision transformer model, which seamlessly integrates visual data processing with the chatbot's language understanding capabilities. By using a webcam as an IP cam, DeskBuddy can interpret visual inputs, analyze them through the vision transformer, and generate relevant and accurate responses. This integration of VQA not only enhances the bot's interaction with the user but also represents a leap forward in creating a more holistic, sensory-aware AI companion.

As of now, C.O.O.P.E.R. can interact through its expressive robot form, recognize and respond to speech, and manage basic conversational context. The project roadmap is ambitious, with plans to enhance hardware utilization, improve context understanding, and integrate various AI models for a more nuanced interaction experience.

Stay tuned for more updates and videos of C.O.O.P.E.R. in action at the GitHub repository. This journey with C.O.O.P.E.R. is just beginning, and I can't wait to see where it leads!