Multimodal AI in Action: Shaping Sustainable Solutions with Gemini
This week, Google introduced Gemini to the world: its latest development in AI technology, consisting of a series of large language models to enhance practically all of Google's products. Gemini is built from the ground up to comprehend, process and operate across distinct types of data: not solely text, but also code, audio, images and video.
One of Google's products that is directly benefiting from the release of Gemini is Bard — the company's user-facing language model. Gemini is rolling out to Bard in two phases: as of now, Bard is slightly enhanced with a specifically tuned version of Gemini Pro in English for more advanced reasoning, planning and understanding. Nothing too spectacular. However, early next year, a completely new version of Bard will be available, Bard Advanced, providing the public with first-hand access to Google's most advanced models and capabilities with Gemini Ultra. Here, Google will set itself apart from OpenAI's ChatGPT with its multimodal capabilities.
Present Perspective — Emphasising Sustainability through Conversations
For the present user-facing large language models, there are serious opportunities to amplify the impact on sustainability initiatives, such as emphasising the need for and providing sustainable solutions to the prompts provided by the user: using advanced cognitive skills to advocate sustainability in every answer and decision to create a better world, one conversation at the time. If you are interested in reading more on this specific topic, check out this post on the potential of leveraging ChatGPT for sustainable intelligence:
Future Perspective — Unleashing Sustainable Intelligence
For future user-facing large language models like Bard Advanced powered by Gemini Ultra, their ability to enable users to interact and learn from audio, images and video in a similar way to how we currently interact with text is even more promising for significantly increasing the impact on and feasibility of sustainable initiatives.
Take, for instance, a complex field like energy transition, which involves intricate infrastructure modifications, complicating technological advancements, financial considerations, and geopolitical factors affecting the transition towards sustainable power sources. In this domain, an AI with multimodal capabilities would be able to:
- Analyse satellite imagery to identify suitable locations for renewable energy installations [image].
- Assess these sites in terms of their potential impact on geopolitical factors by extracting information from fragments of news broadcasts about the area [video + audio].
- Provide an executive summary advocating for the most appropriate and sustainable option [text] or communicate these complex concepts through multimedia formats [image + video + audio + text]. For example, by providing interactive explanations.
Gemini as a Catalyst for Sustainable Solutions
Overall, the integration of multimodal comprehension into user-facing AI language models holds great promise. Google's Gemini marks a significant first step in this direction, expanding the potential for supporting sustainable initiatives and resolving complex environmental issues. It not only enables more engaging, personalised, and impactful communication on sustainability, but with its analytical capabilities, it also accelerates the potential to actually solve the world's most complex challenges. Gemini could be the way to stimulate awareness and action towards a greener future.