News

Multimodal AI in Action: Shaping Sustainable Solutions with Gemini

This week, Google introduced Gemini to the world: its latest development in AI technology, consisting of a series of large language models to enhance practically all of Google's products. Gemini is built from the ground up to comprehend, process and operate across distinct types of data: not solely text, but also code, audio, images and video.

One of Google's products that is directly benefiting from the release of Gemini is Bard — the company's user-facing language model. Gemini is rolling out to Bard in two phases: as of now, Bard is slightly enhanced with a specifically tuned version of Gemini Pro in English for more advanced reasoning, planning and understanding. Nothing too spectacular. However, early next year, a completely new version of Bard will be available, Bard Advanced, providing the public with first-hand access to Google's most advanced models and capabilities with Gemini Ultra. Here, Google will set itself apart from OpenAI's ChatGPT with its multimodal capabilities.

Present Perspective — Emphasising Sustainability through Conversations

For the present user-facing large language models, there are serious opportunities to amplify the impact on sustainability initiatives, such as emphasising the need for and providing sustainable solutions to the prompts provided by the user: using advanced cognitive skills to advocate sustainability in every answer and decision to create a better world, one conversation at the time. If you are interested in reading more on this specific topic, check out this post on the potential of leveraging ChatGPT for sustainable intelligence:

Prompt: What is the best location to build a wind turbine park?

Response: Wind turbine parks should be located in areas with strong, consistent winds, near power lines, with minimal ecological impact, and with the support of local communities. Other factors like terrain height, mountain proximity, and water availability also influence site selection.

Future Perspective — Unleashing Sustainable Intelligence

For future user-facing large language models like Bard Advanced powered by Gemini Ultra, their ability to enable users to interact and learn from audio, images and video in a similar way to how we currently interact with text is even more promising for significantly increasing the impact on and feasibility of sustainable initiatives.

Take, for instance, a complex field like energy transition, which involves intricate infrastructure modifications, complicating technological advancements, financial considerations, and geopolitical factors affecting the transition towards sustainable power sources. In this domain, an AI with multimodal capabilities would be able to:

Analyse satellite imagery to identify suitable locations for renewable energy installations [image].
Assess these sites in terms of their potential impact on geopolitical factors by extracting information from fragments of news broadcasts about the area [video + audio].
Provide an executive summary advocating for the most appropriate and sustainable option [text] or communicate these complex concepts through multimedia formats [image + video + audio + text]. For example, by providing interactive explanations.

Prompt: What is the best location to build a wind turbine park?

Response: By combining weather data analysis, satellite image assessments, and visual representations, the prime location for building a wind turbine park at this exact moment in time is marked in the geospatial image. This selected site aligns with optimal wind conditions, minimal obstructions, and reduced environmental impact, paving the way for a sustainable and efficient wind energy project. More necessary information about the current geopolitical situation of this location is visualised in the infographic. Additionally, a list of all organizations that would be helpful to be involved in building a wind turbine park at this specific location is provided.

Gemini as a Catalyst for Sustainable Solutions

Overall, the integration of multimodal comprehension into user-facing AI language models holds great promise. Google's Gemini marks a significant first step in this direction, expanding the potential for supporting sustainable initiatives and resolving complex environmental issues. It not only enables more engaging, personalised, and impactful communication on sustainability, but with its analytical capabilities, it also accelerates the potential to actually solve the world's most complex challenges. Gemini could be the way to stimulate awareness and action towards a greener future.

Multimodal AI in Action: Shaping Sustainable Solutions with Gemini

Present Perspective — Emphasising Sustainability through Conversations

Future Perspective — Unleashing Sustainable Intelligence

Gemini as a Catalyst for Sustainable Solutions

Read next

Microsoft's Planetary Computer vs Google Earth Engine: Shaping the Future of Environmental Insight

AI for Sustainable Change: Pioneering the Intersection of Technology and Earth's Future