Visual Genome: A Brief Intro

After decades of hype-cycle convolutions and revolutions… augmented reality (AR) finally appears poised to become a ubiquitous part of our daily lives; though there are several hurdles that remain. On the software side, to put it simply, AR is still a series of one-off applications and not a platform that is practically accessible for daily use. As a software company, Heavy Projects endeavors to help further identify, clarify, and ultimately solve the platform hurdle. 

Specifically, we propose that widespread adoption will arise from an image-based platform that dynamically displays contextually aware images in augmented reality (AR). In other words, much in the way that Pandora utilizes a genetic artificial intelligence (AI) algorithm to provide a custom user soundscape, we are interested in the possibility of an “imagescape” based on user visual preferences. A programmatic visual tuning of your environment, but that also permits serendipity, or the possibility that we can be surprised by the visual inputs seen in AR. The paper “Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image” (Stanford, 2017) provides an interesting point of departure for such an exploration. While there has been several decades of research in computer-vision (CV) and image recognition, the recent evolution of deep learning AI provides new and exciting ways to contextualize and understand visual scenes based on a model that is able to detect and classify objects, describe their attributes, and recognize their relationships.

However, what we propose takes this methodology one step further by then using this contextual visual understanding to provide users with an augmented reality user-experience (AR UX) that is contextual and situational. We imagine an opt-in AR landscape that dynamically pulls content from the cloud in a way that adds meaning and depth to our physical habitats and allows for creative expressions within the convergence of AI and mixed reality (MR) technologies. The user defines and frames the content through their creations and selection of visual channels that the visual genome both supplies and enhances through adding new, related, but sometimes unexpected and surprising content. For example, a person’s city commute could be augmented by an "urban art" channel that superimposes digital artworks on the physical city walls along their commute from artists they know and enjoy, while adding related artists whose work they’ve never seen before. Or a beach-goer whose summer sky is augmented by a “solar system” channel and they can see Jupiter filling the horizon as if it were only the distance of the moon. These visual "alternative facts" would be playful ways to experience your daily environment.  How novel would it be for users of this future toy to just sit back and enjoy the physical world with no digital content at all?

We imagine an AR overlay that is a timely and useful augmentation, one that's an extension of ourselves, but one that also provides fresh and meaningful content that can extend our mental horizons to see our surroundings in new and interesting ways. We imagine a platform that has the ability to deepen our experiences while pushing us outside of our pre-constructed digital echo-chambers, rather than re-enforcing them. In short, a hybrid reality that effectively synthesizes the digital with the physical in contextually appropriate and thoughtful ways.