SG-API: Open-Source 3D Exploratory Search Engine for the Next-Generation Learner — Ideation Stage

華士頓 Austin Hua
8 min readDec 20, 2022

--

“Remember to look up at the stars and not down at your feet. Try to make sense of what you see and wonder about what makes the universe exist.”
― Stephen Hawking

Meet the 3D Search Engine API that started out as a project submission to the world’s largest hackathon — NASA’s Space Apps Challenge — “Stargazer”.

Stargazer — Next-Generation Astronomer

There is nothing more diverse than the night sky. Whether viewed from Earth or beyond, the cosmos will always unify humanity’s exploration of what’s beyond. In its massive scale of research on the Earth and the cosmos, NASA produces terabytes of new data each day. Sifting through and categorizing this fascinating yet chaotic plethora of data is no easy task.
Meet Stargazer. Stargazer is an open-source 3D search engine that sorts through free and openly accessible NASA data — videos, pictures, and audio recordings — and uses AI mechanisms to pair similar data together.

View: Stargazer 1.0 submission to NASA Space Apps Challenge 2022
View: Stargazer 1.0 submission to NASA Space Apps Challenge 2022

As is the case with the vast majority of Space Apps Challenge entries, Stargazer is a work in progress.
The following is what we set out to achieve.

Stargazer 1.0 Demonstration

You are in a 3D learning space — be it in VR, AR, your laptop, or your phone.

Look straight in front of you. You will see your current search query — “nebula” — along with a cluster of hundreds of images of nebulae, some from distant galaxies that are billions of light-years away, some that are very close, others that are red, more still that are purple, and a myriad of endless other possibilities.

Now look around you. In every direction, you will see your search history.

Looking to your left, you will see your past search query — “Radio-loud, Radio-quiet, and Type 2 quasars” — along with just what you searched for — these three types of quasars, automatically gathered together into their three distinct sub-clusters. Looking closer at the sub-cluster of Radio-loud quasar images, you will notice that there are actually two “sub-subclusters”: the supermassive black hole sub-subcluster (where most of the data is) and the ultramassive black hole sub-subcluster (where a few outliers are).
If you look above, you will see another search query — “rocky planets”. Some of these rocky planets are recently-discovered exoplanets such as LHS 3844b, while others are in our Solar System, such as Earth and Mars. Close by those rocky planets, from your “gas giant” search query you will see a group of gas giants such as Saturn along with some Hot Jupiter exoplanets such as HAT-P-12b.

If you click on any of your queries, you will find that there is a detailed academic explanation providing the reasoning behind how each subcluster is formed. For example, upon clicking on the “rocky planets” query, a message box pops up pointing to and explaining that rocky planets exist both within our Solar System — Jupiter, Venus, Earth, and Mars — and beyond our Solar System, i.e. all the exoplanets.

Stargazer started out as a submission to “The Art in our Worlds” challenge, just one of 23 challenges in the 2022 Space Apps Challenge. We designed Stargazer with deep consideration of three key elements: creative learning, exploratory search, and immersion.

  • The Art of Creative Learning: Web-like structures connecting search results — “clusters” and “subclusters”— resemble human neural networks, enhancing learning efficiency and allowing for creativity and customizability.
  • The Art of Exploratory Search: Unlike most search engines that use aligned text, Stargazer has colorful nodes with draggable clusters and topic links that provide a more exciting and intuitive search experience.
  • The Art of Immersion: Stargazer creates a virtual world of information in what we call “learning spaces”. Our next step is to bring this to the physical showroom with HCI devices such as VR and AR, making the search experience more immersive.

With Stargazer, your search history and queries become your very own custom-tailored 3D learning environment. This provides a beautiful, powerful educational mechanism for curious young astronomers to learn all kinds of new things about the universe.

We are very honored to announce that our project has been awarded the title of Global Nominee in NASA’s 2022 Space Apps Challenge.

SG-API — Introduction

Our plan now is to expand on our Stargazer design with a new framework called SG-API. “SG” — originally the abbreviation for “Stargazer” — has now been repurposed as “SpatialGraph API”.
SG-API is a generalized API designed to provide a 3D search engine powered by various visualization tools accompanied by various ML-powered data clustering mechanics for clustering similar data.

The idea for SG-API comes from our work on Stargazer; SG-API applies Stargazer’s principles of creative learning, exploratory search, and immersion.

SG-API — Technical Overview

  • Search query: BERT-like embedding with Semantic Search
  • Data cleaning: Texthero, Scikit-Learn, and other relevant Python packages depending on the data modality
  • Clustering: K-means clustering and other relevant methods depending on the data modality
  • Frontend: 3D Force Graphs. Frontend server hosted by Azure Static App.
  • Backend: Microsoft Azure App Server or other platform of choice. The backend is powered by Flask. The backend server facilitates an encapsulated model packaged by the Pickle module to leverage the overall searchability of the system with ML techniques. The backend is hosted by Microsoft Azure App Server which provides built-in CI/CD pipeline from GitHub.
  • Display formats: VR (AFrame), AR (AR.js), 2D (2D HTML Canvas), and 3D (WebGL/ThreeJS)

SG-API — Proposed Key Features

While Stargazer is focused on three key features (1), (2), and (3), SG-API may also have additional key features (4) and (5).

(1) A 3D search engine displaying free and open data
(2) Large data clusters, where each cluster represents an individual query
(3) Smaller sub-clusters within each cluster that are grouped by AI based on similar characteristics
(4) Arrangements of search history based on AI-calculated query similarity
(5) Explainable AI that discusses the logic behind (5.1) how each sub-cluster is formed, (5.2) the meaning of the distance calculations between each query/large cluster, and (5.3) how the AI understands the meaning of your query

Note that the implementation of (3) and (4) call for a high level of customizability that will depend on the developer’s intended application scenario.

We are open to new ideas and suggestions for more features to add. Please share your idea with the community on our Discord server (link below)!

SG-API — Potential Applications

In addition to providing a powerful educational tool for young astronomers, we believe that SG-API in the future may be applied in a variety of fields — not just astronomy, and for various purposes — both educational and professional.

  • Biologist: A biologist is seeking to enhance her research paper focused on the taxonomy of the Emperor Penguin (Aptenodytes forsteri). With the search query “emperor penguin”, she finds images of emperor penguins along with what appear to be emperor penguins of a distinguishably smaller size in a sub-cluster. By clicking on her query results, through Explainable AI she finds out that this sub-cluster of “smaller emperor penguins” is actually a sub-cluster of king penguins (Aptenodytes patagonicus) — an entirely separate species of penguin that share the same genus and many of the same colors of emperor penguins.
  • Doctor: A doctor searching for medical data on their patients enters the search query of “patient diabetes with demographic segmentation” and 9 age segmentation subsets our formed: 10–19, 20–29, 30–39, 40–49, 50–59, 60–69, 70–79, 80–89, and 90–99. The segmentation mechanism knows that the doctor does not happen to have any patients diagnosed with diabetes that are between 0 to 9 years old or that are at least 100 years old. Given that his patients are evenly distributed over these age segmentations, the doctor will probably find a clear trend that the smallest subcluster of patients with diabetes is the 10–19 age segment, and the largest subcluster of patients with diabetes is 90–99.
  • Linguist: An English teacher queries with the word “effervescent”. Keeping in mind that SG-API’s node distances are calculated based on similarity, an SG-API-powered app provides an interactive 3D thesaurus. The thesaurus takes into account the fact that words have multiple distinct or even mutually-uncorrelated meanings. Two subclusters show: one for the “fizzy” connotation of “effervescent” and the other for the “vivacious” connotation of “effervescent”. It shows words such as “bubbly” that are very close in distance to the “fizzy” connotation, while showing other words in that subcluster that are much farther apart such as “frothing”. The other subcluster dedicated to the “vivacious” connotation show words such as “lively” that are very close to vivacious and other words that are a bit farther away such as “vital”.
  • Job Seeker: A student finding an internship is looking to find local tech events to chase after different opportunities and meet potential big tech employers. They search the term “FAANG events in October 2024” and find that there are 5 sub-clusters: one for Facebook, one for Apple, one for Amazon, one for Netflix, and one for Google. They find that there are many events hosted by Google during that time, and among the Google subcluster, they find a sub-subcluster of events hosted for Computer Vision, another sub-subcluster for Google Developer Student Club (GDSC) events, and yet another sub-subcluster for Autonomous Vehicle technology.

We plan to allow for a high level of customizability and filters for SG-API.

  • Data modality adjustments — unlike the NASA open data we worked with on Stargazer which only contains videos, photos, and audiofiles, our search engine may be applied for much broader applications. Stargazer could manage many more modalities: text documents, research papers pdf files, hospital patient data, words with their intermapped similarities, event information formats, etc. For true and simple versatility of modality considerations, Stargazer needs to apply sophisticated NLP methods to effectively operate on diverse data formats.
  • Frontend display settings — adjustment settings may be useful for minimizing performance demands on less powerful devices

SG-API — User Feedback

In addition, in order to allow for constant improvements, we welcome the user to express how satisfied they are with SG-API and provide feedback on how SG-API is performing for their specific use case. Some questions we might ask could be:

  1. Were subclusters logically organized?
  2. Were the subclusters used directly relevant to the user?
  3. If the clustering methods seemed subtle to you, did the Explainable AI allow you to have a significantly deeper understanding of how the subclusters (or sub-subclusters) were formed, how the inter-distance between different clusters was calculated, and generally how the AI understands your specific query?

SG-API — Team & Community

Team & Contributors: Cindy Lin, Seth Harding, Mark Chen, Alex Riviest, Ryan Landay, Astrid Chou

Our team members have worked in Google, Microsoft, and TSMC, and we are currently founding startup DXDRData-X and DragonNote.

Our 3D search engine API — SG-API — may provide a powerful educational tool for what will be the future of exploratory learning in 3D virtual environments such as the metaverse.

Interested in joining our community or even contributing as a team member? Please hop on our Discord server today!

--

--

華士頓 Austin Hua

National Taiwan University CSIE. Professional focus in AI and the Chinese language.