SQuery: semantic database search

Introduction

One of the ultimate goals in information technology is to make machines understand human language and answer questions, just like humans do. This is hindered, however, by the biggest challenge in today’s information retrieval methods – How can we extract the most relevant information from billions of web pages in a painless and efficient way? We are used to traditional keyword search in which you must know at least one keyword that relates to the content or answer you are looking for. However, in many situations we don’t have the exact keywords. We need another means of obtaining information. We need an entirely new type of information retrieval method, the semantic search.

What is semantic search?

Semantic search aims to understand users’ intentions and expand search queries into related concepts. When searching for a movie to watch, Google can give you the title and related information. Google can recognize that you are looking for movies, not simply webpages containing word “movie”. However, most websites’ built-in search engines can only perform plain keyword matching. More sophisticated searches, like finding a gift for someone significant or identifying a job that matches your skills, require the use of multiple websites, which likely contain both relevant and useless answers. As such, SQuery was born amidst the growing need for a more intelligent answer engine.

SQuery works like a natural language version SQL database. For example, the query, “Which movie is acted by Leonardo DiCaprio and directed by James Cameron?” is equivalent to SQL “SELECT title FROM movie WHERE actor = ‘Leonardo DiCaprio’ AND director = ‘James Cameron’”. SQuery eliminates the formidable barrier of SQL, which inhibits those who lack programming background.

Please visit SQuery demo at SQuery demo site. The site contains millions names of casts, directors, movies and characters. You can ask questions like:

  • Which movie is starred by Leonardo DiCaprio and directed by James Cameron?
  • Who played James Bond in Goldeneye?
  • Which actor acted in Titanic and Departed?

*Our site is just a functioning prototype. Kindly refrain from asking questions out of her knowledge scope or she’ll become very embarrassed.

From text to entity

HTML was only designed to display information. The context and relations of the information are completely discarded by web spider. As a result, search engines struggle to understand the meaning of certain web pages. Today’s web has begun shifting from pages to entities. An entity is organized information with both relations and knowledge domains. For example, a user entity is a subclass of person entity, which may contain information such as name, age, sex and address. Entities can be understood and processed by computers.

When using SQuery, users simply type in an English sentence that describes the thing they are looking for. Then, this human-readable sentence is translated into commands that machines can understand. Once results are obtained, users can filter results by adding more specific conditions.

The next step

There are many advantages of using SQuery, but perhaps the biggest is SQuery’s ability to combine results from multiple databases with one single query. Using traditional SQL, users have to know both the database and table name, which is impractical when searching across thousands of databases. SQuery, on the other hand, can search all of its indexed databases, regardless of their original schema design. This greatly eases and improves users’ experiences. Our demo site contains four databases: movies, actors, actresses and directors. SQuery also supports additional features including, result grouping, stemming, synonym expansion and spellcheck.

For more information on semantic search and our progress, please follow our newest postings. Email contact: guoyangrui@gmail.com

Leave a comment