The AVA-X team actually develops a completely new approach to content and information search. Tobias Bolliger talks about Heliospan.
Tobias, what are you actually working on?
Tobias Bolliger: Among other things, we are currently working on an innovative way to make sense of large volumes of documents and unstructured data thanks to artificial intelligence. We call it: Heliospan.
What is Heliospan?
Heliospan contains a set of state-of-the-art technologies for data mining, text analysis and search. The search engine is build around machine intelligence, able to do a search by better understanding the meaning of the user’s query and the document pool, called semantic search. This makes it different and unique compared to lexical search where the search engine looks for literal matches of the query words, without understanding the meaning of the query.
By mining and analysing text data, Heliospan can discover interesting patterns, extract useful knowledge, and support decision making, with an emphasis on machine learning and statistical approaches that can be generally applied to arbitrary text data in any natural language with no or minimal human effort.
We are competing with the big players on the market. As a small company from Switzerland, our resources for marketing and sales are limited (we prefer to put our resources into research and development 😉). Even if we are convinced of the quality and innovation of our products and can compete with the best, the community still does not know and trust us enough.
What are you proud of?
With Sentinel, we have already developed a market-leading facial recognition system that is used by investigative authorities. With Heliospan, we now want to provide a solution that revolutionizes the handling of large amounts of unstructured data. We are particularly proud that 100% of Heliospan’s research and development takes place in Winterthur. We attach particular importance to the fact that our solutions are developed from scratch and without any dependencies on third-party products. Therefore we can also guarantee full compliance with GDPR and other data protection regulations.
For whom is Heliospan useful?
Heliospan can maximize its benefits wherever large unstructured data volumes are present. This is now the case in almost all industries (Financial, Media, Government and many more). Areas in which Heliospan has already proven itself successfully are:
- online media publisher uses Heliospan for a better customer experience and automating the tagging and clustering of content like www.inside-it.ch
- collection of several thousands of health related articles were made searchable semantically and with the help of Heliospan.
Contact Tobias if you want to have a chat about Heliospan.
Do you know that? Semantic search – topic modelling – word association mining? If not please keep on reading.
Semantic Search Engine
The core and most important module of Heliospan is the state-of-the-art semantic search and explore engine. It enables users to browse and link words, paragraphs and documents based on their meaning and not just by doing classical string comparison or N-gram vector space model. Heliospan uses deep learning to build vector representations of not only the words, but also on paragraphs and whole documents. In the scientific literature this technique is known as Word2Vec and Word2Doc. By training the documents vectors at the same time as the word embeddings, allows Heliospan to connect meaning from a single word, to paragraphs to documents.
To have a better way of managing the explosion of electronic documents, it requires new technologies that deal with automatically organising, searching, indexing, and browsing large collections of unstructured text. Heliospan can cluster a large set of unstructured documents into topics and returns for each cluster the most relevant words of that topic. These words can be seen as automatically discovered tags, a summary of the topic in just a few words. The word clouds can be used as an entry point for the user to browse and navigate through a document archive. Another use case is to use topic modelling for having a quick overview of what is trending in the news, twitter, or any other streaming data source.
Word Association Mining
Words with strong syntagmatic relations usually tend to co-occur frequently together while having relatively low individual occurrence. There are many applications where syntagmatic relation mining is important. For example, in retrieval, words that have strong syntagmatic relations with the original query words can be used to expand the search query in order to enhance retrieval results. Another application is opinion summarization; for example, we can extract the top K syntagmatically related words to “iPhone 11” from a corpus of customer reviews in order to summarise the users’ feedback. Knowledge of word association can be helpful in clustering words that are related to each other.
The techniques used to mine word associations can be generally classified into two categories. The first is hypothesis testing, where statistical tests are used to determine if the co-occurrence of two words happened by chance or due to an actual correlation. The second category is information-theoretic and is based on measures such as mutual information. This method is part of Heliospan.
Name Entity Recognition
Name entity recognition (NER) is the ability to identify the names of things, such as people, companies or locations in a text. NER is an important area of research in machine learning and natural language processing (NLP), because it can be used to answer many real-world questions, such as: “Does a tweet contain the name of a person”, “does the tweet also provide his current location”, “which companies were mentioned in a news article” or “were specified products mentioned in complaints or reviews?” NER is integrated in Heliospan and part of active research. Note that at this moment, only the English language is supported.