NEAR/X, NEXT/X and other ways of searching using keywords

NEAR/X, NEXT/X and other ways of searching using keywords

Our News API already provides an extensive set of filters that allow users to get exactly the set of articles that they need. As if that is not enough, we have recently published an update to the API, which allows users to have even more power when searching using keywords. Do you want to be able to find keywords that are close in the text? Do you want to use full Boolean algebra when specifying what you want to find? If yes, then this blog post will tell you how.

Search modes. What are they and how to use them?

Depending on your use case, you might have different needs when searching using keywords. In some cases you might want to make a search like you are used to on Google - where you have a bunch of words and you simply want the results that are most related to those keywords. In other cases, you have an exact set of keywords that you want to find mentioned and potentially even with some location restrictions.

To make these different use cases possible we have introduced search modes. There are three search modes available: simple, phrase and exact. We will now describe each of them together with some examples where they are appropriate.

Phrase search mode

In many cases, you have a need to search for a phrase in the text. An example would be "Star Wars". You want to find articles that mention the phrase as you have specified it - one word after the other. In this case you should use phrase search mode, which is also used by default.

{
	"keyword": "Star Wars",
	"keywordSearchMode": "phrase"
}

If you send the top content as request body to the http://eventregistry.org/api/v1/article/getArticles endpoint, you will receive articles that mention Star Wars somewhere in the text as a phrase. Since phrase search is used by default, you can in this case even omit the keywordSearchMode parameter.

Exact search mode

In some cases we would like to be able to specify in a single sentence a more complicated keyword search condition. For those cases, you can use the Exact search mode, where full Boolean algebra can be used. An example would be:

{
	"keywords": "Apple iPhone OR Microsoft Store",
	"keywordSearchMode": "exact"
}

In this case, your results would include articles that either mention the phrase Apple iPhone or Microsoft Store.

In exact search mode, all consecutive words that are not AND, OR, NEXT, NEAR or NOT will be considered as a part of a phrase that you are searching for. If you would want to search for articles that potentially mention Apple and iPhone in different parts of the article, the query could be modified like this:

{
	"keywords": "Apple AND iPhone OR Microsoft Store",
	"keywordSearchMode": "exact"
}

By using the AND, we are now requesting that the articles should mention both Apple and iPhone, but not necessarily as a phrase.

Operators NEXT/X and NEAR/X

The best new feature of using the exact search mode is the possibility of using two additional operators - NEXT/X and NEAR/X, where X is a user set number.

In many use cases, we want to find articles where two keywords are mentioned close together, possibly in the same sentence. Closeness often implies that the words are related to each other. If we would be interested in learning about what Siemens is doing in terms of sustainability, ecology or renewable energy, we could specify the keyword parameter as:

"Siemens NEAR/15 (sustainability or ecology or renewable energy)"

In this case, the resulting articles would mention Siemens and any of the three keywords at most 15 word before or after Siemens. Instead of 15 you can of course use any other number.

Alternatively, if the order of words is important, you can use the NEXT/X operator. If NEXT would be used in the previous example, then the only returned articles would be those where Siemens is mentioned first and then any of those three words is mentioned at most 15 words later.

Operator precedence

Because you can use multiple operators in a single search, it is important for you to understand which operators have precedence over each other. The importance of operators is defined like this:

NEAR/x, NEXT/x > NOT > AND > OR

Specifying the keyword parameter like this:

"Donald Trump NEAR/10 tariff OR recession AND China NOT Mexico"

is therefore equivalent to this query:

"(Donald Trump NEAR/10 tariff) OR (recession AND (China NOT Mexico))"

To force different precedence you should group items using parentheses. More desirable results for the above query would likely be obtained by specifying it as such:

"Donald Trump NEAR/10 (tariff OR recession) AND China NOT Mexico"

In this case, we would first find results that mention tariff or recession and then find the subset of results that are close to the phrase Donald Trump.

Simple search mode

In some use cases you also have a more "Google-like" search task, where you have a bunch of keywords and you want to find results that best match the query. In this case, you don't require that all the keywords are mentioned in the text or that they appear in any particular order - you simply want results that mention as many of those keywords and as many times as possible.

An example of a search using simply search mode would be:

{
	"keyword": "tesla self driving car number of fatal accidents",
	"keywordSearchMode": "simple"
}

When using the simple search mode, make sure that you sort results by relevance (rel) so that you get first the results that are best match to the provided list of keywords. If you want to find recent content, then simply add the dateStart parameter to limit the results to latest news.