sql

Check multiple columns for the same WHERE IN set of values

Post describes a way of how to check if any of the values of several columns is in the given set of values in your SQL query

Denis Savenko

May 6, 2018 • 2 min read

Recently, I faced a very simple task which in the end does not seem so obvious to me. I was told to rewrite an existing report, in which SQL query amongst others, was filtering by values of a column which had to be in a given set of values. And I needed to add another column for the same check.

Ok, let's go step by step to make it easier. Imagine a situation when you need to retrieve all rows from table1 where values of column1 are in a given set of values. I bet you have already written a query in your head and it looks something like this:

select * from table1
where column1 in (1, 2, 3);

But what if your task was a bit harder - and you need to fetch all rows where the value of any column1 or column2 is in the given set.

The obvious answer is to use OR:

select * from table1
where column1 in (1, 2, 3)
or column2 in (1, 2, 3);

Looks ugly, doesn't it? And is this query efficient? Especially, when this action is only one of many you need to do with your data (so your query is a bit more complex)? I doubt it.

If you think this example is something unnatural and simulated, let's consider this - imagine, you have a trading platform and for each order you track identifiers of a buyer and a seller. But these two are of the same type. I mean, both the buyer and the seller are organizations, for example. And you need to show your users all their orders - the orders where the buyer OR the seller is their organization.

So, you have the table order with the columns buyer_id and seller_id and the query to get a user's orders would be:

select * from order
where 
    buyer_id in (select organization_id from employee where employee_id = :employee_id)
    or seller_id in (select organization_id from employee where employee_id = :employee_id);

And it looks even worse. If you think the same way, let's sort out how to improve it together. To do so, we need to remember about SET operators available in Oracle. Indeed, they are a great feature, but a bit underused to my mind. To rewrite our query, I will be using the intersect operator and the query transforms to:

select * from order o
where 
    exists 
    (
        (
            select o.buyer_id from dual
            union all
            select o.seller_id from dual
        ) intersect (
            select organization_id from employee where employee_id = :employee_id
        )
    );

Better? For sure! Why? Look:

This query is more efficient in terms of performance (using OR in a complex query in an anti-pattern since it's hard for the optimizer to cope with such a query).
I do not repeat the filtering subquery anymore.
Need another column to be checked (i.e. reseller_id)? Not a problem - just add it into the first checking subquery.
It is much easier to integrate such a subquery into a complex query (imagine you have many filters, joins and so on).
This approach is suitable for a dynamically constructed report (using dynamic SQL) - this was my case, actually.

Note: Mind nullable columns - you may need to add and <column_name> is not null in your where clauses in the subqueries if you don't want to intersect on null.

I also tried to suggest the same in a question on StackOverflow.

Hope it'll help somebody else when they face the same task.

Sign up for more like this.