Check multiple columns for the same WHERE IN set of values
Recently, I faced a very simple task which in the end does not seem so obvious to me. I was told to rewrite an existing report, in which SQL query amongst others, was filtering by values of a column which had to be in a given set of values. And I needed to add another column for the same check.
Ok, let's go step by step to make it easier. Imagine a situation when you need to retrieve all rows from table1
where values of column1
are in a given set of values. I bet you have already written a query in your head and it looks something like this:
select * from table1
where column1 in (1, 2, 3);
But what if your task was a bit harder - and you need to fetch all rows where the value of any column1
or column2
is in the given set.
The obvious answer is to use OR
:
select * from table1
where column1 in (1, 2, 3)
or column2 in (1, 2, 3);
Looks ugly, doesn't it? And is this query efficient? Especially, when this action is only one of many you need to do with your data (so your query is a bit more complex)? I doubt it.
If you think this example is something unnatural and simulated, let's consider this - imagine, you have a trading platform and for each order you track identifiers of a buyer and a seller. But these two are of the same type. I mean, both the buyer and the seller are organizations, for example. And you need to show your users all their orders - the orders where the buyer OR the seller is their organization.
So, you have the table order
with the columns buyer_id
and seller_id
and the query to get a user's orders would be:
select * from order
where
buyer_id in (select organization_id from employee where employee_id = :employee_id)
or seller_id in (select organization_id from employee where employee_id = :employee_id);
And it looks even worse. If you think the same way, let's sort out how to improve it together. To do so, we need to remember about SET operators available in Oracle. Indeed, they are a great feature, but a bit underused to my mind. To rewrite our query, I will be using the intersect
operator and the query transforms to:
select * from order o
where
exists
(
(
select o.buyer_id from dual
union all
select o.seller_id from dual
) intersect (
select organization_id from employee where employee_id = :employee_id
)
);
Better? For sure! Why? Look:
- This query is more efficient in terms of performance (using
OR
in a complex query in an anti-pattern since it's hard for the optimizer to cope with such a query). - I do not repeat the filtering subquery anymore.
- Need another column to be checked (i.e.
reseller_id
)? Not a problem - just add it into the first checking subquery. - It is much easier to integrate such a subquery into a complex query (imagine you have many filters, joins and so on).
- This approach is suitable for a dynamically constructed report (using dynamic SQL) - this was my case, actually.
Note: Mind nullable columns - you may need to add
and <column_name> is not null
in yourwhere
clauses in the subqueries if you don't want to intersect onnull
.
I also tried to suggest the same in a question on StackOverflow.
Hope it'll help somebody else when they face the same task.