Advertisements
The COLLECT aggregate function was introduced in Oracle 10g and enables us to aggregate data into a collection in SQL. Oracle 11g Release 2 includes two enhancements to this function: the ability to order and de-duplicate data as it is aggregated. In this article, we will examine these new features and demonstrate a way to replicate their functionality in 10g and 11g Release 1.
a brief reminder
Before we demonstrate the COLLECT function's 11.2 new features, we will remind ourselves of its original 10g functionality. We will begin by querying the EMP table and aggregating the employee names into a collection for each department, as follows.
SQL> break on deptno skip 1
SQL> SELECT deptno
2 , COLLECT(ename) AS emps
3 FROM emp
4 GROUP BY
5 deptno;
DEPTNO EMPS
------- ------------------------------------------------------------------------------------
10 SYSTPd/UcOeePQOrgQKjAAjgR1A==('CLARK', 'MILLER', 'KING')
20 SYSTPd/UcOeePQOrgQKjAAjgR1A==('SMITH', 'FORD', 'ADAMS', 'SCOTT', 'JONES')
30 SYSTPd/UcOeePQOrgQKjAAjgR1A==('ALLEN', 'JAMES', 'TURNER', 'BLAKE', 'MARTIN', 'WARD')
3 rows selected.
Using COLLECT, we have aggregated a single collection per department as required but the nested table type is system-generated and not very usable. By using our own collection type, we can make this much more usable and flexible. First, we will create a general collection type, as follows.
SQL> CREATE TYPE varchar2_ntt AS TABLE OF VARCHAR2(4000);
2 /
Type created.
We will now repeat our query but this time CAST the system-generated collection type to our own user-defined type, as follows.
SQL> SELECT deptno
2 , CAST(COLLECT(ename) AS varchar2_ntt) AS emps
3 FROM emp
4 GROUP BY
5 deptno;
DEPTNO EMPS
---------- -------------------------------------------------------------------
10 VARCHAR2_NTT('CLARK', 'MILLER', 'KING')
20 VARCHAR2_NTT('SMITH', 'FORD', 'ADAMS', 'SCOTT', 'JONES')
30 VARCHAR2_NTT('ALLEN', 'JAMES', 'TURNER', 'BLAKE', 'MARTIN', 'WARD')
3 rows selected.
Having cast the collection in this way, it is now available to use throughout our application (i.e. wherever a parameter, variable or column of VARCHAR2_NTT is required). This summarises, in general, how we use the COLLECT function.
This is a short reminder of the COLLECT function's 10g usage. Readers who are not familiar with the function are advised to read this oracle-developer.net article for further discussion and detail on the concepts described above.
new feature: ordering collection elements
Sometimes we might wish to populate our collection with elements in a specific order. From 11g Release 2, COLLECT officially supports the ordering of elements during aggregation. Note that we use the term "officially supports" because this has been a hidden (and therefore unsupported) feature of COLLECT since its introduction in 10g Release 1. It is only with the release of 11g Release 2 that this option has been documented.
With this in mind, in the following example we will collect the employee names in alphabetical order by using the new extended syntax.
SQL> SELECT deptno
2 , CAST(
3 COLLECT(ename ORDER BY ename)
4 AS varchar2_ntt) AS ordered_emps
5 FROM emp
6 GROUP BY
7 deptno;
DEPTNO ORDERED_EMPS
---------- -------------------------------------------------------------------
10 VARCHAR2_NTT('CLARK', 'KING', 'MILLER')
20 VARCHAR2_NTT('ADAMS', 'FORD', 'JONES', 'SCOTT', 'SMITH')
30 VARCHAR2_NTT('ALLEN', 'BLAKE', 'JAMES', 'MARTIN', 'TURNER', 'WARD')
3 rows selected.
We can see that the collection elements are now in ENAME order. Of course, we don't need to order the collection according to the element we are collecting. We can order our set using other columns or expressions. In the following example, we will collect and order the employees according to their hiredates.
SQL> SELECT deptno
2 , CAST(
3 COLLECT(ename ORDER BY hiredate)
4 AS varchar2_ntt) AS ordered_emps
5 FROM emp
6 GROUP BY
7 deptno;
DEPTNO ORDERED_EMPS
---------- ----------------------------------------------------------------------
10 VARCHAR2_NTT('CLARK', 'KING', 'MILLER')
20 VARCHAR2_NTT('SMITH', 'JONES', 'FORD', 'SCOTT', 'ADAMS')
30 VARCHAR2_NTT('ALLEN', 'WARD', 'BLAKE', 'TURNER', 'MARTIN', 'JAMES')
3 rows selected.
We can verify these results using a simple query of the EMP table, as follows.
SQL> break on deptno skip 1
SQL> SELECT deptno
2 , ename
3 , hiredate
4 FROM emp
5 ORDER BY
6 deptno
7 , hiredate;
DEPTNO ENAME HIREDATE
---------- ---------- -----------
10 CLARK 09/06/1981
KING 17/11/1981
MILLER 23/01/1982
20 SMITH 17/12/1980
JONES 02/04/1981
FORD 03/12/1981
SCOTT 09/12/1982
ADAMS 12/01/1983
30 ALLEN 20/02/1981
WARD 22/02/1981
BLAKE 01/05/1981
TURNER 08/09/1981
MARTIN 28/09/1981
JAMES 03/12/1981
14 rows selected.
new feature: distinct collection elements
In addition to ordered elements, we can also request distinct collection elements in 11.2 (note that this option is also available unsupported in earlier releases). Before we demonstrate this extension, we will populate non-unique collections of jobs from our EMP table for reference, as follows.
SQL> SELECT deptno
2 , CAST(COLLECT(job) AS varchar2_ntt) AS jobs
3 FROM emp
4 GROUP BY
5 deptno;
DEPTNO JOBS
------- --------------------------------------------------------------------------------
10 VARCHAR2_NTT('MANAGER', 'CLERK', 'PRESIDENT')
20 VARCHAR2_NTT('CLERK', 'ANALYST', 'CLERK', 'ANALYST', 'MANAGER')
30 VARCHAR2_NTT('SALESMAN', 'CLERK', 'SALESMAN', 'MANAGER', 'SALESMAN', 'SALESMAN')
3 rows selected.
We can see repeating elements in two of our collections, which we can de-duplicate with the COLLECT extended syntax as follows.
SQL> SELECT deptno
2 , CAST(
3 COLLECT(DISTINCT job)
4 AS varchar2_ntt) AS distinct_jobs
5 FROM emp
6 GROUP BY
7 deptno;
DEPTNO DISTINCT_JOBS
---------- --------------------------------------------------
10 VARCHAR2_NTT('CLERK', 'MANAGER', 'PRESIDENT')
20 VARCHAR2_NTT('ANALYST', 'CLERK', 'MANAGER')
30 VARCHAR2_NTT('CLERK', 'MANAGER', 'SALESMAN')
3 rows selected.
As an alternative, we can use the UNIQUE keyword (this is synonymous with DISTINCT in the same way as SELECT UNIQUE is synonymous with SELECT DISTINCT in SQL).
SQL> SELECT deptno
2 , CAST(
3 COLLECT(UNIQUE job)
4 AS varchar2_ntt) AS distinct_jobs
5 FROM emp
6 GROUP BY
7 deptno;
DEPTNO DISTINCT_JOBS
---------- --------------------------------------------------
10 VARCHAR2_NTT('CLERK', 'MANAGER', 'PRESIDENT')
20 VARCHAR2_NTT('ANALYST', 'CLERK', 'MANAGER')
30 VARCHAR2_NTT('CLERK', 'MANAGER', 'SALESMAN')
3 rows selected.
As an aside (and to continue the SQL comparison), the non-unique alternative is ALL, which we can use in our COLLECT syntax, as follows.
SQL> SELECT deptno
2 , CAST(
3 COLLECT(ALL job)
4 AS varchar2_ntt) AS distinct_jobs
5 FROM emp
6 GROUP BY
7 deptno;
DEPTNO DISTINCT_JOBS
---------- -----------------------------------------------------------------------------------
10 VARCHAR2_NTT('MANAGER', 'CLERK', 'PRESIDENT')
20 VARCHAR2_NTT('CLERK', 'ANALYST', 'CLERK', 'ANALYST', 'MANAGER')
30 VARCHAR2_NTT('SALESMAN', 'CLERK', 'SALESMAN', 'MANAGER', 'SALESMAN', 'SALESMAN')
3 rows selected.
Collecting all elements is, of course, the default behaviour and is implicit in the same way that SELECT ALL is implied in SQL.
combining both new features
The syntax diagram in the documentation suggests that both extensions can be combined to build ordered, distinct collections for each group. However, in version 11.2.0.1, this doesn't appear to work. In the following query, we will attempt to collect the distinct jobs in each department and then order them.
SQL> SELECT deptno
2 , CAST(
3 COLLECT(
4 DISTINCT job
5 ORDER BY job
6 ) AS varchar2_ntt) AS distinct_ordered_jobs
7 FROM emp
8 GROUP BY
9 deptno;
DEPTNO DISTINCT_ORDERED_JOBS
---------- --------------------------------------------------------------------------------
10 VARCHAR2_NTT('CLERK', 'MANAGER', 'PRESIDENT')
20 VARCHAR2_NTT('ANALYST', 'ANALYST', 'CLERK', 'CLERK', 'MANAGER')
30 VARCHAR2_NTT('CLERK', 'MANAGER', 'SALESMAN', 'SALESMAN', 'SALESMAN', 'SALESMAN')
3 rows selected.
We can see duplicate elements in our ordered collections so something is clearly not working as of 11.2.0.1.
using complex types
Our examples so far have used a simple scalar collection type. We can also use COLLECT to aggregate collections of more complex object types and perform the new operations on them. We will see an example of this below. First we will create an object type based on a couple of the attributes of the EMP table, as follows.
SQL> CREATE TYPE empsal_ot AS OBJECT
2 ( ename VARCHAR2(30)
3 , sal NUMBER
4 );
5 /
Type created.
We also need a collection type of this object, which we create as follows.
SQL> CREATE TYPE empsal_ntt AS TABLE OF empsal_ot;
2 /
Type created.
As with our previous examples, it is simple to COLLECT data of this type, as the following query demonstrates.
SQL> col empsals format a50 wrap
SQL> break on deptno skip 1
SQL> SELECT deptno
2 , CAST(
3 COLLECT(
4 empsal_ot(ename, sal)
5 ) AS empsal_ntt) AS empsals
6 FROM emp
7 GROUP BY
8 deptno;
DEPTNO EMPSALS(ENAME, SAL)
---------- --------------------------------------------------
10 EMPSAL_NTT(EMPSAL_OT('CLARK', 2450), EMPSAL_OT('MI
LLER', 1300), EMPSAL_OT('KING', 5000))
20 EMPSAL_NTT(EMPSAL_OT('SMITH', 800), EMPSAL_OT('FOR
D', 3000), EMPSAL_OT('ADAMS', 1100), EMPSAL_OT('SC
OTT', 3000), EMPSAL_OT('JONES', 2975))
30 EMPSAL_NTT(EMPSAL_OT('ALLEN', 1600), EMPSAL_OT('JA
MES', 950), EMPSAL_OT('TURNER', 1500), EMPSAL_OT('
BLAKE', 2850), EMPSAL_OT('MARTIN', 1250), EMPSAL_O
T('WARD', 1250))
3 rows selected.
If we wish to order each collection by one of the input columns (in our case, either the ENAME or SAL column from EMP), we can use the new ordering extension to COLLECT, as below.
SQL> SELECT deptno
2 , CAST(
3 COLLECT(
4 empsal_ot(ename, sal) ORDER BY sal
5 ) AS empsal_ntt) AS empsals
6 FROM emp
7 GROUP BY
8 deptno;
DEPTNO EMPSALS(ENAME, SAL)
---------- --------------------------------------------------
10 EMPSAL_NTT(EMPSAL_OT('MILLER', 1300), EMPSAL_OT('C
LARK', 2450), EMPSAL_OT('KING', 5000))
20 EMPSAL_NTT(EMPSAL_OT('SMITH', 800), EMPSAL_OT('ADA
MS', 1100), EMPSAL_OT('JONES', 2975), EMPSAL_OT('S
COTT', 3000), EMPSAL_OT('FORD', 3000))
30 EMPSAL_NTT(EMPSAL_OT('JAMES', 950), EMPSAL_OT('WAR
D', 1250), EMPSAL_OT('MARTIN', 1250), EMPSAL_OT('T
URNER', 1500), EMPSAL_OT('ALLEN', 1600), EMPSAL_OT
('BLAKE', 2850))
3 rows selected.
We can see that the elements of each collection are ordered according to salary. However, we can't order the collection by each entire object instance it contains, as the following query demonstrates.
SQL> SELECT deptno
2 , CAST(
3 COLLECT(
4 empsal_ot(ename, sal)
5 ORDER BY empsal_ot(ename, sal)
6 ) AS empsal_ntt) AS empsals
7 FROM emp
8 GROUP BY
9 deptno;
ORDER BY empsal_ot(ename, sal)
*
ERROR at line 5:
ORA-22950: cannot ORDER objects without MAP or ORDER method
Oracle tells us that we need a MAP or ORDER method to be able to sort instances of the object type. The same is also true if we wish to use the new distinct option, as follows.
SQL> SELECT deptno
2 , CAST(
3 COLLECT(
4 DISTINCT empsal_ot(ename, sal)
5 ) AS empsal_ntt) AS empsals
6 FROM emp
7 GROUP BY
8 deptno;
DISTINCT empsal_ot(ename, sal)
*
ERROR at line 4:
ORA-22950: cannot ORDER objects without MAP or ORDER method
This makes sense as we cannot expect Oracle to be able to order or compare multi-attribute instances of EMPSAL_OT unless we define the sort criteria. A MAP method (or alternatively an ORDER method) is required to tell Oracle how to sort or compare data of this type. We will therefore remove our example types and start again, as follows.
SQL> DROP TYPE empsal_ntt;
Type dropped.
SQL> DROP TYPE empsal_ot FORCE;
Type dropped.
SQL> CREATE TYPE empsal_ot AS OBJECT
2 ( ename VARCHAR2(30)
3 , sal NUMBER
4 , MAP MEMBER FUNCTION sort_key RETURN VARCHAR2
5 );
6 /
Type created.
Note the MAP method definition in our new EMPSAL_OT type specification. It returns a simple scalar value to represent a comparison key for the data held in the type. We will define the sort key in the type body below.
SQL> CREATE TYPE BODY empsal_ot AS
2 MAP MEMBER FUNCTION sort_key RETURN VARCHAR2 IS
3 BEGIN
4 RETURN ename || TO_CHAR(sal,'fm0000');
5 END;
6 END;
7 /
Type body created.
For the purposes of this example, we have defined our sort key to be a concatenation of the type attributes. To complete the setup, we will re-create our collection type, as follows.
SQL> CREATE TYPE empsal_ntt AS TABLE OF empsal_ot;
2 /
Type created.
Now that Oracle has a method to sort and compare instances of EMPSAL_OT, we can retry our failed queries with the 11.2 COLLECT new features, starting with the ordered option, as follows.
SQL> SELECT CAST(
2 COLLECT(
3 DISTINCT empsal_ot(ename, sal)
4 ) AS empsal_ntt) AS empsals
5 FROM emp;
EMPSALS(ENAME, SAL)
------------------------------------------------------------
EMPSAL_NTT(EMPSAL_OT('ADAMS', 1100), EMPSAL_OT('ALLEN', 1600
), EMPSAL_OT('BLAKE', 2850), EMPSAL_OT('CLARK', 2450), EMPSA
L_OT('FORD', 3000), EMPSAL_OT('JAMES', 950), EMPSAL_OT('JONE
S', 2975), EMPSAL_OT('KING', 5000), EMPSAL_OT('MARTIN', 1250
), EMPSAL_OT('MILLER', 1300), EMPSAL_OT('SCOTT', 3000), EMPS
AL_OT('SMITH', 800), EMPSAL_OT('TURNER', 1500), EMPSAL_OT('W
ARD', 1250))
1 row selected.
To retry the distinct option, we will first add a duplicate row for 'ALLEN' into EMP and then repeat our query, as follows.
SQL> INSERT INTO emp (empno, ename, sal)
2 VALUES (9999, 'ALLEN', 1600);
1 row created.
SQL> SELECT CAST(
2 COLLECT(
3 DISTINCT empsal_ot(ename, sal)
4 ) AS empsal_ntt) AS empsals
5 FROM emp;
EMPSALS(ENAME, SAL)
------------------------------------------------------------
EMPSAL_NTT(EMPSAL_OT('ADAMS', 1100), EMPSAL_OT('ALLEN', 1600
), EMPSAL_OT('BLAKE', 2850), EMPSAL_OT('CLARK', 2450), EMPSA
L_OT('FORD', 3000), EMPSAL_OT('JAMES', 950), EMPSAL_OT('JONE
S', 2975), EMPSAL_OT('KING', 5000), EMPSAL_OT('MARTIN', 1250
), EMPSAL_OT('MILLER', 1300), EMPSAL_OT('SCOTT', 3000), EMPS
AL_OT('SMITH', 800), EMPSAL_OT('TURNER', 1500), EMPSAL_OT('W
ARD', 1250))
1 row selected.
SQL> ROLLBACK;
Rollback complete.
We can see that our collection has been de-duplicated and an object instance for 'ALLEN' appears just once in our collection.
emulating collect new features before 11g release 2
As noted earlier, the ordering or de-duplication of elements during a COLLECT function call has been available (but unsupported) since the function was first released. Both extensions can be emulated quite simply in all versions since 10g Release 1. We will see some examples below that use supported methods only.
ordering collection elements before 11.2
To order collection elements in a deterministic way, we can write a simple function to sort the collection once it has been populated, as follows.
SQL> CREATE FUNCTION sort_collection (
2 p_collection IN varchar2_ntt
3 ) RETURN varchar2_ntt IS
4 v_collection varchar2_ntt;
5 BEGIN
6 SELECT column_value
7 BULK COLLECT INTO v_collection
8 FROM TABLE(p_collection)
9 ORDER BY
10 column_value;
11 RETURN v_collection;
12 END sort_collection;
13 /
Function created.
Our function is very simple. It uses the TABLE() operator to cast our collection to a rowsource, which we then sort using a SQL ORDER BY clause and finally BULK COLLECT into a new collection. Using the function is also very simple. In the following example, we will populate and order a collection of employee names using COLLECT and our SORT_COLLECTION function.
SQL> SELECT deptno
2 , SORT_COLLECTION(
3 CAST(COLLECT(ename) AS varchar2_ntt)
4 ) AS ordered_emps
5 FROM emp
6 GROUP BY
7 deptno;
DEPTNO ORDERED_EMPS
---------- --------------------------------------------------------------------
10 VARCHAR2_NTT('CLARK', 'KING', 'MILLER')
20 VARCHAR2_NTT('ADAMS', 'FORD', 'JONES', 'SCOTT', 'SMITH')
30 VARCHAR2_NTT('ALLEN', 'BLAKE', 'JAMES', 'MARTIN', 'TURNER', 'WARD')
3 rows selected.
Of course, this method doesn't completely emulate the COLLECT ordering feature. Remember that the COLLECT function can order a collection using columns/expressions that are not being collected. To achieve the same prior to 11.2, we can use the MULTISET function. In the following example, we will populate collections of enames ordered by their hire dates. First, we will see the correct order using the new COLLECT feature below.
SQL> SELECT deptno
2 , CAST(
3 COLLECT(ename ORDER BY hiredate)
4 AS varchar2_ntt) AS ordered_emps
5 FROM emp
6 GROUP BY
7 deptno;
DEPTNO ORDERED_EMPS
---------- ---------------------------------------------------------------------------
10 VARCHAR2_NTT('CLARK', 'KING', 'MILLER')
20 VARCHAR2_NTT('SMITH', 'JONES', 'FORD', 'SCOTT', 'ADAMS')
30 VARCHAR2_NTT('ALLEN', 'WARD', 'BLAKE', 'TURNER', 'MARTIN', 'JAMES')
3 rows selected.
We can emulate this with MULTISET but it requires slightly more work, as the following query demonstrates.
SQL> SELECT e.deptno
2 , CAST(
3 MULTISET(
4 SELECT e2.ename
5 FROM emp e2
6 WHERE e2.deptno = e.deptno
7 ORDER BY
8 e2.hiredate
9 ) AS varchar2_ntt) AS ordered_emps
10 FROM emp e
11 GROUP BY
12 e.deptno
13 ORDER BY
14 e.deptno;
DEPTNO ORDERED_EMPS
---------- --------------------------------------------------------------------
10 VARCHAR2_NTT('CLARK', 'KING', 'MILLER')
20 VARCHAR2_NTT('SMITH', 'JONES', 'FORD', 'SCOTT', 'ADAMS')
30 VARCHAR2_NTT('ALLEN', 'WARD', 'BLAKE', 'TURNER', 'MARTIN', 'JAMES')
3 rows selected.
The ORDER BY clause in the MULTISET subquery enables us to order the inputs to each collection and we can see that they are correctly ordered according to hire dates. However, with this method we have two EMP accesses which will naturally lead to a greater workload. The Cost Based Optimiser is smart enough, however, to avoid executing the MULTISET subquery for every EMP record in the outer query block (to verify this, repeat the query with statistics level set to ALL, query DBMS_XPLAN.DISPLAY_CURSOR with the format option "IOSTATS" and look at the number of Starts for the subquery, which will be equivalent to the number of distinct departments).
To measure the additional workload that this technique generates, we will compare the COLLECT and MULTISET methods with a larger dataset. Using the ALL_OBJECTS view, we will populate collections of OBJECT_NAME by OWNER, with the collection elements ordered by OBJECT_ID. We will compare their performance using Autotrace statistics and a wall-clock (having queried ALL_OBJECTS several times beforehand to eliminate physical I/O interference).
First, we will use the new COLLECT method, as follows.
SQL> set timing on
SQL> set autotrace traceonly statistics
SQL> SELECT owner
2 , object_type
3 , CAST(
4 COLLECT(object_name ORDER BY object_id)
5 AS varchar2_ntt) AS object_names
6 FROM all_objects
7 GROUP BY
8 owner
9 , object_type;
272 rows selected.
Elapsed: 00:00:14.92
Statistics
----------------------------------------------------------
222 recursive calls
0 db block gets
95805 consistent gets
0 physical reads
0 redo size
1882434 bytes sent via SQL*Net to client
57167 bytes received via SQL*Net from client
548 SQL*Net roundtrips to/from client
1464 sorts (memory)
0 sorts (disk)
272 rows processed
We will now repeat the query using MULTISET, as follows.
SQL> SELECT a.owner
2 , a.object_type
3 , CAST(
4 MULTISET(
5 SELECT b.object_name
6 FROM all_objects b
7 WHERE a.owner = b.owner
8 AND a.object_type = b.object_type
9 ORDER BY
10 b.object_id
11 ) AS varchar2_ntt) AS object_names
12 FROM all_objects a
13 GROUP BY
14 a.owner
15 , a.object_type;
272 rows selected.
Elapsed: 00:00:20.35
Statistics
----------------------------------------------------------
427 recursive calls
0 db block gets
308412 consistent gets
140 physical reads
0 redo size
1883832 bytes sent via SQL*Net to client
57375 bytes received via SQL*Net from client
550 SQL*Net roundtrips to/from client
3198 sorts (memory)
0 sorts (disk)
272 rows processed
As expected, the MULTISET method uses significantly more resources (particularly LIOs and sorts) to answer the same query, which naturally leads to a longer elapsed time (almost 40% longer in the example above).
de-duplicating collections before 11.2
To emulate the new distinct extension to COLLECT, we can use the MULTISET functions and operators introduced in 10g. Note that these will only work with collections of nested table types (i.e. not VARRAYs), but they are simple to use. Furthermore, we need a MAP or ORDER method to use these with complex collections, as demonstrated by our earlier examples. For simplicity, however, we will use our simple scalar collection type for the following examples.
First, we can use the SET function to de-duplicate a collection, as below.
SQL> SELECT deptno
2 , SET(
3 CAST(
4 COLLECT(job)
5 AS varchar2_ntt)) AS distinct_jobs
6 FROM emp
7 GROUP BY
8 deptno;
DEPTNO DISTINCT_JOBS
---------- -------------------------------------------------
10 VARCHAR2_NTT('MANAGER', 'CLERK', 'PRESIDENT')
20 VARCHAR2_NTT('CLERK', 'ANALYST', 'MANAGER')
30 VARCHAR2_NTT('SALESMAN', 'CLERK', 'MANAGER')
3 rows selected.
Note the placement of the SET function. It must be outside the CAST call, because SET will not work with Oracle's system-generated types that are created by our use of the COLLECT function. We will demonstrate this below.
SQL> SELECT deptno
2 , CAST(
3 SET(COLLECT(job))
4 AS varchar2_ntt) AS distinct_jobs
5 FROM emp
6 GROUP BY
7 deptno;
SET(COLLECT(job))
*
ERROR at line 3:
ORA-00932: inconsistent datatypes: expected UDT got -
An alternative MULTISET method we can use is the MULTISET UNION DISTINCT operator. This is used to perform a union of two collections, as the following query demonstrates.
SQL> SELECT deptno
2 , CAST(
3 COLLECT(job)
4 AS varchar2_ntt
5 ) MULTISET UNION DISTINCT varchar2_ntt() AS distinct_jobs
6 FROM emp
7 GROUP BY
8 deptno;
DEPTNO DISTINCT_JOBS
---------- -------------------------------------------------
10 VARCHAR2_NTT('MANAGER', 'CLERK', 'PRESIDENT')
20 VARCHAR2_NTT('CLERK', 'ANALYST', 'MANAGER')
30 VARCHAR2_NTT('SALESMAN', 'CLERK', 'MANAGER')
3 rows selected.
We have performed a distinct union on our collection by simply appending an empty collection of the same type with the MULTISET UNION DISTINCT operator. This is the semantic equivalent of a SQL UNION.
Using the ALL_OBJECTS dataset and a wall-clock, we will compare the performance of the two MULTISET techniques and the 11.2 COLLECT function below. We will begin by enabling Autotrace to suppress the resultset and use the new COLLECT function, as follows.
SQL> set timing on
SQL> set autotrace traceonly statistics
SQL> SELECT owner
2 , object_type
3 , CAST(
4 COLLECT(DISTINCT object_name)
5 AS varchar2_ntt) AS object_names
6 FROM all_objects
7 GROUP BY
8 owner
9 , object_type;
272 rows selected.
Elapsed: 00:00:12.35
Statistics
----------------------------------------------------------
222 recursive calls
0 db block gets
95811 consistent gets
0 physical reads
0 redo size
1883131 bytes sent via SQL*Net to client
57375 bytes received via SQL*Net from client
550 SQL*Net roundtrips to/from client
1464 sorts (memory)
0 sorts (disk)
272 rows processed
The COLLECT-based query completes in approximately 12 seconds, with all sorts being performed in memory. We can compare this to the SET-based query, as follows.
SQL> SELECT owner
2 , object_type
3 , SET(
4 CAST(
5 COLLECT(object_name)
6 AS varchar2_ntt)) AS object_names
7 FROM all_objects
8 GROUP BY
9 owner
10 , object_type;
272 rows selected.
Elapsed: 00:01:09.42
Statistics
----------------------------------------------------------
222 recursive calls
0 db block gets
95811 consistent gets
0 physical reads
0 redo size
1876306 bytes sent via SQL*Net to client
57375 bytes received via SQL*Net from client
550 SQL*Net roundtrips to/from client
1464 sorts (memory)
0 sorts (disk)
272 rows processed
The SET query takes significantly longer to complete (almost 70 seconds compared with 12 seconds for the COLLECT query) but with the same resource profile being reported by Autotrace.
Finally, we will test the performance of the MULTISET UNION DISTINCT method below.
SQL> SELECT owner
2 , object_type
3 , CAST(
4 COLLECT(object_name)
5 AS varchar2_ntt
6 ) MULTISET UNION DISTINCT varchar2_ntt() AS object_names
7 FROM all_objects
8 GROUP BY
9 owner
10 , object_type;
ERROR:
ORA-22813: operand value exceeds system limits
no rows selected
Elapsed: 00:00:07.51
Statistics
----------------------------------------------------------
228 recursive calls
0 db block gets
94179 consistent gets
0 physical reads
0 redo size
26650 bytes sent via SQL*Net to client
799 bytes received via SQL*Net from client
5 SQL*Net roundtrips to/from client
1464 sorts (memory)
0 sorts (disk)
0 rows processed
After just 7 seconds and the same resource profile as the previous queries, our MULTISET UNION DISTINCT method gives up with an ORA-22813 exception due to collection size (note that this was tested on an 11.1.0.7 database). We can therefore conclude that for collection de-duplication, the new COLLECT function is more efficient than the pre-11g Release 2 alternatives (and that the alternatives will not necessarily be appropriate when working with larger collections).
acknowledgements
Thanks to Jonathan Heller for suggesting the use of the SET function to emulate the new distinct feature in versions prior to 11g Release 2.