Monday, April 26, 2010

Find and delete duplicate rows by Analytic Function

The intuitive way will be create a temp table, with Min(RowID) and Count(*)>1, then join it back to target table to do the delete.

You can get duplicate rows by Analytic SQL:
SELECT rid, deptno, job, rn
  FROM
  (SELECT /*x parallel(a) */
        ROWID rid, deptno, job,
        ROW_NUMBER () OVER (PARTITION BY deptno, job ORDER BY empno) rn
   FROM scott.emp a
  )
WHERE rn <> 1;

Get duplicate row count with Count(*) > 0.

SELECT /*x parallel(a,8) */
 MAX(ROWID) rid, deptno, job, COUNT(*)
FROM scott.emp a
GROUP BY deptno, job
HAVING COUNT(*) > 1;

To delete them:
DELETE FROM scott.emp
WHERE ROWID IN
 (
  SELECT rid
    FROM (SELECT /*x parallel(a) */
                 ROWID rid, deptno, job,
                 ROW_NUMBER () OVER (PARTITION BY deptno, job ORDER BY empno) rn
            FROM scott.emp a)
  WHERE rn <> 1
);

Monday, April 19, 2010

Reclaim deleted LOB data storage

I was helping a client purge obsolete data and reclaim some space in an OLAP database.
We stuck on a BLOB column segment. It took me a couple hours to find the solution.

Here is the solution demo.

Create a table with BLOB column,
drop table t2 purge;

CREATE TABLE t2
(
 n1 NUMBER(10),
 d1 date,
 b1 BLOB,
 CONSTRAINT t2_PK PRIMARY KEY (n1)
  USING INDEX TABLESPACE index_auto
) 
TABLESPACE data_auto;
Displays the large objects (LOBs) contained in tables
select b.TABLE_NAME, b.COLUMN_NAME, b.SEGMENT_NAME, b.TABLESPACE_NAME, b.INDEX_NAME
from user_lobs b;

SELECT SEGMENT_NAME, segment_type, TABLESPACE_NAME, BYTES, BLOCKS, EXTENTS
FROM user_SEGMENTS
WHERE segment_type like '%LOB%'
ORDER BY SEGMENT_NAME;

SELECT b.TABLE_NAME, b.COLUMN_NAME, b.SEGMENT_NAME, b.TABLESPACE_NAME, b.INDEX_NAME
 ,s.bytes, s.blocks, s.extents
FROM user_lobs b, user_segments s
WHERE b.table_name = 'T2'
and b.column_name = 'B1'
and s.segment_type like 'LOB%'
and s.segment_name = b.segment_name;

Test Shrink a BASICFILE LOB segment only

truncate table t2;

declare
    l_blob blob;
    l_size number := 32700;
begin
  for i in 1 .. 15
  loop
    insert into t2(n1,b1) values (i, empty_blob() ) returning b1 into l_blob;
    dbms_lob.writeappend( l_blob, l_size, utl_raw.cast_to_raw(rpad('*',l_size,'*')));
  end loop;
  commit;
end;
/

delete t2;
commit;

SELECT b.TABLE_NAME, b.COLUMN_NAME, b.SEGMENT_NAME, b.TABLESPACE_NAME, b.INDEX_NAME
 ,s.bytes, s.blocks, s.extents
FROM user_lobs b, user_segments s
WHERE b.table_name = 'T2'
and b.column_name = 'B1'
and s.segment_type like 'LOB%'
and s.segment_name = b.segment_name;

     BYTES     BLOCKS    EXTENTS
---------- ---------- ----------
    720896         88         11

ALTER TABLE t2 MODIFY LOB (b1) (SHRINK SPACE);

SELECT b.TABLE_NAME, b.COLUMN_NAME, b.SEGMENT_NAME, b.TABLESPACE_NAME, b.INDEX_NAME
 ,s.bytes, s.blocks, s.extents
FROM user_lobs b, user_segments s
WHERE b.table_name = 'T2'
and b.column_name = 'B1'
and s.segment_type like 'LOB%'
and s.segment_name = b.segment_name;

     BYTES     BLOCKS    EXTENTS
---------- ---------- ----------
     65536          8          1

Note

Shrink command will generate about same size of redo/archive logs as LOB storage space size.

Reference.


Shrink a table and all of its dependent segments (including BASICFILE LOB segments):
ALTER TABLE t2 ENABLE ROW MOVEMENT;
ALTER TABLE t2 SHRINK SPACE CASCADE;

Shrink a BASICFILE LOB segment only:
ALTER TABLE t2 MODIFY LOB (b1) (SHRINK SPACE);

Monday, April 12, 2010

Pl/SQL Development Workflow

Here is the notes taking from book <<Oracle PL/SQL Best Practices>>.

Four steps of preparing an application, special for PL/SQL transactional database API.

Validate program requirements
  1. ask lots of questions
  2. what users ask for is not always the easiest way to solve a problem
  3. consider other approaches, include business processes and programming algorithms

Implement header of the program

  1. good name for the program, accurately represent the purpose of the program
  2. inputs and outputs
  3. overload sub-procedure ?

Define the test cases

  1. Verify it works
  2. how will I know when I am done with this program

Build test code

Testing for correctness:
  • Have you tested with good and all the different possibilities of bad data
  • Does the code do the right thing, ... and nothing more.

Tuesday, April 06, 2010

Why Transactional Database API approach?

.

We like (Transactional) Database API approach, because it:

..
We like Transactional Database API approach, because it:

DRY, do not repeat yourself

*. make software components modular, I am totally into modular programming.
*. software modules must carry out a very specific task (and be very efficient at carrying it out).
*. The same APIs are available to all applications in any Language that access the database. No duplication of effort.

Orthogonality

*. each software module should be loosely coupled (to limit dependencies)
  -- Put SQL scattered willy-nilly around in Java/C*.JavaScript is high coupled.
  -- Schema change, adding table/column should only be changed in database, one place.
  -- B calls A, changed A, do not bother to touch B.
*. Make test simpler, easy to setup function and load test.
*. Easy to deploy.
E.g. PL/SQL installation only, no need to touch Java/C*.JavaScript mid tier and UI.
*. Defined clear interface contract.

*. It removes the need for triggers as all inserts, updates and deletes are wrapped in APIs. Instead of writing triggers you simply add the code into the API. I loathe triggers.

*. Clearly separate all database access code (APIs Are King)

*. To understand the consequences of database refactorings, it is important to be able to see how the database is used by the application. If SQL is scattered willy-nilly around the code base, this is very hard to do. As a result, it is important to have a clear database access layer to show where the database is being used and how. To do this we suggest Database API approach.
*. The underlying structure of the database is hidden from the users, so I can make structural changes without client applications being changed. The API implementation can be altered and tuned without affecting the client application.

Control and Responsibility, DevOPS

*. It prevents people who do not understand SQL writing stupid queries.
  -- All SQL would be written by Database developers or DBAs, reducing the likelihood of dodgy queries.
*. SDLC: 80% is maintenance, Dealing with Change.
  -- Changing the database schema.
  -- Migrating the data in the database.
  -- Online data fix,
  -- Changing the database access code / data process logic.
*. Troubleshooting and firefighting
  -- Database developer and DBA can easily get and fix the SQL. Do not bother Java programmers.
*. Tuning SQL. Do not bother C# programmers.

*. Having a clear database layer (APIs Are King) has a number of valuable side benefits. It minimizes the areas of the system where developers need SQL knowledge to manipulate the database, which makes life easier to developers who often are not particularly skilled with SQL. For the database expert it provides a clear section of the code that he can look at to see how the database is being used. This helps in preparing indexes, database optimization, and also looking at the SQL to see how it could be reformulated to perform better. This allows the database expert to get a better understanding of how the database is used.

Profession = High Efficiency + High Quality

*. (ORM) Anything that generates SQL on-the-fly worries me, not just Java. I want to be able to cut and paste the SQL, not try and capture or trace it during a run.


*. More database features and functions
  -- Partition
*. Less code, less bug, easy to maintain.

*. Eliminate SQL Parse in host language. Parse consume client host CPU and Server CPU and Latches. PL/SQL keep the SQL cursor cached and opened.
*. Eliminate data round trip; data type conversion, the context switch.
*. Tightly couple the data model and data process design. Database world favor of Up Front Big Design.

*. Maximum the data share and reuse.

Suggestion

*. Business logic, Validation and lots of IF statements can be put and refined in Java with advanced language features, such as OO.

..

...
This list goes on and on.

Our concept is "build the data API in the database, you call the data API".
The data API encapsulate a transaction in a bit of code. Here we agree - no SQL in the client application, just call stored procedures - well tuned and developed explicitly to do that one thing.

Database API has been layered by different UI technologies over time.

All about API's. The applications don't do table level(select/insert/update/delete) stuff, the apps don't even really know about tables.

On top of database schema we have an API that does the data stuff.
(generally, functions or Ref cursor to retrieve data, procedures to change data)

In the application we call this API but have our own application logic as well
(only application logic is in the application, data logic is - well, right where it belongs - next to the data, waiting for the 'next great programming paradigm to come along')

The fact that our UI is in Java isn't really relevant. You could pretty much see how you would use this package from C#, Java/JSP, Swing, VB, Pro*C, Forms, Powerbuilder, Perl, PHP, Python, a mobile phone, <whatever the heck you want>.

Reference


http://stackoverflow.com/questions/1588149/orm-for-oracle-pl-sql?lq=1

ORM is flawed
Performance Anti-Patterns in Database-Driven Applications ,
http://www.infoq.com/articles/Anti-Patterns-Alois-Reitbauer
  • Misuse of O/R Mappers
  • Load More Data Then Needed
  • Inadequate Usage of Resources
  • One Bunch of Everything