Even if you have storage appliances and a lot of terabytes, thinks like backup, compression, deduplication, storage relocation, encryption, availability could be a nightmare.
One standard option that you have with the Oracle WebCenter Content is to store data to the database. And the Oracle Database allows you leverage features like compression, deduplication, encryption and seamless backup.
But with a huge volume, the challenge is passed to the DBA to keep the WebCenter Content Database up and running.
One solution is the use of DB partitions for your content repository, but what are the implications of this? Can I fit this with my business requirements?
Well, yes. It’s up to you how you will manage that, you just need a good plan. During you “storage brainstorm plan” take in your mind what you need, such as storage petabytes of documents? You need everything on-line? There’s a way to logically separate the “good content” from the “legacy content”?
The first thing that comes to my mind is to use the creation date of the document, but you need to remember that this document could receive a lot of revisions and maybe you can consider the revision creation date. Your plan can have also complex rules like per Document Type or per a custom metadata like department or an hybrid per date, per DocType and an specific virtual folder.
Extrapolation the use, you can have your repository distributed in different servers, different disks, different disk types (Such as ssds, sas, sata, tape,…), separated accordingly your business requirements, separating the “hot” content from the legacy and easily matching your compliance requirements.
If you think to use by revision, the simple way is to consider the dId, that is the sequential unique id for every content created using the WebCenter Content or the dLastModified that is the date field of the FileStorage table that contains the date of inclusion of the content to the DB Table using SecureFiles.
Using the scenario of partitioned repository using an hierarchical separation by date, we will transform the FileStorage table in an partitioned table using “Partition by Range” of the dLastModified column (You can use the dId or a join with other tables for other metadata such as dDocType, Security, etc…).
The test scenario bellow covers:
- Previous existent data on the JDBC Storage to be migrated to the new partitioned JDBC Storage
- Partition by Date
- Automatically generation of new partitions based on a pre-defined interval (Available only with Oracle Database 11g+)
- Deduplication and Compression for legacy data
- Oracle WebCenter Content 11g PS5 (Could present some customizations that do not affect the test scenario)
FileStorage Providers |
Enable the metadata StorageRule in the UI and upload some documents using this rule.
For this test case you can run using the schema owner or an dba user. We will use the schema owner TESTS_OCS.
I can’t forgot to tell that this is just a test and you should do a proper backup of your environment.
When you use the schema owner, you need some privileges, using the dba user grant the privileges needed:
REM Grant privileges required for online redefinition.
GRANT EXECUTE ON DBMS_REDEFINITION TO TESTS_OCS;
GRANT ALTER ANY TABLE TO TESTS_OCS;
GRANT DROP ANY TABLE TO TESTS_OCS;
GRANT LOCK ANY TABLE TO TESTS_OCS;
GRANT CREATE ANY TABLE TO TESTS_OCS;
GRANT SELECT ANY TABLE TO TESTS_OCS;
REM Privileges required to perform cloning of dependent objects.
GRANT CREATE ANY TRIGGER TO TESTS_OCS;
GRANT CREATE ANY INDEX TO TESTS_OCS;
In our test scenario we will separate the content as Legacy, Day1, Day2, Day3 and Future. This last one will partitioned automatically using 3 tablespaces in a round robin mode. In a real scenario the partition rule could be per month, per year or any rule that you choose.
Table spaces for the test scenario:
CREATE TABLESPACE TESTS_OCS_PART_LEGACY DATAFILE 'tests_ocs_part_legacy.dat' SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED;
CREATE TABLESPACE TESTS_OCS_PART_DAY1 DATAFILE 'tests_ocs_part_day1.dat' SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED;
CREATE TABLESPACE TESTS_OCS_PART_DAY2 DATAFILE 'tests_ocs_part_day2.dat' SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED;
CREATE TABLESPACE TESTS_OCS_PART_DAY3 DATAFILE 'tests_ocs_part_day3.dat' SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED;
CREATE TABLESPACE TESTS_OCS_PART_ROUND_ROBIN_A 'tests_ocs_part_round_robin_a.dat' DATAFILE SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED;
CREATE TABLESPACE TESTS_OCS_PART_ROUND_ROBIN_B 'tests_ocs_part_round_robin_b.dat' DATAFILE SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED;
CREATE TABLESPACE TESTS_OCS_PART_ROUND_ROBIN_C 'tests_ocs_part_round_robin_c.dat' DATAFILE SIZE 500K AUTOEXTEND ON NEXT 500K MAXSIZE UNLIMITED;
Before start, gather optimizer statistics on the actual FileStorage table:
EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'FileStorage', cascade => TRUE);
Now check if is possible execute the redefinition process:
EXEC DBMS_REDEFINITION.CAN_REDEF_TABLE('TESTS_OCS', 'FileStorage',DBMS_REDEFINITION.CONS_USE_PK);
If no errors messages, you are good to go.
Create a Partitioned Interim FileStorage table.
You need to create a new table with the partition information to act as an interim table:
CREATE TABLE FILESTORAGE_Part
(
DID NUMBER(*,0) NOT NULL ENABLE,
DRENDITIONID VARCHAR2(30 CHAR) NOT NULL ENABLE,
DLASTMODIFIED TIMESTAMP (6),
DFILESIZE NUMBER(*,0),
DISDELETED VARCHAR2(1 CHAR),
BFILEDATA BLOB
)
LOB (BFILEDATA) STORE AS SECUREFILE
(
ENABLE STORAGE IN ROW
NOCACHE LOGGING
KEEP_DUPLICATES
NOCOMPRESS
)
PARTITION BY RANGE (DLASTMODIFIED)
INTERVAL (NUMTODSINTERVAL(1,'DAY'))
STORE IN (TESTS_OCS_PART_ROUND_ROBIN_A, TESTS_OCS_PART_ROUND_ROBIN_B, TESTS_OCS_PART_ROUND_ROBIN_C)
(
PARTITION FILESTORAGE_PART_LEGACY VALUES LESS THAN (TO_DATE('05-APR-2012 12.00.00 AM', 'DD-MON-YYYY HH.MI.SS AM'))
TABLESPACE TESTS_OCS_PART_LEGACY
LOB (BFILEDATA) STORE AS SECUREFILE
( TABLESPACE TESTS_OCS_PART_LEGACY
RETENTION NONE
DEDUPLICATE
COMPRESS HIGH
),
PARTITION FILESTORAGE_PART_DAY1 VALUES LESS THAN (TO_DATE('06-APR-2012 07.25.00 PM', 'DD-MON-YYYY HH.MI.SS AM'))
TABLESPACE TESTS_OCS_PART_DAY1
LOB (BFILEDATA) STORE AS SECUREFILE
( TABLESPACE TESTS_OCS_PART_DAY1
RETENTION AUTO
KEEP_DUPLICATES
COMPRESS
),
PARTITION FILESTORAGE_PART_DAY2 VALUES LESS THAN (TO_DATE('06-APR-2012 07.55.00 PM', 'DD-MON-YYYY HH.MI.SS AM'))
TABLESPACE TESTS_OCS_PART_DAY2
LOB (BFILEDATA) STORE AS SECUREFILE
( TABLESPACE TESTS_OCS_PART_DAY2
RETENTION AUTO
KEEP_DUPLICATES
NOCOMPRESS
),
PARTITION FILESTORAGE_PART_DAY3 VALUES LESS THAN (TO_DATE('06-APR-2012 07.58.00 PM', 'DD-MON-YYYY HH.MI.SS AM'))
TABLESPACE TESTS_OCS_PART_DAY3
LOB (BFILEDATA) STORE AS SECUREFILE
( TABLESPACE TESTS_OCS_PART_DAY3
RETENTION AUTO
KEEP_DUPLICATES
NOCOMPRESS
)
);
After the creation you should see your partitions defined.
Note that only the fixed range partitions have been created, none of the interval partition have been created.
Start the redefinition process:
BEGIN
DBMS_REDEFINITION.START_REDEF_TABLE(
uname => 'TESTS_OCS'
,orig_table => 'FileStorage'
,int_table => 'FileStorage_PART'
,col_mapping => NULL
,options_flag => DBMS_REDEFINITION.CONS_USE_PK
);
END;
This operation can take some time to complete, depending how many contents that you have and on the size of the table.
Using the DBA user you can check the progress with this command:
SELECT * FROM v$sesstat WHERE sid = 1;
Copy dependent objects:
DECLARE
redefinition_errors PLS_INTEGER := 0;
BEGIN
DBMS_REDEFINITION.COPY_TABLE_DEPENDENTS(
uname => 'TESTS_OCS'
,orig_table => 'FileStorage'
,int_table => 'FileStorage_PART'
,copy_indexes => DBMS_REDEFINITION.CONS_ORIG_PARAMS
,copy_triggers => TRUE
,copy_constraints => TRUE
,copy_privileges => TRUE
,ignore_errors => TRUE
,num_errors => redefinition_errors
,copy_statistics => FALSE
,copy_mvlog => FALSE
);
IF (redefinition_errors > 0) THEN
DBMS_OUTPUT.PUT_LINE('>>> FileStorage to FileStorage_PART temp copy Errors: ' || TO_CHAR(redefinition_errors));
END IF;
END;
With the DBA user, verify that there's no errors:
SELECT object_name, base_table_name, ddl_txt FROM DBA_REDEFINITION_ERRORS;
*Note that will show 2 lines related to the constrains, this is expected.
Synchronize the interim table FileStorage_PART:
BEGIN
DBMS_REDEFINITION.SYNC_INTERIM_TABLE(
uname => 'TESTS_OCS',
orig_table => 'FileStorage',
int_table => 'FileStorage_PART');
END;
Gather statistics on the new table:
EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'FileStorage_PART', cascade => TRUE);
Complete the redefinition:
BEGIN
DBMS_REDEFINITION.FINISH_REDEF_TABLE(
uname => 'TESTS_OCS',
orig_table => 'FileStorage',
int_table => 'FileStorage_PART');
END;
During the execution the FileStorage table is locked in exclusive mode until finish the operation.
After the last command the FileStorage table is partitioned.
If you have contents out of the range partition, you should see the new partitions created automatically, not generating an error if you “forgot” to create all the future ranges. You will see something like:
You now can drop the FileStorage_PART table:
DROP TABLE FileStorage_PART PURGE;
To check the FileStorage table is valid and is partitioned, use the command:
SELECT num_rows,partitioned
FROM user_tables
WHERE table_name = 'FILESTORAGE';
You can list the contents of the FileStorage table in a specific partition, per example:
SELECT *
FROM FileStorage
PARTITION (FILESTORAGE_PART_LEGACY)
Some useful commands that you can use to check the partitions, note that you need to run using a DBA user:
SELECT *
FROM DBA_TAB_PARTITIONS
WHERE table_name = 'FILESTORAGE';
SELECT *
FROM DBA_TABLESPACES
WHERE tablespace_name like 'TESTS_OCS%';
After the redefinition process complete you have a new FileStorage table storing all content that has the Storage rule pointed to the JDBC Storage and partitioned using the rule set during the creation of the temporary interim FileStorage_PART table.
At this point you can test the WebCenter Content downloading the documents (Original and Renditions). Note that the content could be already in the cache area, take a look in the weblayout directory to see if a file with the same id is there, then click on the web rendition of your test file and see if have created the file and you can open, this means that is all working.
The redefinition process can be repeated many times, this allow you test what the better layout, over and over again.
Now some interesting maintenance actions related to the partitions:
- Make an tablespace read only.
- No issues viewing, the WebCenter Content do not alter the revisions
- When try to delete an content that is part of an read only tablespace, an error will occurs and the document will not be deleted
- The only way to prevent errors today is creating an custom component that checks the partitions and if you have an document in an “Read Only” repository, execute the deletion process of the metadata and mark the document to be deleted on the next db maintenance, like a new redefinition.
- Take an tablespace off-line for archiving purposes or any other reason.
- When you try open an document that is included in this tablespace will receive an error that was unable to retrieve the content, but the others online tablespaces are not affected.
- Same behavior when deleting documents.
- Again, an custom component is the solution. If you have an document “out of range”, the component can show an message that the repository for that document is offline. This can be extended to a option to the user to request to put online again.
- Moving some legacy content to an offline repository (table) using the Exchange option to move the content from one partition to a empty nonpartitioned table like FileStorage_LEGACY. Note that this option will remove the registers from the FileStorage and will not be able to open the stored content. You always need to keep in mind the indexes and constrains.
- An redefinition separating the original content (vault) from the renditions and separate by date ate the same time. This could be an option for DAM environments that want to have an special place for the renditions and put the original files in a storage with less performance.
- The process will be the same, you just need to change the script of the interim table to use composite partitioning. Will be something like:
CREATE TABLE FILESTORAGE_RenditionPart
(
DID NUMBER(*,0) NOT NULL ENABLE,
DRENDITIONID VARCHAR2(30 CHAR) NOT NULL ENABLE,
DLASTMODIFIED TIMESTAMP (6),
DFILESIZE NUMBER(*,0),
DISDELETED VARCHAR2(1 CHAR),
BFILEDATA BLOB
)
LOB (BFILEDATA) STORE AS SECUREFILE
(
ENABLE STORAGE IN ROW
NOCACHE LOGGING
KEEP_DUPLICATES
NOCOMPRESS
)
PARTITION BY LIST (DRENDITIONID)
SUBPARTITION BY RANGE (DLASTMODIFIED)
(
PARTITION Vault VALUES ('primaryFile')
( SUBPARTITION FILESTORAGE_VAULT_LEGACY VALUES LESS THAN (TO_DATE('05-APR-2012 12.00.00 AM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_VAULT_DAY1 VALUES LESS THAN (TO_DATE('06-APR-2012 07.25.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_VAULT_DAY2 VALUES LESS THAN (TO_DATE('06-APR-2012 07.55.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_VAULT_DAY3 VALUES LESS THAN (TO_DATE('06-APR-2012 07.58.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_VAULT_FUTURE VALUES LESS THAN (MAXVALUE)
)
,PARTITION WebLayout VALUES ('webViewableFile')
( SUBPARTITION FILESTORAGE_WEBLAYOUT_LEGACY VALUES LESS THAN (TO_DATE('05-APR-2012 12.00.00 AM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_WEBLAYOUT_DAY1 VALUES LESS THAN (TO_DATE('06-APR-2012 07.25.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_WEBLAYOUT_DAY2 VALUES LESS THAN (TO_DATE('06-APR-2012 07.55.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_WEBLAYOUT_DAY3 VALUES LESS THAN (TO_DATE('06-APR-2012 07.58.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_WEBLAYOUT_FUTURE VALUES LESS THAN (MAXVALUE)
)
,PARTITION Special VALUES ('Special')
( SUBPARTITION FILESTORAGE_SPECIAL_LEGACY VALUES LESS THAN (TO_DATE('05-APR-2012 12.00.00 AM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_SPECIAL_DAY1 VALUES LESS THAN (TO_DATE('06-APR-2012 07.25.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_SPECIAL_DAY2 VALUES LESS THAN (TO_DATE('06-APR-2012 07.55.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_SPECIAL_DAY3 VALUES LESS THAN (TO_DATE('06-APR-2012 07.58.00 PM', 'DD-MON-YYYY HH.MI.SS AM')) LOB (BFILEDATA) STORE AS SECUREFILE
, SUBPARTITION FILESTORAGE_SPECIAL_FUTURE VALUES LESS THAN (MAXVALUE)
)
)ENABLE ROW MOVEMENT;
The next post related to partitioned repository will come with an sample component to handle the possible exceptions when you need to take off line an tablespace/partition or move to another place.
Also, we can include some integration to the Retention Management and Records Management.
Another subject related to partitioning is the ability to create an FileStore Provider pointed to a different database, raising the level of the distributed storage vs. performance.
Let us know if this is important to you or you have an use case not listed, leave a comment.
Cross-posted on the WebCenter A-Team Blog