Even if you have storage appliances and a lot of terabytes, thinks like backup, compression, deduplication, storage relocation, encryption, availability could be a nightmare.
One standard option that you have with the Oracle WebCenter Content is to store data to the database. And the Oracle Database allows you leverage features like compression, deduplication, encryption and seamless backup.
But with a huge volume, the challenge is passed to the DBA to keep the WebCenter Content Database up and running.
One solution is the use of DB partitions for your content repository, but what are the implications of this? Can I fit this with my business requirements?
Well, yes. It’s up to you how you will manage that, you just need a good plan. During you “storage brainstorm plan” take in your mind what you need, such as storage petabytes of documents? You need everything on-line? There’s a way to logically separate the “good content” from the “legacy content”?
The first thing that comes to my mind is to use the creation date of the document, but you need to remember that this document could receive a lot of revisions and maybe you can consider the revision creation date. Your plan can have also complex rules like per Document Type or per a custom metadata like department or an hybrid per date, per DocType and an specific virtual folder.
Extrapolation the use, you can have your repository distributed in different servers, different disks, different disk types (Such as ssds, sas, sata, tape,…), separated accordingly your business requirements, separating the “hot” content from the legacy and easily matching your compliance requirements.
If you think to use by revision, the simple way is to consider the dId, that is the sequential unique id for every content created using the WebCenter Content or the dLastModified that is the date field of the FileStorage table that contains the date of inclusion of the content to the DB Table using SecureFiles.
Using the scenario of partitioned repository using an hierarchical separation by date, we will transform the FileStorage table in an partitioned table using “Partition by Range” of the dLastModified column (You can use the dId or a join with other tables for other metadata such as dDocType, Security, etc…).
The test scenario bellow covers:
- Previous existent data on the JDBC Storage to be migrated to the new partitioned JDBC Storage
- Partition by Date
- Automatically generation of new partitions based on a pre-defined interval (Available only with Oracle Database 11g+)
- Deduplication and Compression for legacy data
- Oracle WebCenter Content 11g PS5 (Could present some customizations that do not affect the test scenario)
Enable the metadata StorageRule in the UI and upload some documents using this rule.
For this test case you can run using the schema owner or an dba user. We will use the schema owner TESTS_OCS.
I can’t forgot to tell that this is just a test and you should do a proper backup of your environment.
When you use the schema owner, you need some privileges, using the dba user grant the privileges needed:
In our test scenario we will separate the content as Legacy, Day1, Day2, Day3 and Future. This last one will partitioned automatically using 3 tablespaces in a round robin mode. In a real scenario the partition rule could be per month, per year or any rule that you choose.
Table spaces for the test scenario:
Before start, gather optimizer statistics on the actual FileStorage table:
Now check if is possible execute the redefinition process:
If no errors messages, you are good to go.
Create a Partitioned Interim FileStorage table.
You need to create a new table with the partition information to act as an interim table:
After the creation you should see your partitions defined.
Note that only the fixed range partitions have been created, none of the interval partition have been created.
Start the redefinition process:
This operation can take some time to complete, depending how many contents that you have and on the size of the table.
Using the DBA user you can check the progress with this command:
Copy dependent objects:
With the DBA user, verify that there's no errors:
*Note that will show 2 lines related to the constrains, this is expected.
Synchronize the interim table FileStorage_PART:
Gather statistics on the new table:
Complete the redefinition:
During the execution the FileStorage table is locked in exclusive mode until finish the operation.
After the last command the FileStorage table is partitioned.
If you have contents out of the range partition, you should see the new partitions created automatically, not generating an error if you “forgot” to create all the future ranges. You will see something like:
You now can drop the FileStorage_PART table:
To check the FileStorage table is valid and is partitioned, use the command:
You can list the contents of the FileStorage table in a specific partition, per example:
Some useful commands that you can use to check the partitions, note that you need to run using a DBA user:
After the redefinition process complete you have a new FileStorage table storing all content that has the Storage rule pointed to the JDBC Storage and partitioned using the rule set during the creation of the temporary interim FileStorage_PART table.
At this point you can test the WebCenter Content downloading the documents (Original and Renditions). Note that the content could be already in the cache area, take a look in the weblayout directory to see if a file with the same id is there, then click on the web rendition of your test file and see if have created the file and you can open, this means that is all working.
The redefinition process can be repeated many times, this allow you test what the better layout, over and over again.
Now some interesting maintenance actions related to the partitions:
- Make an tablespace read only.
- No issues viewing, the WebCenter Content do not alter the revisions
- When try to delete an content that is part of an read only tablespace, an error will occurs and the document will not be deleted
- The only way to prevent errors today is creating an custom component that checks the partitions and if you have an document in an “Read Only” repository, execute the deletion process of the metadata and mark the document to be deleted on the next db maintenance, like a new redefinition.
- When you try open an document that is included in this tablespace will receive an error that was unable to retrieve the content, but the others online tablespaces are not affected.
- Same behavior when deleting documents.
- Again, an custom component is the solution. If you have an document “out of range”, the component can show an message that the repository for that document is offline. This can be extended to a option to the user to request to put online again.
- The process will be the same, you just need to change the script of the interim table to use composite partitioning. Will be something like:
The next post related to partitioned repository will come with an sample component to handle the possible exceptions when you need to take off line an tablespace/partition or move to another place.
Also, we can include some integration to the Retention Management and Records Management.
Another subject related to partitioning is the ability to create an FileStore Provider pointed to a different database, raising the level of the distributed storage vs. performance.
Let us know if this is important to you or you have an use case not listed, leave a comment.
Cross-posted on the WebCenter A-Team Blog