This document was written and massaged during a four-month Oracle, SQL, system tuning assignment. All proprietary or personal information has been removed. The application required precise security, so the extra-cost Oracle Label Security (OLS) package was licensed. The self-tuning was amazing. The OLS software experienced a 33% performance boost by the tuning.
Sections with notes and/or questions where resolved as issues were addressed and document was updated. A document named tuning notes results was also produced and will linked here shortly.
The Oracle RDBMS was developed on the DEC VAX/VMS hardware/software platform. The database was subsequently ported to Data General AOS and AOS/VS. These platforms ran proprietary operating systems as well as UNIX-derivatives such as Ultrix-32 and System V.
Oracle was written in C and assembly language and is considered to be a UNIX application. It became available running on Windows hosts as a 16-bit version in 1984. The product is available today in both 32-bit and 64-bit versions on all industry platforms: Windows and UNIX/Linux.
The Oracle Optimal Flexible Architecture (OFA) defines the database footprint on the host machine. Considering how each file in the system operates, read-only, read-write, sequential read, write, etc., an optimal configuration would require over twenty disk controllers and storage devices. Resizing incrementally, consolidating types of files into groups with compatible operations, a DBA might acquire two two-port controllers, and four drives, and configure accordingly.
Under Windows, or any single-disk system, sizing devices on singular controllers, and using multiplexed files (control, redo, other) is largely a visual exercise. Even on UNIX systems the hardware resources are not always available. Individual disk devices (slices or partitions) for /u01, /u02 and /u03 may not be possible or feasible.
Oracle 11g is a very significant database release. The Oracle RDBMS runs on so many platforms that changes must be implemented across many platforms, sometimes written in port specific languages, and as a result, major changes or platform unique enhancements are slow to arrive.
The file structures associated with log files have been enhanced with a command-line interface.
The Automatic Diagnostic Repository (ADR) Command Interpreter (ADRCI) is new in 11g. This tool is based on the new software footprint in 11g, and takes advantage of the XML logs.
Oracle’s bullets on this product, as follows:
- View diagnostic data within the Automatic Diagnostic Repository (ADR).
- View Health Monitor reports.
- Package incident and problem information into a zip file for transmission to Oracle Support.
The Automatic Diagnostic Repository Command Interface (ADRCI) facility allows the capture of associated files from different product homes on the host into an incident that may be recorded with MyOracleSupport (MOS).
This government system is not going to be running Oracle Configuration Manager (OCM) and is not going to be connecting with Oracle support directly; but, the ability to package all messages as an “incident” is available. These files may be provided to MOS manually with an SR (Service Request).
Once properly installed, taking the physical implementation into consideration, in Oracle11gR2, the act of tuning the database means configuring the system global area (SGA) and the various application program global areas (PGAs). Features of the operating system should have been considered for the initial installation.
Once these areas are properly configured, sized and locked into memory (RAM) there is not a whole lot that can be done to further tune the database. The database works as fast as it is allowed under all circumstances.
Oracle was originally delivered with a rule-based optimizer (RBO) which remained exclusively through Oracle7i. This optimizer required the programmer to know about the database and data, including row counts. In addition to understanding the data model, the join order, and the contents of the tables, the programmer was forced to put SQL hints in the code in order to tell the optimizer how to retrieve data.
Oracle says the RBO no longer exists in 11g. There are plenty of posts on migrating from 9i with RBO to 11g using cost-based optimizer (CBO). This is one of two places where Oracle recommends stored outlines, to maintain a working plan through a code migration. This is done for plan stability for a short time, through the migration, not as a permanent fix to all plans.
In earlier versions, hinted SQL code forced the RBO. Oracle11g only offers the CBO. Hints are still available and used by Oracle, so are also acceptable for developer’s use.
The optimizer affects the SQL code, not so much the database configuration.
Oracle Database 11g forces the use of the CBO, all tables in the database need to have statistics, including all of the dictionary tables. Statistics are also gathered on the dynamic performance tables (fixed objects) such as v$SQL due to the new, default value for optimizer_mode.
The RBO was the original Oracle optimizer used exclusively through Oracle7i. The RBO does not support any new features of Oracle since 1994 (bitmap indexes, table partitions, function-based indexes, and others). The RBO was de-supported in Oracle Database 10g, so in Oracle Database 11g we must use the CBO.
The CBO has been progressively improved, and continues to be improved in 11gR2. Improvements are not a subject of this document. The CBO is the default in 11gR2.
The optimizer will no longer use the RBO when a table has no statistics. The CBO will use dynamic sampling to get statistics for these tables.
The Oracle Scheduler manages a job which runs during maintenance windows daily and on weekends.
Manually updating the statistics is not required. Statistics are collected by this scheduled job when a change greater the 20% is calculated. We want the CBO to determine data access and perform accordingly.
If a data load changes the execution plan, do we really need the changed plan to be implemented now? If not, waiting for the system to update the statistics should be satisfactory.
The default value for OPTIMIZER_MODE is ALL_ROWS. The other two possible values are FIRST_ROWS_N and FIRST_ROWS. The CHOOSE and RULE values are no longer supported.
We manually set the mode to FIRST_ROWS_100.
Both STATSPACK and the Automatic Workload Repository (AWR) require statistics to be present.
Statistics are automatically collected for an index at creation. No compute statistics clause is required as part of the create index statement.
Several new optimizer methods and V$SQL_<statements> are also available. (See product documentation for latest changes).
The RBO was de-supported in Oracle Database 10g, so in Oracle Database 11g you must use the CBO, per Oracle documentation since 2009.
The following parameters may be adjusted from default, as required:
- hash_area_size (if not using pga_aggregate_target)
- sort_area_size (if not using pga_aggregate_target)
My experience with 10g/11g with these values says “do not bother”. Messing with these setting always seemed to make things worse than finding and correcting the problem to start.
Working under the premise that the database is properly configured, sized with the SGA locked in memory, the database will run as fast as possible, and “wait” when told to “wait”. With external factors like disk subsystems and network connections accounted for, there is no further tuning of the database recommended, or available. Any additional tuning will take place in the code by either changing the code, modifying the structure or the access method (indices, perhaps).
The Oracle Wait Interface (OWI) wait state views are the best way to quickly identify our performance bottlenecks. We are interested in those “wait states” and will concentrate our tuning on them.
OWI has existed in the Oracle RDBMS since Oracle7i. It was written by Oracle and resides in the database kernel code. The OWI has evolved from less than 100 counters in Oracle7i, to nearly 1000 in 11g.
Limitations have been identified in several areas, as follows:
- CPU Statistics
- End-to-End Visibility
- Historical Data
Limitations are becoming fewer with each Oracle version. Licensing remains an issue for customers without Enterprise Edition and/or the various management and performance tuning packs. We do not suffer those particular limitations.
Prior to the OWI we used trace events (10046, 9i and earlier, or dbms_monitor, 10g and later) and “tkprof” to format the trace file contents. There were other tools prior to that… barely notable.
Currently available 3rd party tools, like Quest products Toad and SQL Optimizer still use these methods and tools. They do not consider the OWI information in their analysis or solutions. They are using old tuning techniques of maintaining cache buffer hit rates.
Tom Kyte, Oracle employee, guru developer (asktom.oracle.com) and author has the following to say about ratios:
"Ratios STINK (I have stronger words but this is a public forum after all).
There is one ratio I use -- soft parse ratio (the ratio of soft to hard parses). It should be near 100 for most systems.
All other ratios -- forget about them. There is no magic "good number". It is like the stupidest ratio of them all - cache hit. I have a theory that systems with high cache hit ratios, over 95, 96% - are among the most poorly tuned systems. They are experiencing excessive LIO's due to massive nested loop joins. Yet, their DBA's sit there and say "well, my cache hit is 99% so all is well in the world".
I have no ratios for you. If you have a query that you need to execute and the best we can do is 1,000,000 LIO's -- then SO BE IT, that is that. However, if that query could be executing without doing 1,000,000 LIO's then we need to fix it."
Oracle DBAs have been using hit ratios for 25 years to tune their databases. Strikes me as silly, which is why we are undertaking this OWI “wait event” approach.
Note: get “oracle’s trace analyzer” from MOS, and compare functionality with tkprof.
Note: confirmed: The Oracle Grid Control (i.e., original rename of thick OEM) may format this information. This tool is still available with a service request. Available for download from OTN is the 12c (cloud) version. Reviewing its’ features did not reveal where tuning was done.
Wait contention may be detected between instance-wide wait events and block-level waits, as identified in AWR and ASH reports (i.e., custom performance reports).
There are 500+ performance metrics captured by the AWR.
“The AWR and ASH are arguably the most exciting performance optimization tools in Oracle’s history. They can provide the foundation for the use of artificial intelligence techniques to be applied to Oracle performance monitoring and optimization. As Oracle evolves, DBAs expect that the AWR and ASH will largely automate the tedious and time-consuming task of Oracle tuning.”
— Burleson, aka, Scary Guy
Do we use Stored Outlines?
Stored outlines are collections of SQL hints associated with SQL statements which force the optimizer to use a particular execution plan. With Stored Outlines there was no easy way to determine when an existing plan was no longer the best plan. This has been replaced by a newer component which does not appear to have the same limitations.
The best execution plan today may not be the best plan tomorrow. These just did not work well for most installations. They were primarily used for DB migrations from the RBO to the CBO environments to provide stable execution until code could be revised… as part of the migration.
“Oracle highly recommends the use of SQL plan baselines instead of the stored outlines.”
When you take over the decisions the optimizer makes, and force a particular execution plan regardless, it isn’t safe. It may conceal a problem, for a while. It may even be an escape at some point; but, eventually, a new plan is going to be required. If you’re forcing it to be a pre-determined, stored plan, it doesn’t ever get that option to tell you that your plan is out of date.
So, stored outlines don’t work. And Oracle recommends against them…
How about using SQL Plan Management (SPM)?
This is a similar facility to Stored Outlines with the issues resolved. In addition to the stored execution plan, a profile is created which tells the optimizer “never take a plan worse than this… but, if a better one exists, go ahead.”
There are space restrictions on the #1 & #2-Clone VMs.
Instances are not running in Archive Log Mode due to space limitations. Backups may only be run “cold”, and tablespace point-in-time recovery cannot be accomplished without archived redo logs. Archive log mode should be turned on for performance tuning and archive logs resized, if required, to roll at ten to twenty minute intervals under load.
Backups should be implemented and scheduled. A test restore will eventually be accomplished, likely with the building (re-building) of a DB clone from one host to another, using RMAN.
The time on the #1 host is set to Standard Time (+0000 UTC), consistent with other environments.
The current database time zone setting on this host is “America/Anguilla” which is GMT -4. Information on UTC and San Diego follows:
|Standard time zone:||UTC/GMT -8 hours|
|Daylight saving time:||+1 hour|
|Current time zone offset:||UTC/GMT -7 hours|
|Time zone abbreviation:||PDT – Pacific Daylight Time|
While this is not a tuning or performance issue, it makes following transactions in trace files and logs very difficult. The databases on #1-Clone and local Windows machine are set to Pacific.
Note: time_drift_detected (set time) see VKTM trace files. (from alert log). This only appears on #1, not #2-clone or local Windows DBs, so it is probably associated with the setting.
Not a big issue; but, not having it consistent makes matching job queues, trace and log files, etc. to the wall clock time where an action (such as a test session) takes place.
As it turned out, this was a “big issue”, as maintenance was taking place in prime time.
The decision has been made to use “GMT” as the time zone for the server and the database. #1 and #2-Clone were found to be wrong and it did cause problems.
The server OS must be set to use GMT. It was using UTC. Oracle handles daylight saving time when using UTC. Please see Linux documentation for setting OS time zones. When using Virtual Machines (VMs) please be sure that the VM host time zone and time are also set correctly.
From SQL*Plus, please execute the following commands:
select dbms_scheduler.stime from dual ;
> check result, if not “GMT”, execute the additional commands from SQL*Plus:
> note response that command was executed successfully
select dbms_scheduler.stime from dual ;
> note that it is properly set now to GMT.
Alter database set timezone = ‘GMT’ ;
The list of acceptable time zone abbreviations may be found in the following location:
This list is not needed for this operation; however, you may want to confirm valid values.
Oracle Automatic Database Diagnostic Monitor (ADDM) and Oracle Automatic Workload Repository (AWR) both use the same database snapshots. The interval of collection is the same for both; however, the retention period may be different, as described on this page.
Active session history (ASH) is snapped once each second and the data are held in the System Global Area (SGA) in a circular buffer. Most if not all ASH data are also stored in the AWR.
The default values for ADDM and AWR snapshots are 1) 60 minutes and 2) 7 days.
ADDM retention is set from Oracle EM (Database Control) on the Advisor Central page by clicking on the “change default parameters” button and following instructions. Do not set this value less than eight days.
AWR is configured to use an eight day System Moving Window as a baseline. This allows comparing performance between today and a week ago.
The AWR interval and retention are set from SQL*Plus with the following command:
(interval => 60, retention => 11520) ;
Both times are in minutes. Interval one hour. Retention eight days.
Confirm what you set from SQL*Plus with the following command:
select * from dba_hist_wr_control;
We were using asynchronous I/O in the initial AMM configuration; however, due to performance problems, it was taken out. We elected not to re-introduce this because it depends on how the Oracle binaries are linked in the system.
In order to use async I/O in Oracle make sure the following parameters are set to be:
These parameters are set with the following command:
alter system set disk_asynch_io=’TRUE’ scope=spfile sid=’*’;
alter system set filesystemio_options=’SETALL’ scope=spfile sid=’*’;
Confirm from SQL*Plus, as follows:
SQL> show parameter disk_asynch_io
SQL> show parameter filesystemio_options
The filesystemio_options can have the following values:
asynch: This value enables asynchronous I/O on file system files.
directio: This value enables direct I/O on file system files.
setall: This value enables both asynchronous and direct I/O on file system files.
none: This value disables both asynchronous and direct I/O on file system files.
Note: already set in spfile provided. Syntax provided here for example only.
To verify whether $ORACLE_HOME/bin/oracle was linked with async I/O, use the following Linux commands to obtain the answers (example taken from #1)
>$ ldd $ORACLE_HOME/bin/oracle | greplibaio<enter>
>libaio.so.1 => /lib64/libaio.so.1 (0x000000304c800000)
>$ nm $ORACLE_HOME/bin/oracle | grepio_getevent<enter>
The Oracle Alert Log on the #1 instance advises that Linux large pages are not being used. Linux HugePages (or large pages) are not compatible with AMM and will not be used.
The following warning and recommendation should be ignored:
Total Shared Global Region in Large Pages = 0 KB (0%)
Large Pages used by this instance: 0 (0 KB)
Large Pages unused system wide = 0 (0 KB) (alloc incr 16 MB)
Large Pages configured system wide = 0 (0 KB)
Large Page size = 2048 KB
RECOMMENDATION (by Oracle):
Total Shared Global Region size is 1762 MB. For optimal performance, prior to the next instance restart increase the number of unused Large Pages by at least 881 2048 KB Large Pages (1762 MB) system wide to get 100% of the Shared Global Region allocated with Large pages.
This was a recommendation made by Oracle on #1 while configured to use Automatic Shared Memory Management (ASMM). We are using Automatic Memory Management (AMM), introduced in 11g, in order to benefit from automatic tuning of both SGA and PGA.
A large number of reputable web pages confuse these two acronyms, some seeming to use them interchangeably. They are not interchangeable.
Additionally, confusing is the one for Automatic Storage Management (ASM).
An interesting aside, AMM is not currently compatible with ASM. Expect that to change in a future release, as these two techniques must be made to work together.
In 11g and beyond, Automatic Memory Management (AMM) is to be used to dynamically control the SGA & PGA operation. Linux HugePages are not compatible with AMM and will not be used.
This determination was based on the two RAM configurations of 8 GB and 12 GB both being very small and not requiring extended amounts of memory. A larger RAM system may benefit from ASMM and HugePages; but, the manual tuning of PGA is not manually scalable. It needs to be automated and automatic.
The AMM memory_max_targetparameter specifies the max size that memory_target may take.
Oracle’s recommendation on memory_max_target:
“For the MEMORY_MAX_TARGET initialization parameter, decide on a maximum amount of memory that you would want to allocate to the database for the foreseeable future.
That is, determine the maximum value for the sum of the SGA and instance PGA sizes.”
Note: The traditional buffer pool settings are not ignored if set to non-zero numbers. Instead, these values are used as minimums for the respective pool. The SGA will always maintain that minimum size, if set.
Note: 11g new parameter RESULT_CACHE_MAX_SIZE
“The result cache is comprised of the SQL query result cache and PL/SQL function result cache, which share the same memory structures. Results of queries and query fragments can now be cached in memory in the SQL query result cache. The database can then use those cached results to answer future executions of those queries and query fragments. Similarly, the PL/SQL Function Result can also be cached.”
To use, set RESULT_CACHE_MODE=FORCE — (otherwise, MANUAL)
Excellent URL on AMM:
Linux Huge Pages are not being used at this time on <customer instance>.
Starting with the first patch set for Oracle Database 11g Release 2 (188.8.131.52), Oracle Database patch sets are full installations of the Oracle Database software.
Oracle recommends that you perform an out-of-place patch set upgrade. It requires much less downtime and is safer as it does not require patching an ORACLE_HOME that is already being used in production. For an out-of-place patch set upgrade, install the patch set into a new, separate Oracle home location. After you install the patch upgrade, you then migrate the Oracle Database from the older Oracle home. The patch set upgrade is now the same process to upgrade from one version to another.
With the specification of ORACLE_BASE (C:\oracle\product\11.2.0) and ORACLE_HOME (C:\oracle\product\11.2.0\OraDB1) multiple 11g libraries are not easy installed within the same Oracle base using a different Oracle home.
Due to an installation error (PRCT-1011 : Failed to run “getcrshome”) on both Windows and Linux, there is no convenient way to customize this structure. My recommendation would be to create Oracle Base, as “\oracle\product”, and Oracle Home as “\oracle\product\184.108.40.206.0\<name>”.
There are two customizations in that string, re-iterated, as follows:
- full product version
- custom name (i.e., not Oracle default) for home name, in our case, “<removed>”
Because of the getcrshome failure, the custom name is not possible. The path can be done on Windows…. not tried yet on Linux.
The SYSMAN encryption key must be backed up.
On Windows, this key is located in the following folder:
on Linux, in the following location & file:
The emkey is an encryption key that is used to encrypt and decrypt sensitive data in Enterprise Manager such as host passwords, database passwords and others. By default, the emkey is stored in the ORACLE_HOME/sysman/config/emkey.ora file. The location of this file can be changed.
If the emkey.ora file is lost or corrupted, all the encrypted data in the Management Repository becomes unusable. Maintain a backup copy of this file on another system.
During startup, the Oracle Management Service checks the status of the emkey. If the emkey has been properly configured, it uses it encrypting and decrypting data.
Scheduled jobs for the following:
AutoTask enables Oracle Database to automatically schedule Automatic Maintenance Tasks to run during the established maintenance windows.
The following tasks may be automatically scheduled by AutoTask:
- Optimizer statistics gathering
- SQL Tuning Advisor
- Automatic Segment Advisor
- space advisor (may be same as Segment Advisor)
- memory tuning
- SGA tuning
- SQL Execution Memory
- UNDO MGMT (may be same as auto tune of undo retention)
SQL Plan Management (SPM)
- We are seeking methods for correcting poorly operating SQL, if not handled by concentrating on wait states this will be the facility that does it.
Automatic SQL Tuning Advisor – enabled by default
- runs in the default Maintenance windows and may take resources during the maintenance windows in the process of trying to tune the query because it does actual execution of the query to be tuned
- automatically runs on selected high-load SQL statements from the Automatic Workload Repository (AWR) that qualify as tuning candidates
other tools, as required:
- SQL Profiles
- Remote Diagnostic Agent (RDA) – not installed
3rd Party tools, used, or attempting to use follow:
- Toad for Oracle 11.5 – useful; but, lacking
- SQL Tracker – extremely useful
- Quest SQL Optimizer – has potential; but, does not use OWI “waits”
- Spotlight on Oracle – pretty, marginally useful
- Benchmark Factory (BMF) – issues
The Oracle error (ORA-39082) on importing the 220.127.116.11.0 Linux schemas into Windows 18.104.22.168.0 are not a significant bottleneck. Since performance tuning will be done on the #1 instances, the error in Windows may be overlooked.
The Quest tools are not using “wait” information available in the Oracle database. They require trace level 4 to be set, and WAIT information is only provided in level 8 or above. While they will accept the higher-level traces, they do nothing with the information which implies they are not using the latest available techniques to tune. “Hit Ratios” are dead. WAITs are the only way to go.
In order to execute BMF as originally envisioned, the follow two tasks need to be completed:
- post-logon trigger to assign trace levels
- assuming connections in JBoss pool stay connected in pool, even while not in use
- load trace files into BMF
- currently requires manual transfer
- if successful, requires SAMBA share from Linux
We are using the JBoss (5.1.0) application server which acts as a transaction manager allowing inserts, updates and deletes in the Oracle database. Each transaction must be committed or rolled back across all resources.
We also use the iBATIS data mapper framework which makes it easier to use a relational database with an object-oriented application. In iBATIS, one session (
EJB transaction) may have only one database connection. Select statements against a spatial index succeed; however, inserts, updates and deletes fail with the following Oracle error stack:
ORA-29875: failed in the execution of the ODCIINDEXINSERT routine
ORA-29400: data cartridge error
ORA-14450: attempt to access a transactional temp table already in use
ORA-06512: at “MDSYS.SDO_INDEX_METHOD_10I”, line 623 (not confirmed)
ORA-06512: at “MDSYS.SDO_INDEX_METHOD_10I”, line 227 (not confirmed)
This problem manifests itself when the code is executing in an XA transaction running under JBoss.
Code executed in stand-alone context, with or without XA, works: index updated. (answer?)
When spatial data are changed before an exception is raised in a distributed transaction the entire transaction is aborted; however, the temporary tables used by spatial do not get rolled back properly, leaving the DB in an inconsistent state.
The problem appears to be in Oracle Spatial. Oracle databases do not support temporary tables and XA transactions together. Distributed transactions are not supported for temporary tables. Oracle Spatial uses temporary tables.
This was identified as a breaking change between Oracle 9i and Oracle 10g.
Searches for these three error messages offer limited suggestions, as follows:
- drop the index, insert the data, create the index
- do not use hybrid spatial indexes (Use quad-tree index instead)
- do not use an r-tree index on data with more than two dimensions (Oracle 8.1.6)
- use two connections (one non-XA, one XA)
Further research points out the following:
- A non-XA driver/resource can participate in a distributed transaction if it is the last resource to commit
Other offered solutions:
- Check to see if the routine has been coded correctly.
- emulate 2PC with non XA drivers (example in WebLogic)
- The Oracle thin driver does not support 2PC protocol when two resources are involved
- ‘Required’ tx attribute forces operations involving connections from two different sources to share the same tx context which is possible only by using 2PC
- Check Oracle thin XA driver
The following information was forwarded to me in notes – and has not been acted on… DBR.
XA data source should be never used with ADF/Fusion applications because it closes all cursors upon commit. http://andrejusb.blogspot.com/2012/05/dont-use-oracles-driver-thin-xa-to.html
That was significance of above link… out of context, for WebLogic
non-XA JDBC drivers participate in XA transactions anyway by configuring a NonXaDataSource
(2) The app was using what I call an “emulated XA resource” for JDBC to get a non-XA JDBC driver to participate in a global tx. Unlike JDBC XA drivers, this is not completely transactionally safe in the event of a crash. (The “emulated XA resource” flag is actually called “enabletwophasecommit” or something like that in 8.1, the doc details its caveats). Side note: WL 9.0 has a JDBC “LLR” capability which is transactionally safe, and can be significantly faster than XA – it is limited to one “LLR” connection per transaction.
Wait Events – TX, Transaction
… more familiar with JBoss, etc. has switched over to TX transactions and the problem disappears.
<link broken and removed 6/2020>
Ignore the bit about bypassing XA transactions.
One of our guys more familiar with JBoss, etc. has switched over to TX transactions and the problem disappears. Unfortunately, this doesn’t really help us although it does point the finger at the XA transaction.
ACID Atomicity, Consistency, Isolation, and Durability (database transaction properties)
JBoss 5 does not support using multiple local-tx-datasources in a war, this being because the old method is broken and does not guarantee ACID, so basically, in the past it was possible to use it like this but it would result in errors, so now it is blocked from being possible to even do this type of configuration.
Ok, so the solution that is pointed is to use XA, luckily for me the underlying database is Oracle 10g, so we are able to switch to Oracle XA drivers, so with this I’ve looked up how to properly set up XA Data sources with Oracle, unfortunately the documentation and the examples include some hints, which appear to be old or no longer needed.
******************** End of Document ******************