Check out the $INFORMIXDIR/etc/*cr files, this will list the various products and versions. The $INFORMIXDIR/etc/.snfile will also give you the installation order
command: onstat -u INFORMIX-OnLine Version 7.14.UD2 -- On-Line -- Up 00:18:03 -- 147744 KbytesIf the words 'On-Line' do not appear in the returned line then there could be a problem. Check the log directly if the engine isn't running, otherwise use onstat -m
If the engine remains in 'Fast Recovery' for a period and there is no disk activity onstat -D restart the instance. The instance should go Online.
command: onstat -l or onstat -m
One of the possible causes is continuous logging has stopped or failed.
Command: ps -ef | egrep ontape informix 795 1 0 09:32:31 0 0:50 /infx/bin/ontape -c informix 12899 6863 1 12:59:34 2 0:00 egrep ontape
This indicates that continuous logging is running on one instance but not necessarily on the correct instance. Once the ontape has been running for a reasonable length of time it should be accumulating CPU time. If it isn't then the program might be waiting for user intervention.
If you are running in multi-instance environment then you need to check you are looking at the correct instance
Command: onstat -u
This identifies the ‘monitor’ threads running in the instance, onstat -u
INFORMIX-OnLine Version 7.14.UD2 -- On-Line -- Up 14:54:57 -- 195680 Kbytes Userthreads address flags sessid user tty wait tout locks nreads nwrites 40024010 ---P--D 0 informix - 0 0 0 189 570 40024444 ---P--F 0 informix - 0 0 0 0 0 40024878 ---P--F 0 informix - 0 0 0 0 0 40024cac ---P--F 0 informix - 0 0 0 0 0 4002d760 Y--P--- 2405 eric - 60d55a98 0 1 178 244 4002db94 ---P--M 2313 admin - 0 0 0 0 0 <Monitor> 4002dfc8 Y--P--- 2474 eric - 4021fd98 0 3 9 3 . . . 400348dc Y--P--- 2466 eric - 404bb4dc 0 3 23 46 40035144 Y--P--- 2476 eric - 404a38ac 0 1 18 15 51 active, 128 total, 71 maximum concurrent
If a monitor thread is found, indicated by the M in the flags column, then check to see if it is the correct thread.
Command: onstat -g ses <sessid>
INFORMIX-OnLine Version 7.14.UD2 -- On-Line -- Up 14:56:54 -- 195680 Kbytes session RSAM total used id user tty pid hostname threads memory memory 2313 admin - 9795 eric 1 4251648 4217944 tid name rstcb flags curstk status 2532 ontape 4002db94 ---P--M 1660 sleeping(secs: 4) ....
An ontape thread is running within this instance. If the system is not being archived then some form of logging is in place. A netnorm condition means the program is waiting user intervention.
Author: Carsten Haese
create sequence tmpseq; select tmpseq.nextval, sometable.id from sometable order by sometable.id; drop sequence tmpseq;
command: onstat -u
onstat -l. Look for a user that has significantly more read and/or writes than anybody else.
Userthreads address flags sessid user tty wait tout locks nreads nwrites 40024010 ---P--D 0 informix - 0 0 0 568 116 40024444 ---P--F 0 informix - 0 0 0 0 0 40024878 ---P--F 0 informix - 0 0 0 0 0 40024cac ---P--F 0 informix - 0 0 0 0 0 400250e0 ---P--F 0 informix - 0 0 0 0 0 40025514 ---P--F 0 informix - 0 0 0 0 0 40025948 ---P--F 0 informix - 0 0 0 0 0 40025d7c ---P--F 0 informix - 0 0 0 0 0 400261b0 ---P--F 0 informix - 0 0 0 0 0 400265e4 ---P--B 9 informix - 0 0 0 0 1 40026a18 ---P--D 0 informix - 0 0 0 0 0 400276b4 Y--P--- 627 eric - 402597e0 0 3 0 3 40027ae8 Y--P--- 649 eric 2 404ac790 0 1 2028 578 40027280 Y--P--- 668 informix 9 4049dcc4 0 1 0 14 400276b4 Y--P--- 625 eric - 402597e0 0 3 0 3 40027ae8 Y--P--- 649 informix 2 404ac790 0 1 2 0 16 active, 128 total, 24 maximum concurrent
This can have a number of causes.
Command: ps -ef USER PID PPID C STIME TTY TIME CMD root 1 0 2 09 May - 39:39 /etc/init root 1699 1 0 09 May - 0:00 /etc/uprintfd root 2852 1 0 09 May - 14:49 /etc/syncd 60 root 3431 3933 0 09 May - 0:04 /etc/syslogd .. .. informix 16193 24690 0 12:36:35 pts/9 0:00 dbaccess db1 informix 16757 1 4 01 Jun - 10:04 oninit -s root 16965 5750 0 10:30:00 - 0:00 telnetd informix 17275 16757 0 01 Jun - 1:04 oninit -s root 18102 5750 0 17:26:00 - 0:00 telnetd informix 18808 16757 0 01 Jun - 0:22 oninit -s informix 19587 13186 0 12:16:22 pts/2 0:00 -ksh informix 20092 15739 0 10:57:06 pts/5 0:00 -ksh informix 20417 19136 0 17:26:31 pts/3 0:00 -ksh informix 20598 16757 0 01 Jun - 0:25 oninit -s informix 20851 25015 5 12:44:05 pts/1 0:00 ps -ef informix 21062 16965 0 10:30:00 pts/7 0:00 -ksh informix 21623 16757 0 01 Jun - 0:20 oninit -s informix 24462 1 0 12:16:34 pts/2 12:23 dbaccess eric <Lost Process> informix 24690 13425 0 11:52:33 pts/9 0:00 -ksh informix 25015 18102 0 17:26:00 pts/1 0:00 -ksh
The most likely candidates are programs that have a PID of 1. The caveat are jobs being run by cron, and nohupped jobs from previous sessions. If the ‘possible’ runaway is attached to a terminal then it is more likely to be the trouble maker.
For pre Version 6 engines each individual user process spawns a process (sqlexec or sqlturbo). The same rules apply.
Command: onstat -g sql <session id>onstat -sql.
INFORMIX-OnLine Version 7.14.UD2 -- On-Line -- Up 02:53:23 -- 43040 Kbytes Sess SQL Current Iso Lock SQL ISAM F.E. Id Stmt type Database Lvl Mode ERR ERR Vers 32303 - eric DR Wait 5 0 0 5.01 Current SQL statement : SELECT col1, col2 FROM eric_tab WHERE col2 = 9 AND col1 IN ("AE","AI","AL","ASH","AA","AR","AO","DA","DU","DF","OJW","PE", "PO","PI","PU","PNR","WH")
Copy the current SQL statement into dbaccess and run it with set explain on;
Don't run onstat -g sql on a very busy server it will never return and you will not be able to kill the session it starts. Because of the very dynamic nature of this particular table, and the fact that rows are constantly being dropped off the beginning of it and appended to the end, your query will chase its tail until the server is shutdown.
Author: Jonathan Leffler
If the performance nose-dives at checkpoint time, seriously consider reducing LRU_MIN_DIRTY and LRU_MAX_DIRTY. Don't try to go below LRU_MIN_DIRTY 1, LRU_MAX_DIRTY 2. The default values (of 30/20 in 4.00 and 60/50 are, in my not so very humble opinion, rubbish for any system with more than about 1 MB of shared memory. Most databases are read a lot more than they are modified.
Author: Art Kagel
I agree that additional cleaners are of limited use during normal LRU writes but they are needed for fastest checkpointing. Since at checkpoint time each chunk is assigned to another cleaner thread until the threads are exhausted, and since that one thread, as has already been pointed out, is only scheduling the actual I/Os with either the AIO VPs or KAIO VPs, that thread will block on each I/O that it schedules and single thread writes to your disks. You must have multiple cleaners, even with a single CPU VP, since the other cleaners can wake and schedule the I/Os that they are responsible for while the first cleaner is blocked waiting for I/O service…My point is that the single cleaner thread has to wait for the issued I/O to complete. It does relinquish the CPU VP to other threads so that other work is not blocked
So I recommend 1.5 * min(#chunks,#LRUs) as the number of cleaners keep the 2-2.5 * NUMCPUs idea for multiple CPUs in mind and constrained by a maximum useful value of about 64 for multiple CPU systems.
Within SQL there is no random number routines.
Author: Jonathan Leffler
@(#)$Id: random.spl,v 1.2 1997/12/08 19:31:44 johnl Exp $ Simple emulation of SRAND and RAND in SPL Using random number generator suggested by C standard (ISO 9899:1990) CREATE PROCEDURE sp_setseed(n INTEGER) DEFINE GLOBAL seed DECIMAL(10) DEFAULT 1; LET seed = n; END PROCEDURE; CREATE PROCEDURE sp_random() RETURNING INTEGER; DEFINE GLOBAL seed DECIMAL(10) DEFAULT 1; DEFINE d DECIMAL(20,0); LET d = (seed * 1103515245) + 12345; -- MOD function does not handle 20-digit values LET seed = d - 4294967296 * TRUNC(d / 4294967296); RETURN MOD(TRUNC(seed / 65536), 32768); END PROCEDURE;
The following Random Number generator will generate a C based UDR
Command: onstat -k
Locks address wtlist owner lklist type tblsnum rowid key#/bsiz 30258128 0 400276b4 0 HDR+X 100002 208 0 302583e8 0 40027280 0 HDR+S 100002 20d 0 30258414 0 40028350 30258548 HDR+X a0004f 4e50f 0 3025851c 0 40028350 0 S 100002 20d 0 30258548 0 40028350 3025851c HDR+IX a0004f 0 0 5 active, 50000 total, 32768 hash buckets
The owner of a lock can be traced to a particular user using the onstat -u command and looking for the owner address detailed in the onstat -k. This gives the username. The following SQL submitted against the sysmaster database will provide summary information.
select dbsname database, username, hostname, tabname, type from syslocks, syssessions where sid = $owner and syslock.sid = sysessions.sid
Command: onstat -k
Locks address wtlist owner lklist type tblsnum rowid key#/bsiz 30258128 0 400276b4 0 HDR+X 100002 208 0 302583e8 0 40027280 0 HDR+S 100002 20d 0 30258414 0 40028350 30258548 HDR+X a0004f 4e50f 0 3025851c 0 40028350 0 S 100002 20d 0 30258548 0 40028350 3025851c HDR+IX a0004f 0 0 5 active, 50000 total, 32768 hash buckets
An individual table tblsnum can be converted into a database tablename using the following SQL.
select tabname from systables where hex(partnum) = $tblsnum;
Command: onstat -g seg
INFORMIX-OnLine Version 10.10.UD2 -- On-Line -- Up 01:48:44 -- 147744 Kbytes Segment Summary: (resident segments are not locked) id key addr size ovhd class blk blk used free 40965 1384007681 30000000 69369856 1860 R 8464 4 40966 1384007682 40000000 81920000 1844 V 723 9277 4106 1384007683 50000000 1105920 612 M 132 3 4107 1384007684 60000000 1105920 612 M 20 115 4108 1384007684 70000000 1105920 612 M 0 135
When Informix requests and is allocated memory it assumes that if the memory was needed once it will be needed again and does not release it back to the operating system. However, the instance can be forced to release memory to the operating system if it is no longer in use.
Command: onmode -F
INFORMIX-OnLine Version 7.14.UD2 -- On-Line -- Up 01:52:38 -- 147744 Kbytes Segment Summary: (resident segments are not locked) id key addr size ovhd class blk blk used free 40965 1384007681 30000000 69369856 1860 R 8464 4 40966 1384007682 40000000 81920000 1844 V 723 9277 4106 1384007683 50000000 1105920 612 M 132 3 4107 1384007684 60000000 1105920 612 M 20 115
If a segment can not be released to the operating system then the instance will attempt to release as much memory as possible from within the block.
Large ready queues and the addition of new segments might mean someone is running a large query with a high PDQPRIORITY.
Other considerations include
sar -g is a good indicator of whether more memory may be needed. Use the ps -elf command to show the number of cycles used by the page daemon. A high number of cycles, combined with high values for pgfree/s and pgscan/s indicates a memory shortage.
Alternatively, use vmstat. Ignore the page in and out columns [pi/po] and instead look at the scan rate [sr] and the anticipated short-term memory shortfall [de] columns. The scan rate should be less than 200. The de column is slightly odd in the fact that it can peak very high, due to the pre-emptive paging algorithms in the OS, but should fall back very quickly. On SVR4 you have serious memory problems if you are swapping
Do not confuse swapping with paging, paging is how the memory management sub-system looks after itself, swapping is a cry for help from an under-resourced machine. The only time swap in required is at the start of a process when the kernel determines whether the swappable image of the process can fit into the available swap space. It doesn't use the swap it just checks to see if it's available.
Any driver that uses Kernel Memory Allocator (KMA) resources, but does not specifically release them before exiting can create a memory leak. A memory leak causes the amount of memory allocated by KMA to increase over time. Thus, if the alloc field increases steadily over time, they may be a memory leak. Another indication of a memory leak is failed requests. If this occurs, then it is likely that a memory leak has caused KMA to be unable to reserve and allocate memory.
Detecting memory leaks on a live system can be very difficult. However, if luxury of a test system is available then it is relatively straightforward to monitor the engine's memory usage and determine whether it is slowly increasing over time.
Unless the machine has a very large amount of memory or the query is very simple the engine will nearly always have to go to disk to get the data/index pages. [Remember later engines can keep some user-definable tables in memory]. Therefore, the onstat -u for the running sessions should show disk reads or writes. Once the data is in memory it is processed from within the buffers and buffer activity for the session needs to be checked. For this, the thread id for the session is required.
Command: onstat -g tpf <thread id>
INFORMIX-OnLine Version 10.00.UD2 -- On-Line -- Up 00:39:56 -- 147744 Kbytes Thread profiles tid lkreqs lkw dl to lgrs isrd iswr isrw isdl isct isrb lx bfr bfw lsus lsmx seq 0 0 0 0 0 582 0 0 0 0 0 0 0 5473 529 14088 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6521 0 0 0 49 0 0 0 0 14 0 0 450 233 0 172 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <Active thread> 1348 2627 0 0 0 3 16259 0 1 0 1 0 0 50925 41 0 16 4
If the bfr column is not increasing then the thread is probably not doing any work i.e. it's stopped. On most UNIX based systems this number can easily increase at over 5000 per second on multi-table queries. It obviously varies between machines but it is a good rule of thumb. Remember a rapid rate of climb only indicates the machine is working hard and not the SQL is efficient.
While the session controls the user requests the actual work is done by a sqlexec thread (typically).
Command: onstat -g ses <ses_id>
INFORMIX-OnLine Version 7.14.UD2 -- On-Line -- Up 00:03:28 -- 147744 Kbytes session #RSAM total used id user tty pid hostname threads memory memory 1294 informix - -1 ERIC 1 57344 33860 tid name rstcb flags curstk status 1322 sqlexec 400276b4 Y--P--- 1292 running Memory pools count 1 name class addr totalsize freesize #allocfrag #freefrag 1294 V 403ac010 57344 23484 146 11 name free used name free used overhead 0 96 scb 0 80 opentable 0 3428 filetable 0 636 log 0 4168 temprec 0 3132 ralloc 23328 0 gentcb 0 192 ostcb 0 2260 net 0 7636 sqscb 0 9772 rdahead 0 172 hashfiletab 0 276 osenv 0 396 sqtcb 156 1464 fragman 0 152 Sess SQL Current Iso Lock SQL ISAM F.E. Id Stmt Type Database Lvl Mode ERR ERR Vers 1294 - eric CR Not Wait 0 0 5.01 Last parsed SQL statement : SELECT AQ.part8, AQ.part7, AQ.part6, AQ.part5, AQ.part4, AQ.part3, AQ.part2, AQ.part1, AP.constrname, AP.tabid, AP.constrtype FROM 'informix'.sysconstraints AP, informix.sysindexes AQ WHERE AQ.idxname = AP.idxname AND AP.tabid = 183 AND AP.constrtype = 'P'
systabinfo.serialv