Golden gate interview questions

 Golden gate:

===========
Golden gate software was founded in the year 1995 and name after famous golden gate bridge
Golden gate was founded by Eric fish and Tod Davidson

Golden gate is a data replication tool , also called as change data capture tool

Golden gate features:
====================

Real time data replication is one of the basic feature of golden gate
It provides high availability by acting as live stand-by target for a primary target
It provides  zero downtime upgrades and migrations
It provides live reporting for the operational databases

Golden gate 12.3 onwards micro services have been introduced 


Golden gate Processes:
=====================
Manager process
Extract Process(capture process)
replicat Process
Trail files
Data Pump
Server Collector


Extract Process:
===============
It reads the data from the online redo logs of the database and typically does not require connection to the database.
The extract process will regularity checkpoint its read and write position, typically to a file.

Trail  Files:-
============

A GoldenGate trail file contains the data capture information of transactions that occurred at the source database system. This information is stored in a binary format within the trail file and includes details such as transaction type, table and column names, and the data values before and after the transaction.

DataPump:
========
This is an additional process along with extract process as it takes responsibility of sending local trail files to target server using TCP


Server Collector:
===============
Server Collector process runs at target server , collecting information of local trail file and writes to remote trail file. 

This is an optional process especially useful for multiple extract process from multiple sources like RAC as this will consolidate data changes into a single data stream and write them to a remote trail file. The Replicate process can then read the data changes from this remote trail file and apply them to the target database.

Replicat Process:
================
This process reads remote trail file and applies to target database in the form DDL and DML

Replicat users 2 methods to write the remote trail change capture data
 online mod and batch mode 
In online mode the Replicat process applies data changes directly to the online redo log files of the target database.
In batch mode, it applies the changes to the log buffer of the target database before committing them to the database.

Second method will be useful for loading OLTP data into DWH database.

batch processing can be a better option for scenarios where real-time replication is not required, and the volume of data changes is high or when it is required to apply the data changes during off-peak hours to minimize the impact on the target database.

classic replicat:
In Classic Replicat, the update statement captured by CDC is first converted into SQL statements, which are then written to the redo logs on the source database. The redo logs are then read by the Replicat process, which converts the SQL statements back into update statements and applies them to the target database.

When the update statement is executed on the target database, it involves updating the log buffer and recording the changes in the redo log buffer using LGWR. The data is then written to the database datafiles by DBWR, which is responsible for writing the changes from the log buffer to disk.


Integrated replicat:
In Integrated Replicat, the Replicat process is responsible for writing the changes to the target database datafiles using the Direct Path API. It bypasses the buffer cache and writes the changes directly to the datafiles, which can improve performance and reduce overhead compared to Classic Replicat.
classic replicat:


Coordinated replicat can be configured for both active-passive and active-active replication setups. In the example we discussed earlier, US is sending data to UK, but it is still considered coordinated replication because the process of committing changes involves US receiving an acknowledgment from UK.

In general, coordinated replicat is a method of replication where changes are propagated to one or more target databases, and the replication process ensures that the changes are applied to all targets in a coordinated manner. This means that the replication process ensures that all targets are kept in sync and that all changes are applied in a consistent manner, even in the case of failures or network interruptions.

REPLICAT  rep1
USERID    gguser, PASSWORD ggpassword COORDINATED_TRANSACTION
MAP       source_schema.*, TARGET schema.*;


Batch processing parameters definitions:
++++++++++++++++++++++++++++++++++++++++

REPLICAT <replicat_name>
TARGETDB <target_db>
MAP SRC TABLE,TARGET TABLE
-- specify batch processing parameters:
ASSUMETARGETDEFS
-- UseBatchSQL
-- MAXTRANSOPS <number_of_operations_per_commit>
-- BATCHSQLMAX <max_sql_size_in_bytes>
-- BATCHSQLBUF <batch_buffer_size_in_bytes>


Manager Process:
================
Manager process runs on both source and target servers .  It control starting and stopping of the proceses , monitors and then generates events.


Extract : it is also called as capture process used to capture the data from source database . It is of 2 types classic and integrated . From 11.2.0.4,11.2.0.3

replicat : It is also called as apply process , it is used to apply the data from remote trail files to the target database . It is of 2 types classic and integrated . Integrated replicat is introduced from GG 12.1 and supports 12C database.

coordinated replicate has been introduced from 12.2 onwards 
parallel replicate has been introduced from 12.3 onwards 
parallel non integrated can be configured for 12.1
parallel integrated replicat -> DB 12.2. GG 12.3 



what is the classic capture or extract method?

                                                                                                                                                                        

what is integrated extract?

Integrated extract is designed to read directly from log miner of source database in a way it does not a direct connection to the database.
In this case the the transactions happened in the redo logs will be filtered and mined by the log miner process helping the extract process to 
read only commited data called "logical chnage record" LCR.

It is 5 times faster than classic replicat because in classic method it has to read the redo logs and then has to filter the data and has to generate the local trails 
by updating the commited data.

note: integrated replica is only for oracle DB as a source 

How do you install GG?

Its all start with verification of src ( version,type and platform ), and target (version,type,platform)
Next decide the space for trail files by analyzing the arhcive generation of the src db for a week eg: tot archive generation of weekplus 20% tot AG
at source update enable_goldengate_replicat=true
enable force logging at source 
enable supplemental logging 
create GG user(C###user) at CDB 
create a seperate tablespace for GG user 
grant DBA to GG user 
exec dbms_goldengate_auth.grant_admin_privilegeadmin to gg
install GG software 
logi to ggsci and create directories 
create credential store
create manager process 
create extarct  process 
create pump process 
create extarct  params 
create pump params 
start the extract , followed by pump 
go to target side 
prform gg installation,create directories,create credential store,manager,replicat process

what is supplemental logging?

It is a feature of oracle database which adds additional logging to the table which will be helpful for replication and recovery. For GG it is a mandatory step 
because we use GG for replication 

SL additionally logs table information,primary key ,foreign key ,composite key and columns changes for LOB as this will be helpful to reduce the replication latency.

We can define SL at DB level using the below command 
> alter  database add supplemental log data (all) columns 

SL is used / useful for DML update/delete but for insert no need of supplemental logging 

How to enable SL at table level?

we have 2 ways 

sql>eg: 
ALTER TABLE customers ADD SUPPLEMENTAL LOG DATA(UK) COLUMNS (customer_id, email);
ALTER TABLE employees ADD SUPPLEMENTAL LOG DATA(PK) COLUMNS;


another way
ggsci> add trandata <pdbname>.<schemaname>.<tablename>


How to check  the objects enabled for SL?
query the table dba_log_groups --> look for lGWR,table_name,specific SL columns forf PK,FK,UK..etc 

what is the command to use min supplemental logging?
ALTER TABLE employees ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;

Here we are enabling logging only to primary key of that table 

what is the command to use full supplemental logging?
ALTER TABLE employees ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

what is credential store ?
credential store in golden gate will bypass the method of displaying the DB credentials while logging into GG utility.

How do you create credential store?
ggsci> dblogin userid c##gguser@pdb,password <password>
ggsci> add credentialstore
ggsci>alter credentialstore add user c##gguser@ggsrc,pasword pwd,alias ggsrcuser
ggsci> info credentialstore
ggsci> dblogin useralias ggsrcuser

it will create a directory called dircrd and then create the files



How to create extract file?

ggsci> add extract extinit46,integrated granlog,begin now 
GGSCI> REGISTER EXTRACT <extract_name>, DATABASE CONTAINER (pdb_name)
files will be created under dirdat 


How to add a trail file?

GGSCI> ADD EXTRAIL /u01/app/oracle/gg/dirdat/it, EXTRACT extinit46
Here first file will be named as IT0000000000,IT0000000001,IT0000000002......

How to add a data pump file?
we have  2 ways to add a datapump process 
1st way:
======
you can create an extract process by mentioning the paramebet exttrailsource. 
this parameter will read the data from the trail files generated by extract process
GGSCI> ADD EXTRACT dpinit46, EXTTRAILSOURCE /u01/app/oracle/gg/dirdat/IT

2nd method,

GGSCI> ADD EXTDATAPUMP dpinit46, SOURCEDEFS /u01/app/oracle/gg/dirdef/it.def, RMTTRAIL /u01/app/oracle/gg/dirdat/rt

This method will bypass logminor as it will read directly from transaction logs but for create and insert it needs log minor,
It is always a good practice to go with method 1

this method will help you 
what is the use of passthru?
passthru is an option provided while creating data pump process , which will reduce the load on the extract process as 
data pump directly reads data from transaction logs instead of trail files as extract process will not create trail files in this case. 
This is all about reducing the latency between source and target but it is a not a good practice to go as it will be difficult for us during auditing 


How to create a remote trail?
ggsci> add rettrail /u01/app/oracle/gg/dirdat/rt,extract dpinit46

How to write extract param parameter file?

a parameter file always runs at the start of the golden gate process 
it is located at dirprm directory 

ggsci>edit params extinit46 

Extract extinit46
setenv (ORACLE_HOME=/u01/app/oracle/db_1)
setenv (ORACLE_SID=PROD) 
uderidalias ggsrc
tranlogoptions integratedparams (Max_SGA_SIZE 100) // this is needed to limit the SGA usage 
exttrail /u01/app/oracle/gg/dirdat/LT
TABLE pdb1.ggsrc.dept;
TABLE emp.ggsrc.emp;



How to create replicat parameter file?

ggsci>edit params repinit46 
replicat repinit46 
setenv (ORACLE_HOME=/u01/app/oracle/db_1)
setenv (ORACLE_SID=PROD) 
uderidalias ggtarget
DBOPTIONS integartedparams (parallelism 6)
Assumetargetdefs
map pdb1.ggsrc.dept,target pdb1.ggtrg.dept 
map pdb1.ggsrc.emp,target pdb1.ggtrg.emp 


Parallel Replicat is a feature of Oracle GoldenGate that allows a single Replicat process to use multiple parallel threads to apply changes from the source database to the target database. The main goal of using parallel Replicat is to improve the overall performance of the data replication process.

Parallel Replicat uses multiple threads to process transactions in parallel. The Replicat process creates a pool of threads, which can be configured to run on one or more servers in the network. Each thread in the pool processes a subset of the transactions, applying them to the target database in parallel with the other threads. This results in a faster and more efficient replication process, especially for high-volume databases with large data sets.

REPLICAT PR
USERIDALIAS gg_user
ASSUMETARGETDEFS
MAP sales.*, TARGET sales.*;
THREADS 8
BATCHSQL 1000
MAXTRANSOPS 1000
DISCARDFILE ./dirrpt/pr.dsc, APPEND, MEGABYTES 100

Issue:
During a parallel replicat , I have seen a conflict when one thread is running slow making others to wait for the transaction to complete 


 I have configured Oracle GoldenGate integrated capture process using the default values. As the data load increases, I see that extract starts lagging behind by an hour (or more) and database performance degrades. How you will resolve this performance issue?
Ans: When operating in integrated capture mode, you must make sure that you have assigned sufficient memory to STREAMS_POOL_SIZE. An undersized STREAMS_POOL_SIZE or limiting the streams pool to use a specific amount of memory can cause troubles.
The best practice is to allocate STREAMS_POOL_SIZE at the instance level and allocate the MAX. SGA at GG process level as below:

SQL> alter system set STREAMS_POOL_SIZE=3G
TRANLOGOPTIONS INTEGRATEDPARAMS (MAX_SGA_SIZE 2048, PARALLELISM 4)

 Why would you segregate the tables in a replication configuration? How would you do it?
Ans: In OGG you can configure replicate at the data at the schema level or at the table level using the TABLE parameter of extract and MAP parameter of replicate.

For replicating the entire database you can list all the schemas in the database in the extract/replicate parameter file.
Depending on the amount of redo generation you can split the tables in a schema into multiple extracts and replicates to improve the performance of data replication. Alternatively, you can also group a set of tables in the configuration by the application functionality.

Alternatively, you may need to remove tables that have long-running transactions in a separate extract process to eliminate lag on the other tables.
Let’s say that you have a schema named SCOTT and it has 100 hundred tables.

Out of these hundred tables, 50 tables are heavily utilized by the application.
To improve the overall replication performance you create 3 extracts and 3 replicates as follows:

Ext_1/Rep_1 –> 25 tables
Ext_2/Rep_2 –> 25 tables
Ext_3/Rep_3 –> 50 tables
Ext_1/Rep_1 and Ext_2/Rep_2 contain 25 tables each which are heavily utilized or generate more redo.
Ext_3/Rep_3 contains all the other 50 tables which are least used.

How can we report on long-running transactions?
Ans: The WARNLONGTRANS parameter can be specified with a threshold time that a transaction can be open before Extract writes a warning message to the ggs error log.
Example: WARNLONGTRANS 1h, CHECK INTERVAL 10m

The RESTARTCOLLISION and HANDLECOLLISIONS parameters are both used to specify how Replicat should handle collisions, which occur when multiple transactions target the same row or set of rows in a target table. However, there is a difference in how these parameters are used:

RESTARTCOLLISION specifies whether Replicat should restart after a collision occurs, by default it is set to YES. When a collision occurs, Replicat will stop processing transactions and wait for a user to resolve the conflict manually. If the RESTARTCOLLISION parameter is set to YES, Replicat will automatically restart processing after the conflict has been resolved.

HANDLECOLLISIONS specifies the action that Replicat should take when a collision occurs. This parameter has several options, including:

ABEND: This option causes Replicat to stop processing immediately when a collision is detected. This is the default option.

IGNORE: This option causes Replicat to skip the transaction that caused the collision and continue processing.

IGNOREALL: This option causes Replicat to skip all transactions that involve the rows that caused the collision, and continue processing.

DISCARD: This option causes Replicat to write the transaction that caused the collision to a discard file, and continue processing. The transaction will not be applied to the target database.

In summary, RESTARTCOLLISION determines whether Replicat should automatically restart processing after a collision is resolved, while HANDLECOLLISIONS determines the action Replicat should take when a collision is detected.




How do you view the data which has been extracted from the redo logs?
Ans: The log dump utility is used to open the trail files and look at the actual records that have been extracted from the redo or the archive log files.

I have a table called ‘TEST’ on the source and target with the same name, structure, and data type but in different column order. How can you set up replication for this table?
Ans: OGG by default assumes that the sources and target tables are identical. A table is said to be identical if and only if the table structure, data type, and column order are the same on both the source and the target.
If the tables are not identical you must use the parameter ‘SOURCEDEFS’ pointing to the source table definition and ‘COLMAP’ parameter to map the columns from source to target.

 What is the best practice to delete the extract files in OGG?
Ans: Use the manager process to delete the extract files after they are consumed by the extract/replicate process

PURGEOLDEXTRACTS /u01/app/oracle/dirdat/et*, USECHECKPOINTS, MINKEEPHOURS 2

What parameters can be used to configure Oracle GoldenGate to extract data from Archived Redo log files only?
Ans: Use the TRANLOGOPTIONS ARCHIVEDLOGONLY option in the parameter file.


 the PURGEOLDEXTRACTS parameter was used for Extract, and the PURGEOLDUSEQUENCETABLE parameter was used for Replicat. The <minutes> parameter specified the number of minutes after which records should be purged.

Note that these syntaxes are no longer recommended for use in current versions of Oracle GoldenGate. Instead, the PURGE parameter should be used, as described in my previous answer.


 the PURGEOLDEXTRACTS parameter was used for Extract, and the PURGEOLDUSEQUENCETABLE parameter was used for Replicat. The <minutes> parameter specified the number of minutes after which records should be purged.

Note that these syntaxes are no longer recommended for use in current versions of Oracle GoldenGate. Instead, the PURGE parameter should be used, as described in my previous answer.


Oracle GoldenGate maintains checkpoints to keep track of the last transaction that was successfully processed by each process. These checkpoints are used to ensure data integrity and consistency during the replication process.

By default, GoldenGate creates a set of checkpoint files in the dirchk directory for each Extract and Replicat process. These checkpoint files are named with the same prefix as the trail file, followed by an underscore and the process name. For example, if your trail file is named dirdat/tt, the checkpoint file for an Extract process named EXT1 would be named tt_EXT1.cpr.

ou do not need to create any tables to use checkpoints in GoldenGate, as GoldenGate manages the checkpoints automatically. However, you may need to configure the checkpoint settings in your parameter files to ensure that checkpoints are created and maintained as needed for your specific replication requirements. The following checkpoint-related parameters are available in GoldenGate parameter files:

CHECKPOINTTABLE: Specifies the name of the checkpoint table to use instead of the default checkpoint files. This parameter is optional, and not commonly used.
CHECKPOINTINTERVAL: Specifies the interval, in minutes, at which checkpoints should be taken. The default is 15 minutes.
MINKEEPCKPTFILES: Specifies the minimum number of checkpoint files to keep in the dirchk directory before they are deleted. The default is 1.
MAXKEEPCKPTFILES: Specifies the maximum number of checkpoint files to keep in the dirchk directory before they are deleted. The default is 10.

eg:

CHECKPOINTTABLE <checkpoint_table_name>
CHECKPOINTINTERVAL <interval_in_minutes>
MINKEEPCKPTFILES <min_files>
MAXKEEPCKPTFILES <max_files>

By default, GoldenGate uses checkpoint files to store this information. However, you can use a checkpoint table instead by creating a table with the appropriate schema and specifying the table name with the CHECKPOINTTABLE parameter in your GoldenGate configuration file.

CREATE TABLE <checkpoint_table_name> (
    GROUP_ID NUMBER(10) NOT NULL,
    CHKPT_RBA RAW(20) NOT NULL,
    CHKPT_POSITION NUMBER(19) NOT NULL,
    CHKPT_SERIAL# NUMBER(10) NOT NULL,
    PROCESS_NAME VARCHAR2(8) NOT NULL,
    CHECKPOINT_TIME DATE DEFAULT SYSDATE,
    EXTRACT_TS DATE,
    REPLICAT_TS DATE,
    PRIMARY KEY (GROUP_ID, PROCESS_NAME)
);



Accidently deleted trail file:
=========================
you can use alter goldengate process 

based on time stamp 
sequence no 
SCN
CSN - commit sequence number 
SCN is a database-level identifier used for consistency and recovery, while CSN is a logical ordering used by GoldenGate for transaction processing.


the above can be seemn from 
GGSCI> VIEW REPORT <report_name> FROM <start_time> TO <end_time>



steps 
stop the process 
aler extract begin now 
alter extract begin <old time stamp>

If some of the trail files are missing or have been deleted, you can restore the missing files from a backup location and then copy them to the local trail directory. Once the missing trail files have been restored, you can then use the ALTER EXTRACT BEGIN NOW command to start the Extract process and resume capturing changes from the current log file.

It's important to note that when using the BEGIN NOW option, the Extract process will start reading from the current position in the redo log or archive log. This means that if there were any changes made to the database while the Extract process was stopped, those changes will not be captured by the Extract process.

ET roll over 
 if a transaction is very long and spans multiple log files, the commit record for that transaction may not be written to the same log file as the rest of the transaction. This can cause issues for GoldenGate, as it may not be able to correctly identify the end of the transaction and capture all of the changes made by the transaction.



info all: Displays the status of all GoldenGate processes and their associated lag statistics.

lag: Displays the lag statistics for a specific extract or replicat process.

stats: Displays the performance statistics for a specific extract or replicat process.

view report <filename>: Generates a report of the current GoldenGate configuration and the replication statistics.

view params <processname>: Displays the current parameter settings for a specific extract or replicat process.

info tr: Displays the status of the trail files and their associated lag statistics.

info db: Displays the status of the database connections and their associated lag statistics.

stats-extract <extractname>: Displays the detailed performance statistics for a specific extract process.

stats-replicat <replicatname>: Displays the detailed performance statistics for a specific replicat process.

send <processname>, status: Sends a status request to a specific GoldenGate process.
info threads: Displays the status of threads used by GoldenGate processes.

stats extract <extractname>, detail: Displays the detailed performance statistics for a specific extract process.

stats replicat <replicatname>, detail: Displays the detailed performance statistics for a specific replicat process.

stats group <groupname>: Displays the performance statistics for a specific group of GoldenGate processes.

info checkpoint: Displays the status of checkpoint information for GoldenGate processes.

info pump: Displays the status of pump processes and their associated lag statistics.

info extract: Displays the status of extract processes and their associated lag statistics.

info replicat: Displays the status of replicat processes and their associated lag statistics.

info database: Displays the status of the database connections and their associated lag statistics.

showch: Displays the current configuration of change capture.











Comments

Popular posts from this blog

How to make CRS and ASM not to restart after server reboot

How to repair ASM disk header

How to replace ASM failed disk?