Aurora PostgreSQL Internal Tables vs S3 External Data: A Real-World Performance & Cost Benchmark

December 1, 2025January 7, 2026 ~ admin ~ Leave a comment

Aurora PostgreSQL supports reading data directly from Amazon S3 using aws_s3 functions, making it possible to offload heavy or cold data without fully moving it to a data lake platform. But does it perform well enough for analytical queries? In this benchmark, I tested identical datasets—one stored locally in Aurora tables, and the other offloaded to S3 and queried through external table definitions—to compare latency, resource impact, and practical usability.

Aurora PostgreSQL doesn’t provide a traditional federated query capability for S3 CSV files like some other database systems. Instead, you can:

Use External Tables via aws_s3 Functions
Create a View that dynamically loads S3 data
Use a Stored Procedure for ad-hoc S3 queries
And the list goes on…

I chose the first option because it offers several advantages:

Performance Control – You can optimize the function for your specific use case
Reusability – Create once, use multiple times with different parameters
Memory Management – Uses temporary tables that are automatically cleaned up
Flexibility – Easy to modify for different S3 files or filtering
Error Handling – Ability to implement proper error handling and logging
Cost Efficiency – Data is loaded only when actually needed

Main disadvantage: Data stored in S3 can only be read (SELECT only); INSERT, UPDATE, and DELETE operations are not possible. However, this test assumes that the data does not need to be modified (e.g., historical or metadata), and the focus is purely on performance and cost.

For the test, I used Aurora PostgreSQL (db.t4g.medium, Engine version 17.4). The dataset is NYC Taxi Trip data — January 2023 (Parquet format).
Ensure that the aws_s3 and postgres_fdw extensions are available (create them if missing).

In CloudShell, download the data:

curl -O https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-01.parquet

Then upload it to S3:
aws s3 cp . s3://aurora-fdw-test-gm-01/data/parquet/ --recursive --exclude "*" --include "yellow_tripdata_2023-*.parquet"

There are, of course, additional setup steps that are required but not covered in detail here—such as S3 bucket creation, security group rules, IAM permissions and roles, and CloudShell-to-Aurora access configuration. The focus remains on the benchmark test itself.

The aws_s3 extension in Aurora PostgreSQL primarily supports CSV, text, and binary formats—Parquet is not directly supported. Therefore, Parquet must be converted to CSV first. Yes, I could have used pre-existing CSV files, but performing this extra step is valuable—learning something new is never a waste of time.

To convert parquet to csv Athena can be used. First create External Table for Parquet Data in Athena-
CREATE EXTERNAL TABLE nyc_taxi_parquet_2023
( VendorID BIGINT,
tpep_pickup_datetime TIMESTAMP,
tpep_dropoff_datetime TIMESTAMP,
passenger_count DOUBLE,
trip_distance DOUBLE,
PULocationID BIGINT,
DOLocationID BIGINT,
fare_amount DOUBLE,
tip_amount DOUBLE,
total_amount DOUBLE )
STORED AS PARQUET LOCATION 's3://aurora-fdw-test-gm-01/data/parquet/';

Then convert Parquet to CSV using CTAS –
CREATE TABLE nyc_taxi_csv
WITH (
format = 'TEXTFILE',
field_delimiter = ',',
external_location = 's3://aurora-fdw-test-gm-01/data/csv-uncompressed/',
write_compression = 'NONE'
)
AS SELECT * FROM nyc_taxi_parquet_2023;

Result is a file in data/csv-uncompressed – 20251130_103431_00106_crcai_2c48e1bd-0609-4f86-a067-91c8bd1e6903 (a bit of a long, system genertated name)

Now that we have our CSV file, we can begin benchmarking.
For this test, I created a new database and user in Aurora and connected via CloudShell using psql. If Aurora Serverless were used, the Query Editor could also be an option, but in this case, it is not available—Query Editor currently supports only Aurora Serverless.

Lets create our database table in Aurora
CREATE TABLE nyc_taxi_data (
vendorid BIGINT,
tpep_pickup_datetime TIMESTAMP,
tpep_dropoff_datetime TIMESTAMP,
passenger_count DOUBLE PRECISION,
trip_distance DOUBLE PRECISION,
pulocationid BIGINT,
dolocationid BIGINT,
fare_amount DOUBLE PRECISION,
tip_amount DOUBLE PRECISION,
total_amount DOUBLE PRECISION
);

and import the csv data –

SELECT aws_s3.table_import_from_s3(
'nyc_taxi_data',
'',
'(format csv, header false, null ''\N'')',
aws_commons.create_s3_uri(
'aurora-fdw-test-gm-01',
'data/csv-uncompressed/20251130_103431_00106_crcai_2c48e1bd-0609-4f86-a067-91c8bd1e6903',
'eu-central-1'
)
);

There are 3066766 rows in nyc_taxi_data, table size is 721 MB.

Nest step is to create a foreign server pointing to the S3 location –

CREATE SERVER s3_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (
host '<Aurora endpoint>',
port '5432',
dbname 'nyc_tripdata_db'
);

In order to get the data from s3, we need a function to query s3 CSV data directly. Here is the code.
Now data in the DB and in s3 can be fetched like this –
SELECT COUNT(*) as local_count FROM nyc_taxi_data;
SELECT COUNT(*) as s3_count FROM query_s3_nyc_taxi();

Our benchmark infrastructure includes a table with the results (benchmark_results), timing function (benchmark_query), system monitoring function (capture_system_stats) and workload test function (run_workload_benchmark). Here is the code.

benchmark_query function measures and compares query execution times between different data access methods (Aurora direct vs S3 import approaches).
capture_system_stats function monitors Aurora resource usage (CPU, memory, I/O) during operations to calculate true costs including I/O spikes from S3 imports.

With all prerequisites in place, the benchmarking can now begin. The following tests will be done (just various sql statements )

Test 1: Simple COUNT queries
Test 2: Simple SELECT with LIMIT
Test 3: Simple aggregations
Test 4: Date range filtering
Test 5: Numeric filtering
Test 6: Multiple condition filtering
Test 7: GROUP BY with aggregations
Test 8: Complex aggregation with date grouping
Test 9: Hourly analysis
Test 10: Complex analytical query
Test 11: Window functions
Test 12: First run (cold cache) - force fresh load
Test 13: Second run (warm cache)
Test 14: Third run (warm cache)
Test 15: Complex query with cache
Test 16: Workload Simulation Tests
SQL statements are here.

And the results – Overall performance comparison
SELECT
query_type,
data_source,
COUNT(*) as test_count,
ROUND(AVG(execution_time_ms)) as avg_time_ms,
ROUND(MIN(execution_time_ms)) as min_time_ms,
ROUND(MAX(execution_time_ms)) as max_time_ms,
ROUND(STDDEV(execution_time_ms)) as stddev_ms
FROM benchmark_results
WHERE data_source IN ('local', 's3')
GROUP BY query_type, data_source
ORDER BY query_type, data_source;

query_type	data_source	test_count	avg_time_ms	min_time_ms	max_time_ms	stddev_ms
aggregation	local	5	452	0	1279	543
aggregation	s3	5	1530	1148	2203	422
cache_test	s3	4	3190	1166	8744	3709
complex	local	2	640	1	1279	904
complex	s3	2	1603	924	2281	960
workload	local	5	0	0	0	0
workload	s3	5	1297	1246	1359	49
filter	local	3	219	214	227	7
filter	s3	3	1317	1290	1346	28
select	local	1	0	0	0
select	s3	1	243	243	243

Lets check the Performance ratio analysis
WITH performance_comparison AS (
SELECT
test_name,
query_type,
MAX(CASE WHEN data_source = 'local' THEN execution_time_ms END) as local_time,
MAX(CASE WHEN data_source = 's3' THEN execution_time_ms END) as s3_time
FROM benchmark_results
WHERE data_source IN ('local', 's3')
GROUP BY test_name, query_type
)
SELECT
test_name,
query_type,
local_time,
s3_time,
ROUND(s3_time::NUMERIC / NULLIF(local_time, 0)::NUMERIC, 2) as s3_vs_local_ratio,
CASE
WHEN local_time IS NULL THEN 'S3 Only'
WHEN s3_time IS NULL THEN 'Local Only'
WHEN s3_time < local_time THEN 'S3 Faster'
WHEN s3_time > local_time * 2 THEN 'Local Much Faster'
WHEN s3_time > local_time THEN 'Local Faster'
ELSE 'Similar Performance'
END as performance_verdict
FROM performance_comparison
ORDER BY s3_vs_local_ratio NULLS LAST;

test_name	query_type	local_time	s3_time	s3_vs_local_ratio	performance_verdict
Hourly Analysis	aggregation	1279	2203	1.72	Local Faster
Complex Analysis	complex	1279	2281	1.78	Local Faster
Daily Aggregation	aggregation	698	1628	2.33	Local Much Faster
Group By Vendor	aggregation	281	1456	5.18	Local Much Faster
Complex Filter	filter	227	1314	5.79	Local Much Faster
Fare Filter	filter	214	1290	6.03	Local Much Faster
Date Filter	filter	217	1346	6.20	Local Much Faster
Window Functions	complex	1	924	924.00	Local Much Faster
Simple AVG	aggregation	1	1216	1216.00	Local Much Faster
Select Limit 1000	select	0	243		Local Much Faster
Simple Count	aggregation	0	1148		Local Much Faster
Workload Test 5	workload	0	1359		Local Much Faster
Cache Test – Cold	cache_test		8744		S3 Only
Cache Test – Warm 1	cache_test		1166		S3 Only
Cache Test – Warm 2	cache_test		1206		S3 Only
Workload Test 1	workload	0	1246		Local Much Faster
Workload Test 2	workload	0	1248		Local Much Faster
Workload Test 3	workload	0	1305		Local Much Faster
Workload Test 4	workload	0	1326		Local Much Faster
Cache Complex Query	cache_test		1645		S3 Only

— Cache effectiveness analysis
SELECT
test_name,
execution_time_ms,
LAG(execution_time_ms) OVER (ORDER BY test_timestamp) as previous_time,
CASE
WHEN LAG(execution_time_ms) OVER (ORDER BY test_timestamp) IS NOT NULL
THEN ROUND(100.0 * (1 - execution_time_ms::NUMERIC / LAG(execution_time_ms) OVER (ORDER BY test_timestamp)), 2)
ELSE NULL
END as improvement_percent
FROM benchmark_results
WHERE test_name LIKE 'Cache Test%'
ORDER BY test_timestamp;

test_name	execution_time_ms	previous_time	improvement_percent
Cache Test – Cold	8744
Cache Test – Warm 1	1166	8744	86.67
Cache Test – Warm 2	1206	1166	-3.43

Now let’s try with a larger dataset.
The table size was increased (simply using INSERT INTO ... SELECT * FROM ...), and then exported to S3 as a CSV file.
Before running the next test, it is important to reset statistics and truncate the relevant tables.

select pg_stat_reset();
select pg_stat_reset_shared('bgwriter');
drop table if exists s3_nyc_taxi_cache;
truncate table benchmark_results;

The results lookk like

query_type	data_source	tests	avg_ms	min_ms	max_ms
aggregation	local	3	3612	1	10688
aggregation	s3	3	62543	13327	155211
cache	s3	2	47056	37977	56134
filter	local	1	7678	7678	7678
filter	s3	1	18805	18805	18805

test_name	query_type	local_time	s3_time	s3_vs_local_ratio	performance_verdict
Date Filter	filter	10304	16306	1.58	Local Faster
Daily Aggregation	aggregation	7192	16167	2.25	Local Much Faster
Fare Filter	filter	7276	17033	2.34	Local Much Faster
Hourly Analysis	aggregation	6914	19007	2.75	Local Much Faster
Complex Analysis	complex	6259	18754	3.00	Local Much Faster
Group By Vendor	aggregation	4544	17737	3.90	Local Much Faster
Complex Filter	filter	3166	18048	5.70	Local Much Faster
Select Limit 1000	select	42	1008	24.00	Local Much Faster
Window Functions	complex	235	8461	36.00	Local Much Faster
Workload Test 1	workload	3	8948	2982.67	Local Much Faster
Workload Test 5	workload	0	10640		Local Much Faster
Cache Test – Warm 1	cache_test		12611		S3 Only
Workload Test 2	workload	0	8824		Local Much Faster
Simple Count	aggregation	0	34849		Local Much Faster
Cache Complex Query	cache_test		9818		S3 Only
Cache Test – Warm 2	cache_test		8427		S3 Only
Cache Test – Cold	cache_test		56295		S3 Only
Simple AVG	aggregation	0	17452		Local Much Faster
Workload Test 4	workload	0	8815		Local Much Faster
Workload Test 3	workload	0	9099		Local Much Faster

No surprise — accessing data from S3 is slower, sometimes significantly slower, compared to querying data stored directly in an Aurora table.
But what about the cost aspect?

Let’s compare storage costs for the second dataset:
Aurora table: 1.3TB (1,330 GB)
S3 CSV files: 800MB (0.8 GB)

Monthly Costs for these data volumes:

Service	Storage Cost	I/O Cost	Total Monthly Cost	Annual Cost
S3 Standard	$0.018	$0.009	$0.027	$0.32
Aurora Standard	$133.00	$0.20	$133.20	$1,598
Aurora I/O-Optimized	$299.25	$0.00	$299.25	$3,591

Annual Savings:
Aurora Standard → S3: Save $1,598/year
Aurora I/O-Optimized → S3: Save $3,591/year

If the table grows significantly (e.g., 10 TB of historical data), the savings become substantial — especially in cases with multiple customers or several databases sharing similar structures and data volumes.

In general, S3 storage is approximately 4.4× cheaper than Aurora when storing the same data volume.
However, as observed in this test, the actual data stored in S3 (CSV or Parquet) is often much smaller than the equivalent data stored in an Aurora table — making the real cost advantage even greater.

Conclusion:
S3 Offloading is Financially Attractive When
Dataset > 100GB
Data is accessed infrequently (archives, historical data)
Read-heavy analytical workloads
Cost optimization is a priority vs response time for the queries

Perfect Use Cases are
Daily/weekly reports (infrequent access)
Monthly analytics (batch processing)
ETL processes (load → transform → export)
Compliance reports (quarterly/annual)
Data validation (periodic checks)

Key Challenges & Considerations for Aurora + S3 Temporary Loading Approach

Data Management Challenges:
Export scheduling – Automate daily/weekly data moves to S3
Data consistency – Ensure Aurora and S3 data sync properly
Cleanup automation – Drop temp tables after processing
Storage lifecycle – Manage S3 storage classes (Standard → IA → Glacier)

Performance Considerations:
Import time – 30-60 minutes for large datasets (plan accordingly)
Aurora I/O spikes – Temporary high I/O during S3 imports
Workload access – Handle conflicts if multiple processes need data
Query optimization – Temp tables need proper indexing

Operational Overhead:
Monitoring – Track import success/failures, processing times
Error handling – Retry logic for failed S3 imports
Alerting – Notify when reports fail or take too long
Version control – Manage function updates and deployments

Security & Compliance:
IAM permissions – Aurora → S3 access rights
Data encryption – S3 encryption at rest and in transit
Audit trails – Log data access and movements
Retention policies – How long to keep S3 data

Cost Monitoring:
S3 request costs – PUT/GET operations add up
Aurora I/O costs – Temporary spikes during imports
Data transfer costs – Aurora → S3 bandwidth charges
Storage optimization – Compress data, use efficient formats

Architecture Decisions:
Hot vs Cold data – What stays in Aurora vs moves to S3
Partitioning strategy – How to organize S3 data (by date, etc.)
Backup strategy – S3 as backup or primary archive
Disaster recovery – How to restore from S3 if needed

Bottom line: Great cost savings, but requires solid DevOps practices and monitoring to manage the complexity.

Happy clouding!

Setup PMM Server on EC2 for Aurora Monitoring – Part 2

April 23, 2025April 23, 2025 ~ admin ~ 1 Comment

Now when Percona PMM is installed and up – see Part 1 – lets add Aurora database to it. For this I created one small Aurora (PostgreSQL Compatible) database, engine 16.6, db.t3.medium.

The setup includes Aurora PostgreSQL , PMM Server (Docker) and PMM Client (pmm-admin) running on EC2.

Here are the steps.

Prepare the database
Connect from the EC2 machine to Aurora as admin. Get the endpoint from the AWS console →Aurora and RDS

psql –host=<aurora-endpoint> –port=5432 –username=postgres –dbname=postgres

–create monitoring user
CREATE USER pmm WITH ENCRYPTED PASSWORD ‘your_password’;
GRANT rds_superuser TO pmm;
–enable pg_stat_statements extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Install PMM Client on the EC2 Host
sudo yum install -y https://repo.percona.com/yum/percona-release-latest.noarch.rpm
wget https://downloads.percona.com/downloads/pmm3/3.1.0/binary/redhat/2023/x86_64/pmm-client-3.1.0-7.amzn2023.x86_64.rpm
sudo yum install -y ./pmm-client-3.1.0-7.amzn2023.x86_64.rpm

Register EC2 Host as PMM Client
pmm-admin config –server-insecure-tls –server-url=https://admin:welcome1@localhost

If this fails with

Checking local pmm-agent status…
pmm-agent is running.
Registering pmm-agent on PMM Server…
Failed to register pmm-agent on PMM Server: Post “https://localhost:443/v1/management/nodes”: EOF.

then port 443 must be added for listen to docker. How to add a port in docker(and to test it with curl) is described in Part 1 .But if only listen 443; is added, then curl http… is ok, but curl https… returns “curl: (35) OpenSSL/3.2.2: error:0A0000C6:SSL routines::packet length too long”

This is because we have to add in docker file /etc/nginx/conf.d/pmm.conf not only
listen 443;
but also
listen 443 ssl http2;

Now testing it with ‘curl -k https://localhost:443’ is working (-k to skip certificate validation. EC2 instance’s curl just doesn’t trust the (self-signed) certificate, which is normal for PMM by default)

Then registering the host as PMM client is fine.

Register Aurora PostgreSQL to PMM
pmm-admin add postgresql \
–username=pmm \
–password=’xxxxxxxx’ \
–query-source=pgstatstatements \
–service-name=aurora-postgres \
–host=db-postgres-1-instance-1.cxf8co3va5da.eu-central-1.rds.amazonaws.com \
–port=5432

PostgreSQL Service added.
Service ID : dd2e0625-b47c-4534-9f63-7e345f55bd5f
Service name: aurora-postgres

Verify the Registration
pmm-admin list

You should see the following agent types conencted and running – pmm_agent
node_exporter
postgres_exporter
vmagent

View in PMM Web UI
Open: https://<your-ec2-public-ip>

Go to:
Dashboards → PostgreSQL Overview
and you’ll see your Aurora instance metrics.

To avoid costs, do not forget to stop and drop your Aurora DB when you have finished with this hands-on.

Happy clouding!

Setup PMM Server on EC2 for Aurora Monitoring – Part 1

April 23, 2025April 23, 2025 ~ admin ~ 1 Comment

Percona Monitoring and Management (PMM) is an open-source tool for monitoring databases like PostgreSQL and MySQL, including Amazon Aurora. We will deploy PMM Server on an EC2 instance using Docker for a fast and simple setup. This allows us to collect and visualize performance metrics from Aurora in real time. The EC2 instance acts as a central monitoring hub without needing to install agents on the database itself.
Here are the step for PMM setup

1. Launch EC2 instance.
I will use t2.medium type, 2 cCPU / 4G RAM, 30G gp3 storage.

2. When the EC2 instance is ready, logon and install/configure PMM docker
sudo dnf install -y docker
sudo systemctl enable –now docker
sudo usermod -aG docker $USER (Log out and log back in for the group changes to take effect.)
docker pull percona/pmm-server:latest
docker create –name pmm-server \
-p 80:80 -p 443:443 \
-v pmm-data:/srv \
percona/pmm-server:latest
sudo docker start pmm-server

3. Now the PMM Server should be accessible at:

http://your-ec2-public-ip/
User: admin/admin

But trying http://your-ec2-public-ip/ doesn’t work. Lets try to find out what could be the reason. For this test I have the following inbound rules.
Rule Name Protocol Port Source
HTTP TCP 80 0.0.0.0/0
HTTPS TCP 443 0.0.0.0/0
SSH TCP 22 0.0.0.0/0
Of course this should not be set for production environment, it is too much, but for testing purposes it is fine, after finishing the lab all resources will be deleted.

My EC2 instance is in public subnet, there is access to/from internet. So, from the network side it should work.
Let see if curl is working for the localhost –

curl -I http://localhost
curl: (56) Recv failure: Connection reset by peer

Hm, the EC2 instance is not accepting connections on port 80 inside the instance, even locally.

sudo lsof -i -P -n | grep LISTEN | grep :80
docker-pr 28689 root 4u IPv4 72736 0t0 TCP *:80 (LISTEN)
docker-pr 28695 root 4u IPv6 72740 0t0 TCP *:80 (LISTEN)

docker inspect pmm-server –format='{{json .NetworkSettings.Ports}}’ | jq .
{
“443/tcp”: [
{
“HostIp”: “0.0.0.0”,
“HostPort”: “443”
},
{
“HostIp”: “::”,
“HostPort”: “443”
}
],
“80/tcp”: [
{
“HostIp”: “0.0.0.0”,
“HostPort”: “80”
},
{
“HostIp”: “::”,
“HostPort”: “80”
}
],
“8080/tcp”: null,
“8443/tcp”: null
}

Ok, Docker is binding port 80 correctly (both IPv4 and IPv6) – confirmed via lsof. The container is marked as healthy by Docker and the Security Groups are open. The last lines from the docker logfile (docker logs pmm-server –tail 50) is showing
2025-04-21 12:17:19,390 INFO success: qan-api2 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-04-21 12:17:20,529 INFO spawned: ‘pmm-agent’ with pid 407
2025-04-21 12:17:21,581 INFO success: pmm-agent entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2025-04-21 12:17:21,643 INFO exited: pmm-init (exit status 0; expected)

This is also fine. Lets take a look inside the container.

docker exec -it pmm-server bash
and then from inside
docker exec -it pmm-server bash
[pmm@e39b524f3153 opt] # curl -I http://localhost
curl: (7) Failed to connect to localhost port 80: Connection refused

Great, we almost nailed it – inside the PMM container, nothing is listening on port 80. Let check the managed services inside the container –

supervisorctl status
clickhouse RUNNING pid 15, uptime 0:36:54
grafana RUNNING pid 16, uptime 0:36:54
nginx RUNNING pid 17, uptime 0:36:54
pmm-agent RUNNING pid 407, uptime 0:36:51
pmm-init EXITED Apr 21 12:17 PM
pmm-managed RUNNING pid 29, uptime 0:36:54
postgresql RUNNING pid 14, uptime 0:36:54
qan-api2 RUNNING pid 337, uptime 0:36:53
victoriametrics RUNNING pid 18, uptime 0:36:54
vmalert RUNNING pid 19, uptime 0:36:54
vmproxy RUNNING pid 23, uptime 0:36:54

nginx is indeed running inside the container — which means that
– the webserver is alive
– the container is healthy
– docker is exposing port 80 correctly

Lets check the ‘listen’ directive in /etc/nginx/conf.d/pmm.conf

cat /etc/nginx/conf.d/pmm.conf | grep listen

listen 8080;

listen 8443 ssl http2;

Here it is – looks like nginx inside the container is only listening on ports 8080 and 8443, but not on port 80 — which explains why you can’t access it via HTTP on port 80.

The fix is pretty simple – edit /etc/nginx/conf.d/pmm.conf and add ‘listen 80;’, so it should looks like

listen 80;

listen 8080;

listen 8443 ssl http2;

and restart nginx –

supervisorctl restart nginx

Great! Now the Percona GUI is available on http://your-ec2-public-ip/

Happy clouding!

Oracle RDS Custom – Read Replica

April 7, 2025 ~ admin ~ Leave a comment

Oracle RDS Read Replicas on AWS provide enhanced scalability and performance by allowing you to create up to five read-only copies of your primary Oracle DB instance. These replicas asynchronously replicate data from the source DB, making them ideal for read-heavy workloads, analytics, and disaster recovery scenarios. They help offload read traffic and can be promoted to standalone databases if needed.

Lets create Oracle RDS read replica and see what is behind the curtain. I will use Oracle RDS Custom – to be able to login to the underlying machine. Creating Oracle RDS custom is easy – first you must create (Custom Engine Version) – check here for details, then using this CEV, Oracle RDS Custom database can be created. You can find more information here.

Disclaimer:
Some steps or configurations described in this post may incur charges when using AWS services. It is your responsibility to review AWS pricing and monitor your usage. After testing, ensure that all resources are properly deleted to avoid unexpected charges. Always use the AWS Free Tier where applicable and test responsibly.

Now when we have our Oracle RDS Custom database, select the database in the AWS console (under Aurora and RDS) and from Action select ‘Create read replica’. The name for the read replica is dbstby(it is up to you of course). Under RDS Custom security-Instance profile, select AmazonRDSCustomInstanceProfile, the rest can be left as it is. The instance type is as the type of the database.

After some time the read replica is ready.
You will find out that the corresponding EC2 instance was created too. Use Session Manager to connect to the EC2 instance for the dbstby database. Indeed, the database is mounted.

This is not really read replica. This is Data guard primary-standby configuration widely
used in Oracle world. And the standby side is not open for reading, so it is not read
replica, but Data Guard configuration.

Lets create a new pluggable database(PDB) on the primary side –

The new PDB appears immediately on the standby side.
As said, this is just primary-standby database configuration, lets check
dgmgrl from primary side –
What is important to note is that the port number used in StaticConnectIdentifier.
It is usually 1521, but here is 1140.

DGMGRL> show database rdscdb_a StaticConnectIdentifier
  StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)
(PORT=1140)(HOST=x.x.x.x))(CONNECT_DATA=(SID=RDSCDB)
(SERVER=DEDICATED)))'

This is because there is a second listener(L_RDSCDB_DG, port 1140)
especially for data guard – beside the ‘usual one’ (L_RDSCDB_001) on 1521.

Note also DG connect identified - 
DGMGRL> show database rdscdb_a DGConnectIdentifier
  DGConnectIdentifier = 'rds_custom_rdscdb_a'

What about is we make switcover? No problem, just connect as sys to the primary database using the alias ‘sys@rds_custom_rdscdb_a’. In tnsnames.ora can be seen that alias rds_custom_rdscdb_a
uses port 1140 – port used for DG communication.
DG configuration looks fine –

DG configuration looks fine - 
DGMGRL> show configuration

Configuration - rds_dg

  Protection Mode: MaxPerformance
  Members:
  rdscdb_b - Primary database
    rdscdb_a - Physical standby database

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 18 seconds ago)

Now the previous primary database is now mounted, while the previous standby is open –
can be seen in sqlplus immediately. But what about the console? If we refresh, now dbstby is primary, while db1 is Replica(standby) now.

The switcover can be done from the AWS console too - under Actions->Promote.

If we delete the primary (now dbstby), then the standby db1 will be promoted 
to primary and will be open (for db1) - 

SQL> show pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 ORCL                           READ WRITE NO
         4 PDB1                           READ WRITE NO


If you test it, do not forget at the end to delete the rds databases 
(and keep in mind, that Endpoints and CEV also generate costs even 
without databases running).

Happy clouding!

Oracle RDS Multi-AZ Failover: High Availability in AWS

March 31, 2025 ~ admin ~ Leave a comment

Amazon RDS Multi-AZ deployments enhance high availability and reliability for database instances by automatically replicating data to a standby instance in a different Availability Zone (AZ). In the event of a failure, AWS performs an automatic failover to the standby, minimizing downtime and ensuring business continuity. This setup is essential for mission-critical applications that require seamless database availability and disaster recovery.

Lets see how much does it take to perform Oracle RDS database failover? To find out, I created a small Oracle RDS database and tested failover with AWS CLI.

Deploy Oracle on ECS Fargate

March 16, 2025March 16, 2025 ~ admin ~ Leave a comment

If you want auto-scaling and managed containers, AWS ECS has to be used. It has many advantages and some of the main ones are
– Easier Oracle deployment without full VM installations
– More flexibility compared to RDS (can run XE, SE2, or Enterprise)
– Good for cost savings (shutdown when not in use)
– Enables containerized workflows on AWS
– Works well with ECS, Fargate, or Kubernetes for scalability

Lets go through the steps and see how it works.

1. Create an ECS Cluster
Go to AWS ECS Console -> Clusters -> Click Create Cluster. I named it ora-cls.
Choose AWS Fargate (serverless).
Click Create.

Cluster creation can be monitored in CloudFormation under Stacks.

2. Create a Task Definition for Oracle
Go to AWS ECS -> Task Definitions -> Click Create New Task Definition. I named it OraXE.
Choose AWS Fargate and unique task definition family name.

Configure the container –
– Container Name: orax213
– Image: XE image in your ECR repository. Check here how to add Docker image to ECR Repository.
– Memory: 3GB
– Port Mapping: 1521 (TCP)

Under Environment variables set the following key – ORACLE_PWD with value ‘oracle’. This is sys password for the Oracle XE database.

The rest can be left as it is. Click Create Task Definition.

3. After creating the task definition, now task can be run from this definition.
Go to ECS Clusters -> Task definitions and select the OraXE. From the menu for Task definition, select Deploy and then just click ‘Run task’.
The ‘Create’ dialog is shown and ora-cls cluster is selected (as there are no other cluster), you can select another cluster if you have more than one.
In ECS, clusters and task definitions are separate entities, but when you run a task (or create a service), you specify the cluster in which the task will run.
Leave all settings as they are – for this exercise no need to make any additional storage or network changes.
Click on Create.

Task status can be seen under Clusters -> <cluster name> -> Tasks
Now, Oracle is running fully managed inside ECS.

You might check the task logs – under Logs tab for the task – to see any unusual/errors by task start

4. Connect to the container
There are different ways to connect to the container – I will use one small EC2 instance in the same VPC running Amazon Linux and Oracle instance client installed on it.
Do not forget to change the security group associated with your Fargate task – add inbound rule allowing 1521 port from the security group of the EC2 instance.
You can directly select Type Oracle-RDS – it is for port 1521 .

Get the container private IP – under Networking settings for the running task.

sqlplus sys/Oracle123@//<Fargate Private IP>:1521/XEPDB1 as sysdba

Happy clouding!

Add Oracle XE Docker Image to ECR Repository

March 11, 2025March 16, 2025 ~ admin ~ 1 Comment

Here’s a short description how to get an Oracle XE Docker image from the Oracle Container Registry (https://container-registry.oracle.com) and add it to ECR Repository.

1. Prerequisites
Docker must be installed and running on your local machine.
AWS CLI must be configured with the necessary permissions to access ECR
Oracle account is needed to access the Oracle Container Registry.

Docker and AWS CLI can be installed on your local machine or if you prefer on small EC2 instance with free tier eligible. Here EC2 instance will be used.
How to install docker on EC2 – check here.

2. Log in to the Oracle Container Registry
From your EC2 login to container-registry.oracle.com

docker login container-registry.oracle.com

3. Pull the Oracle XE Docker Image
docker pull container-registry.oracle.com/database/express:latestt
latest: Pulling from database/express
2318ff572021: Pull complete
c6250726c822: Pull complete
33ac5ea7f7dd: Pull complete
753e0fae7e64: Pull complete
Digest: sha256:dcf137aab02d5644aaf9299aae736e4429f9bfdf860676ff398a1458ab8d23f2
Status: Downloaded newer image for container-registry.oracle.com/database/express:latest
container-registry.oracle.com/database/express:latest

Verify the image was downloaded
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
container-registry.oracle.com/database/express latest 8da8cedb7fbf 19 months ago 11.4GB

4. Create an ECR Repository in AWS
Navigate to Elastic Container Registry (ECR) and click on Create. Provide a name for the repository(f.i. oracle-xe) and configure settings (or just leave them as they are).

5. Authenticate Docker to Your ECR Registry

To push images to ECR, authenticate Docker with your ECR registry:

— Replace <region> with your AWS region (e.g., us-west-1) and <aws_account_id> with your AWS account ID.
— ‘aws configure’ must have been started before
aws ecr get-login-password –region <region> | docker login –username AWS –password-stdin <aws_account_id>.dkr.ecr.<region>.amazonaws.com
…
Login Succeeded

6. Tag the Oracle XE Image for ECR

–Replace: <aws_account_id> with your AWS account ID, <region> with your AWS region, <repository_name> with the name of your ECR repository, <tag> with the desired tag.

docker tag container-registry.oracle.com/database/express:latest <aws_account_id>.dkr.ecr.<region>.amazonaws.com/<repository_name>:<tag>

7. Push the Oracle XE Image to ECR
docker push <aws_account_id>.dkr.ecr.<region>.amazonaws.com/<repository_name>:<tag>

8. Verify the Image in ECR
From AWS Management Console navigate to Elastic Container Registry (ECR), select your repository and verify that Oracle XE image has been successfully pushed

9. When you finish, EC2 instance can be terminated if no more needed.

10. Pull the Image from ECR
To pull the image from ECR to another machine or service

docker pull <aws_account_id>.dkr.ecr.<region>.amazonaws.com/<repository_name>:<tag>

Happy clouding!

Deploy Oracle on Docker in EC2

March 11, 2025March 16, 2025 ~ admin ~ 1 Comment

Deploying Oracle Database inside a Docker container on AWS has the following advantages –

– Easier Oracle deployment without full VM installations.
– More flexibility compared to RDS (can run XE, SE2, or Enterprise).
– Good for cost savings (shutdown when not in use).
– Enables containerized workflows on AWS.
– Works well with ECS, Fargate, or Kubernetes for scalability.

And of course it is good possibility to play a bit with Oracle on AWS and to refresh your skills with one simple hands-on exercise.

Running Oracle on Docker is simple and gives you full control over your container.

1. Launch an EC2 Instance
This is pretty simple, under EC2 section in AWS Console name your EC2 instance, select Amazon Linux(or what distribution you prefer), instance type (I am going to use t2.medium – not free tier, but need more memory – docker image comes wit SGA 1536M), Key pair, storage 50G gp3 (for non-prod general purpose ssd is fine). Do not forget to select appropriate security group and to allow ssh from anywhere and public IP auto-assign – yes, I know, this is not good idea, but hey, I am not building production environment, just exercising aws 😉

2. Install Docker & Dependencies
There are different ways to connect to your EC2 instance, I just will use EC2 Instance Connect here – connect and run

sudo yum install -y docker
sudo systemctl enable docker
sudo systemctl start docker
sudo usermod -aG docker ec2-user

3. Get the Oracle Docker Image
Log out and log in for group changes to take effect.
Run the following command to get Oracle Database XE Release 21c (21.3.0.0) Docker Image –

docker pull container-registry.oracle.com/database/express:latest
latest: Pulling from database/express
2318ff572021: Pull complete
c6250726c822: Pull complete
33ac5ea7f7dd: Pull complete
753e0fae7e64: Pull complete
Digest: sha256:dcf137aab02d5644aaf9299aae736e4429f9bfdf860676ff398a1458ab8d23f2
Status: Downloaded newer image for container-registry.oracle.com/database/express:latest
container-registry.oracle.com/database/express:latest

Verify it –
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
container-registry.oracle.com/database/express latest 8da8cedb7fbf 19 months ago 11.4GB

4. Run the Oracle Container

docker run –name orax213 \
-p 1521:1521 -p 5500:5500 \
-e ORACLE_PWD=welcome1 \
-e ORACLE_CHARACTERSET=AL32UTF8 \
-v /home/ec2-user/oradata \
-d container-registry.oracle.com/database/express

5. Verify Oracle is Running

docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8cbe510f2dd8 container-registry.oracle.com/database/express “/bin/bash -c $ORACL…” About a minute ago Up About a minute (healthy) 0.0.0.0:1521->1521/tcp, :::1521->1521/tcp, 0.0.0.0:5500->5500/tcp, :::5500->5500/tcp orax213

Connect to the database (XEPDB1 is the service name for the first PDB created by default) –
docker exec -it orax213 sqlplus sys/welcome1@//localhost:1521/XEPDB1 as sysdba

SQL*Plus: Release 21.0.0.0.0 – Production on Tue Mar 11 09:41:01 2025
Version 21.3.0.0.0

SQL> show pdbs

CON_ID CON_NAME OPEN MODE RESTRICTED
———- —————————— ———- ———-
3 XEPDB1 READ WRITE NO

If you check for the smon db process –
ps -ef | grep smon
54321 28698 28585 0 09:38 ? 00:00:00 xe_smon_XE

you will notice that the owner of the process is 54321 – this is because Oracle inside the container runs as user ID 54321.

At the end stop and remove the container

docker stop 8cbe510f2dd8
docker rm 8cbe510f2dd8

and terminate the EC2 instance.

Happy clouding!

Oracle RDSCustom – Performance Analysis of AWS db.m5 Instances: Sequential & Random Read/Write Benchmarks

February 11, 2025March 16, 2025 ~ admin ~ Leave a comment

This post presents the results of IOPS performance testing for Oracle running on Custom RDS in AWS. The tests were conducted using the fio benchmarking tool across three instance types: db.m5.large, db.m5.xlarge, and db.m5.4xlarge. The storage used was General Purpose SSD (gp2) volumes. The database remained active but was not under any load during the tests.
Here are Step-by-Step guide to creating a CEV and deploying an Oracle Database on RDS Custom.

Sequential Reads

Metrics	db.m5.large	db.m5.xlarge	db.m5.4xlarge
Read IOPS	513	513	513
Read Bandwidth (MiB/s)	513	514	514
Read Bandwidth (MB/s)	538	539	539
Total Read Data (GiB)	150	151	151
Run Duration (ms)	300,003	300,003	300,003
Submit Latency Avg (usec)	53.36	52.65	57.34
Complete Latency Avg (usec)	3,843.65	3,838.99	3,833.52
Total Latency Avg (usec)	3,897.16	3,891.79	3,891.03
IOPS Min/Max/Avg	485 / 1024 / 513.68	487 / 1077 / 514.05	496 / 1014 / 514.29
Latency Percentiles	5,407	4,424	5,014
CPU Usage (sys)	1.29%	1.41%	1.48%
Queue Utilization (%)	99.29%	100.00%	100.00%

Key Observations:

Performance Consistency – The read bandwidth and IOPS are nearly identical across all instance types, hovering at 513-514 MiB/s (538-539 MB/s).
Latency – Minor differences in latency averages and percentiles suggest similar read patterns and IO efficiencies.
Queue Utilization – All tests indicate almost full utilization (100%) of the storage system.
CPU System Load – A slight increase in system CPU usage with larger instance types might be attributed to handling higher parallel operations efficiently.

Performance Summary – All three instance types (db.m5.large, db.m5.xlarge, and db.m5.4xlarge) achieved the same Read IOPS during the sequential read tests and the sequential read throughput remained stable across all instance types without any significant variations.
The results indicate that sequential read performance is limited by the characteristics of the gp2 SSD storage rather than the instance size.

Sequential Writes

Metrics	db.m5.large	db.m5.xlarge	db.m5.4xlarge
IOPS	513	514	513
Bandwidth (MiB/s)	514	514	514
Avg Latency (usec)	3890.53	3889.16	3890.80
CPU Usage (usr/sys)	1.37% / 1.26%	1.06% / 1.26%	1.15% / 1.57%
Disk Utilization (%)	99.05%	99.98%	100.00%

Key Observations:

Bandwidth and IOPS – All instance types exhibit similar performance for sequential writes, with approximately 514 MiB/s bandwidth and 513 to 14 IOPS.
Latency – Latency percentiles for all instances show a consistent pattern with an average latency of ~3890 usec across the board. Higher percentiles indicate occasional spikes, with the 99.99th percentile showing latencies up to 14,484 usec.
CPU Utilization – CPU usage is relatively low across all instance types, with the db.m5.large showing slightly higher user-mode CPU usage (1.37%) than the other configurations.
System CPU utilization remains consistent at around 1.26%.
Disk Utilization – Disk utilization for db.m5.xlarge and db.m5.4xlarge reaches 100%, indicating they are fully saturated during sequential write operations.
The db.m5.large instance shows 99.07% disk utilization, suggesting near-complete resource saturation.
Disk Stats – Across all instances, NVMe devices report consistent statistics, with variations in disk utilization percentages across different drives. Some drives on the larger instances operate at utilization rates between 83% to 86%, likely due to parallel I/O distribution.

Performance Summary – The three instance types demonstrate comparable sequential write performance in terms of throughput, IOPS, and latency. The db.m5.xlarge and db.m5.4xlarge instances fully utilize disk resources, which aligns with their larger resource allocations. The marginally lower CPU usage for the db.m5.xlarge could indicate a more efficient processing path for sequential write tasks.

Random Reads

Metrics	db.m5.large	db.m5.xlarge	db.m5.4xlarge
IOPS	2991	2991	2993
Bandwidth (MiB/s)	23.4	23.4	23.4
Avg Latency (usec)	667.41	667.39	667.04
Latency Percentiles (99%) (usec)	1303	1336	1303
CPU Usage (usr/sys)	0.47% / 1.15%	0.53% / 1.39%	0.44% / 1.03%
Disk Utilization (%)	98.75%	100.00%	100.00%

Key Observations:

Consistent IOPS and Bandwidth – All instance types achieved nearly identical IOPS (~2991–2993) and bandwidth (~23.4 MiB/s), indicating the random read workload is not heavily dependent on instance size.
Latency Distribution – The average read latency for all instances remained consistent at approximately 667 µs, with the 99th percentile ranging between 1303 and 1336 µs.
CPU Utilization – Slightly higher system CPU usage was observed on the db.m5.xlarge instance (1.39% sys) compared to the other types.
Disk Utilization – Disk utilization reached near-saturation levels (98.75% to 100%), demonstrating the workload’s reliance on storage performance rather than compute power.

Performance Summary – The random read performance appears bottlenecked by storage IOPS capacity rather than instance compute or memory resources. Scaling to higher instance sizes (db.m5.large → db.m5.4xlarge) did not yield any improvement in IOPS or bandwidth.
Stable read latencies indicate that the underlying storage maintained consistent performance across different instance sizes.
So, given the results, larger instances do not provide significant benefits for random read workloads on these storage configurations. Focus should be on storage optimization if performance enhancements are required.

Random Writes

Metrics	db.m5.large	db.m5.xlarge	db.m5.4xlarge
IOPS	2051	2086	2122
Bandwidth (MiB/s)	16.0	16.3	16.6
Avg Latency (usec)	973.33	957.14	941.02
Latency Percentiles (99%) (usec)	1631	1188	1037
CPU Usage (usr/sys)	0.43% / 2.09%	0.50% / 2.21%	0.40% / 2.16%
Disk Utilization (%)	98.47%	99.40%	100.00%

Key Observations:

IOPS and Bandwidth Consistency – The IOPS and bandwidth performance is quite stable across all instance sizes with slight improvements on larger instances. The difference between the db.m5.large and db.m5.4xlarge instances is minimal in terms of IOPS (2051–2122) and bandwidth (~16 MiB/s–16.6 MiB/s).
Latency Distribution – The average latency is relatively consistent across instances, with a small decrease as instance size increases. The 99th percentile latency improved as you scale up the instance sizes (1631 µs to 1037 µs), although the reduction is modest.
CPU Utilization – CPU utilization is low across all instances, with the system CPU utilization being slightly higher on db.m5.xlarge (2.21%) compared to the db.m5.large (2.09%) and db.m5.4xlarge (2.16%).
Disk Utilization – The disk utilization is near saturation levels, peaking at 100% on the db.m5.4xlarge instance. All instances show similar disk utilization (~98–100%), which highlights the potential storage bottleneck in random write operations.

Performance Summary – The performance for random writes appears to be limited by the storage subsystem rather than the CPU or memory of the instances. All instance sizes achieved similar bandwidth and IOPS, even with the increasing instance size.
Scaling up to larger instances (db.m5.xlarge and db.m5.4xlarge) provided minimal improvements in random write performance. This suggests that disk I/O is the limiting factor in this workload.
While latency did improve slightly as we moved to larger instances, the improvement is marginal, and the overall behavior shows that random writes are not significantly affected by the instance’s compute capacity.
For workloads with heavy random writes, optimizing storage performance (e.g., choosing faster EBS volumes) rather than scaling up instances would likely lead to more significant performance gains.

Happy clouding!

Oracle Linux on EC2

February 5, 2025March 16, 2025 ~ admin ~ Leave a comment

AWS offers different preconfigured OS images (Amazon Machine Images or AMI) to be used by launching EC2 instance – Amazon Linux, Ubuntu, RedHat, SUSE, but still not Oracle Linux(OL).
This doesn’t mean that it is not possible to have own EC2 instance running OL – here’s how this can be done.

By EC2 instance launch (EC2 Console) search in the AMI catalog for 131827586825 – there are Oracle-provided Oracle Linux AMIs
Refer to the policy document Licensing Oracle Software in the Cloud Computing Environment for licensing and support pricing details

Just select one (f.i. for Architecture: x86_64) and then continue with the EC2 Console (Key pair, Network Settings, Storage, etc.). Free tier is eligible.
In a short time the instance is ready and can be used.
To connect by SSH Client, the instance must have public IP, then just use your .pem file to connect – using ec2_user instead of root –
ssh -i “myKey.pem” ec2-user@3.125.8.112 (IP of course will be different).
If you want to use Session Manager to connect to the instance, some additional steps must be followed.

1. SSM Agent must be installed on the machine – make ssh session and install the rpm package –
yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm

Ensure that the service is up by systemctl status amazon-ssm-agent.

2. Create Endpoints needed for the SSM
Go to VPC Console->Endpoints and create endpoint. Select Type ‘AWS service’

Under Services filter for ssm and select com.amazonaws.eu-central-1.ssm (if you are not in eu-central-1 the name will be different, showing the region)

Make sure you select the VPC where the EC2 instance is running and correct subnet as well security group and then create the endpoint.

Create two additional endpoints for ssmmessages and ec2messages(here filter for ec2 by searching for service)

3. EC2 instance needs appropriate IAM role. There is a plenty of information in AWS Doc/Forums about that. I used the role AWSRDSCustomInstanceRole-eu-central-1 which was created before for Oracle RDS Custom (custom-oracle-iam.json).

With these steps Session Manager is ready and connection to the instance is working

Happy clouding!