-
Installing MySQL 9.7 LTS Community Edition on CentOS
Historically installing MySQL on a RedHat Compatible Linux server was as simple as yum install mysql-server.
Today’s MySQL Oracle Linux, Red Hat Enterprise Linux, CentOS, and Fedora 9.7 instructions are not accurate mixing in 8.
-
MySQL 9.7 – Thank you for your contributions!
On April 21st, 2026, we released MySQL 9.7.0, the latest Long-Term Support release. As always, we are grateful to the MySQL community for helping improve MySQL with bug reports, patches, pull requests, and continued feedback. Community contributions help make MySQL better for everyone, and we are happy to recognize the contributors whose work was included […]
-
A More Predictable MySQL Release Model: Calendar Versions, LTS, and Innovation
Understanding the New Cadence: Quarterly CPUs, Targeted CSPUs, and Transitioning to Calendar Versioning MySQL is updating its release model to make releases easier to understand, plan for, and follow: The goal is not simply to change the number on a release. The goal is to give users, DBAs, developers, Linux distributions, cloud platforms, and ecosystem […]
-
Extending pt-archiver with a Partition-Aware Plug-in for Fast Retention Policy Enforcement
Managing data retention policies is one of the most common operational tasks in MySQL.
Applications continuously generate transactional, audit, logging, telemetry, and event data. Over time, these tables can grow to billions of rows, causing:
Larger backups
Longer recovery times
Reduced buffer pool efficiency
Slower index maintenance
Increased storage costs
Degraded query performance
To address these problems, organizations typically implement retention policies based on dates or timestamps. Examples include deleting events older than 90 days or purging session data older than 30 days and so forth. The deleted data can then eventually be archived somewhere else, like in another DBMS or on external files.
One of the most widely used tools for implementing these policies in MySQL ecosystems is pt-archiver, part of the Percona Toolkit.
This article provides a review of what pt-archiver is and how to use it, but in particular it focuses on the fact this tool is not partitioning aware, and this can make the deletion phase more costly. The article shows how to extend pt-archiver with a Perl plugin to make it aware of partitioning.
What is pt-archiver?
pt-archiver is a command-line utility from Percona Toolkit designed to:
Archive rows from MySQL tables
Purge rows from MySQL tables
Move data between tables into the local database or a remote one
Export rows into files
In a few words: implementing retention policies safely.
The tool processes rows incrementally in chunks, avoiding massive transactions and reducing impact on production systems.
Example:pt-archiver \
--source h=localhost,D=mydb,t=events \
--where "created_at < '2026-05-01'" \
--purge \
--limit 1000 \
--commit-eachThis command:
Scans rows matching the WHERE condition
Processes them in chunks of 1000 rows
Commits every chunk
Deletes matching rows from the source table
pt-archiver provides several advantages compared to ad-hoc DELETE statements.
Instead of running:DELETE FROM events
WHERE created_at < '2026-05-01';which may:
Lock rows for a long time
Generate massive undo/redo logs
Create replication lag
Exhaust transaction logs
pt-archiver processes rows incrementally to make the process overhead less impactful for the database performance.
pt-archiver implementation permits flexible archival strategies
Rows can be copied to another table on a remote host, exported to files or removed completely
More details: ps://docs.percona.com/percona-toolkit/pt-archiver.html
Example: Copy rows to a remote archive table
The following example archives rows older than 90 days from a local table into an archive table hosted on a remote MySQL server:pt-archiver \
--source h=localhost,D=sales,t=orders,u=archiver,p=secret \
--dest h=archive-server,D=archive,t=orders_archive,u=archiver,p=secret \
--where "created_at < '2026-05-01'" \
--limit 1000 \
--commit-each \
--progress 10000 \
--statisticsIn this example:
–source defines the source table
–dest defines the remote archive destination
–where selects rows eligible for archival
–limit controls batch size
–commit-each commits every batch independently to reduce transaction overhead
–-progress reports progress every 10,000 rows
If rows should be removed from the source table after being copied, add –purge
Example: Export rows to a file
The following example exports rows older than one year into a text file:pt-archiver \
--source h=localhost,D=sales,t=orders,u=archiver,p=secret \
--where "created_at < NOW() - INTERVAL 1 YEAR" \
--file '/tmp/orders_archive_%Y-%m-%d.txt' \
--output-format csv \
--limit 1000 \
--commit-each \
--progress 10000 \
--statisticsIn this example:
–file specifies the output file
–-output-format csv exports rows in CSV format
Date placeholders in the filename are expanded automatically
Rows can optionally be deleted from the source table by adding –purge
This allows pt-archiver to be used both for data retention and for offline archival workflows.
The Hidden Cost of DELETE Statements
Although pt-archiver is much safer than massive DELETE operations, it still fundamentally relies on DELETE statements.
This is a critical point.
Even when there are proper indexes, the rows are processed in chunks, and transactions are small; the large-scale DELETE operations remain expensive.
Deleting rows is expensive in InnoDB because it involves:
Locating rows via indexes
Modifying clustered indexes
Modifying secondary indexes
Generating undo logs
Generating redo logs
Purge thread processing
Replication event generation
Page fragmentation
When deleting billions of rows, the overhead becomes enormous.
Indexes help for sure, but only partially.
Consider:DELETE FROM events
WHERE created_at < '2024-01-01';If created_at is indexed, MySQL can efficiently locate rows.
However, locating rows efficiently is only part of the cost. The actual delete operations still require all those things we mentioned above.
At considerable scale, this becomes expensive.
Why RANGE Partitioning is Superior for Retention Policies
For time-based retention policies, partitioning is often dramatically more efficient. In particular, RANGE partitioning is very useful for these cases.
Example:CREATE TABLE events (
id BIGINT NOT NULL,
created_at DATETIME NOT NULL,
payload JSON,
PRIMARY KEY(id, created_at)
)
PARTITION BY RANGE (TO_DAYS(created_at)) (
PARTITION p202604 VALUES LESS THAN (TO_DAYS('2026-05-01')),
PARTITION p202605 VALUES LESS THAN (TO_DAYS('2026-06-01')),
PARTITION p202606 VALUES LESS THAN (TO_DAYS('2026-07-01'))
);With partitioning, dropping old data becomes:ALTER TABLE events DROP PARTITION p202604;This operation is dramatically faster than running a DELETE.
Dropping a partition:
Removes an entire physical partition
Avoids row-by-row DELETE
Avoids undo generation for each row
Avoids secondary index maintenance per row
Minimizes redo generation
Is nearly metadata-only
This can remove millions or billions of rows in a matter of seconds without the same large cost of DELETE.
The Problem: pt-archiver is Not Partition-Aware
Unfortunately, pt-archiver does not automatically understand partitioning strategies.
Even if the table is partitioned or the retention policy perfectly matches partition boundaries, pt-archiver still executes DELETE statements.
Example:pt-archiver \
--where "created_at < NOW() - INTERVAL 90 DAY" \
--purgeInternally, this still produces DELETE … instead of ALTER TABLE … DROP PARTITION …
This means organizations may lose the major operational benefits of partitioning, or they need to implement custom scripts for managing the selection of rows to copy using pt-archiver and then use DROP PARTITION separately from the tool. That is doable, and to be honest, not too complicated, but why not make pt-archiver aware of partitioning for some specific use cases?
Extending pt-archiver with Pulg-ins
Fortunately, pt-archiver supports Perl plug-ins.
A plug-in can do plenty of things. Like: inspect runtime conditions, interact with MySQL, override behaviors, and execute custom logic
This gives us an opportunity to implement partition-aware retention handling.
The plug-in can:
Inspect partition definitions
Analyze the WHERE condition
Determine which partitions are fully expired
Execute ALTER TABLE DROP PARTITION
Prevent row-by-row DELETE processing
This approach combines the scheduling/orchestration power of pt-archiver with the efficiency of partition pruning.
Plug-in Design
Our plug-in will:
Connect using the pt-archiver DB handle
Inspect INFORMATION_SCHEMA.PARTITIONS
Identify partitions older than the retention cutoff
Issue DROP PARTITION statements
Log actions
Skip DELETE processing
Assumptions:
The table is RANGE partitioned
Partitions are DATETIME based using the TO_DAYS() function to define ranges
Partition naming convention contains dates
Retention policy aligns with partition boundaries; if the plugin cannot determine a specific boundary, pt-archiver does nothing
Full Perl Plug-in for pt-archiver
package pt_archiver_partition_drop;
use strict;
use warnings;
sub new {
my ($class, %args) = @_;
my $self = {
dbh => $args{dbh},
db => $args{db},
tbl => $args{tbl},
statistics => {},
};
bless $self, $class;
return $self;
}
sub statistics {
my ($self) = @_;
return $self->{statistics};
}
sub before_begin {
my ($self) = @_;
my $dbh = $self->{dbh} or die "Missing dbh from pt-archiver\n";
my $db = $self->{db} or die "Missing db from pt-archiver plugin args\n";
my $tbl = $self->{tbl} or die "Missing tbl from pt-archiver plugin args\n";
my $where = _get_cmdline_option('where');
my $dryrun = $ENV{PT_PARTITION_DROP_DRY_RUN} ? 1 : 0;
die "Missing --where from original command line\n" unless $where;
print "PLUGIN before_begin called\n";
print "DB=$db TABLE=$tbl\n";
print "WHERE=$where\n";
print "PLUGIN_DRY_RUN=$dryrun\n";
my ($column, $cutoff_date) = _parse_where($where);
my $partitions = _get_partitions($dbh, $db, $tbl);
if (!@$partitions) {
print "Table `$db`.`$tbl` is not partitioned. Refusing DELETE.\n";
exit(0);
}
my $partition_expr = $partitions->[0]->{expression};
die "Missing PARTITION_EXPRESSION\n"
unless defined $partition_expr && length $partition_expr;
print "Partition expression: $partition_expr\n";
my $cutoff_value = _evaluate_cutoff(
$dbh,
$partition_expr,
$column,
$cutoff_date,
);
print "Cutoff date: $cutoff_date\n";
print "Cutoff boundary value: $cutoff_value\n";
my $matched;
for my $p (@$partitions) {
next if !defined $p->{description};
next if uc($p->{description}) eq 'MAXVALUE';
if ($p->{description} == $cutoff_value) {
$matched = $p;
last;
}
}
if (!$matched) {
print "No exact partition boundary matches cutoff $cutoff_value. Refusing DELETE.\n";
exit(0);
}
print "Matched boundary partition: $matched->{name}, position $matched->{position}\n";
my @drop;
for my $p (@$partitions) {
next if !defined $p->{description};
next if uc($p->{description}) eq 'MAXVALUE';
if ($p->{position} <= $matched->{position}) {
push @drop, $p->{name};
print "Eligible for DROP: $p->{name}, boundary $p->{description}\n";
}
}
if (!@drop) {
print "No partitions eligible for DROP. Refusing DELETE.\n";
exit(0);
}
my $sql = sprintf(
"ALTER TABLE %s.%s DROP PARTITION %s",
_quote_ident($db),
_quote_ident($tbl),
join(", ", map { _quote_ident($_) } @drop),
);
print "SQL: $sql\n";
if ($dryrun) {
print "PT_PARTITION_DROP_DRY_RUN enabled. Not executing DROP PARTITION.\n";
}
else {
$dbh->do($sql);
print "Dropped partitions: " . join(", ", @drop) . "\n";
}
$self->{statistics}->{partitions_dropped} = scalar @drop;
exit(0);
}
sub _parse_where {
my ($where) = @_;
$where =~ s/^\s+|\s+$//g;
die "Only WHERE format supported: created_at < 'YYYY-MM-DD'\n"
unless $where =~ /^`?([A-Za-z0-9_]+)`?\s*<\s*'(\d{4}-\d{2}-\d{2})'\s*$/;
return ($1, $2);
}
sub _evaluate_cutoff {
my ($dbh, $partition_expr, $column, $cutoff_date) = @_;
my $expr = $partition_expr;
$expr =~ s/`//g;
die "Partition expression does not reference column `$column`: $partition_expr\n"
unless $expr =~ /\b\Q$column\E\b/i;
$expr =~ s/\b\Q$column\E\b/'$cutoff_date'/ig;
die "Unsafe generated expression: $expr\n"
unless $expr =~ /^[A-Za-z0-9_\s\(\)\+\-\*\/,\.'":]+$/;
my $sql = "SELECT $expr";
print "Boundary evaluation SQL: $sql\n";
my ($value) = $dbh->selectrow_array($sql);
die "Cannot evaluate cutoff expression: $sql\n"
unless defined $value;
return $value;
}
sub _get_partitions {
my ($dbh, $db, $tbl) = @_;
my $sql = q{
SELECT
PARTITION_NAME,
PARTITION_DESCRIPTION,
PARTITION_EXPRESSION,
PARTITION_ORDINAL_POSITION
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_SCHEMA = ?
AND TABLE_NAME = ?
AND PARTITION_NAME IS NOT NULL
ORDER BY PARTITION_ORDINAL_POSITION
};
my $sth = $dbh->prepare($sql);
$sth->execute($db, $tbl);
my @partitions;
while (my $row = $sth->fetchrow_hashref()) {
push @partitions, {
name => $row->{PARTITION_NAME},
description => $row->{PARTITION_DESCRIPTION},
expression => $row->{PARTITION_EXPRESSION},
position => $row->{PARTITION_ORDINAL_POSITION},
};
}
return \@partitions;
}
sub _get_cmdline_option {
my ($name) = @_;
my $opt = "--$name";
for (my $i = 0; $i < @ARGV; $i++) {
if ($ARGV[$i] eq $opt && defined $ARGV[$i + 1]) {
return $ARGV[$i + 1];
}
if ($ARGV[$i] =~ /^\Q$opt\E=(.*)$/) {
return $1;
}
}
if (open my $fh, '<', "/proc/$$/cmdline") {
local $/;
my $raw = <$fh>;
close $fh;
my @cmd = split /\0/, $raw;
for (my $i = 0; $i < @cmd; $i++) {
if ($cmd[$i] eq $opt && defined $cmd[$i + 1]) {
return $cmd[$i + 1];
}
if ($cmd[$i] =~ /^\Q$opt\E=(.*)$/) {
return $1;
}
}
}
return undef;
}
sub _quote_ident {
my ($ident) = @_;
die "Invalid identifier: $ident\n"
unless defined $ident && $ident =~ /^[A-Za-z0-9_]+$/;
return "`$ident`";
}
1;Create the file named pt_archiver_partition_drop.pm into the /usr/local/share/perl5 path.
Also set the environment variable PERL5LIB to let pt-archiver where to find the Perl packageexport PERL5LIB=/usr/local/share/perl5
Example Usage
First, create the partitioned table events and insert some fake data.DROP TABLE IF EXISTS events;
CREATE TABLE events (
id BIGINT NOT NULL,
created_at DATETIME NOT NULL,
payload JSON DEFAULT NULL,
PRIMARY KEY (id, created_at)
)
PARTITION BY RANGE (TO_DAYS(created_at)) (
PARTITION p202604 VALUES LESS THAN (TO_DAYS('2026-05-01')),
PARTITION p202605 VALUES LESS THAN (TO_DAYS('2026-06-01')),
PARTITION p202606 VALUES LESS THAN (TO_DAYS('2026-07-01')),
PARTITION pmax VALUES LESS THAN MAXVALUE
);
INSERT INTO events (id, created_at, payload) VALUES
-- p202604
(1, '2026-04-01 08:00:00', JSON_OBJECT('event', 'login', 'user', 'alice')),
(2, '2026-04-03 09:15:00', JSON_OBJECT('event', 'view', 'page', 'home')),
(3, '2026-04-05 10:30:00', JSON_OBJECT('event', 'click', 'button', 'signup')),
(4, '2026-04-08 11:45:00', JSON_OBJECT('event', 'search', 'term', 'mysql')),
(5, '2026-04-10 12:00:00', JSON_OBJECT('event', 'purchase', 'amount', 100)),
(6, '2026-04-14 13:20:00', JSON_OBJECT('event', 'logout', 'user', 'alice')),
(7, '2026-04-18 14:35:00', JSON_OBJECT('event', 'download', 'file', 'report.pdf')),
(8, '2026-04-22 15:50:00', JSON_OBJECT('event', 'upload', 'file', 'image.png')),
(9, '2026-04-26 16:05:00', JSON_OBJECT('event', 'click', 'button', 'buy')),
(10, '2026-04-30 23:59:59', JSON_OBJECT('event', 'month_end')),
-- p202605
(11, '2026-05-01 00:00:00', JSON_OBJECT('event', 'login', 'user', 'bob')),
(12, '2026-05-03 08:10:00', JSON_OBJECT('event', 'view', 'page', 'pricing')),
(13, '2026-05-06 09:20:00', JSON_OBJECT('event', 'search', 'term', 'percona')),
(14, '2026-05-09 10:30:00', JSON_OBJECT('event', 'purchase', 'amount', 250)),
(15, '2026-05-12 11:40:00', JSON_OBJECT('event', 'logout', 'user', 'bob')),
(16, '2026-05-16 12:50:00', JSON_OBJECT('event', 'download', 'file', 'backup.sql')),
(17, '2026-05-20 13:00:00', JSON_OBJECT('event', 'upload', 'file', 'data.csv')),
(18, '2026-05-24 14:10:00', JSON_OBJECT('event', 'click', 'button', 'subscribe')),
(19, '2026-05-28 15:20:00', JSON_OBJECT('event', 'view', 'page', 'docs')),
(20, '2026-05-31 23:59:59', JSON_OBJECT('event', 'month_end')),
-- p202606
(21, '2026-06-01 00:00:00', JSON_OBJECT('event', 'login', 'user', 'carol')),
(22, '2026-06-03 08:05:00', JSON_OBJECT('event', 'search', 'term', 'partitioning')),
(23, '2026-06-06 09:15:00', JSON_OBJECT('event', 'view', 'page', 'dashboard')),
(24, '2026-06-09 10:25:00', JSON_OBJECT('event', 'purchase', 'amount', 500)),
(25, '2026-06-12 11:35:00', JSON_OBJECT('event', 'logout', 'user', 'carol')),
(26, '2026-06-16 12:45:00', JSON_OBJECT('event', 'login', 'user', 'dave')),
(27, '2026-06-20 13:55:00', JSON_OBJECT('event', 'download', 'file', 'archive.zip')),
(28, '2026-06-24 14:05:00', JSON_OBJECT('event', 'upload', 'file', 'video.mp4')),
(29, '2026-06-28 15:15:00', JSON_OBJECT('event', 'click', 'button', 'checkout')),
(30, '2026-06-30 23:59:59', JSON_OBJECT('event', 'month_end')),
-- pmax
(31, '2026-07-01 00:00:00', JSON_OBJECT('event', 'login', 'user', 'eve')),
(32, '2026-07-05 08:30:00', JSON_OBJECT('event', 'view', 'page', 'future')),
(33, '2026-07-10 09:45:00', JSON_OBJECT('event', 'search', 'term', 'maxvalue')),
(34, '2026-08-01 10:00:00', JSON_OBJECT('event', 'purchase', 'amount', 750)),
(35, '2026-09-01 11:15:00', JSON_OBJECT('event', 'retained_future'));
Now you can run the following command to delete all rows before the 1st of May, which, by the way, matches the entire first partition in the table.pt-archiver \
--source h=localhost,D=mydb,t=events,m=pt_archiver_partition_drop \
--where "created_at < '2026-05-01'" \
--purge
Notice the Perl plugin must be indicated with the m option in the DSN string.
In practice:
pt-archiver initializes
The plug-in runs
Partitions are dropped
No DELETE statements are executed
Here is what you get from the execution of the above command:PLUGIN before_begin called
DB=mydb TABLE=events
WHERE=created_at < '2026-05-01'
PLUGIN_DRY_RUN=0
Partition expression: to_days(`created_at`)
Boundary evaluation SQL: SELECT to_days('2026-05-01')
Cutoff date: 2026-05-01
Cutoff boundary value: 740102
Matched boundary partition: p202604, position 1
Eligible for DROP: p202604, boundary 740102
SQL: ALTER TABLE `mydb`.`events` DROP PARTITION `p202604`
Dropped partitions: p202604You can simply verify the table has been managed correctly:
SELECT * FROM mydb.events;
SHOW CREATE TABLE mydb.events;
Now TRUNCATE the table and recreate the data and try now to specify the where conditions that match a RANGE that is not the first in the list of the boundaries.pt-archiver \
--source h=localhost,D=mydb,t=events,m=pt_archiver_partition_drop \
--where "created_at < '2026-06-01'" \
--purgeYou should get:PLUGIN before_begin called
DB=mydb TABLE=events
WHERE=created_at < '2026-06-01'
PLUGIN_DRY_RUN=0
Partition expression: to_days(`created_at`)
Boundary evaluation SQL: SELECT to_days('2026-06-01')
Cutoff date: 2026-06-01
Cutoff boundary value: 740133
Matched boundary partition: p202605, position 2
Eligible for DROP: p202604, boundary 740102
Eligible for DROP: p202605, boundary 740133
SQL: ALTER TABLE `mydb`.`events` DROP PARTITION `p202604`, `p202605`
Dropped partitions: p202604, p202605In this case, two partitions have been identified and dropped.
Truncate the table and recreate the data again. Try now to provide a WHERE condition that does not match any of the boundaries in the RANGE.pt-archiver \
--source h=localhost,D=mydb,t=events,m=pt_archiver_partition_drop \
--where "created_at < '2026-04-25'" \
--purge
You get the following:PLUGIN before_begin called
DB=mydb TABLE=events
WHERE=created_at < '2026-04-25'
PLUGIN_DRY_RUN=0
Partition expression: to_days(`created_at`)
Boundary evaluation SQL: SELECT to_days('2026-04-25')
Cutoff date: 2026-04-25
Cutoff boundary value: 740096
No exact partition boundary matches cutoff 740096. Refusing DELETE.As expected, the tool now refuses to execute anything if it doesn’t find an exact match.
Operational Benefits
This approach provides major advantages.
Dropping partitions is vastly faster than deleting rows, and minimal binary logging is needed, compared to billions of row deletes. There is no massive transactional overhead for managing undo logs and purging. You get then a better InnoDB Buffer Pool stability because of less page churn.
In the end, retention jobs are completed quickly and consistently in a predictable way and at the minimal cost.
Important Caveats
Partition Boundaries Must Match Retention Policy
If partitions contain mixed retention windows, DROP PARTITION may remove too much data. For this reason, ensure correct partition design.
Recommended:
daily partitions
weekly partitions
monthly partitions
aligned with business retention requirements.
Metadata Locks
ALTER TABLE DROP PARTITION still acquires metadata locks.
Test carefully in production.
Backup Awareness
Ensure dropped partitions are no longer needed before removal or use pt-archiver to also copy the data into a remote server or dump the data into a CSV file before running the DROP PARTITION.
Possible Enhancements
The plug-in can be extended further.
Potential improvements:
Support for daily partitions
Support for UNIX timestamp partitions
Dry-run reporting
Automatic partition creation
Push Slack notifications
Export Prometheus metrics
Safety checks for replicas
GTID-aware orchestration
Integration with pt-online-schema-change workflows
These are just some ideas I had meanwhile doing my tests. What you can do by implementing a Perl plugin is only limited by your imagination and your real needs.
Conclusion
pt-archiver remains an excellent tool for implementing retention policies and archival workflows.
However, DELETE-based purging becomes increasingly expensive at scale, even with proper indexing and chunked processing.
For large time-series or historical datasets, RANGE partitioning is often a dramatically superior strategy.
The challenge is that pt-archiver does not natively leverage partition-level operations.
Fortunately, its Perl plug-in architecture allows advanced users to extend its behavior and implement partition-aware cleanup logic.
By combining:
pt-archiver orchestration
MySQL RANGE partitioning
Custom Perl plug-ins
Organizations can achieve:
Faster retention enforcement
Lower operational overhead
Smaller replication impact
Dramatically improved scalability
For large MySQL deployments, this hybrid approach can turn multi-hour purge operations into near-instant metadata operations.
The use case presented in this article is limited to a specific scenario, but you can reuse it or customize it if you have a different kind of RANGE partitioning, for example, not using TO_DAYS().
Take this as just an example of how you can extend pt-archiver. What you can do for real is driven by your needs and/or only limited by your imagination.
More info about extending pt-archiver:
https://docs.percona.com/percona-toolkit/pt-archiver.html#extending
The post Extending pt-archiver with a Partition-Aware Plug-in for Fast Retention Policy Enforcement appeared first on Percona.
-
Using GenAI directly in the database. A practical example using MySQL 8.0
If you have a typical MySQL production setup using MySQL 8.0 (EOL) with replication, you can take advantage of VillageSQL extensions to generate AI responses directly with your source data with no impact on your production setup or existing application software.
|