|
There was an interesting discussion not too long back on how to do fast
check-pointing for your server.
The basic procedure with check-pointing for backups is shown in the
System Administrator's Guide. For larger sites it can take tens of minutes,
getting up to hours sometimes, which becomes inconvenient if you have limited
windows for backup due to people working in different time-zones etc.
As an aside on the -z option to zip a checkpoint while backing up - it is
worth checking for your server hardware the performance of the CPU overhead of
zipping vs. the writing to disk of the checkpoint. Thus in some circumstances it
might be worth check-pointing and zipping as you go, and in others zipping
offline.
With thanks to
Chris Bartz who posted it to the Perforce User mailing list in such a well
documented fashion:
Okay, here are the gory details. I can't take credit for inventing it;
Perforce tech support gave me most of the details and I'm pretty sure others
are doing very similar things. To bootstrap the process you need to create
an offline database. This is done by:
1) use "p4 counter journal" to get journal counter value. The checkpoint
name will be checkpoint.<journal counter+1>.
2) "p4 admin checkpoint" (or "p4d -jc" if you prefer)
3) Optional. Zip and backup the truncated journal file
4) Delete old offline database db.* files
5) Build offline database with "p4d -r <offlineDir> -jr <checkpoint>"
6) Zip and backup checkpoint
We do the above steps once a week so that we start each week with a fresh
offline database. We currently keep all the journals between rebuilding the
offline database so we could recover from a real checkpoint plus journal
files if there was some problem with the offline database.
Rebuilding and keeping all the journals in between isn't really required but
when I set it up I wasn't 100% confident in the whole process. If I were
making other changes to the process I would probably go with once a month
rebuilds and maybe not keep all the journals.
The offline checkpoint is done daily with:
1) use "p4 counter journal" to get value of journal counter
2) Truncate journal file with "p4d -r <root> -jj <journal filename>" This
creates a files <journal filename>.jnl.<journal counter> and starts a new
journal file
3) Read truncated journal into offline database with "p4d -r <offline root>
-jr <journal filename>.jnl.<journal counter>"
4) Optional. Zip and backup journal file
5) Checkpoint offline database with "p4d -r <offline root> -jd
<checkpoint>.<journal counter + 1>". The journal file + 1 is so it has the
same name as perforce would give it if we checkpointed the live database.
6) Optional. Zip and backup checkpoint
7) Optional. Delete old checkpoints and journals (we keep all journals
between rebuilding the offline database and 3 checkpoints).
When this is done we have a checkpoint and journal file that should be
exactly the same as if we did the "p4 admin checkpoint" on the live
database. There is essentially zero downtime (except the weekly rebuild).
The offline database could be on another machine and the checkpoint could
be done there if disk space or processing power were an issue.
The depot files are backed up after this process is done. We do not shut
perforce down for that backup. You really don't need to; what perforce does
to handle this is simple and does work.
Another note on the final remark - imagine the metadata (as restored from the
checkpoint + journal) has information about 9 revisions of a file, and due to
the backup having happened a little time after the checkpoint (and journal being
a little out of date), and yet the RCS format archive file actually contains 10
revisions. The server will carry on fine. Obviously the opposite is not true
(metadata has 10 revisions and archive file only 9). In both cases, if there is
some inconsistency you will potentially lose some work, but most recent activity
will be stored in people's workspaces (and they may remember any changelists
they have recently submitted).
Thus your disaster recovery scenario needs to include what happens when you
get your server back online and what people need to do (e.g.
Tech Note 2 -
Working Disconnected). Sally Page of Symbian gave an excellent presentation
to the UK User Group on
Symbian's DR experience and lessons learnt.
|