Skipping or Selectively Running Tests with Maven

Table of Contents

Skip Unit or Integration Tests

Skip the execution of unit tests:

$ mvn compile -DskipTests

This will still compile the unit tests, but not run them. To skip both compilation and execution, use -Dmaven.test.skip. Note that this setting is also honored by the Failsafe plugin.

Skip executing integration tests:

$ mvn verify -DskipITs

Execute a Single Unit or Integration Test Class

Execute a single unit test class with Surefire:

$ mvn test -Dtest=MyUnitTest

In a multi-module project, we need to add -DfailIfNoTests=false, otherwise a module will fail where a class with that name is not available.

Also note that we do not execute integration tests because we instruct Maven to stop after the test phase. To run a single unit test class and skip all integration tests, add -DskipITs.

Execute a single integration test class with Failsafe:

$ mvn verify -Dit.test=MyIntegrationTest

As you can guess by now, this will run the single integration test by the name of MyIntegrationTest, and will also run all unit tests. To prevent this, you can add -DskipTests. Again, don’t forget to set -DfailIfNoTests if you run Maven on a multi-module project.

Execute a Single Unit or Integration Test Within a Test Class

Both the Surefire and Failsafe Maven plugins allow you to go crazy in their syntax for filtering which individual test classes or methods should be run. Specifying the parameters -Dtest and -Dit.test will override all defined include all defined includes and excludes from your pom.xml. Maven will create an include that looks like **/${test}.java.

We have seen how we can run single test classes by specifying the class name after -Dtest or -Dit.test. The next most simple format (since version 2.7.3) takes the form of ClassName#methodName, so if you have a test class named OrderTest.java and a unit test method testEmptyOrderShouldThrow, you can run this test method by itself, excluding all other test by typing:

$ mvn test -Dtest=OrderTest#testEmptyOrderShouldThrow

Complexer syntax with both glob and regex patterns is possible since version 2.19. The documentation gives this example:

$ mvn test '-Dtest=???Test, !Unstable*, pkg/**/Ci*leTest.java,
> *Test#test*One+testTwo?????, #fast*+slowTest'

Let’s break this down. As you can see, we specify multiple patterns in a single parameter (to prevent shell expansion, they are put inside single quotes):

  • ???Test will run any test that has a name that starts with any three characters and ends with Test (? is a glob character meaning any single character).
  • !Unstable* will not run any test that starts with Unstable (the initial ! negates the pattern).
  • pkg/**/Ci*leTest.java will run any test that has pkg somewhere in its package name, followed by any number of other child package names, while the class name should start with Ci followed by zero or more wildcard characters and ending with leTest.java (pkg/CileTest.java matches, as does foo/bar/pkg/baz/CiqwertyleTest.java).
  • *Test#test*One+testTwo????? will run any test whose class name ends in Test and whose method name matches test + any amount of characters + One or testTwo + any five characters. So this will run two test methods in every test class called *Test.
  • #fast*+slowTest will run the methods fast* and slowTest where ever there are found.

We can even have more fun by throwing in regular expressions:

$ mvn test '-Dtest=Basic*, !%regex[.*.Unstable.*],
> !%regex[.*.MyTest.class#one.*|two.*], %regex[#fast.*|slow.*]'
  • !%regex[.*.Unstable.*] will not run any tests that contain Unstable somewhere in their path.
  • !%regex[.*.MyTest.class#one.*|two.*] will run methods named one and two in a class called MyTest. We see that when using globs, we can use forward slashes to delimit packages. However, when using regex, we use dots.
  • %regex[#fast.*|slow.*] will run the test methods fast and slow wherever they are found. Note that (at least in version 3.0.0-M4), the trailing .* wildcard pattern is not needed.

Further Reading

You can read more about Surefire and Failsafe from the comfort of your terminal:

$ mvn surefire:help -Ddetail
$ mvn failsafe:help -Ddetail

You can also read the Surefire and Failsafe online documentation:

Grokking Time Zones in Oracle’s DATE and TIMESTAMP

Table of Contents

Intro

When I was tasked with putting some data in BigQuery, Google Cloud’s data warehouse implementation, I had to look at the data we currently store in our local Oracle database. I found that we were using DATE for fields that store “moments in time”. Normally, I would expect Unix timestamps here, or some other data type that contains the time zone.

Naturally, I complained about this, but people were quick to point out that it doesn’t matter, because you should relate the DATE to the time zone of the database. Oracle, it turns out, is tracking two time zones: the time zone of the database itself:

SELECT dbtimezone
  FROM dual;
-- Europe/Amsterdam

and the time zone of the client session:

SELECT sessiontimezone
  FROM dual;
-- Europe/Moscow

(See here how to update these time zones.)

I was warned that Oracle might change the query results based on either of these time zones, meaning when you insert a data type related to dates or time, you might get back an altered representation based on the time zone of the database or the session of the client.

This did not put my mind at ease at all. So, let’s take a look at what some of the common types relating to date and time are in Oracle, and when we should be careful with the query results.

DATEs

A DATE is a calendar date with a wallclock time without a time zone. If you put 2020-02-20 09:11:43 in the database, you will always retrieve the same exact datetime, no matter whether you’re in Amsterdam or New York. (It is the equivalent of a java.time.LocalDateTime, if that’s your thing.)

SELECT sessiontimezone
  FROM dual;
-- Europe/Amsterdam

SELECT def_time
  FROM users
 WHERE username = 'foo@example.com'
-- 2020-02-20 09:04:50

ALTER SESSION SET time_zone = 'UTC';

SELECT def_time
  FROM users
 WHERE username = 'foo@example.com'
-- 2020-02-20 09:04:50

Plain TIMESTAMPs

A TIMESTAMP (without a time zone) is the same beast as a DATE, except that you can specify times up to a fraction of a second (a DATE needs to be rounded to the nearest second).

TIMESTAMP WITH LOCAL TIME ZONE

Say, you store the value 2020-02-20 09:11:43 +01:00. A client with a sessiontimezone of Europe/Amsterdam in the winter will retrieve this exact value whether or not they retrieve it from an Amsterdam, Sydney or New York database. Another client with a session time zone that is set to America/New_York, however, will get the value 2020-02-20 03:11:43 -05:00.

So: LOCAL means it adjusts the retrieved value to your session time zone.

ALTER SESSION SET time_zone = 'Europe/Moscow';  -- offset +03:00

SELECT to_char("ttz", 'yyyy-MM-dd HH24:MI:SS')
FROM (-- timestamp with local, New York time zone
      SELECT CAST(
              to_timestamp_tz('2020-02-20 09:11:43 -05:00',
                              'yyyy-MM-dd HH24:MI:SS TZH:TZM')
              AS TIMESTAMP WITH LOCAL TIME ZONE) AS "ttz"
      FROM dual);
-- 2020-02-20 17:11:43

See also this tutorial.

TIMESTAMP WITH TIME ZONE

This data type stores the time zone together with the datetime and displays it independently from the session time zone of the client.

If we store a value 2020-02-20 09:11:43 +01:00 in Amsterdam and copy this to a database in New York, this is the value that is returned to clients in both Amsterdam and New York regardless of their session time zone.

Conclusion

DATEs and plain TIMESTAMPs don’t store a time zone and queries will always return the exact same value that was inserted regardless of dbtimezone or sessiontimezone. This makes them safe choices if you somehow don’t care about time zones.

TIMESTAMP WITH LOCAL TIME ZONE and TIMESTAMP WITH TIME ZONE do store time zones. A TIMESTAMP WITH LOCAL TIME ZONE will be represented differently depending on your session time zone. A TIMESTAMP WITH TIME ZONE will not be represented differently; it will just include an offset.

You might now always want to use TIMESTAMP WITH TIME ZONE, but apparently, they have problems of their own.

Falling back to DATEs, however, is problematic because of Daylight Saving Time (DST) in the fall. When you store the current datetime in the fall, and you go through the dance of setting all clocks back from 3:00 to 2:00, you have no way of knowing at what exact point in time the row was inserted. This is bad if you rely on this for anything crucial, such as forensics (“so the hacker copied the entire database at 2:30, but was that before or after we reset our clocks?”).

Using Oracle DATEs in Java

As for me, having to deal with DATEs without an explicit time zone, this means that the application inserting/retrieving these datetimes will need to decide how to interpret them. In my Java application, I will map them to java.time.LocalDateTime, add a time zone and convert them to java.time.Instants:

Instant defTimeInstant = user.getDefTime()
        .atZone(ZoneId.of("Europe/Moscow"))
        .toInstant();

user.getDefTime() returns a LocalDateTime indicating when the user was created. We can give a LocalDateTime a time zone by invoking atZone(ZoneId) on it, returing a ZonedDateTime. Once we have converted the datetime to an actual point on the timeline (no longer being some kind of abstract datetime without a timezone), we can convert it to an Instant by invoking toInstant() on the ZonedDateTime.

Sources

Oracle documentation on datetime data types.

Oracle SQL*Plus Cheatsheet

Table of Contents

Intro

Last week, I suddenly had to work with an Oracle database again. I normally use Intellij’s DataGrip to connect to databases. I tried it this time, and I found I could not connect to the schema I wanted: the schema just turned up empty. Of course, everybody will recommend you use Oracle’s SQL Developer with any Oracle database you have to touch. So, after trying brew search sqldeveloper (yes, I’m on a Mac at work), coming up empty, reading this cask request and feeling the anticipation of endless frustration grow inside me, I went to Oracle’s web site to see if I could download the program. I can, except that they want me to turn in a DNA sample first:

Oracle's create-new-account page
Oracle’s create-new-account page

Of course, faking those kind of details is not impossible, but the hassle of going through something like that just for a lousy program so I can just run a couple of lousy queries puts me off.

Luckily, I had a Docker container with Oracle XE lying around. It includes SQL*Plus, the “venerable” command-line tool provided by Oracle to query and administer Oracle databases. Being a Arch Linux user with a predilection for anything with a command-line interface, it was not too bad, but getting the output formatted so that I could actually make sense of it required more effort than most users are willing to put up with.

So how do you find your way around when you have SQL*Plus and no clue how it works? How do you find the schema’s you want, the tables you need to query? How can you check your privileges?

This post aims to be a cheatsheet so you can find your way around. Keep in mind that the results you’re getting depend on the privileges your user has. Getting an empty result does not mean that the database object does not exist. Rather, it means you’re a lesser god.

Getting Help With sqlplus

sqlplus does not have a man page, but provides help when you pass -h/-help:

$ sqlplus -h

Connecting to an Oracle Database Using SQL*Plus

The basic syntax to connect as user alice with password qwerty to a database FOO which is located on db.domain.tld and listens on port 1521 (default port) is:

$ sqlplus alice/qwerty@db.domain.tld:1521/FOO

Show the Connected User

SHOW USER;

The SHOW command lets you look at the current state of your SQLPlus environment: SHOW ALL shows all settings.

Show All Schema’s

SELECT username
  FROM dba_users u;

Return only non-empty schema’s (excluding most users who never created any object in the database):

SELECT username
  FROM dba_users u
 WHERE EXISTS (-- filter users without database objects
               SELECT 1
               FROM dba_objects o
               WHERE o.owner = u.username);

Excluding Oracle’s built-in schema’s:

SELECT username
  FROM dba_users
 WHERE default_tablespace NOT IN ('SYSTEM', 'SYSAUX');

Source

Show All Tables/Views in a Particular Schema

SELECT table_name
  FROM all_tables
 WHERE owner = 'MYSCHEMA';

Related: find all views:

SELECT view_name, text
  FROM all_views
 WHERE owner = 'MYSCHEMA';

Describe a Table

DESCRIBE table;

Show DDL of a Table

SELECT dbms_metadata.get_ddl('VIEW', 'USERS', 'MYSCHEMA')
  FROM dual;

where the first argument is the type of object (e.g. 'TABLE', 'VIEW', 'TRIGGER'), the second is the name of the object, and the third the schema where the object is defined.

Show the Privileges of Your User

SELECT * FROM USER_SYS_PRIVS;
SELECT * FROM USER_TAB_PRIVS;
SELECT * FROM USER_ROLE_PRIVS;

Source

Get More Decent Output From a Command

If you want SQL*Plus to truncate the value of a column:

SET WRAP OFF;

otherwise it will allow the column to wrap to the next line (default ON).

Suppress all headings, page breaks, titles, the initial blank line and other formatting information:

SET PAGESIZE 0;

Source

Now, if only Oracle would support \G for vertical output…

Get and Alter Database Timezones

SELECT dbtimezone FROM dual;
ALTER DATABASE SET dbtimezone = 'Europe/Amsterdam';
SELECT sessiontimezone FROM dual;
ALTER SESSION SET sessiontimezone = 'Europe/Amsterdam';

Select DATE Rows in the Last Hour

Table t has a column c_date of type DATE:

SELECT *
  FROM table t
 WHERE c_date > sysdate - 1/24;

This works because you can subtract a fraction from a DATE type where the fraction is interpreted as a fraction of a day (1 is an entire day, 0.5 is 12 hours, etc.).

Working Around Oracle Not Having a LIMIT

Yes, Oracle does not have a LIMIT keyword, so this idiom will quickly become ingrained in your muscle memory:

SELECT *
FROM (-- actual query
      SELECT firstName, lastName
      FROM users)
WHERE ROWNUM <= 10;

For each row returned by a query, the ROWNUM pseudocolumn returns a number indicating the order in which Oracle selects the row from a table or set of joined rows. The first row selected has a ROWNUM of 1, the second has 2, and so on.

Oracle documentation

You could do without the subquery, plainly writing:

SELECT firstName, lastName
FROM users
WHERE ROWNUM <= 10;

but if you add a ORDER BY clause, Oracle will first select the first ten rows, and only then apply the ORDER BY clause, which might not be what you want. So that’s why it’s best to always use the first idiom above.

MariaDB/MySQL Password Exposure in Bash

Table of Contents

How to Provide a Password to mysql?

Recently, I was writing a small Bash program with a GUI (using zenity) that would prompt the user for their MariaDB credentials and then retrieve some information. Passing username and password to the mysql program looks like:

$ mysql -u user -h host -ppassword mydatabase -e 'select 1;'

(Note that there cannot be a space between -p and the beginning of the password.)

Now, this obviously works fine, except for two problems:

  1. We cannot reuse the connection to the database, so we have to make a new connection every time we want to execute a query.
  2. The password of user user is given in the command itself. This means that everyone on the same machine can list the processes and see the command-line arguments passed to mysql.

In this post, I want to talk about the second problem.

$ pgrep -af mysql
8046 mysql -u user -h 192.168.123.5 -px xxxxxxxxxxxxxxxxxxxxxxx mydatabase
$ ls -l /proc/8046/cmdline
-r--r--r-- 1 neftas neftas 0 16 feb 10:33 /proc/8046/cmdline
$ cat -et /proc/8046/cmdline
mysql^@-u^@stefan^@-h^@192.168.123.5^@-px^@xxxxxxxxxxxxxxxxxxxxxxx^@mydatabase^@

As we can see, mysql replaces the provided password with xes, but at the same time warns in man mysql that

Specifying a password on the command line should be considered insecure. You can use an option file to avoid giving the password on the command line.

The option file man mysql is talking about is either one of the configuration files located at $HOME/.my.cnf or /etc/my.cnf, or a custom one specified on the command line. So, one option would be to put the credentials in $HOME/.my.cnf before executing the program, stopping this issue cold in its tracks.

But let’s say that we want to dynamically connect to a database. Asking the user to provide credentials and then write them to a file would provide the same issues as passing it directly to the mysql command. The only difference would be that we now pass the sensitive information to different command:

printf '[client]\nhost=%s\nusername=%s\npassword=%s\n' \
    host username password > creds.tmp
mysql --defaults-extra-file=creds.tmp mydatabase

The Environment Solution

Although not mentioned in my version of man mysql (10.4 from 29 March 2019), the safest way, it turns out, is to set an environment variable named MYSQL_PWD. Why is this safer?

$ ls -l /proc/8046/environ
-r-------- 1 neftas neftas 0 16 feb 11:00 /proc/8046/environ

At least on Linux, environment variables are only readable by the user that started the process. This means other users (beware of root) cannot look at them.

To summarize:

declare -x MYSQL_PWD
IFS='|' read -r username MYSQL_PWD < <(zenity --password --username)
mysql -u "$username" -h host mydatabase

declare -x declares a variable that is going to be exported on assignment. This eliminates the need to export MYSQL_PWD explicitly somewhere down the road: when MYSQL_PWD gets a value assigned, it will export that value.

In the second line, I use process substitution (see Process Substitution in man bash) to provide a pipe-separated string to read that will split the string and assign the values to username and MYSQL_PWD.

Now that MYSQL_PWD is set, we no longer have to worry about providing the password directly to the mysql command.

Versions Used In This Post

This post was written using the following versions:

$ uname -a
Linux cirrus7 5.5.3-arch1-1 #1 SMP PREEMPT Tue, 11 Feb 2020 15:35:41 +0000 x86_64 GNU/Linux
$ mysql --version
mysql  Ver 15.1 Distrib 10.4.12-MariaDB, for Linux (x86_64) using readline 5.1
$ zenity --version
3.32.0

What’s the Difference Between a Login and a Nonlogin Shell?

Table of Contents

This is addressed nicely in the book Unix Power Tools (Oreilly):

When you first log in to a Unix system from a terminal, the system normally starts a login shell. A login shell is typcally the top-level shell in the “tree” of processes that starts with the init process. Many characteristics of processes are passed from parent to child process down this “tree” — especially environment variables, such as the search path. The changes you make in a login shell will affect all the other processes that the top-level shell starts — including any subshells.

So, a login shell is where you do general setup that’s done only the first time you log in — initialize your terminal, set environment variables, and so on. […]

So you could think about a login shell as a shell that is started at startup by the init process (or systemd nowadays). Or as a shell that logs you into the system by your providing a username and a password. A nonlogin shell, by contrast, is a shell that is invoked without logging anybody in.

Is My Current Shell a Login Shell?

There are two ways to check if your current shell is a login shell: First, you can check the output of echo $0: if it starts with a dash (like -bash), it’s a login shell. Be aware, however, that you can start a login shell with bash --login, and echo $0 will output just bash without the leading dash, so this is not a surefire way of find out if you are running a login shell.

Secondly, the Unix StackOverflow offers this way of finding out:

$ shopt -q login_shell && echo login || echo nonlogin

(-q supresses the output of the shopt command.)

The Difference Between Login and Nonlogin That Actually Matters

Practically speaking, the difference between a login shell and a nonlogin shell is in the configuration files that Bash reads when it starts up. In particular, according to man bash:

[…] it first reads and executes commands from the file /etc/profile, if that file exists. After reading that file, it looks for ~/.bash_profile, ~/.bash_login, and ~/.profile, in that order, and reads and executes commands from the first one that exists and is readable.

You can observe this behavior by putting echo commands in /etc/profile, ~/.bash_profile, ~/.bash_login and ~/.profile. Upon invoking bash --login you should see:

echo from /etc/profile
echo from ~/.bash_profile
$

If the shell is a nonlogin shell, Bash reads and executes commands from ~/.bashrc. Since we are starting a nonlogin shell from within a login shell, it will inherit the environment. Sometimes, this will lead to confusion when we inadvertantly get a login shell, and find out that our configuration from ~/.bashrc is not loaded. This is why many people put something like the following in their .bash_profile:

[[ -r ~/.bashrc ]] && source ~/.bashrc

This test whether .bashrc is readable and then sources it.

Why You Sometimes Want a Login Shell

When you switch users using su you will take the environment of the calling user with you. To prevent this, you should use su - which is short for su --login. This acts like a clean login for a new user, so the environment will not be cluttered with values from the calling user. Just as before, a login shell will read /etc/profile and the .bash_profile of the user you are switching to, but not its .bashrc. This post on StackOverflow shows why you might want to prefer to start with a clean environment (spoiler: your $PATH might be “poisened”).

Conclusion

In this article we saw that the main difference between a login and a nonlogin shell are the configuration files that are read upon startup. We then looked at what the benefits are of a login shell over a nonlogin shell.

Encrypt Device With Veracrypt From the Command Line

Table of Contents

You have a drive that you want to encrypt and use in Linux and other OSes. Then Veracrypt, the successor of Truecrypt, is a good choice. The prerequisite for this tutorial is that you already have created a partition on a drive. See my previous blog post on how to accomplish that. Creating a volume on a partition with data on it will permanently destroy that data, so make sure you are encrypting the correct partition (fdisk -l is your friend).

Encrypt a volume interactively from the command line using Veracrypt…

(The # sign at the beginning of the code examples indicates that the command should be executed as root. You can either use su - or sudo to accomplish this.)

# veracrypt -t --quick -c /dev/sdXX

-t is short for --text (meaning you don’t want the GUI) and should always be used first after the command name. The --quick option is explained in the docs:

If unchecked, each sector of the new volume will be formatted. This means that the new volume will be entirely filled with random data. Quick format is much faster but may be less secure because until the whole volume has been filled with files, it may be possible to tell how much data it contains (if the space was not filled with random data beforehand). If you are not sure whether to enable or disable Quick Format, we recommend that you leave this option unchecked. Note that Quick Format can only be enabled when encrypting partitions/devices.

So, using --quick is less secure, but not specifying it could take (a lot) longer, especially on traditional hard drives (we’re talking hours for 500GB).

Finally, the -c or --create command allows us to specify on which partition we want to create a veracrypt volume. Make sure you change the /dev/sdXX from the example above to the appropriate output of fdisk -l (for example, /dev/sdc1).

This command will interactively guide us to create a new volume:

Volume type:
 1) Normal
 2) Hidden
Select [1]: 1

Encryption Algorithm:
 1) AES
 2) Serpent
 3) Twofish
 4) Camellia
 5) Kuznyechik
 6) AES(Twofish)
 7) AES(Twofish(Serpent))
 8) Camellia(Kuznyechik)
 9) Camellia(Serpent)
 10) Kuznyechik(AES)
 11) Kuznyechik(Serpent(Camellia))
 12) Kuznyechik(Twofish)
 13) Serpent(AES)
 14) Serpent(Twofish(AES))
 15) Twofish(Serpent)
Select [1]: 1

Hash algorithm:
 1) SHA-512
 2) Whirlpool
 3) SHA-256
 4) Streebog
Select [1]: 1

Filesystem:
 1) None
 2) FAT
 3) Linux Ext2
 4) Linux Ext3
 5) Linux Ext4
 6) NTFS
 7) exFAT
Select [2]: 6

Enter password:
WARNING: Short passwords are easy to crack using brute force techniques!

We recommend choosing a password consisting of 20 or more characters. Are you sure you want to use a short password? (y=Yes/n=No) [No]: y

Re-enter password:

Enter PIM:

Enter keyfile path [none]:

Please type at least 320 randomly chosen characters and then press Enter:
Characters remaining: 4

Done: 100.000%  Speed: 61.8 GB/s  Left: 0 s

The VeraCrypt volume has been successfully created.

The volume is now created in the partition and is ready to be mounted.

… Or do it all in a one-liner

# veracrypt --text --quick                      \
        --non-interactive                       \
        --create /dev/sdXX                      \
        --volume-type=normal                    \
        --encryption=AES                        \
        --hash=SHA-512                          \
        --filesystem=NTFS                       \
        --password='Un$@f3'

Use --stdin to read the password from the standard in, instead of supplying it directly to the command, which is considered unsecure.

Mounting the volume

# mkdir /tmp/vera
# veracrypt -t /dev/sdXX /tmp/vera

Unmouting the volume

# veracrypt -d /tmp/vera

More info

$ veracrypt -t -h

-h is short for --help and should be self-explanatory.

Make less Options Permanent (or: the Missing .lessrc)

Table of Contents

The missing $HOME/.lessrc

I often wondered how I could make certain options for less permanent, like -I, for example, which will make search case insensitive. In GNU/Linux, preferences are often stored in rc files. For vim we have .vimrc, for Bash .bashrc, etc:

$ find "$HOME" -maxdepth 1 -name '*rc'
./.vimrc
./.idlerc
./.xinitrc
./.lynxrc
./.old_netrc
./.inputrc
./.bashrc
./.rtorrent.rc
./.sqliterc
./.xdvirc

Environment variable LESS

So, it would make sense to expect a .lessrc. But there is none. Instead, we define a environment variable LESS. My .bashrc:

export LESS="IFRSX"

Breakdown:

  • -I: ignore case when searching
  • -F: quit immediately when the entire file fits in one screen (in effect, mimic cat’s behavior)
  • -R: enable colored output (for example, when piping to less from diff --color=always)
  • -S: truncate long lines instead of wrapping them to the next line
  • -X: don’t clear screen on exit

See man 1 less for all options.

Make a Backup With rsync

Table of Contents

We want to make a backup of data, for example to an external hard drive.

The basic command

Assuming we are in someone’s home directory and we want to copy three source directories Music, Documents and Movies to a destination directory /mnt/external-hdd:

$ rsync -a Music Documents Movies /mnt/external-hdd

A word on slashes

Notice that we omit the trailing forward slash / on the source directories. This means the destination will look like:

/mnt/external-hdd
|-- Music
|   |-- a.mp3
|-- Documents
|   |-- b.txt
|-- Movies
|   |-- c.mp4

If we were to add trailing forward slashes, the upper-level source directories would not be copied and the result would look like:

/mnt/external-hdd
|-- a.mp3
|-- b.txt
|-- c.mp4

Read more about slashes in rsync.

The rsync -a command broken down

rsync -a is equal to rsync --archive and is a convenience command. According to the man page, it equals rsync -rlptgoD.

  • -r or --recurse: recursively copy data
  • -l or --links: copy symlinks as symlinks
  • -p or --perms: preserve permissions
  • -t or --times: preserve modifications times
  • -g or --group: preserve the group
  • -o or --owner: preserve owner
  • -D is the same as --devices --specials:
    • --devices: preserve device files
    • --specials: preserve special files

[…] a device file or special file is an interface to a device driver that appears in a file system as if it were an ordinary file

Wikipedia

Device files in Linux are usually found under the /dev directory.

See the overall progress of rsync

By default, rsync will show the progress of the individual files that are being copied. If you want the overall progress, you have to add some flags:

$ rsync -a --info=progress2 --no-i-r src dst

--info=progress2 shows the total transfer progress. (To see all available options for --info, execute rsync --info=help). --no-i-r is short for --no-inc-recursive and disables incremental recursion, forcing rsync to do a complete scan of of all directories before starting the file transfer. This is needed to get an accurate progress report, otherwise rsync doesn’t know how much work is left.

Human-readable output can be obtained by passing the -h or --human-readable option.

For a discussion of these options, see also this StackOverflow post.

Partition and Format Drive With NTFS

Table of Contents

Say we bought an external hard drive to back up some stuff from a crashed computer. We can use a Live USB to get at the data and put the data on the external hard drive. Because the data needs to be accessible by Windows, we are going to use format the drive with NTFS.

Create partition

Connect the external hard disk to your computer. Use sudo fdisk -l to find the device name. Output should look something like this:

Disk /dev/sdb: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: CT2000MX500SSD1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x117d68c1

Device     Boot Start        End    Sectors  Size Id Type
/dev/sdb1        2048 3907029167 3907027120  1.8T 83 Linux

As can been seen above, the name of the device is /dev/sdb. We use to name to run fdisk:

$ sudo fdisk /dev/sdb

Notice how we use the name of the device, and not the name of the partition (so /dev/sdb without any numbers attached at the end).

After entering the command above, an interactive menu will be facing you. Type a letter and press Enter to confirm. Changes will only be applied when you type w, so if you make a mistake, just stay calm and press q and you will exit fdisk with your pending changes discarded.

  • Delete all your existing partitions by pressing d. Depending on the amount of partitions, you might have to repeat this several times. If you want to check the current partition table, press p.
  • After all old partitions are deleted, add a new partition by pressing n. If you just want to create a single partition on your drive, accept all the defaults by pressing Enter on each prompt. This will leave you with a single partition that will take up all space on the drive.
  • Back in the main menu, type t to change the partition type. Press L to see all partitions types. Here we are going to choose 7 (HPFS/NTFS/exFAT). “The partition type […] is a byte value intended to specify the file system the partition contains and/or to flag special access methods used to access these partitions” (source). Linux does not care about the partition type, but Windows does, so we have to change it.
  • Press w to write your changes to the disk and exit fdisk.

Format partition with NTFS

Now we create the actual NTFS file system on the drive:

$ sudo mkfs.ntfs -Q -L label /dev/sdX1

(If you don’t have mkfs.ntfs installed, use your distro’s package manager to install it (on Arch Linux it’s in a package called ntfs-3g)).

Breakdown:

  • -Q is the same as --quick, -f or --fast. This will perfom a quick format, meaning that it will skip both zeroing of the volume and and bad sector checking. So obviously, leave this option out if you want the volume to be zeroed or you want error checking. Depending on the size of your partition, this might take quite a while.
  • -L is the same as --label: it’s the identifier you’ll see in Windows Explorer when your drive is connected.
  • dev/sdX1: change the X to the actual letter of your drive we found earlier in this tutorial. You always format a partition, not a drive, so make sure that you put the correct number of the partition you want formatted at the end.

Create Arch Linux Live USB

Table of Contents

Download image and PGP signature

From https://www.archlinux.org/download/. The preferred option is to download the image using bittorrent, so as to unburden the Arch servers.

Verify downloaded image

$ gpg --keyserver pgp.mit.edu                       \
        --keyserver-options auto-key-retrieve       \
        --verify archlinux-version-x86_64.iso.sig

If this disk is being created on an Arch Linux system, you could also invoke:

$ pacman-key -v archlinux-version-x86_64.iso.sig

The -v switch is short for --verify.

Insert USB drive and check the device name

$ sudo fdisk -l

The -l is short for --list, and will display the device names and partition tables.

Output will look like:

Disk /dev/sdb: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: CT2000MX500SSD1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x117d68c1

Device     Boot Start        End    Sectors  Size Id Type
/dev/sdb1        2048 3907029167 3907027120  1.8T 83 Linux

Check the output and find the device name of the USB (for instance /dev/sdc). Make sure this device is not mounted, otherwise the next command will fail. Also make sure you note the device name, and not a partition (indicated by a numeral at the end: /dev/sdc1, for example).

Copy Arch Linux image to USB drive

$ sudo dd if=archlinux-2019-01-01-x86_64.iso    \
        of=/dev/sdX                             \
        bs=64K                                  \
        oflag=sync                              \
        status=progess

Breakdown:

  • if indicates the input file (the .iso of the live Linux distro).
  • of, likewise, points to the output file, which is a device in this case. Note that /dev/sdX needs to be replaced with the device name we found in the previous step.
  • bs=64K indicates the block size, which means that dd will read and write up to 64K bytes at a time. The default is 512 bytes. It really depends what the optimal block size is, but several sources indicate that 64K is a good bet on somewhat modern to modern hardware.
  • oflag stands for “output flag”. The sync flag will make sure that all data is written to the USB stick when the dd command exits, so it will be safe to remove the USB stick.
  • status=progress indicates the level of information that is printed during file transfer. progress shows periodic transfer statistics.

Notice that the device does not need to be partitioned or empty before this operation. When dd writes to a device rather than a partition, all data on the drive – including partitions – will be erased anyway.