Use pandoc with Pygments to highlight source code

I am someone who has JavaScript disabled by default in his browser (I use uMatrix in Firefox for that). Only when I trust a site and I need to use functionality that truly depends on JavaScript will I turn it on. This hopefully protects me from most of the known and unknown bad stuff out there on the internet. It also makes me appreciate people who go through the trouble of making their webpages work without JavaScript.

Until recently, I used a JavaScript plugin on this blog to format source code. This bothered me, since using JavaScript just to display some source code seems like overkill and makes people have to turn on JavaScript in their browsers just to see the source code formatted nicely. I wanted to do better than that.

The way I normally write my blog posts is, I start with a Markdown article and then use pandoc to convert it to HTML which I then copy and paste into WordPress (if there is a better way to do this, please contact me). I noticed pandoc provides a switch --filter where you can specify a executable that can transform the pandoc output. The only problem is, you have to write a filter. Luckily, I found a GitHub gist that has already figured out how to write one. Here is some Haskell for you:

import Text.Pandoc.Definition
import Text.Pandoc.JSON (toJSONFilter)
import Text.Pandoc.Shared
import Data.Char(toLower)
import System.Process (readProcess)
import System.IO.Unsafe

main = toJSONFilter highlight

highlight :: Block -> Block
highlight (CodeBlock (_, options , _ ) code) = RawBlock (Format "html") (pygments code options)
highlight x = x

pygments:: String -> [String] -> String
pygments code options
         | (length options) == 1 = unsafePerformIO $ readProcess "pygmentize" ["-l", (map toLower (head options)),  "-f", "html"] code
         | (length options) == 2 = unsafePerformIO $ readProcess "pygmentize" ["-l", (map toLower (head options)), "-O linenos=inline",  "-f", "html"] code
         | otherwise = "<div class =\"highlight\"><pre>" ++ code ++ "</pre></div>"

Note that this program invokes another program, pygmentize to actually highlight the source code (pygmentize is part of the Pygments project). So, install pygmentize with your favorite package manager, install Haskell if you have not done so already, and then compile pygments.hs with:

$ ghc -dynamic pygments.hs

That’s it! Putting it all together, to create a blog post, I can now do:

$ pandoc -F pygments -f markdown -t html5 -o blogpost.html blogpost.md

I added some CSS that makes use of the Pygments classes and voilà: you can now view this blog without having to worry about a JavaScript cryptocurrency miner hijacking your CPU. You’re welcome.

Remove all files except a few in Bash

$ ls -1
153390909910_first
15339090991_second
15339090992_third
15339090993_fourth
15339090994_fifth
15339090995_sixth
15339090996_seventh
15339090997_eighth
15339090998_nineth
15339090999_tenth
15339091628_do_not_delete
root
root.sql

We want to delete all files that start with a timestamp (seconds since the epoch), except the newest file (15339091628_do_not_delete) and the files root and root.sql. The easiest way to do this, is enabling the shell option extglob (“extended globbing”), which allows us to use patterns to include or exclude files of operations:

$ shopt -s extglob
$ rm !(*do_not_delete|root*)

The last command will tell Bash to remove all files, except the ones that match either one of the patterns (everything ending with do_not_delete and everything starting with root). We delimite the patterns by using a pipe character |.

Other patterns that are supported by extglob include:

?(pattern-list)
      Matches zero or one occurrence of the given patterns

\*(pattern-list)
      Matches zero or more occurrences of the given patterns

+(pattern-list)
      Matches one or more occurrences of the given patterns

@(pattern-list)
      Matches one of the given patterns

!(pattern-list)
      Matches anything except one of the given patterns

To disable the extended globbing again:

$ shopt -u extglob

References

To read about all the options that extglob gives you, refer to man bash (search for Pathname Expansion). Searching for shopt in the same manual page will turn up all shell options. To see which shell options are currently enables for your shell, type shopt -p at the prompt.

Bash’ magic space

What does the “magic space” do?

Given the following:

$ find -wholename '*/path/to/file' -print -quit
$ man rm
$ rm -fv !-2:2

In the last line, feedback would be appreciated to see if we are indeed going to delete the second argument of two commands back. If you set Bash’ so-called “magic space”, history expansion will take place right away after typing a space after !-2:2:

$ rm -fv '*/path/to/file'

How to enable the magic space?

Put the following in your ~/.inputrc:

$if Bash
    Space: magic-space
$endif

Start a new session, or use bind -f ~/.inputrc to put the changes in effect immediately.

Other ways to achieve the same

You could also enable shopt -s histverify, which will perform the history expansion and give you another opportunity to modify the command before executing it. This requires you to press enter, though.

Import contacts (vCards) into Nextcloud

TL;DR

Export your contacts from Google in vCard version 3 format, split the contacts file and use cadaver to upload all files individually to your address book.

The struggle

Last week, I did a fresh install of Lingeage OS 14.1 on my OnePlus X and decided not to install any GApps. I have been slowly moving away from using Google services and, having found replacements in the form of open-source apps or web interfaces, I felt confident I would be able to use my phone without a Google Play Store or Play Services. (F-Droid is now my sole source of apps.)

To tackle the problem of storing contacts and a calendar that could be synced, I installed a Nextcloud instance on a Raspberry Pi 3. Having installed DAVdroid, I got my phone to sync contacts with Nextcloud, but not all of them: it would stop synchronizing after some 120 contacts, while I had more than 400.

I decided to try a different approach, so I exported the contacts on my phone in vCard format and tried to upload them to Nextcloud using the aptly named application "contacts" for this. However, this also failed unexpectedly. I’m using Nextcloud version 12.0.3 and version 2.0.1 of the contact app, but it refuses to accept vCard version 2.1 (HTTP response code 415: Unsupported media type). This, naturally, is the version Android 6 uses to export contacts.

After some searching, I found out that if you go to contacts.google.com, you can download your contacts in vCards version 3. Problem fixed? Well, not so fast: importing 400+ contacts into Nextcloud using the web interface on a Raspberry Pi 3 with an SD card for storage will take a long time. In fact, it never finished over the course of a couple of hours (!), so I needed yet another approach.

Fortunately, you can approach your Nextcloud instance through the WebDAV protocol using tools such as cadaver:

$ cadaver https://192.168.1.14/nextcloud/remote.php/dav

Storing your credentials in a .netrc file in your home directory will enable cadaver to verify your identity without prompting, making it suitable for scripting:

machine 192.168.1.14
login foo
password correcthorsebatterystaple

cadaver allows you to traverse the directories of the remote file system over WebDAV. To put a single local contacts file (from your working machine) to the remote Raspberry Pi, you could tell it to:

dav:/nextcloud/remote.php/dav/> cd addressbooks/users/{username}/{addressbookname}
dav:/nextcloud/remote.php/dav/addressbooks/users/foo/Contacts/> put /home/foo/all.vcf all.vcf

I had a single vcf file with 400+ contacts in them, but after uploading it this way, only a single contact was being displayed. Apparently, the Nextcloud’s contacts app assumes a single vcf file contains only a single contact. New challenge: we need to split this single vcf file containing multiple contacts into separate files that we can then upload to Nextcloud.

To split the contacts, we can use awk:

BEGIN {
    RS="END:VCARD\r?\n"
    FS="\n"
}
{
    command = "echo -n $(pwgen 20 1).vcf"
    command | getline filename
    close(command)
    print $0 "END:VCARD" > filename
}

This separates the contacts on the record separator END:VCARD and generates a random filename to store the individual contact in. (I also wrote a Java program to do the same thing, which is faster when splitting large files).

Obviously, it would be convenient now if we could upload all these files in one go. cadaver does provides the mput action to do so, but I did not get it to work with wildcards. So instead, I created a file with put commands:

for file in *.vcf; do
    echo "put $(pwd)/$file addressbooks/users/foo/Contacts/$file" >> commands
done

And then provided this as input to cadaver:

$ cadaver http://192.168.1.14/nextcloud/remote.php/dav <<< $(cat commands)

This may take a while (it took around an hour for 400+ contacts), but at least you get to see the progress as each request is made and processed. And voilà, all the contacts are displayed correctly in Nextcloud.

Always-on VPN and captive portals

I was attending a meetup that was taking place in a bar in Utrecht. The first thing you want to do is to make a connection to the internet and get started. The location used a captive portal, however. You know: you have the name of the wireless network (SSID) and the password, but when you try to open any web page, you are directed towards a login page where you have to accept the terms and conditions of whoever is operating the network.

But what if you use an always-on VPN? You cannot connect to the network, because your MAC and IP address are not whitelisted yet by the operator. And you cannot get to the login page, because you do not allow any traffic outside your VPN.

The captive portal page.
The captive portal page.

UFW

I use ufw (uncomplicated firewall) as my firewall of choice, mainly because the alternative, iptables, always looked too complicated and ufw served its purpose. The rules I have for ufw are currently:

# ufw status verbose
Status: active
Logging: on (low)
Default: deny (incoming), deny (outgoing), disabled (routed)
New profiles: skip

To                         Action      From
--                         ------      ----
Anywhere on wlp1s0         ALLOW IN    192.168.178.0/24

Anywhere                   ALLOW OUT   Anywhere on tun0-unrooted
192.168.178.0/24           ALLOW OUT   Anywhere on wlp1s0
1194                       ALLOW OUT   Anywhere on wlp1s0
Anywhere (v6)              ALLOW OUT   Anywhere (v6) on tun0-unrooted
1194 (v6)                  ALLOW OUT   Anywhere (v6) on wlp1s0

By default, I deny all incoming and outgoing connections. I only allow incoming connections from hosts on the same network. As for outgoing connections, I only allow them to other hosts over the LAN (to any port), and everywhere else only over port 1194 (the OpenVPN port). (wlp1s0 is the name of my wireless interface, tun0-unrooted of the VPN tunnel. These rules were inspired by this Arch Linux article).

So we need to allow some traffic outside of the VPN tunnel to accept the terms and conditions and to register our machine at the captive portal. The best thing to do would be to allow a single, trusted application to access this portal, one that would be used exclusively for this task. If you would allow you regular browser to bypass the VPN, it would send all kind of traffic over the untrusted network for the rest of the world to freely sniff around in (think add-ons, other browser tabs, automatic updates). So we would need a dedicated web browser for this task. I’m on Linux using Firefox as my default browser, so GNOME Web would be a good choice for this purpose. (Gnome Web was previously known as Epiphany, and is still available under that name on a lot of distributions) .

First, we need to determine what kind of traffic we want to allow. The application will need to have outbound access to ports 80 (HTTP) and 443 (HTTPS) for web traffic, and it will also need to be able to resolve domain names using DNS, so port 53 should also be opened.

UFW Profiles

However, it’s not that easy to allow one particular application access to the internet if you use UFW. When you look at the man page for UFW, you see you can specify “apps”. Apps (or application profiles) are basically just text files in INI-format that live in the /etc/ufw/applications.d/ folder.

To list all (predefined) profiles:

# ufw app list

To create a profile for our purposes, we put the following in a file called ufw-webbrowser:

[Epiphany]
title=Epiphany
description=Epiphany web browser
ports=80/tcp|443/tcp|53

The “ports” field is clarified in the man page:

The ‘ports’ field may specify a ‘|’-separated list of ports/protocols where the protocol is optional. A comma-separated list or a range (specified with ‘start:end’) may also be used to specify multiple ports, in which case the protocol is required.

In our case we allow TCP traffic over ports 80 (HTTP) and 443 (HTTPS) and both UDP and TCP traffic over 53 (DNS). We can now use this profile:

# ufw insert 1 allow out to any app Epiphany
Rule added
Rule added (v6)
# ufw status verbose
[...snip...]
To                         Action      From
--                         ------      ----
Anywhere on wlp1s0         ALLOW IN    192.168.178.0/24

80/tcp (Epiphany)          ALLOW OUT   Anywhere
443/tcp (Epiphany)         ALLOW OUT   Anywhere
53 (Epiphany)              ALLOW OUT   Anywhere
Anywhere                   ALLOW OUT   Anywhere on tun0-unrooted
192.168.178.0/24           ALLOW OUT   Anywhere on wlp1s0
1194                       ALLOW OUT   Anywhere on wlp1s0
80/tcp (Epiphany (v6))     ALLOW OUT   Anywhere (v6)
443/tcp (Epiphany (v6))    ALLOW OUT   Anywhere (v6)
53 (Epiphany (v6))         ALLOW OUT   Anywhere (v6)
Anywhere (v6)              ALLOW OUT   Anywhere (v6) on tun0-unrooted
1194 (v6)                  ALLOW OUT   Anywhere (v6) on wlp1s0
# ufw reload  # don't forget to reload firewall after making changes!
Firewall reloaded

(Note that we use insert 1 to make sure the rule is placed in the first position. With UFW, the first rule matched wins.)

Now we can use our dedicated browser to go to the captive portal page and accept the terms and conditions.

Checking traffic with wireshark

You need to be careful, however, not to use any other applications during this time. If you launch Firefox, for example, it can also use the opened ports to communicate with the outside world. I’d like to use Wireshark to see what communications are taking place during this time.

Use wireshark to see what packets are send in the open.
Use wireshark to see what packets are send in the open.

When you are registered with the WiFi provider and are done with the captive portal, you should first disable the profile again with UFW. We can do this by specifying the rule we added earlier, but prepending delete:

# ufw delete allow out to any app Epiphany

Another way to delete rules from UFW is by first doing ufw status numbered and then ufw delete <number>. However, since we have added 6 rules, this may take a while. Also, if you can’t remember the exact rule that was used, you can use ufw show added to show all added rules and their syntax.

Better solutions: beyond UFW

Now, we see that using UFW isn’t exactly ideal to deal with always-on VPN and captive portals. What if you have an email application (or something else) running in the background when you have allowed all those ports to bypass the VPN tunnel? And also, you have to enable and disable the application profile every time you encounter a captive portal. It would be better if we could allow only a single, named, demarcated application to bypass the VPN.

One solution I’ve read about makes use of what I like to call “the Android way”: every installed application is a user with its own home directory. This means that applications don’t have access to each other files, but more importantly, this gives the opportunity to allow only a specific user to access the internet outside of the VPN. This way, we could create a user epiphany that runs Gnome Web to access the captive portal.

AFWall+, an open-source Android application, uses this method to implement a pretty effective firewall. It also uses iptables as a back-end. I might have to finally bite the bullet and learn iptables after all…

How to use Groovy’s CliBuilder

I write quite some scripts at work, either for myself or for my team. I used to write these scripts in Bash, but since we are using Windows at work, it made sense to switch to a more general scripting language. Because we’re a Java shop, Groovy seemed like a natural choice. Now, it’s often very convenient to have named command-line arguments (e.g. tool --output out.ext --input in.ext) as opposed to positional arguments (e.g. tool out.ext in.ext): named arguments allow for putting the arguments in any order you want, and you can leave out optional arguments and use the defaults. Groovy’s CliBuilder makes things especially easy.

A sample program

weather.groovy

@Grab(group='commons-cli', module='commons-cli', version='1.4')

class Main {
    static main(args) {
        CommandLineInterface cli = CommandLineInterface.INSTANCE
        cli.parse(args)
    }
}

enum CommandLineInterface {
    INSTANCE

    CliBuilder cliBuilder

    CommandLineInterface() {
        cliBuilder = new CliBuilder(
                usage: 'weather [<options>]',
                header: 'Options:',
                footer: 'And here we put footer text.'
        )
        // set the amount of columns the usage message will be wide
        cliBuilder.width = 80  // default is 74
        cliBuilder.with {
            h longOpt: 'help', 'Print this help text and exit.'
            n(longOpt: 'max-count', args: 1, argName: 'number',
                    'Limit the number of days shown')
            '1' longOpt: 'one', "Show today's forecast"
            '2' longOpt: 'two', "Show today's and tomorrow's forecast"
            _(longOpt: 'url', args: 1, argName: 'URL',
                    'Use given URL to query for weather.')
            D(args: 2, valueSeparator: '=', argName: 'property=value',
                    'Use value for given property.')
        }
    }

    void parse(args) {
        OptionAccessor options = cliBuilder.parse(args)

        if (!options) {
            System.err << 'Error while parsing command-line options.\n'
            System.exit 1
        }

        if (options.h) {
            cliBuilder.usage()
            System.exit 0
        }

        if (options.n) {
            // handle user's input
        }
        if (options.'1') {
            // show weather for one day
        }
        if (options.url) {
            URI uri = new URI(options.url)
        }
        if (options.Ds) {
            assert options.Ds.class == ArrayList.class
        }
    }
}

When invoked with groovy weather.groovy --help, the program will output:

usage: weather []
Options:
 -1,--one                  Show today's forecast
 -2,--two                  Show today's and tomorrow's forecast
 -D        Use value for given property.
 -h,--help                 Print this help text and exit.
 -n,--max-count    Limit the number of days shown
    --url             Use given URL to query for weather.
And here can we put footer text.

You can specify short and long names: cliBuilder.n specifies that the program will accept an argument -n. To also specify a long name, we add longOpt: max-count. To use only a long name, we can replace the short name by _: cliBuilder._(longName: 'wow-such-long'). Note that cliBuilder._ can be reused several times (in contrast to other short names).

The number of arguments can be indicated by using the args parameter. If you use more than one parameter, a valueSeparator must be indicated, which is used to split up the argument. I found that the number assigned to args is not very strict: when called with argument -Dmyprop, the following is valid.

assert options.Ds == ['myprop']

Note how we append an s to the option name (options.Ds) to get a list of properties and values. A normal invocation groovy weather.groovy -Dpancakes=yesplease would result in:

assert options.Ds == ['pancakes', 'yesplease']

When required: true is set, the program will shout at you if the argument is not used. For example, if we specify cliBuilder.q(required: true) but fail to provide the -q argument on the command line, the program will exit with error: Missing required option: q.

One thing to remember is that, if you want to use digits as argument names, you have to stringify them first by putting quotation marks around them: cliBuilder.'1'.

More on CliBuilder

Check the Groovy documentation on CliBuilder for more .

Dependencies

Groovy’s CliBuilder depends on cli-commons:

build.gradle

dependencies {
    compile group: 'commons-cli', name: 'commons-cli', version: '1.4'
}