Git has very convenient feature called filters. What it basically allows to do is to pipe the content of a given file (decided by entry in .gitattributes thru external filter command.

The different hook points are on file checkout (so called "smudge"), check-in, and on diff.

Here are two useful cases for using it:

Formatting files on checkin

If your language have a formatter that just accepts stdin and outputs formatted file on stdout, like Go, it is very straightforward to add it:

1git config --global filter.gofmt.clean "gofmt"
2git config --global filter.gofmt.smudge "gofmt" # or "cat" if you don't want to filter incoming commit

then in either ~/.config/git/attributes or in repository itself under .gitattributes

1*.go filter=gofmt

Do remember that if used on repo where others don't use formatter regularly you might get a bunch of spurious formatting modifications during daily work.Especially with filter on checkout.

Decoding formats that do not diff well

Say you have configuration management repo. You want to have some secrets there so you decide to put them in say GPG-encrypted files. But diffs and logs look shit now, what to do?

You need to define new diff "driver". Here is how it looks in config

1[diff "gpg"]
2	textconv = /path/to/binary/converting/to/text
3	command = /path/to/binary/generating/diffs

textconv is used to extract "human readable" info from the file. It can be just decoding, it can be say extracting EXIF info from image, it can be anything that returns text

Without command, that text will also be used to diff the files when using git commands

So first approximation of a solution would be just

1[diff "gpg"]
2	textconv = gpg --batch -d 2>/dev/null

and in .gitattributes

1 *.gpg diff=gpg

2>/dev/null is there to get rid of key info that GPG displays on STDERR

And voila! Now you should be able to get the decoded info. Other uses would be using with say jq . to have pretty printed JSON files.

Astute reader might've noticed that we're losing some info here. It will display changes in file content but it will not display changes in who the GPG is encrypted for.

For that we need to use command option of the diff driver. That command gets called with same option the diff gets, which roughly will look like this:

1 hiera-gpg/common/common.gpg /tmp/R5Bydy_common.gpg <old_file_hash> 100644 /tmp/ZrXWey_common.gpg <new_file_hash> 100644

Those files will be before the textconv filter so you'd need to implement decoding of them first. Let's try that with very dumb JSON differ. This will be our differ:

1#!/bin/bash
2diff -u <(jq -S . $2) <(jq -S . $5)

We will make new diff driver

1[diff "json"]
2	textconv = jq .
3	command = /path/dumbdiffer

and in `.gitattributes

1*.json diff=json

and we should be getting something like this:

1--- /dev/fd/63	2021-06-13 18:00:28.907481716 +0200
2+++ /dev/fd/62	2021-06-13 18:00:28.907481716 +0200
3@@ -1,6 +1,5 @@
4 {
5   "annotations": {
6-    "enable": false,
7     "list": [
8       {
9         "builtIn": 1,

(/dev/fd is just quirk of our very simple approach)