{"id":10940,"date":"2023-11-03T12:27:38","date_gmt":"2023-11-03T11:27:38","guid":{"rendered":"https:\/\/monodes.com\/predaelli\/?p=10940"},"modified":"2023-11-03T12:27:42","modified_gmt":"2023-11-03T11:27:42","slug":"versioning-data-in-postgres-testing-a-git-like-approach-specfy","status":"publish","type":"post","link":"https:\/\/monodes.com\/predaelli\/2023\/11\/03\/versioning-data-in-postgres-testing-a-git-like-approach-specfy\/","title":{"rendered":"Versioning data in Postgres? Testing a git like approach &#8211; Specfy"},"content":{"rendered":"\n<p><em><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres\">Versioning data in Postgres? Testing a git like approach &#8211; Specfy<\/a><\/em> is fashinating but I think that most of the time these two proposed alternatives fit most of the needs:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>In-Table versioning, the WordPress way of doing thing. Add a a column <code class=\"\" data-line=\"\">version<\/code> (or modify date) and SELECT the maximum version. It&#8217;s simple and doesn&#8217;t require maintaining multiple schema or resources. However it has massive drawback in term of performance, and query simplicity. The table will inevitably grow, and SELECT needs to have an order by which can make joining\/grouping\/aggregating harder or slower.<\/li>\n\n\n\n<li>Copy table versioning: the most simple and efficient alternative. Create a quasi equivalent copy of the table your are versioning, migration is almost 1:1. However you still need to add metadata fields, disable or rename primary key to allow the same <code class=\"\" data-line=\"\">id<\/code> to be inserted multiple times. And you obviously need one table per versionned table.<\/li>\n<\/ol>\n\n\n\n<!--more-->\n\n\n\n<!--nextpage-->\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<h1 class=\"wp-block-heading\">Versioning data in Postgres? Testing a git like approach<\/h1>\n\n\n\n<p><a rel=\"noreferrer noopener\" href=\"https:\/\/www.linkedin.com\/in\/bodinsamuel\/\" target=\"_blank\">Samuel Bodin<\/a><\/p>\n\n\n\n<p>Oct 20, 2023<\/p>\n\n\n\n<p>Postgres is probably the software I love the most. Battle tested, resilient and scalable, it has been perfectly crafted in the last 30 years into one the most critical piece of technology out there. However one could complain it&#8217;s not the fastest software to introduce new feature. And when something is missing you might end up being stuck, adding plugins is not easy and often impossible in the cloud, new version migration can be tricky or very slow. Moreover, a much-anticipated feature might take years to see the light of day, like table partitioning. And a <strong>notable absence in PostgreSQL<\/strong> is: data versioning.<\/p>\n\n\n\n<p>Over the years, I had to add versioning in Postgres more than 10 times, and every time in a slightly different way. When I started prototyping Specfy I knew it would my 11th. Keeping <strong>history of different tables is critical<\/strong> to this project, so I needed a simple way to do it at scale with different tables, evolving schema and with as little maintenance as possible.<\/p>\n\n\n\n<p>I listed alternatives and ready to use solution, but all of them had drawbacks (listed at the end) so I continued my search. I got curious about <strong>storing everything in Git.<\/strong> It has out of the box versioning, branching, conflict resolution, scale, etc. After a few days of digging, I realised it was a bit overkill and not the best for structured content like JSON, but it motivated me to try something new, a generic git-like versioning <strong>but in Postgres<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-how-git-internal-works\">\u00a7<\/a>How Git internal works?<\/h2>\n\n\n\n<p>I think it&#8217;s important to understand how git works internally before jumping on the solution. We won&#8217;t replicate all git&#8217;s pattern in Postgres but try to achieve the same way to store and version objects. Please note that the following section is a simplification to make it easier to understand.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-creating-a-file\">\u00a7<\/a>Creating a file<\/h3>\n\n\n\n<p>A git commit flow usually looks like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">git add myfile.txt\ngit commit -m &quot;initial commit&quot;\ngit push\n<\/code><\/pre>\n\n\n\n<p>Those are high level commands that hides the complexity of git. It&#8217;s actually possible to write a commit using more low-level commands, like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\"># Get the hash of the file&#039;s content\n$hash=${git hash-object -w myfile.txt}\n\n# Add this hash with the file to object storage\ngit update-index --add --cacheinfo 100644 \\\n  $hash myfile.txt\n\n# Write this file to a tree\n# A tree is representation of a folder and its files\n#  each folder has a tree hash\n# For the sake of simplicity we are still using some plumbing\n$treeHash=${git write-tree}\n\n# Commit the tree\ngit commit-tree $treeHash -m &quot;initial commit&quot;\n\n# Update the ref to point the HEAD to the new tree\ngit update-ref refs\/heads\/main $treeHash\ngit push\n<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/www.specfy.io\/posts\/7\/git_hash_explanation_create.png?w=910&#038;ssl=1\" alt=\"\"\/><\/figure>\n\n\n\n<p>Basically at each step you get an hash: a file has an hash, a tree has an hash, a commit has an hash, a branch is a reference to hash, a tag is a reference to hash, a stash has an hash, etc&#8230;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">git log --pretty=raw\n\ncommit 7db0d9cefc38d263af8be1fa4b2f84531c151a60\ntree 93d8ccbc3f275695ad86ca6126f0154b3973be42\nparent 2f4eef4ad79ffda62f9b940d5a95fab7d17c0bd2\nauthor Samuel Bodin &lt;1637651+bodinsamuel@users.noreply.github.com&gt; 1697460399 +0200\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-updating-a-file\">\u00a7<\/a>Updating a file<\/h3>\n\n\n\n<p>Now that we understood the internals we can use the plumbing to update our file.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">echo &quot;foobar&quot; &gt; myfile.txt\ngit add myfile.txt\ngit commit -m &quot;fix: something&quot;\ngit push\n<\/code><\/pre>\n\n\n\n<p>After our &#8220;initial commit&#8221; we now have a <code class=\"\" data-line=\"\">.git<\/code> with one object and one commit. When we update the file, add it and commit it, we create a new object, a new tree and a new commit. Then we push it to the current branch. The new commit will become the new HEAD.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/www.specfy.io\/posts\/7\/git_hash_explanation_update.png?w=910&#038;ssl=1\" alt=\"\"\/><\/figure>\n\n\n\n<p>As you can see a new commit is a full copy of the file tree, and all previous objects are still there. So when you switch to a branch, checkout a commit or a file, you are getting all the information inside a single commit hash.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-git-is-not-storing-patches\">\u00a7<\/a>Git is not storing patches<\/h3>\n\n\n\n<p>After 10 years of using Git, I realised something, it is not storing patches. Naively when you read a git diff or a pull request, you might think like me, that git is storing a diff between A and B and just displays it when needed. But in reality it stores the full file in the object storage. So <strong>each modification is just a new file<\/strong> (or object), described by an hash, added to a file tree and linked to a commit hash.<\/p>\n\n\n\n<p>That also means at any given time, git has a copy of all your files since the beginning of the project. If you have rewrote a file 999 times, there is 999 times copy of this file in the object storage. That&#8217;s why it can be slow to clone a git repository.<\/p>\n\n\n\n<p>All of this combined, it has many advantages:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>You can follow history by just following an object hash and its parent hash, like a linked list<\/li>\n\n\n\n<li>You can navigate history instantly without having to recompute any diff<\/li>\n\n\n\n<li>You can switch to a branch by changing the HEAD hash<\/li>\n\n\n\n<li>You can rebase by changing a single parent hash<\/li>\n\n\n\n<li>You can diff two versions years appart because you have access to the full files<\/li>\n\n\n\n<li>You can revert to any version by changing a single hash<\/li>\n\n\n\n<li>You can rewrite history by changing a single parent hash<\/li>\n<\/ol>\n\n\n\n<p>Using <code class=\"\" data-line=\"\">git clone --depth 1<\/code> allows you to fasten your clone by not downloading all the history<\/p>\n\n\n\n<p>&#8220;Storing everything at each update&#8221; sounds counter-intuitive and naive, but it is actually the most efficient solution in this scenario. Especially considering storage is cheap and computation is costly and slow.<br \/><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-implementation\">\u00a7<\/a>Implementation<\/h2>\n\n\n\n<p>Following all of this discovery, I came up with this naive and simple implementation in Postgres. I didn&#8217;t use git directly because it&#8217;s hard to maintain a git layer, especially in a short-lived cloud based environment, and I didn&#8217;t need all the features. Calling it a git-like is definitely far fetched, but the main idea is:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>everything is stored in a global object storage<\/strong><\/li>\n\n\n\n<li><strong>identified by a unique hash<\/strong><\/li>\n\n\n\n<li><strong>each update contains the whole object<\/strong><\/li>\n\n\n\n<li><strong>immutable<\/strong><\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-blob-storage\">\u00a7<\/a>Blob storage<\/h3>\n\n\n\n<p>A &#8220;blob&#8221; (or binary large object) is basically a any data object in binary (e.g: text, image, video, etc.). In this blog post, I will use this name to refer to versionned objects. I didn&#8217;t use the name &#8220;objects&#8221; to avoid any confusion, because it&#8217;s a very generic term used in many languages and especially in Javascript.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/www.specfy.io\/posts\/7\/table.png?w=910&#038;ssl=1\" alt=\"\"\/><\/figure>\n\n\n\n<p>To store all our blobs, we have one table identified by an unique random id <code class=\"\" data-line=\"\">blob_id<\/code> (my equivalent of a git <code class=\"\" data-line=\"\">hash<\/code>). I didn&#8217;t used a content hash for the id because I didn&#8217;t really care for integrity at the time but it could definitely be a fingerprint instead of a random string.<\/p>\n\n\n\n<p>The <code class=\"\" data-line=\"\">blob_id<\/code> is used has a Foreign Key in every other tables to determine their HEAD, so each row has it&#8217;s own HEAD.<\/p>\n\n\n\n<p>The <code class=\"\" data-line=\"\">content<\/code> column will store any type of content, we don&#8217;t care what is inside it&#8217;s only relevant to the associated table. Remember we are aiming to be very generic. It is a <code class=\"\" data-line=\"\">json<\/code> type; we could have stored the content in a <code class=\"\" data-line=\"\">text<\/code> field but it was handy to have access to the data. At some scale it might definitly be interesting to use a <code class=\"\" data-line=\"\">blob<\/code> field.<\/p>\n\n\n\n<p>The <code class=\"\" data-line=\"\">author_id<\/code> and <code class=\"\" data-line=\"\">created_at<\/code> capture by who and when the change was made.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-how-it-works\">\u00a7<\/a>How it works?<\/h2>\n\n\n\n<p>Let&#8217;s explain all of this with a more down to earth example. Let&#8217;s say we are creating a blog. We have a <code class=\"\" data-line=\"\">blobs<\/code> table that will contain our blobs, and a <code class=\"\" data-line=\"\">posts<\/code> table will contain our blog posts.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-creating-a-row\">\u00a7<\/a>Creating a row<\/h3>\n\n\n\n<p>When we create a new row, we create a blob, that gives us a <code class=\"\" data-line=\"\">blob_id<\/code>. We can now create a new row in the appropriate table and reference the <code class=\"\" data-line=\"\">blob_id<\/code> to log where the HEAD is.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">INSERT INTO blobs (blob_id, parent_id, content)\n  VALUES (&quot;blob_a&quot;, null, &quot;{title: &#039;Foobar&#039;}&quot;);\n\nINSERT INTO posts (id, blob_id, title)\n  VALUES (&quot;post_a&quot;, &quot;blob_a&quot;, &quot;Foobar&quot;);\n<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/www.specfy.io\/posts\/7\/create_blob.png?w=910&#038;ssl=1\" alt=\"\"\/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-updating-a-row\">\u00a7<\/a>Updating a row<\/h3>\n\n\n\n<p>We now have a row <code class=\"\" data-line=\"\">post_a<\/code> versionned by <code class=\"\" data-line=\"\">blob_a<\/code>. We want to update the title, and thus create a new version. To do that, we create a new blob referencing its parent with the new values, and update <code class=\"\" data-line=\"\">post_a<\/code>.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">INSERT INTO blobs (blob_id, parent_id, content)\n  VALUES (&quot;blob_b&quot;, &quot;blob_a&quot;, &quot;{title: &#039;Hello World&#039;}&quot;);\n\nUPDATE posts SET blob_id  = &#039;blob_b&#039;, title = &#039;Hello world&#039; WHERE id = &#039;post_a&#039;;\n<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/www.specfy.io\/posts\/7\/update_blob.png?w=910&#038;ssl=1\" alt=\"\"\/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-reverting-a-row\">\u00a7<\/a>Reverting a row<\/h3>\n\n\n\n<p>Good thing with this system, it&#8217;s rather simple to go back in time and revert to a previous blob. Depending on the level of transparency you want to achieve, we can revert by simply using the hash (#1) or by copying the previous blob (#2).<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/www.specfy.io\/posts\/7\/revert_blob.png?w=910&#038;ssl=1\" alt=\"\"\/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-deleting-a-row\">\u00a7<\/a>Deleting a row<\/h3>\n\n\n\n<p>Deleting a versionned object can be done in many ways depending on what you are looking for.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-option-1-soft-delete\">\u00a7<\/a>Option #1 &#8211; Soft delete<\/h4>\n\n\n\n<p>The classic way. Add a <code class=\"\" data-line=\"\">deleted_at<\/code> field in the <code class=\"\" data-line=\"\">posts<\/code> table and update this field independently. It&#8217;s quick and dirty but you don&#8217;t keep history.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">UPDATE posts SET deleted_at = &#039;2023-10-20&#039; WHERE id = &#039;post_a&#039;;\n<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-option-2-soft-versionned-delete\">\u00a7<\/a>Option #2 &#8211; Soft versionned delete<\/h4>\n\n\n\n<p>Add a <code class=\"\" data-line=\"\">deleted_at<\/code> field in the <code class=\"\" data-line=\"\">posts<\/code> table and treat delete as a regular update, so you create new blob with the field.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">INSERT INTO blobs (blob_id, parent_id, content)\n  VALUES (&quot;blob_c&quot;, &quot;blob_b&quot;, &quot;{title: &#039;Hello World&#039;, deleted_at: &#039;2023-10-20&#039;}&quot;);\n\nUPDATE posts SET blob_id  = &#039;blob_c&#039;, deleted_at = &#039;2023-10-20&#039; WHERE id = &#039;post_a&#039;;\n<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-option-3-hard-versionned-delete\">\u00a7<\/a>Option #3 &#8211; Hard versionned delete<\/h4>\n\n\n\n<p>Add a <code class=\"\" data-line=\"\">deleted<\/code> boolean field in the <code class=\"\" data-line=\"\">blobs<\/code> table and delete the original row. This keep the <code class=\"\" data-line=\"\">posts<\/code> table clean while still keeping history for transparency or revert.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><\/li>\n<\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">INSERT INTO blobs (blob_id, parent_id, content, deleted)\n  VALUES (&quot;blob_c&quot;, &quot;blob_b&quot;, &quot;{title: &#039;Hello World&#039;}&quot;, true);\n\nDELETE posts WHERE id  = &#039;post_a&#039;;\n<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/www.specfy.io\/posts\/7\/hard_delete_blob.png?w=910&#038;ssl=1\" alt=\"\"\/><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-option-4-full-hard-delete\">\u00a7<\/a>Option #4 &#8211; Full hard delete<\/h4>\n\n\n\n<p>The whole point of this system is to keep history so it&#8217;s a bit counter intuitive to hard delete everything, except to be compliant with GDPR (and local alternatives). For this case I would recommend adding another column to identify the organisation_id or user_id so that global deletion doesn&#8217;t require listing all object ids.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">DELETE blobs WHERE content-&gt;&gt;&#039;id&#039; = &#039;post_a&#039; ;\nDELETE posts WHERE id = &#039;post_a&#039;;\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-listing-versions\">\u00a7<\/a>Listing versions<\/h3>\n\n\n\n<p>If you want to build a way for your user to list versions, revert or diff in the past it&#8217;s rather straigth forward. You might want to optimise the query by creating an index with fields your regularly filter with. On my side, I choosed to duplicate the columns with the classic <code class=\"\" data-line=\"\">type + type_id<\/code>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">SELECT *\nFROM blobs\nWHERE content-&gt;&gt;&#039;id&#039; = &#039;post_a&#039;\nORDER BY created_at DESC\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-batching\">\u00a7<\/a>Batching<\/h3>\n\n\n\n<p>A concept of <strong>branch or pull request<\/strong> can be easily replicated with an additional table. This table &#8220;branch&#8221; has a <code class=\"\" data-line=\"\">blob_ids<\/code> array column that will hold many <code class=\"\" data-line=\"\">blob_id<\/code>. Those blobs are not created in the source table until we decide to do so.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/www.specfy.io\/posts\/7\/table_with_branch.png?w=910&#038;ssl=1\" alt=\"\"\/><\/figure>\n\n\n\n<p>This is very handy to batch multiple updates to many different objects, or to allow the creation of multiple new objects in a single transaction. Specfy use this to allow users to modify everything, create a batch of change that is reviewable and diffable, like a pull request but for content.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/www.specfy.io\/posts\/7\/update_with_branch.png?w=910&#038;ssl=1\" alt=\"\"\/><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-table-migration\">\u00a7<\/a>Table migration<\/h2>\n\n\n\n<p>Now you might be wondering, we have N copies of our data, but only the original table has a versionned schema. If we add or drop a column, what happen to my now outdated blobs?<\/p>\n\n\n\n<p>You have multiples choices here:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-versionned-blobs\">\u00a7<\/a>Versionned blobs<\/h3>\n\n\n\n<p>Store the table version in the blob (or the date of last migration for example) to be able to determine if the schema is still compatible or not. This can help to adapt the API or UI that is built on top of this data, but requires keeping a strategy for all schema version. And not all schema version are incompatible, for example adding a new nullable column doesn&#8217;t change anything to the previous blobs.<\/p>\n\n\n\n<p>I do believe it&#8217;s the hard way, and something you wouldn&#8217;t do anyway with other alternatives.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-schema-migration-data-migration\">\u00a7<\/a>Schema migration + Data migration<\/h3>\n\n\n\n<p>Another possible way of dealing with migration: when you migrate your table, you also migrate your data, the data being basically a copy of the schema it would be the same migration translated to JSON.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">-- #1 Adding a nullable column\n-- Doesnt require additional fix for the data\nALTER TABLE &quot;posts&quot; ADD COLUMN &quot;description&quot; VARCHAR(250);\n<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">-- #2 Dropping a column\n-- Requires us to also delete the field in the blobs\nALTER TABLE &quot;posts&quot; DROP COLUMN &quot;description&quot;;\n\nUPDATE &quot;blobs&quot; SET content = sub.next\nFROM (SELECT id, content::jsonb - &#039;description&#039; AS next FROM &quot;Blobs&quot;) AS sub\nWHERE &quot;blobs&quot;.id = sub.id;\n<\/code><\/pre>\n\n\n\n<p>You might understand now that this the <strong>biggest drawback<\/strong> to this solution. The problem of <strong>schema drift exists with all versioning solution<\/strong> but this might be one of the slowest. And updating millions\/billions of rows can range from slow to impossible, so you might want to do it asynchronously or find another way to deal with it.<\/p>\n\n\n\n<p>For example, in Specfy, I have an <strong>array of ignored fields<\/strong>, so that when I merge an old blob into a row I can just omit some fields at insert time.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code class=\"\" data-line=\"\">\/\/ Pseudo code\n\n\/\/ List of previously existing fields that are now dropped\nconst ignored = &#091;&#039;description&#039;];\n\n\/\/ select the row\nconst row = SELECT * FROM &quot;blobs&quot; WHERE blob_id = &quot;blob_a&quot;;\n\n\/\/ filter deleted columns\nconst filtered = omit(row.content, ignored);\n\n\/\/ Update the original table\nUPDATE `posts` ${...filtered} WHERE id = row.content.id;\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-conclusion\">\u00a7<\/a>Conclusion<\/h2>\n\n\n\n<p>I feel like it&#8217;s a nice alternative to what we often do. It&#8217;s <strong>easy to use and agnostic<\/strong>. Want to version a new table? Nothing else to do, the system is already there. All the versioning can be done within <strong>a single system, in a single transaction<\/strong>. One table, no other API, no other system to maintain. It scales with your Postgres and can eventually be partionned or sharded.<\/p>\n\n\n\n<p>It has a <strong>major drawback, the migration of data<\/strong>. This is a common issue with all versioning system, but it&#8217;s especially complicated when deleting or updating a column type. If you plan on modifying the schema very often, it might not be the best choice.<\/p>\n\n\n\n<p>This is currently <strong>used by Specfy to version components, projects, documents, etc<\/strong>. With that, we created a <strong>notification system<\/strong> that always point to the appropriate version in time without duplication. Our <strong>revisions feature<\/strong> (pull request equivalent) was also built on top of this, it allows users to create temporary workspace, where they can branch from the HEAD of any data objects, update it, diff it, and merge it when they are are ready.<br \/><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-alternatives-in-postgres\">\u00a7<\/a>Alternatives in Postgres<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-in-table-versioning\">\u00a7<\/a>In-Table versioning<\/h3>\n\n\n\n<p>This is the WordPress way of doing thing. You have a <code class=\"\" data-line=\"\">posts<\/code> table with a column <code class=\"\" data-line=\"\">version<\/code> and SELECT the maximum version. It&#8217;s simple and doesn&#8217;t require maintaining multiple schema or ressources. However it has massive drawback in term of performance, and query simplicity. The table will inevitably grow, and SELECT needs to have an order by which can make joining\/grouping\/aggregating harder or slower.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-outside-versioning\">\u00a7<\/a>Outside versioning<\/h3>\n\n\n\n<p>You can use a NoSql database or even filesystem to version your objects, which can double as backup. However you now need to maintain another API and storage which can be down or require independent maintenance. And you are even at bigger risk of schema drift.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-copy-table-versioning\">\u00a7<\/a>Copy table versioning<\/h3>\n\n\n\n<p>This is the most simple and efficient alternative. Create a quasi equivalent copy of the table your are versioning, migration is almost 1:1. However you still need to add metadata fields, disable or rename primary key to allow the same <code class=\"\" data-line=\"\">id<\/code> to be inserted multiple times. And you obviously need one table per versionned table.<br \/><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-additional-questions\">\u00a7<\/a>Additional questions<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-why-not-using-only-the-blobs-table-directly\">\u00a7<\/a>Why not using only the &#8220;blobs&#8221; table directly?<\/h3>\n\n\n\n<p>For many reasons:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>this table is not optimised for querying except by <code class=\"\" data-line=\"\">id<\/code>.<\/li>\n\n\n\n<li>table will grow subsequently so any other mean of query will be slower and slower over time<\/li>\n\n\n\n<li>indexing other fields requires <code class=\"\" data-line=\"\">jsonb<\/code> data type, that is slower and heavier in memory<\/li>\n\n\n\n<li>foreign key checks in <code class=\"\" data-line=\"\">json<\/code> are lost<\/li>\n\n\n\n<li><code class=\"\" data-line=\"\">json<\/code> syntax is not that well supported in all the tooling<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-why-do-you-spread-the-fields-in-the-end-table-and-not-just-copy-the-json\">\u00a7<\/a>Why do you spread the fields in the end table and not just copy the json?<\/h3>\n\n\n\n<p>Basically for the same reason as the previous question. Jsonb in Postgres is nice and performant but will always be slower than regular access. You also loose foreign key checks and it&#8217;s simply easier in most tooling.<\/p>\n\n\n\n<p>Having regular fields allow us to control the schema with regular tools.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"https:\/\/www.specfy.io\/blog\/7-git-like-versioning-in-postgres#h-is-there-any-database-alternatives-or-extension\">\u00a7<\/a>Is there any database alternatives or extension?<\/h3>\n\n\n\n<p>For me not using Postgres was not a question, but you might be in a different situation. Please mind I have not tried all those alternatives:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Neon <a href=\"https:\/\/neon.tech\/\">neon.tech<\/a> (Open Source)<\/li>\n\n\n\n<li>Dolt <a href=\"https:\/\/www.dolthub.com\/\">www.dolthub.com<\/a> (Open Source)<\/li>\n\n\n\n<li>temporal_tables (Postgres extension) <a href=\"https:\/\/pgxn.org\/dist\/temporal_tables\/\">pgxn.org<\/a><\/li>\n\n\n\n<li>postgresql-tableversion (Postgres extension) <a href=\"https:\/\/github.com\/linz\/postgresql-tableversion\/tree\/master\">github.com\/linz<\/a><\/li>\n<\/ol>\n<\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p class=\"excerpt\">Versioning data in Postgres? Testing a git like approach &#8211; Specfy is fashinating but I think that most of the time these two proposed alternatives fit most of the needs:<\/p>\n<p class=\"more-link-p\"><a class=\"more-link\" href=\"https:\/\/monodes.com\/predaelli\/2023\/11\/03\/versioning-data-in-postgres-testing-a-git-like-approach-specfy\/\">Read more &rarr;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":4,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[278],"tags":[303],"class_list":["post-10940","post","type-post","status-publish","format-standard","hentry","category-tricks","tag-postgresql"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p6daft-2Qs","jetpack-related-posts":[{"id":9759,"url":"https:\/\/monodes.com\/predaelli\/2022\/10\/24\/6-popular-postgresql-cli-libraries\/","url_meta":{"origin":10940,"position":0},"title":"6 Popular PostgreSQL CLI Libraries","author":"Paolo Redaelli","date":"2022-10-24","format":false,"excerpt":"6 Popular PostgreSQL CLI Libraries pgcli - Postgres CLI with autocompletion and syntax highlighting pgsh - Branch your PostgreSQL Database like Git psql - The built-in PostgreSQL CLI client psql2csv - Run a query in psql and output the result as CSV nancy - The Nancy CLI is a unified\u2026","rel":"","context":"In &quot;Documentations&quot;","block_context":{"text":"Documentations","link":"https:\/\/monodes.com\/predaelli\/category\/documentations\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":9598,"url":"https:\/\/monodes.com\/predaelli\/2022\/09\/14\/choosing-a-postgres-primary-key\/","url_meta":{"origin":10940,"position":1},"title":"Choosing a Postgres Primary Key","author":"Paolo Redaelli","date":"2022-09-14","format":"link","excerpt":"\u00a0Choosing a Postgres Primary Key Choosing a Postgres Primary Key 2022-09-08 \u2022 24 minute read VictorGuest Author Primary keys are important. They uniquely identify rows of data in tables, and make it easy to fetch data. The job of a database is to archive and recall data and you're going\u2026","rel":"","context":"In &quot;Documentations&quot;","block_context":{"text":"Documentations","link":"https:\/\/monodes.com\/predaelli\/category\/documentations\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":3743,"url":"https:\/\/monodes.com\/predaelli\/2018\/02\/03\/postgres-hidden-gems-craig-kerstiens\/","url_meta":{"origin":10940,"position":2},"title":"Postgres hidden gems &#8211; Craig Kerstiens","author":"Paolo Redaelli","date":"2018-02-03","format":false,"excerpt":"citext I've always been fond of PostgreSQL, now this Postgres hidden gems - Craig Kerstiens shows its smart features even more! There are many interesting features of Postgresql that I didn't knew, as I haven't actually used it for a while Postgres has a rich set of features, even when\u2026","rel":"","context":"In &quot;Documentations&quot;","block_context":{"text":"Documentations","link":"https:\/\/monodes.com\/predaelli\/category\/documentations\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11886,"url":"https:\/\/monodes.com\/predaelli\/2024\/09\/01\/git-copy-a-file-or-directory-from-another-repository-preserving-the-history\/","url_meta":{"origin":10940,"position":3},"title":"Git: Copy a file or directory from another repository preserving the history","author":"Paolo Redaelli","date":"2024-09-01","format":"quote","excerpt":"How to copy a file or directory from another GIT repository while preserving its history? Internet is full of magic formulas each one more complex. Here I\u2019m proposing a much simpler and faster one that is to make a git format-patch for the entire history of the file or subdirectory\u2026","rel":"","context":"In &quot;Tricks&quot;","block_context":{"text":"Tricks","link":"https:\/\/monodes.com\/predaelli\/category\/documentations\/tricks\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":14199,"url":"https:\/\/monodes.com\/predaelli\/2025\/10\/26\/against-sql\/","url_meta":{"origin":10940,"position":4},"title":"Against SQL","author":"Paolo Redaelli","date":"2025-10-26","format":false,"excerpt":"Against SQL Against SQL Published 2021-07-09 TLDR The relational model is great: A shared universal data model allows cooperation between programs written in many different languages, running on different machines and with different lifespans. Normalization allows updating data without worrying about forgetting to update derived data. Physical data independence allows\u2026","rel":"","context":"In &quot;Documentations&quot;","block_context":{"text":"Documentations","link":"https:\/\/monodes.com\/predaelli\/category\/documentations\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":8922,"url":"https:\/\/monodes.com\/predaelli\/2021\/11\/18\/an-introduction-to-the-pg_auto_failover-project\/","url_meta":{"origin":10940,"position":5},"title":"An introduction to the pg_auto_failover project","author":"Paolo Redaelli","date":"2021-11-18","format":"link","excerpt":"\u201cPostgres is awesome, makes developping my application so much easier, it solves so many problems for me!\u201d and the next step \u201cso, how do I run Postgres in Production?\u201d. If you\u2019re not sure how to bridge that gap yourself, how to deploy your first production system with automated failover, then\u2026","rel":"","context":"In &quot;Senza categoria&quot;","block_context":{"text":"Senza categoria","link":"https:\/\/monodes.com\/predaelli\/category\/senza-categoria\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/posts\/10940","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/comments?post=10940"}],"version-history":[{"count":0,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/posts\/10940\/revisions"}],"wp:attachment":[{"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/media?parent=10940"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/categories?post=10940"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/monodes.com\/predaelli\/wp-json\/wp\/v2\/tags?post=10940"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}