A new hash algorithm for Git
This article brought to you by LWN subscribersThe Git source-code management system is famously built on the SHA‑1 hashing algorithm, which has become an increasingly weak foundation over the years. SHA‑1 is now considered to be broken and, despite the fact that it does not yet seem to be so broken that it could be used to compromise Git repositories, users are increasingly worried about its security. The good news is that work on moving Git past SHA‑1 has been underway for some time, and is slowly coming to fruition; there is a version of the code that can be looked at now.Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.
How Git works, simplified
To understand why SHA‑1 matters to Git, it helps to have an idea of how the underlying Git database works. What follows is an oversimplified view of how Git manages objects that can be skipped by readers who are already familiar with this material.
Git is often described as being built on a content-addressable filesystem — one where you can look up an object if you know that object's contents. That may not seem particularly useful, but there's more than one way to "know" those contents. In particular, you can substitute a cryptographic hash for the contents themselves; that hash is rather easier to work with and has some other useful properties.
Git stores a number of object types, using SHA‑1 hashes to identify them. So, for example, the SHA‑1 hash of drivers/block/floppy.c in a 5.6-merge-window kernel, as calculated by Git, is 485865fd0412e40d041e861506bb3ac11a3a91e3. Conceptually, at least, Git will store that version of floppy.c in a file, using that hash as its name; early versions of Git actually did that. If somebody makes a change to floppy.c, even just removing an extra space from the end of a line, the result will have a completely different SHA‑1 hash and will be stored under a different name.
A Git repository is thus full of objects (often called "blobs") with SHA‑1 names; since a new one is created for each revision of a file, they tend to proliferate. Your editor's kernel repository currently contains 8,647,655 objects. But blobs are not the only types of objects stored in a Git repository.
An individual file object holds a particular set of contents, but it has no information about where that file appears in the repository hierarchy. If floppy.c is moved to drivers/staging someday, its hash will remain the same, so its representation in the Git object database will not change. Keeping track of how files are organized into a directory hierarchy is the job of a "tree" object. Any given tree object can be thought of as a collection of blobs (each identified by its SHA‑1 hash, of course) associated with their location in the directory tree. As one might expect, a tree object has an SHA‑1 hash of its own that is used to store it in the repository.
Finally, a "commit" object records the state of the repository at a particular point in time. A commit contains some metadata (committer, date, etc.) along with the SHA‑1 hash of a tree object reflecting the current state of the repository. With that information, Git can check out the repository at a given commit, reproducing the state of the files in the repository at that point. Importantly, a commit also contains the hash of the previous commit (or multiple commits in the case of a merge); it thus records not just the state of the repository, but the previous state, making it possible to determine exactly what changed.
Commits, too, have SHA‑1 hashes, and the hash of the previous commit (or commits) is included in that calculation. If two chains of development end up with the same file contents, the resulting commits will still have different hashes. Thus, unlike some other source-code management systems, Git does not (conceptually, at least) record "deltas" from one revision to the next. It thus forms a sort of blockchain, with each block containing the state of the repository at a given commit.
Why hash security matters
The compromise of kernel.org in 2011 created a fair amount of concern about the security of the kernel source repository. If an attacker were able to put a backdoor into the kernel code, the result could be the eventual compromise of vast numbers of deployed systems. Malicious code placed into the kernel's build system could be run behind any number of corporate and government firewalls. It was not a pleasant scenario but, thanks to the use of Git, it was also not a particularly likely one.
Let us imagine that some attacker has gained control of kernel.org and wants to place some evil code into floppy.c — something unspeakable like a change that replaces random sectors with segments from Rick Astley videos, say. Somehow this change would have to be incorporated into the repository so that it would be included in subsequent pulls. But the change to floppy.c changes its SHA‑1 hash; that, in turn, will change every tree object containing the evil floppy.c and every commit that includes it as well. The head commit for the repository would certainly change, as would older ones if the attacker tried to make the change appear to have happened in the distant past.
Somewhere out there is certainly some developer who actually memorizes SHA‑1 hashes and would immediately notice a change like that. The rest of us probably would not, but Git will. The distributed nature of Git means that there are many copies of the repository out there; as soon as a developer tries to pull from or push to the corrupted repository, the operation will fail due to the mismatched hashes between the two repositories and the corruption will come to light.
Repository integrity is also protected by signed tags, which include the hash for a specific commit and a cryptographic signature. The chain of hashes leading up to a given tag cannot be changed without invalidating the tag itself. The use of signed tags is not universal in the kernel community (and rare to nonexistent in many other projects), but mainline kernel releases are signed that way. When one sees Linus Torvalds's signature on a tag, one knows that the repository is in the state he intended when the tag was applied.
All of this depends on the strength of the hash used, though. If our attacker is able to modify floppy.c in such a way that its SHA‑1 hash does not change, that modification could well go undetected. That is why the news of SHA‑1 hash collisions creates concern; if SHA‑1 cannot be trusted to detect hostile changes, then it is no longer assuring the integrity of the repository.
The world has not ended yet, fortunately. It is still reasonably expensive to create any sort of SHA‑1 hash collision at all. Creating any new version of floppy.c with the same hash would be hard. An attacker would not just have to do that, though; this new version would have to contain the desired hostile code, still function as a working floppy driver, and not look like an obfuscated C code contest entry (at least not more than it already does). Creating such a beast is probably still unfeasible. But the writing is clearly on the wall; the time when SHA‑1 is too weak for Git is rapidly approaching.
Moving to a stronger hash
Back in the early days of Git, Torvalds was unconcerned about the possibility of SHA‑1 being broken; as a result, he never designed in the ability to switch to a different hash; SHA‑1 is fundamental to how Git operates. As of 2017, the Git code was full of declarations like:
unsigned char sha1[20];
In other words, the type of the hash was deeply wired into the code, and it was assumed that hashes would fit into a 20-byte array.
At that time, Git developer brian m. carlson was already at work to separate the Git core from the specific hash being used; indeed, he had been working on it since 2014. It was unclear what hash might eventually replace SHA‑1, but it was possible to create an abstract type for object hashes that would hide that detail. At this point, that work is done and merged.
The decision on a replacement hash algorithm was made in 2018. A number of possibilities were considered, but the Git community settled on SHA‑256 as the next-generation Git hash. The commit enshrining that choice cites its relatively long history, wide support, and good performance. The community has also decided on (and mostly implemented) a transition plan that is well documented; most of what follows is shamelessly cribbed from that file.
With the hash algorithm abstracted out of the core Git code, the transition is, on the surface, relatively easy. A new version of Git can be made with a different hash algorithm, along with a tool that will convert a repository from the old hash to the new. With a simple command like:
git convert-repo --to-hash=sha-256 --frobnicate-blobs --climb-subtrees \ --liability-waiver=none --use-shovels --carbon-offsets
a user can leave SHA‑1 behind (note that the specific command-line options may differ). There is only one problem with this plan, though: most Git repositories do not operate in a vacuum. This sort of flag-day conversion might work for a tiny project, but it's not going to work well for a project like the kernel. So Git needs to be able to work with both SHA‑1 and SHA‑256 hashes for the foreseeable future. There are a number of implications to this requirement that make themselves felt throughout the system.
One of the transition design goals is that SHA‑256 repositories should be able to interoperate with SHA‑1 repositories managed by older versions of Git. If kernel.org updates to the new format, developers running older versions should still be able to pull from (and push to) that site. That will only happen if Git continues to track the SHA‑1 hashes for each object indefinitely.
For blobs, this tracking will happen through the maintenance of a set of translation tables; given a hash generated with one algorithm, Git will be able to look up the corresponding hash from the other. Needless to say, this lookup will only succeed for objects that are actually in the repository. These translation tables will be maintained in the "pack files" that hold most objects in a contemporary Git repository. There will be a separate table for "loose objects" that are stored as separate files rather than in packs; the cost of lookups in that table is seen as being high enough that measures need to be taken to minimize the number of loose objects in any given repository.
The handling of other object types is a bit more complicated. An SHA‑1 tree object, for example, must contain SHA‑1 hashes for the objects in the tree. So if such a tree object is requested, Git will have to locate the SHA‑256 version, then translate all the object hashes contained within it before returning it. Similar translations will be required for commits. Signed tags will contain both hashes.
With this machinery in place, Git installations will be interoperable during the transition. Eventually, all users will have upgraded to SHA‑256-capable versions of Git, at which point repository owners could begin turning off the SHA‑1 capability and removing the translation tables. The transition will, at that point, be complete.
Some inconvenient details
There are likely to be some glitches along the way, naturally. One of them is a simple human-factors problem: when a user supplies a hash value, should it be interpreted as SHA‑1 or SHA‑256? In some cases, it's unambiguous; SHA‑1 hashes are 160 bits wide, so a 256-bit hash must be SHA‑256, for example. But a shorter hash could be either, since hashes can be (and often are) abbreviated. The transition document describes a multi-phase process during which the interpretation of hash values would change, but most users are unlikely to go through that process.
There is, of course, a way to unambiguously give a hash value in the new Git code, and they can even be mixed on the command line; this example comes from the transition document:
git --output-format=sha1 log abac87a^{sha1}..f787cac^{sha256}
For a Git user interface this is relatively straightforward and concise, but one can still imagine that users might tire of it relatively quickly. The obvious solution to this sort of bracket fatigue is to fully transition a project to SHA‑256 as quickly as possible.
There is another issue out there, though: there are a lot of SHA‑1 hash values in the wild. The kernel repository currently contains over 40,000 commits with a Fixes: tag; each one of those includes an SHA‑1 hash. These hash values also can be found in bug-tracker histories, release announcements, vulnerability disclosures, and more. In a repository without SHA‑1 compatibility, all of those hashes will become meaningless. To address this issue, one can imagine that the Git developers may eventually add a mode where translations for old SHA‑1 hashes remain in the repository, but no SHA‑1 hashes for new objects are added.
Current state
Much of the work to implement the SHA‑256 transition has been done, but it remains in a relatively unstable state and most of it is not even being actively tested yet. In mid-January, carlson posted the first part of this transition code, which clearly only solves part of the problem:
The value of write-only repositories is generally agreed to be relatively low; not even SCCS was so limited. Carlson's purpose in posting the code at this stage is to try to reveal any core issues that will be harder to change as the work progresses. Developers who are interested in where Git is going may well want to take a close look at this code; converting their working repositories over is not recommended, though.
As it turns out, carlson's work goes well beyond what has been put out for
testing now; he will post
it when he is ready, but really curious people can see it now in his GitHub
repository. This work is unlikely to land on the systems of most Git
users for some time yet, but it is good to know that it is getting close to
ready. The Git developers (carlson in particular) have quietly been
working on this project for years; we will all benefit from it.
A new hash algorithm for Git
Posted Feb 3, 2020 18:15 UTC (Mon)
by IanKelling (subscriber, #89418)
[Link] (1 responses)
Posted Feb 3, 2020 18:15 UTC (Mon) by IanKelling (subscriber, #89418) [Link] (1 responses)
A new hash algorithm for Git
Posted Feb 3, 2020 18:34 UTC (Mon)
by zdavatz (guest, #70954)
[Link]
Posted Feb 3, 2020 18:34 UTC (Mon) by zdavatz (guest, #70954) [Link]
A new hash algorithm for Git
Posted Feb 3, 2020 18:37 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (21 responses)
Posted Feb 3, 2020 18:37 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (21 responses)
One trick that worked for me in a similar case was to switch the encoding. SHA-1 is encoded as hex numbers, we can simply switch SHA-256 to be encoded as letters "g" to "v", so they will be immediately recognizable.
A new hash algorithm for Git
Posted Feb 3, 2020 18:41 UTC (Mon)
by juliank (guest, #45896)
[Link] (14 responses)
Posted Feb 3, 2020 18:41 UTC (Mon) by juliank (guest, #45896) [Link] (14 responses)
A new hash algorithm for Git
Posted Feb 3, 2020 23:48 UTC (Mon)
by dsommers (subscriber, #55274)
[Link] (13 responses)
Posted Feb 3, 2020 23:48 UTC (Mon) by dsommers (subscriber, #55274) [Link] (13 responses)
* performance: Doing char replacing in strings is more CPU intensive than just skipping one single byte and continue using standard functions/libraries. This gets more evident when when considering large repositories like the Linux kernel.
* future compatibility: Shifting a-f chars to another set of 6 other letters will only work 3 more times if only considering lower case letters - 6 letters (a-f) * 4 shifts = 24. So at the 5 change, something new must be done to avoid breaking compatibility. Of course the counter argument is "how often will such new algorithms occur in reality?"; but none of us really knows that for sure - just as we don't know how long a git repository will live and be accessed.
From this article (I've not paid attention to discussions in the git community), it seems like they account for the possibility change it again later on again. So having a prefix possibility with just one prefix or suffix letter makes it possible to change algorithms 26 times, with no performance loss (except the "skip one byte" operation when evaluating the hash). If that is two little, 3 letters gives the possibility for 17576 changes; which is probably enough for most of us alive today - but using 4 letters increases that once again to an even more insane number.
But say you then settle for 4 letters prefix (456.976 possibilities) ... then you're not that far away from {sha256} which is 8 letters, with basically an unlimited amount of algorithm changes. What is inside the {} can be any length while containing a good description of what kind of algorithm in use, without needing to lookup that "AAAC" means SHA512.
A new hash algorithm for Git
Posted Feb 4, 2020 0:10 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Feb 4, 2020 0:10 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]
A new hash algorithm for Git
Posted Feb 4, 2020 5:56 UTC (Tue)
by eru (subscriber, #2753)
[Link] (4 responses)
Posted Feb 4, 2020 5:56 UTC (Tue) by eru (subscriber, #2753) [Link] (4 responses)
But probably that would be enough, it is unlikely the hash changes more than once in a decade...
A new hash algorithm for Git
Posted Feb 4, 2020 6:06 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Feb 4, 2020 6:06 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]
A new hash algorithm for Git
Posted Feb 4, 2020 16:07 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
Posted Feb 4, 2020 16:07 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (2 responses)
A new hash algorithm for Git
Posted Feb 5, 2020 2:56 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Posted Feb 5, 2020 2:56 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (1 responses)
A new hash algorithm for Git
Posted Feb 5, 2020 3:32 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Feb 5, 2020 3:32 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]
> That'd be entertaining to watch.
Why?
A new hash algorithm for Git
Posted Feb 4, 2020 19:40 UTC (Tue)
by quotemstr (subscriber, #45331)
[Link]
Posted Feb 4, 2020 19:40 UTC (Tue) by quotemstr (subscriber, #45331) [Link]
Hash functions don't operate on the hex encoding of the hash digest. If you need to parse base-16 to binary anyway, there's no penalty arising from choosing an alternate set of characters to represent that base-16 value.
A new hash algorithm for Git
Posted Feb 5, 2020 15:00 UTC (Wed)
by obi (guest, #5784)
[Link] (5 responses)
Posted Feb 5, 2020 15:00 UTC (Wed) by obi (guest, #5784) [Link] (5 responses)
A new hash algorithm for Git
Posted Feb 5, 2020 17:26 UTC (Wed)
by juliank (guest, #45896)
[Link]
Posted Feb 5, 2020 17:26 UTC (Wed) by juliank (guest, #45896) [Link]
A new hash algorithm for Git
Posted Feb 6, 2020 0:41 UTC (Thu)
by pj (subscriber, #4506)
[Link] (2 responses)
Posted Feb 6, 2020 0:41 UTC (Thu) by pj (subscriber, #4506) [Link] (2 responses)
A new hash algorithm for Git
Posted Feb 7, 2020 22:14 UTC (Fri)
by Jandar (subscriber, #85683)
[Link] (1 responses)
Posted Feb 7, 2020 22:14 UTC (Fri) by Jandar (subscriber, #85683) [Link] (1 responses)
A new hash algorithm for Git
Posted Feb 7, 2020 23:17 UTC (Fri)
by Jandar (subscriber, #85683)
[Link]
Posted Feb 7, 2020 23:17 UTC (Fri) by Jandar (subscriber, #85683) [Link]
The hash is like an inode-number for a file.
A new hash algorithm for Git
Posted Feb 6, 2020 9:50 UTC (Thu)
by ivyl (subscriber, #88764)
[Link]
Posted Feb 6, 2020 9:50 UTC (Thu) by ivyl (subscriber, #88764) [Link]
A new hash algorithm for Git
Posted Feb 3, 2020 21:44 UTC (Mon)
by josh (subscriber, #17465)
[Link] (2 responses)
Posted Feb 3, 2020 21:44 UTC (Mon) by josh (subscriber, #17465) [Link] (2 responses)
A new hash algorithm for Git
Posted Feb 4, 2020 17:13 UTC (Tue)
by excors (subscriber, #95769)
[Link] (1 responses)
Posted Feb 4, 2020 17:13 UTC (Tue) by excors (subscriber, #95769) [Link] (1 responses)
A new hash algorithm for Git
Posted Feb 5, 2020 11:55 UTC (Wed)
by mgedmin (subscriber, #34497)
[Link]
Posted Feb 5, 2020 11:55 UTC (Wed) by mgedmin (subscriber, #34497) [Link]
A new hash algorithm for Git
Posted Feb 4, 2020 14:58 UTC (Tue)
by ballombe (subscriber, #9523)
[Link] (1 responses)
Posted Feb 4, 2020 14:58 UTC (Tue) by ballombe (subscriber, #9523) [Link] (1 responses)
12345678 is perfectly valid in both notation.
A new hash algorithm for Git
Posted Feb 4, 2020 15:51 UTC (Tue)
by willy (subscriber, #9762)
[Link]
Posted Feb 4, 2020 15:51 UTC (Tue) by willy (subscriber, #9762) [Link]
A new hash algorithm for Git
Posted Feb 4, 2020 23:09 UTC (Tue)
by flussence (guest, #85566)
[Link]
Posted Feb 4, 2020 23:09 UTC (Tue) by flussence (guest, #85566) [Link]
A new hash algorithm for Git
Posted Feb 3, 2020 18:52 UTC (Mon)
by meyert (subscriber, #32097)
[Link] (29 responses)
Posted Feb 3, 2020 18:52 UTC (Mon) by meyert (subscriber, #32097) [Link] (29 responses)
A new hash algorithm for Git
Posted Feb 3, 2020 19:03 UTC (Mon)
by martin.langhoff (subscriber, #61417)
[Link] (19 responses)
Posted Feb 3, 2020 19:03 UTC (Mon) by martin.langhoff (subscriber, #61417) [Link] (19 responses)
A new hash algorithm for Git
Posted Feb 3, 2020 19:58 UTC (Mon)
by mirabilos (subscriber, #84359)
[Link] (18 responses)
Posted Feb 3, 2020 19:58 UTC (Mon) by mirabilos (subscriber, #84359) [Link] (18 responses)
Also, I wonder, will I be able to verify old signed commits and tags after the transition is complete? Doesn’t seem so…
A new hash algorithm for Git
Posted Feb 3, 2020 22:02 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
Posted Feb 3, 2020 22:02 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)
SHA-256 has 2^128 collision probability to start with, any realistic attacks won't lower the complexity below 2^100 (WAY outside of possible attacks).
A new hash algorithm for Git
Posted Feb 4, 2020 2:47 UTC (Tue)
by wahern (subscriber, #37304)
[Link] (3 responses)
Posted Feb 4, 2020 2:47 UTC (Tue) by wahern (subscriber, #37304) [Link] (3 responses)
A new hash algorithm for Git
Posted Feb 4, 2020 11:32 UTC (Tue)
by heftig (subscriber, #73632)
[Link] (2 responses)
Posted Feb 4, 2020 11:32 UTC (Tue) by heftig (subscriber, #73632) [Link] (2 responses)
A new hash algorithm for Git
Posted Feb 5, 2020 9:19 UTC (Wed)
by bmenrigh (subscriber, #63018)
[Link] (1 responses)
Posted Feb 5, 2020 9:19 UTC (Wed) by bmenrigh (subscriber, #63018) [Link] (1 responses)
A new hash algorithm for Git
Posted Feb 5, 2020 15:02 UTC (Wed)
by johill (subscriber, #25196)
[Link]
Posted Feb 5, 2020 15:02 UTC (Wed) by johill (subscriber, #25196) [Link]
A new hash algorithm for Git
Posted Feb 4, 2020 2:16 UTC (Tue)
by KaiRo (subscriber, #1987)
[Link] (8 responses)
Posted Feb 4, 2020 2:16 UTC (Tue) by KaiRo (subscriber, #1987) [Link] (8 responses)
A new hash algorithm for Git
Posted Feb 4, 2020 2:22 UTC (Tue)
by mirabilos (subscriber, #84359)
[Link] (7 responses)
Posted Feb 4, 2020 2:22 UTC (Tue) by mirabilos (subscriber, #84359) [Link] (7 responses)
I was just wondering because the signature is over the SHA-1 hash.
A new hash algorithm for Git
Posted Feb 4, 2020 2:39 UTC (Tue)
by KaiRo (subscriber, #1987)
[Link] (6 responses)
Posted Feb 4, 2020 2:39 UTC (Tue) by KaiRo (subscriber, #1987) [Link] (6 responses)
A new hash algorithm for Git
Posted Feb 4, 2020 19:57 UTC (Tue)
by mirabilos (subscriber, #84359)
[Link] (5 responses)
Posted Feb 4, 2020 19:57 UTC (Tue) by mirabilos (subscriber, #84359) [Link] (5 responses)
When I currently have a signed commit…
-----BEGIN cutting here may damage your screen surface-----
$ git cat-file -p HEAD
tree 937122472a792ada03309a60b7a31e02a29aa764
parent 53861b4a1544c7c8825f1414c37c9694c84c5d92
author mirabilos <m@mirbsd.org> 1580771045 +0100
committer mirabilos <mirabilos@evolvis.org> 1580771470 +0100
gpgsig -----BEGIN PGP SIGNATURE-----
Comment: ☃ ЦΤℱ—8 ☕☂☄
iQIcBAABCQAGBQJeOKiPAAoJEIlQwYleuNOzSzYP/3xowIYpxJwuHfdP8oRekbSZ
eVI9mO5g8KC+SUe5oGCbocH478pBUp5AOYlFGL0awetklijRmF+EeYp+a1IluCww
GD2pSPFCpxSjScERlED5YYpfaaw1XEutoGHYQNMAUQhlRMzS8NwhGJjTuoIbvE4X
hMntoMtDM7sPJ3CIADIoYzXIcdaqsELvqptuvNdo9S/PIyR6OFWhpF68Qn+SILqk
N+fOA/KpgQLsRmMEVy3YtqmMdToYXoP3m4ec0/QSoN90QVrO9ZnVG2+0f9yeEiVn
xEWiaSSsz5vtniBLzOvQ6FeE0h08ZsQi9dcTj8aq3tDtUJb2sQi6q79Gl5StmfHI
8HN9q8ZQP/Vh8kIT5z3lcuNnb3y7sc90ZzY5i7Q2YwfKNbJ5mAEMvSgzBxcrDflR
/kjUJcXJg98IzJsWbE3k9gRc9yatqKQii0GiaxID13fCfl++4klJrFMEyoTdhta4
5a7vGa6OuHr+MWsT+35yQsR6Mt1DnMY2oNArTgWG3DfNQK8zb7rIExPbuV6pLP2O
X67ZCVSHwRTrLWnDHjSuQH4Hfoibq96Ga9wJwEjw0+sWKzg4CgvQH6L+UiXIZO0/
2+hhF507WUCKh8Uit2nrRsGhVnXJrI5QZsD857oAifcBFslbTLwTCkj+3gccHxwH
A/BAeG4zN0JrdvMzx0pN
=9w0P
-----END PGP SIGNATURE-----
erm yes, the symlink…
-----END cutting here may damage your screen surface-----
… or tag…
-----BEGIN cutting here may damage your screen surface-----
$ git cat-file -p mksh-57-6
object 3ece4d6c67f32b8e2b9b00900d05cc06c658fc87
type commit
tag mksh-57-6
tagger mirabilos <mirabilos@evolvis.org> 1580771932 +0100
mksh (57-6) unstable; urgency=low
-----BEGIN PGP SIGNATURE-----
Comment: ☃ ЦΤℱ—8 ☕☂☄
iQIcBAABCQAGBQJeOKpdAAoJEIlQwYleuNOzE3EP/1Qu6w3ZnelCbTcR0/lR1QaH
qisRANlIKYq0MVDOmhzGZ4m6/ri9b2njI16x0R3otaIT2QfG2ldj8U/Sq7Vpm6Xb
uTpMluMzFj6sungPYOCvgbDVcVqt4+qCAwtFL5Lt2gpfN45KwYO0RdrSCY8wFD3N
TO3Wq7M3DXt99F9mMY/L+XfvbpDAMzjCEK0tgTAal4QWnnb7V2Y1bVnZjos5XZTV
hWW4kJMqBp2Hf99KLqnjijfPgZkqbSMYKy14Nsqo1cSujwPpOH2MgDbyuun1SuSA
K6U0JT1iyIsL/ixkCx8vi6ejIGGQXXpGEq4K4RA3Wc4ALB/FWC9Y2MrCEExG0wEV
tDkto90sbD6Nymnii1apG2Q7aSyDNDjsiRT2tzYN2S5EzItYtV0V8ZXoxiYk/c/Z
ttAcdXxh8R4+5p3yNYwAjTSzZe8ohvgHFXoAUGVpk7g9oArlNiJmqkrW3BGdFrCb
gH0h4UpiXr3pgnlPi247alGT18Xly5cBX3CbjORGDNsUDZoGPLlVuyW46PaRel3V
P8BODtOoFkoK7JyFCRP70Z97vQig+L9nbN5tf50haYlxhO7oOSU7RzQJxgv2tLza
AT0bg6Wfs4I9VV/MjocIirwrbihZY1gMgURgad5PdoNjoyNy+vd6OKMFQm1i/eUF
hGIwKngrue1A9RMKPaCG
=JPiZ
-----END PGP SIGNATURE-----
-----END cutting here may damage your screen surface-----
… these hardcode the SHA-1 hashes. These are, thus, needed to verify the signature. This also cannot be rewritten.
As a user, I’d expect that, after full git conversion to a new hash, I’ll still be able to verify these. That was the question.
A new hash algorithm for Git
Posted Feb 5, 2020 9:50 UTC (Wed)
by geert (subscriber, #98403)
[Link] (2 responses)
Posted Feb 5, 2020 9:50 UTC (Wed) by geert (subscriber, #98403) [Link] (2 responses)
Cfr. " To address this issue, one can imagine that the Git developers may eventually add a mode where translations for old SHA‑1 hashes remain in the repository, but no SHA‑1 hashes for new objects are added.".
A new hash algorithm for Git
Posted Feb 13, 2020 23:57 UTC (Thu)
by floppus (guest, #137245)
[Link] (1 responses)
Posted Feb 13, 2020 23:57 UTC (Thu) by floppus (guest, #137245) [Link] (1 responses)
Otherwise, assuming you trust the person who generated the signature but you don't trust the contents of the git repository, you have no way of knowing that the commit you're looking at actually corresponds to the same source tree that the person signed.
A new hash algorithm for Git
Posted Feb 14, 2020 1:20 UTC (Fri)
by excors (subscriber, #95769)
[Link]
Posted Feb 14, 2020 1:20 UTC (Fri) by excors (subscriber, #95769) [Link]
Then a regular user can verify a commit (identified by its SHA-256) by using the signed translation table to find the corresponding SHA-1 and checking the committer's signature of that SHA-1. That avoids the performance cost of having to fetch the entire object from disk to compute its SHA-1 before checking the signature, while avoiding the danger of a falsified translation table that tries to link the signed SHA-1 to a totally different commit that doesn't actually match that SHA-1.
As a bonus, if SHA-1 gets completely broken in the future, I think the repository would remain secure. If a future attacker can manufacture a commit whose SHA-1 matches an old signed commit, they could try to insert that commit into the repository with a valid translation table entry (containing the colliding SHA-1 and a new non-colliding SHA-256) and reuse the old commit's signature on their new commit (since it's only signing the SHA-1). If the translation table was unsigned, the attacker could succeed. But if it was signed, there's no way to insert the new translation table entry without tricking Linus into signing the new table. And Linus can avoid being tricked if he simply stops signing any new translation tables beyond the point when SHA-1 gets completely broken (which should be many years away).
A new hash algorithm for Git
Posted Feb 5, 2020 19:40 UTC (Wed)
by KaiRo (subscriber, #1987)
[Link] (1 responses)
Posted Feb 5, 2020 19:40 UTC (Wed) by KaiRo (subscriber, #1987) [Link] (1 responses)
The actual hash that is signed is a different topic. You should be able to verify those signed hashes as long as the original hash is available (part of what the original article is about) and the signature can be trusted (which may not be the case forever, as I was pointing to).
A new hash algorithm for Git
Posted Feb 6, 2020 15:40 UTC (Thu)
by luto (subscriber, #39314)
[Link]
Posted Feb 6, 2020 15:40 UTC (Thu) by luto (subscriber, #39314) [Link]
Maybe Skip SHA-3
Posted Feb 4, 2020 8:19 UTC (Tue)
by tialaramex (subscriber, #21167)
[Link]
Posted Feb 4, 2020 8:19 UTC (Tue) by tialaramex (subscriber, #21167) [Link]
https://www.imperialviolet.org/2017/05/31/skipsha3.html
SHA-3 is significantly slower than SHA-2 which is already very slow for a hash (if we didn't need a crypto hash there are lots of very very fast hashes used elsewhere) so it's a big penalty when you aren't buying say, future proofing, which you aren't because SHA-3 was agreed way before the dust settled on how to do this style of hash, there are currently half a dozen like it, all seemingly secure, most faster, none standardised. This isn't like AES where the rough direction is understood and now you're buying hardware that accelerates it, so that not doing AES ends up slower because you lose hardware assist.
Langley recommends SHA-512/256 (note for those unfamiliar this is literally the name of the hash, not two different hashes you can pick from) if you care about length extension attacks and otherwise SHA-256 is fine. The reason for SHA-512/256 is that the output isn't the entire internal state, it's only half the state, meaning a length extension fails, and it only needs the same size structure to store the hash as SHA-256 (but it is slower).
A new hash algorithm for Git
Posted Feb 4, 2020 10:41 UTC (Tue)
by epa (subscriber, #39769)
[Link] (1 responses)
Posted Feb 4, 2020 10:41 UTC (Tue) by epa (subscriber, #39769) [Link] (1 responses)
Also, I wonder, will I be able to verify old signed commits and tags after the transition is complete?Perhaps you would be able to verify them slowly by recomputing the SHA-1 hashes of each object from scratch, even if they aren't stored in the repository.
A new hash algorithm for Git
Posted Feb 5, 2020 17:06 UTC (Wed)
by droundy (subscriber, #4559)
[Link]
Posted Feb 5, 2020 17:06 UTC (Wed) by droundy (subscriber, #4559) [Link]
A new hash algorithm for Git
Posted Feb 5, 2020 8:26 UTC (Wed)
by bluss (subscriber, #47454)
[Link]
Posted Feb 5, 2020 8:26 UTC (Wed) by bluss (subscriber, #47454) [Link]
It's true the structure and construction is the same, but SHA-2 is has a bigger state and much more involved mixing steps per round, and those are the biggest difference versus SHA-1, not the digest length.
A new hash algorithm for Git
Posted Feb 3, 2020 20:00 UTC (Mon)
by chfisher (subscriber, #106449)
[Link]
Posted Feb 3, 2020 20:00 UTC (Mon) by chfisher (subscriber, #106449) [Link]
A new hash algorithm for Git
Posted Feb 3, 2020 20:15 UTC (Mon)
by dkg (subscriber, #55359)
[Link] (7 responses)
A hash collision for SHA-1 is not theoretical at all. Rather, it is within reach of moderately funded attacker, on the order of $100K, and has been practically demonstrated by a university+corporate team. The price is expected only to fall.
Posted Feb 3, 2020 20:15 UTC (Mon) by dkg (subscriber, #55359) [Link] (7 responses)
The authors of the recent "SHA-ttered" collision have this to say about git:
GIT strongly relies on SHA-1 for the identification and integrity checking of all file objects and commits. It is essentially possible to create two GIT repositories with the same head commit hash and different contents, say a benign source code and a backdoored one. An attacker could potentially selectively serve either repository to targeted users. This will require attackers to compute their own collision.
Note that this weakness in git means that even git signatures made with strong modern crypto are vulnerable, because they are signing objects that refer to other objects only by their SHA-1 digest.
For instance, when signing tags, the signed tag itself cannot be replaced, but the thing that the tag points to can be replaced without invalidating the signature.
Kudos to carlson for having been working on this; it's a shame that this kind of maintenance work never seems to get prioritized by projects until there is a fire that needs putting out. It would have been better if we had already completed this transition years ago.
A new hash algorithm for Git
Posted Feb 3, 2020 20:27 UTC (Mon)
by walters (subscriber, #7396)
[Link]
Posted Feb 3, 2020 20:27 UTC (Mon) by walters (subscriber, #7396) [Link]
A new hash algorithm for Git
Posted Feb 3, 2020 21:11 UTC (Mon)
by martin.langhoff (subscriber, #61417)
[Link] (4 responses)
Posted Feb 3, 2020 21:11 UTC (Mon) by martin.langhoff (subscriber, #61417) [Link] (4 responses)
As many have pointed out, including this article, current attacks match the SHA-1 of an existing file that wasn't built to facilitate the attack in the first place... have to add a bunch of "random" data to get to a collision. For a code file, which is the typical content of git, that's pretty "visible".
A new hash algorithm for Git
Posted Feb 3, 2020 21:49 UTC (Mon)
by dkg (subscriber, #55359)
[Link] (2 responses)
I recommend reading Joey Hess's discussion from 2011 (in particular the discussion in the comments) for why the legibility of the commit messages and code objects typically covered by git is not necessarily sufficient: other stuff can be included in the hashes that won't be visible to normal end users. (maybe this was fixed in the last decade? i haven't tested recently)
Posted Feb 3, 2020 21:49 UTC (Mon) by dkg (subscriber, #55359) [Link] (2 responses)
Even if it were somehow true that git hashes only cover the things that are directly exposed to the user, "git history is cryptographically strong for repositories that contain only human-readable code" is a significant reduction in scope from "git history is cryptographically strong". I don't think we want to make that reduction, and i know of no repositories (and no tooling) that would deliberately enforce that kind of limitation for the sake of retaining cryptographic strength of the git history.
Also, many "code only" repositories contain the occasional binary graphic file (screenshot, logo etc), firmware, test corpus, etc, all of which could be used to hide the "tumor" needed for this kind of collision attack.
This needs fixing, and we've known it needed fixing for nearly as long as git has been around. Why advance an argument that seems like it would only help to delay getting a fix deployed?
A new hash algorithm for Git
Posted Feb 3, 2020 23:39 UTC (Mon)
by martin.langhoff (subscriber, #61417)
[Link]
Posted Feb 3, 2020 23:39 UTC (Mon) by martin.langhoff (subscriber, #61417) [Link]
It's not known to be broken today in a useful, usable way, for the typical uses of git.
A new hash algorithm for Git
Posted Feb 4, 2020 15:25 UTC (Tue)
by joey (guest, #328)
[Link]
Posted Feb 4, 2020 15:25 UTC (Tue) by joey (guest, #328) [Link]
A new hash algorithm for Git
Posted Feb 3, 2020 22:03 UTC (Mon)
by khim (subscriber, #9252)
[Link]
Posted Feb 3, 2020 22:03 UTC (Mon) by khim (subscriber, #9252) [Link]
I don't know where you get the notion that problem of "creating an existing file that wasn't built to facilitate the attack" is even remotely possible.
Not even MD-4 is broken for preimage attack in practice. MD-4 was "broken" with preimage attack of complexity 2¹⁰² - which is really worrying: maybe in a few more years with some ASICs… maybe… Very unlikely though: very few entities could spend literally trillions of dollars to show that old, almost completely forgotten, hash is no longer useful.There are exist theoretical attack on MD-5 of 2¹²³⋅⁴ complexity, but if you'll recall that there are 2¹²⁸ MD-5 hashes is total… that's pretty trivial improvement. SHA-1 doesn't even have a theoretical preimage attacks currently (but there are few for "reduced" versions means soon we'll see something for the full one).
So no, don't expect preimage attack on SHA-1 to happen in your lifetime… unless you plan to live for 300 years.
Now, collision attacks are pretty easy for MD-4, MD-5, and relatively easy for SHA-1 (tens of thousands of dollars) - but they all require attacker to "plant bomb" in the "good repo". These are still nasty enough to worry about these, but as you could guess urgency is quite low: I still think it's cheaper to just submit a dozen of patches with subtle buffer overflows and get one of them accepted than to generate such a collision. But price goes down each year…
A new hash algorithm for Git
Posted Feb 3, 2020 21:56 UTC (Mon)
by dkg (subscriber, #55359)
[Link]
I should also mention that the "shambles" attack published in January 2020 claims costs of $11K (USD) for an arbitrary collision and $45K (USD) for a chosen-prefix collision.
Posted Feb 3, 2020 21:56 UTC (Mon) by dkg (subscriber, #55359) [Link]
A new hash algorithm for Git
Posted Feb 3, 2020 22:02 UTC (Mon)
by josh (subscriber, #17465)
[Link] (4 responses)
Posted Feb 3, 2020 22:02 UTC (Mon) by josh (subscriber, #17465) [Link] (4 responses)
(I *don't* want to bikeshed the hash selection here. But I wonder if that hash selection might be worth benchmarking and re-evaluating now that the infrastructure is ready.)
A new hash algorithm for Git
Posted Feb 4, 2020 2:07 UTC (Tue)
by KaiRo (subscriber, #1987)
[Link]
Posted Feb 4, 2020 2:07 UTC (Tue) by KaiRo (subscriber, #1987) [Link]
A new hash algorithm for Git
Posted Feb 4, 2020 14:40 UTC (Tue)
by cesarb (subscriber, #6266)
[Link]
Posted Feb 4, 2020 14:40 UTC (Tue) by cesarb (subscriber, #6266) [Link]
A new hash algorithm for Git
Posted Feb 5, 2020 15:17 UTC (Wed)
by smoogen (subscriber, #97)
[Link]
Posted Feb 5, 2020 15:17 UTC (Wed) by smoogen (subscriber, #97) [Link]
1. Like a game of scissors-rocks-paper there is no one right choice that 'wins'. You choose shasha versus chachacha and you find out that both are weak against an attack that plugh294 isn't. However plug294 is weak against and attack that shasha is good at and chachacha is sort of ok.. etc etc etc.
2. Because of that and that the attacks get better.. any choice you make at time X will look bad in time X+1. This leads to a lot of projects doing the 'jump to the latest findings' switching crypto or checksums or signing to the latest thing which was written to be stronger than whatever you chose at X time. However also due to 1.. you end up finding that you have to keep hopping.
In the end, you just have to choose something and implement it and know that you will have to choose something else again in 2-3 years and implement that. There is no 'right' choice. There are just an infinite 'wrong' ones which are either more wrong or less wrong.
A new hash algorithm for Git
Posted Feb 3, 2020 22:10 UTC (Mon)
by newren (guest, #5160)
[Link]
Posted Feb 3, 2020 22:10 UTC (Mon) by newren (guest, #5160) [Link]
All that said, I'm glad Brian is doing such great work in transitioning the codebase over to a newer hash algorithm. It's a huge pile of work, and I'm glad he's been tackling it.
A new hash algorithm for Git
Posted Feb 4, 2020 5:21 UTC (Tue)
by pabs (subscriber, #43278)
[Link] (1 responses)
Posted Feb 4, 2020 5:21 UTC (Tue) by pabs (subscriber, #43278) [Link] (1 responses)
For example: I would like to see restic/borg style rolling chunking, for more efficient storage of large files.
A new hash algorithm for Git
Posted Feb 5, 2020 11:54 UTC (Wed)
by nix (subscriber, #2304)
[Link]
Posted Feb 5, 2020 11:54 UTC (Wed) by nix (subscriber, #2304) [Link]
A new hash algorithm for Git
Posted Feb 4, 2020 11:45 UTC (Tue)
by keeperofdakeys (guest, #82635)
[Link] (2 responses)
Posted Feb 4, 2020 11:45 UTC (Tue) by keeperofdakeys (guest, #82635) [Link] (2 responses)
A new hash algorithm for Git
Posted Feb 4, 2020 20:40 UTC (Tue)
by tialaramex (subscriber, #21167)
[Link] (1 responses)
Posted Feb 4, 2020 20:40 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (1 responses)
Collisions are not a second pre-image attack. The bad guys create two blobs, which are the same size, and have the same hash but are different. They get to show you either blob and trick you by substituting the other one which you'll believe is the same because it has the same SHA-1.
An attacker would need to target git specifically, yes, but it isn't particularly more difficult as a result of tracking size and type.
A new hash algorithm for Git
Posted Feb 5, 2020 15:44 UTC (Wed)
by iabervon (subscriber, #722)
[Link]
Posted Feb 5, 2020 15:44 UTC (Wed) by iabervon (subscriber, #722) [Link]
Would your project notice unmotivated color table entries in an image and ask why it was done in such an unintuitive way? Would you go through the layout logic in a PDF, rather than just looking at it?
A new hash algorithm for Git
Posted Feb 4, 2020 15:04 UTC (Tue)
by osma (subscriber, #6912)
[Link] (3 responses)
Posted Feb 4, 2020 15:04 UTC (Tue) by osma (subscriber, #6912) [Link] (3 responses)
A new hash algorithm for Git
Posted Feb 4, 2020 20:18 UTC (Tue)
by Hattifnattar (subscriber, #93737)
[Link]
Posted Feb 4, 2020 20:18 UTC (Tue) by Hattifnattar (subscriber, #93737) [Link]
A new hash algorithm for Git
Posted Feb 4, 2020 21:20 UTC (Tue)
by meuh (guest, #22042)
[Link]
Posted Feb 4, 2020 21:20 UTC (Tue) by meuh (guest, #22042) [Link]
I've not found this suggestion being rejected in https://github.com/git/git/blob/v2.25.0/Documentation/tec... but I would assume there's a catch !
A new hash algorithm for Git
Posted Feb 5, 2020 20:11 UTC (Wed)
by meuh (guest, #22042)
[Link]
Posted Feb 5, 2020 20:11 UTC (Wed) by meuh (guest, #22042) [Link]
A new hash algorithm for Git
Posted Feb 5, 2020 11:25 UTC (Wed)
by unixbhaskar (guest, #44758)
[Link] (1 responses)
Posted Feb 5, 2020 11:25 UTC (Wed) by unixbhaskar (guest, #44758) [Link] (1 responses)
A new hash algorithm for Git
Posted Feb 6, 2020 19:59 UTC (Thu)
by kpfleming (subscriber, #23250)
[Link]
Posted Feb 6, 2020 19:59 UTC (Thu) by kpfleming (subscriber, #23250) [Link]
You did it Jon; now the ICO vultures and others will be all over this site looking for their next HODL opportunity.