git-annex.git
9 months ago--json for addcomputed and recompute
Joey Hess [Mon, 17 Mar 2025 19:51:43 +0000 (15:51 -0400)]
--json for addcomputed and recompute

Not very useful, but it does work.

9 months agorecord fscked files in fsck db by default
Joey Hess [Mon, 17 Mar 2025 19:34:08 +0000 (15:34 -0400)]
record fscked files in fsck db by default

Remember the files that are checked, so a later run with --more will
skip them, without needing to use --incremental.

9 months agoMerge branch 'master' of ssh://git-annex.branchable.com
Joey Hess [Mon, 17 Mar 2025 18:33:11 +0000 (14:33 -0400)]
Merge branch 'master' of ssh://git-annex.branchable.com

9 months agodecided to leave message as-is
Joey Hess [Mon, 17 Mar 2025 18:31:43 +0000 (14:31 -0400)]
decided to leave message as-is

"getting input <file> from <remote>"  is talking about the original
input filename. I think that's ok.

9 months agodecided addcomputed will not support annex.smallfiles
Joey Hess [Mon, 17 Mar 2025 18:27:56 +0000 (14:27 -0400)]
decided addcomputed will not support annex.smallfiles

If it did, recompute would need to somehow support recomputing
non-annexed files.

And, annex.smallfiles is typically used for configuration files or
source code kind of things, where the user doesn't want it to be an
annexed file. Computed artifacts are not likely that kind of thing.

Also, git-annex importfeed is an example of something that does support
annex.addunlocked, but does not support annex.smallfiles.

9 months agoannex.addunlocked support for git-annex compute
Joey Hess [Mon, 17 Mar 2025 18:26:09 +0000 (14:26 -0400)]
annex.addunlocked support for git-annex compute

And for git-annex recompute, add the file unlocked when the original is
unlocked.

9 months agosupport building with old version of bytestring
Joey Hess [Fri, 14 Mar 2025 18:44:22 +0000 (14:44 -0400)]
support building with old version of bytestring

9 months agofix comment typo
Joey Hess [Fri, 14 Mar 2025 15:36:40 +0000 (11:36 -0400)]
fix comment typo

9 months ago(no commit message)
Atemu [Fri, 14 Mar 2025 12:32:04 +0000 (12:32 +0000)]

9 months ago(no commit message)
Atemu [Fri, 14 Mar 2025 12:14:03 +0000 (12:14 +0000)]

9 months ago(no commit message)
Atemu [Fri, 14 Mar 2025 12:11:19 +0000 (12:11 +0000)]

9 months ago(no commit message)
msz [Wed, 12 Mar 2025 19:50:01 +0000 (19:50 +0000)]

9 months agoAdded a comment
msz [Wed, 12 Mar 2025 19:44:23 +0000 (19:44 +0000)]
Added a comment

9 months agoadd compute tip
Joey Hess [Wed, 12 Mar 2025 17:43:50 +0000 (13:43 -0400)]
add compute tip

9 months agorecompute: stage new version of file in git
Joey Hess [Wed, 12 Mar 2025 17:36:16 +0000 (13:36 -0400)]
recompute: stage new version of file in git

When writing doc/tips/computing_annexed_files.mdwn, I noticed
that a recompute --reproducible followed by a drop and a re-get did not
actually test if the file could be reproducible computed again.

Turns out that get and drop both operate on staged files. If there is an
unstaged modification in the work tree, that's ignored. Somewhat
surprisingly, other commands like info do operate on staged files. So
behavior is inconsistent, and fairly surprising really, when there are
unstaged modifications to files.

Probably this is rarely noticed because `git-annex add` is used to add a
new version of a file, and then it's staged. Or `git mv` is used to move
a file, rather than `mv` of a file over top of an existing file. So it's
uncommon to have an unstaged annexed file in a worktree.

It might be worth making things more consistent, but that's out of scope
for what I'm working on currently.

Also, I anticipate that supporting unlocked files with recompute will
require it to stage changes anyway.

So, make recompute stage the new version of the file.

I considered having recompute refuse to overwrite an existing staged
file. After all, whatever version was staged before will get lost when
the new version is staged over top of it. But, that's no different than
`git-annex addcomputed` being run with the name of an existing staged
file. Or `git-annex add` being run with a new file content when there is
an existing staged file. Or, for that matter, `git add` being ran with a
new content when there is an existing staged file.

9 months agotodo
Joey Hess [Wed, 12 Mar 2025 16:11:39 +0000 (12:11 -0400)]
todo

9 months agofix recompute --reproducible run on a VURL key
Joey Hess [Wed, 12 Mar 2025 15:48:29 +0000 (11:48 -0400)]
fix recompute --reproducible run on a VURL key

This avoids "Cannot generate a key for backend VURL", and makes it use
the usual hashing backend.

9 months agoimprove
Joey Hess [Tue, 11 Mar 2025 16:54:34 +0000 (12:54 -0400)]
improve

9 months agocomment
Joey Hess [Tue, 11 Mar 2025 16:53:32 +0000 (12:53 -0400)]
comment

9 months agoMerge branch 'master' of ssh://git-annex.branchable.com
Joey Hess [Tue, 11 Mar 2025 16:42:10 +0000 (12:42 -0400)]
Merge branch 'master' of ssh://git-annex.branchable.com

9 months agobuffer responses to compute programs in a TQueue
Joey Hess [Tue, 11 Mar 2025 16:40:21 +0000 (12:40 -0400)]
buffer responses to compute programs in a TQueue

This avoids a potential problem where the program sends several INPUT
before reading responses, so flushing the respose to the pipe could
block. It's unlikely, but seemed worth making sure it can't happen.

9 months agoclose off newline injection attacks against compute special remote protocol
Joey Hess [Tue, 11 Mar 2025 16:04:58 +0000 (12:04 -0400)]
close off newline injection attacks against compute special remote protocol

9 months agoupdate
Joey Hess [Tue, 11 Mar 2025 15:53:14 +0000 (11:53 -0400)]
update

9 months agoavoid error on missing compute state in checkKey
Joey Hess [Tue, 11 Mar 2025 15:49:47 +0000 (11:49 -0400)]
avoid error on missing compute state in checkKey

This improves eg `git-annex move --to` a compute remote that does not
contain the key. Rather than erroring with "Missing compute state" when
it checks if the key is in the remote, it proceeds to trying to store to
it, which has a nice error message.

9 months agoadd INPUT-REQUIRED
Joey Hess [Tue, 11 Mar 2025 15:46:31 +0000 (11:46 -0400)]
add INPUT-REQUIRED

Used by git-annex-compute-singularity to make addcomputed --fast work.

Also, simplified git-annex-compute-singularity; there is no need to hard
link the container into place. singularity does not care about the
extension of the container, so can just pass it the annex object file.

9 months agoAdded a comment: just thinking out loud
yarikoptic [Tue, 11 Mar 2025 15:15:15 +0000 (15:15 +0000)]
Added a comment: just thinking out loud

9 months agoMerge branch 'master' of ssh://git-annex.branchable.com
Joey Hess [Tue, 11 Mar 2025 15:13:21 +0000 (11:13 -0400)]
Merge branch 'master' of ssh://git-annex.branchable.com

9 months agoreorg and expand security section
Joey Hess [Tue, 11 Mar 2025 15:12:59 +0000 (11:12 -0400)]
reorg and expand security section

9 months agoAdded a comment
yarikoptic [Tue, 11 Mar 2025 15:09:20 +0000 (15:09 +0000)]
Added a comment

9 months agoexpand
Joey Hess [Mon, 10 Mar 2025 21:35:34 +0000 (17:35 -0400)]
expand

9 months agoresponse
Joey Hess [Mon, 10 Mar 2025 20:46:55 +0000 (16:46 -0400)]
response

9 months agoMerge branch 'master' of ssh://git-annex.branchable.com
Joey Hess [Mon, 10 Mar 2025 20:42:24 +0000 (16:42 -0400)]
Merge branch 'master' of ssh://git-annex.branchable.com

9 months agoadded git-annex-compute-singularity
Joey Hess [Mon, 10 Mar 2025 20:41:26 +0000 (16:41 -0400)]
added git-annex-compute-singularity

And implemented SANDBOX, which it needs.

9 months agocompute protocol debugging
Joey Hess [Mon, 10 Mar 2025 19:14:59 +0000 (15:14 -0400)]
compute protocol debugging

9 months agodocument output files must be regular files
Joey Hess [Mon, 10 Mar 2025 18:15:07 +0000 (14:15 -0400)]
document output files must be regular files

9 months agomake usage an error
Joey Hess [Mon, 10 Mar 2025 17:47:23 +0000 (13:47 -0400)]
make usage an error

9 months agocompute: disallow output files that are not regular files
Joey Hess [Mon, 10 Mar 2025 16:52:10 +0000 (12:52 -0400)]
compute: disallow output files that are not regular files

Use case where this came up is a compute program using singularity,
where the process inside the container will be allowed to write to the temp
directory, so could make eg a /etc/shadow symlink, which could then be
used to exfiltrate that from the system to wherever the annex object
might be pushed to.

It seemed better to fix this once in git-annex rather than in any such
compute program.

9 months agoAdded a comment
yarikoptic [Sun, 9 Mar 2025 01:02:55 +0000 (01:02 +0000)]
Added a comment

9 months agoAdded a comment: Any way to annotate what are input files?
yarikoptic [Sat, 8 Mar 2025 14:51:20 +0000 (14:51 +0000)]
Added a comment: Any way to annotate what are input files?

9 months agosymlink, don't hardlink
Joey Hess [Fri, 7 Mar 2025 21:15:54 +0000 (17:15 -0400)]
symlink, don't hardlink

hardlink can cause problems with unlocked files

9 months agodisconnect stdio for wasm binaries
Joey Hess [Fri, 7 Mar 2025 21:15:21 +0000 (17:15 -0400)]
disconnect stdio for wasm binaries

9 months agouse pwd and quote it
Joey Hess [Fri, 7 Mar 2025 20:06:37 +0000 (16:06 -0400)]
use pwd and quote it

Seems more portable and safe

9 months agocase
Joey Hess [Fri, 7 Mar 2025 20:03:35 +0000 (16:03 -0400)]
case

9 months agolayout
Joey Hess [Fri, 7 Mar 2025 20:03:09 +0000 (16:03 -0400)]
layout

9 months agolayout
Joey Hess [Fri, 7 Mar 2025 20:02:43 +0000 (16:02 -0400)]
layout

9 months agoadd git-annex-compute-wasmedge
Joey Hess [Fri, 7 Mar 2025 20:02:11 +0000 (16:02 -0400)]
add git-annex-compute-wasmedge

9 months agoredirect command stdout to stderr
Joey Hess [Fri, 7 Mar 2025 20:01:27 +0000 (16:01 -0400)]
redirect command stdout to stderr

Otherwise it will be interpreted as compute program protocol

9 months agomake OUTPUT subdirs
Joey Hess [Fri, 7 Mar 2025 18:57:12 +0000 (14:57 -0400)]
make OUTPUT subdirs

Simplifies compute programs.

9 months agoMerge branch 'master' of ssh://git-annex.branchable.com
Joey Hess [Fri, 7 Mar 2025 18:50:11 +0000 (14:50 -0400)]
Merge branch 'master' of ssh://git-annex.branchable.com

9 months agocompute: add response to OUTPUT
Joey Hess [Fri, 7 Mar 2025 18:47:34 +0000 (14:47 -0400)]
compute: add response to OUTPUT

This allows rejecting output filenames that are outside the repository,
and also handles converting eg "-foo" to "./-foo" to prevent a command
that it's passed to interpreting the output filename as a dashed option.

9 months agoremove todo I just added
Joey Hess [Fri, 7 Mar 2025 17:29:57 +0000 (13:29 -0400)]
remove todo I just added

If a compute program does this, it has a security hole. Not git-annex.

9 months agotodo
Joey Hess [Fri, 7 Mar 2025 17:24:11 +0000 (13:24 -0400)]
todo

9 months ago(no commit message)
jasonb@ab4484d9961a46440958fa1a528e0fc435599057 [Fri, 7 Mar 2025 04:13:24 +0000 (04:13 +0000)]

9 months agoinitial report on slow thaw
yarikoptic [Thu, 6 Mar 2025 22:40:35 +0000 (22:40 +0000)]
initial report on slow thaw

9 months agoimprove
Joey Hess [Thu, 6 Mar 2025 18:54:05 +0000 (14:54 -0400)]
improve

9 months agoadd git-annex-compute-imageconvert
Joey Hess [Thu, 6 Mar 2025 18:47:22 +0000 (14:47 -0400)]
add git-annex-compute-imageconvert

9 months agoprefix output with ./ in example
Joey Hess [Thu, 6 Mar 2025 18:42:07 +0000 (14:42 -0400)]
prefix output with ./ in example

9 months agono longer a draft
Joey Hess [Thu, 6 Mar 2025 18:29:07 +0000 (14:29 -0400)]
no longer a draft

9 months agoMerge branch 'compute'
Joey Hess [Thu, 6 Mar 2025 18:23:58 +0000 (14:23 -0400)]
Merge branch 'compute'

9 months agopreparing to merge compute
Joey Hess [Thu, 6 Mar 2025 18:22:45 +0000 (14:22 -0400)]
preparing to merge compute

9 months agoupdate
Joey Hess [Thu, 6 Mar 2025 17:34:51 +0000 (13:34 -0400)]
update

9 months agoAdded a comment: Special use case for Scientific application
jerome.charousset@86fd8ed1bf55902989d7e70a11c38cb3a444b72d [Thu, 6 Mar 2025 17:02:22 +0000 (17:02 +0000)]
Added a comment: Special use case for Scientific application

9 months agoupdate
Joey Hess [Thu, 6 Mar 2025 16:52:12 +0000 (12:52 -0400)]
update

9 months agoavoid unncessary git-annex branch changes for recompute and addcomputed
Joey Hess [Thu, 6 Mar 2025 16:41:30 +0000 (12:41 -0400)]
avoid unncessary git-annex branch changes for recompute and addcomputed

9 months agocomputation progress display
Joey Hess [Wed, 5 Mar 2025 17:46:06 +0000 (13:46 -0400)]
computation progress display

9 months agoAdded a comment
matrss [Wed, 5 Mar 2025 15:40:44 +0000 (15:40 +0000)]
Added a comment

9 months agoAdded a comment
bpoldrack [Wed, 5 Mar 2025 14:23:57 +0000 (14:23 +0000)]
Added a comment

9 months agoTag copy_file_range todo with projects/INM7 (came from our cluster)
msz [Wed, 5 Mar 2025 13:35:19 +0000 (13:35 +0000)]
Tag copy_file_range todo with projects/INM7 (came from our cluster)

9 months agoAdded a comment: DataLad exploration of the compute on demand space
msz [Wed, 5 Mar 2025 13:31:41 +0000 (13:31 +0000)]
Added a comment: DataLad exploration of the compute on demand space

9 months agoAdded a comment
msz [Wed, 5 Mar 2025 11:27:39 +0000 (11:27 +0000)]
Added a comment

9 months agofilled out bug description
kenta [Wed, 5 Mar 2025 00:00:19 +0000 (00:00 +0000)]
filled out bug description

9 months agoOsPath build fixes
Joey Hess [Tue, 4 Mar 2025 19:50:15 +0000 (15:50 -0400)]
OsPath build fixes

9 months agomark unused parameter
Joey Hess [Tue, 4 Mar 2025 19:46:30 +0000 (15:46 -0400)]
mark unused parameter

While unused, it seems to make sense to keep it, since it explains what
the function is doing.

9 months agoupdate todo
Joey Hess [Tue, 4 Mar 2025 19:02:02 +0000 (15:02 -0400)]
update todo

9 months agosafer git sha object filename
Joey Hess [Tue, 4 Mar 2025 18:54:13 +0000 (14:54 -0400)]
safer git sha object filename

Rather than use the filename provided by INPUT, which could come from user
input, and so could be something that looks like a dashed parameter,
use a .git/object/<sha> filename.

This avoids user input passing through INPUT and back out, with the file
path then passed to a command, which could do something unexpected with
a dashed parameter, or other special parameter.

Added a note in the design about being careful of passing user input to
commands. They still have to be careful of that in general, just not in
this case.

9 months agocycle detection
Joey Hess [Tue, 4 Mar 2025 18:06:55 +0000 (14:06 -0400)]
cycle detection

9 months agoimprove error message when unable to get an input file
Joey Hess [Tue, 4 Mar 2025 17:13:18 +0000 (13:13 -0400)]
improve error message when unable to get an input file

In this case, the compute program is run the same as if addcomputed --fast
were used, so it should succeed, without outputting a computed file.

computeInputsUnavailable is in ComputeState for simplicity, but it is
not serialized with the rest of the ComputeState.

9 months agoupdate location log after getting input file from remote
Joey Hess [Tue, 4 Mar 2025 16:51:38 +0000 (12:51 -0400)]
update location log after getting input file from remote

9 months agobetter wording
Joey Hess [Tue, 4 Mar 2025 16:43:50 +0000 (12:43 -0400)]
better wording

Avoids this contradiction:

(Auto enabling special remote foo...)

  Not enabling compute special remote c2 because [..]

9 months agocompute remote: get input files from other remotes
Joey Hess [Tue, 4 Mar 2025 15:06:58 +0000 (11:06 -0400)]
compute remote: get input files from other remotes

This needed some refactoring to avoid cycles, since Remote.Compute
cannot import Remote.List. Instead, it uses Annex.remotes. Which must be
populated by something else, but we know it has been, because something
is using Remote.Compute, which it must have found in the remote list,
which populates that.

In Remote.Compute, keyPossibilities' is called with all loggedLocations,
without the trustExclude DeadTrusted that keyLocations does. There is
another cycle there. This may be a problem if a dead repository is still
a remote.

This is missing cycle prevention, and it's certianly possible to make 2
files in the compute remote co-depend on one-another. Hopefully not in a
real world situation, but it an attacker could certainly do it. Cycle
prevention will need to be added to this.

9 months agomove showOutput into compute remote
Joey Hess [Tue, 4 Mar 2025 14:02:33 +0000 (10:02 -0400)]
move showOutput into compute remote

9 months agorename config to annex.security.allowed-compute-programs
Joey Hess [Mon, 3 Mar 2025 20:07:04 +0000 (16:07 -0400)]
rename config to annex.security.allowed-compute-programs

And require for enable as well as autoenable.

It seemed asking for trouble for `git-annex enable foo` to use whatever
compute program is stored in the git config, without verifying that the
user wants that program to be used.

Note that it would be good to allow `git-annex enable foo program=...`
to be used without the program being in the git config. Not implemented yet
though.

9 months agoautoenable security for compute special remote
Joey Hess [Mon, 3 Mar 2025 19:47:09 +0000 (15:47 -0400)]
autoenable security for compute special remote

Added annex.security.autoenable-compute-programs and only allow
autoenabling special remotes that use compute programs on that list.

The reason this is needed is a user might have some compute programs
that are less safe to use than others. They might want to use an unsafe
one only with one repository, where they are the only committer or other
committers are trusted. They might be ok with others being used by any
repository, and if so they can add them to the list.

Another reason would be a user who has installed a compute program by
accident. Eg, it might be included with git-annex at some point, or
pulled in by some dependency. That user doesn't necessarily want that
compute program to be used in an autoenabled special remote.

9 months agorecompute: display one of the changed files
Joey Hess [Mon, 3 Mar 2025 19:12:19 +0000 (15:12 -0400)]
recompute: display one of the changed files

9 months agoavoid recomputing every time on git inputs
Joey Hess [Mon, 3 Mar 2025 18:56:49 +0000 (14:56 -0400)]
avoid recomputing every time on git inputs

9 months agosupport git files as input to computations
Joey Hess [Mon, 3 Mar 2025 15:59:04 +0000 (11:59 -0400)]
support git files as input to computations

Using GIT keys, like are used when exporting git files to special
remotes. Except here the GIT key refers to a file checked into the git
repo.

Note that, since the compute remote uses catObject to get the content,
a symlink that is checked into git does not get followed. This is important
for security, because following a symlink and adding the content to the
repo as an annex object would allow exfiltrating content from outside
the repository.

Instead, the behavior with a symlink is to run the computation on the
symlink target. This may turn out to be confusing, and it might be worth
addcomputed checking if the file in git is a symlink and erroring out.
Or it could follow symlinks as long as the destination is a file in the
repisitory.

9 months agofactor out Annex.GitShaKey
Joey Hess [Mon, 3 Mar 2025 15:08:36 +0000 (11:08 -0400)]
factor out Annex.GitShaKey

9 months agorecord VURL key hashes in addcomputed and recompute
Joey Hess [Mon, 3 Mar 2025 14:57:56 +0000 (10:57 -0400)]
record VURL key hashes in addcomputed and recompute

9 months agoAdded a comment: Permission fix
czard [Mon, 3 Mar 2025 12:08:28 +0000 (12:08 +0000)]
Added a comment: Permission fix

9 months agorecord VURL key hashes when getting from compute remote
Joey Hess [Thu, 27 Feb 2025 20:19:41 +0000 (16:19 -0400)]
record VURL key hashes when getting from compute remote

Like when getting from the web special remote, when the output of the
computation has changed, record the new hash of the content as an
equivilant key for the VURL key.

Still needs to be done for addcomputed and recompute.

9 months agofix build
Joey Hess [Thu, 27 Feb 2025 20:18:04 +0000 (16:18 -0400)]
fix build

9 months agorefactor
Joey Hess [Thu, 27 Feb 2025 20:17:42 +0000 (16:17 -0400)]
refactor

9 months agomany recompute improvements
Joey Hess [Thu, 27 Feb 2025 19:12:29 +0000 (15:12 -0400)]
many recompute improvements

I've lost track of them all, but it includes:

* Using the same key backend as was used in the original computation.
* Fixing bug that prevented updating the source file key in the compute
  state
* Handling --reproducible and --unreproducible.
* recompute --original of a file using VURL, when the result is
  different, but the key remains the same, makes the object file
  be updated with the new content
* Detecting some other ways the program behavior can change, just for
  completeness.
* Also adds --backend to addcomputed.

9 months agoAdded a comment
dmcardle [Thu, 27 Feb 2025 19:02:14 +0000 (19:02 +0000)]
Added a comment

9 months agorefactoring
Joey Hess [Thu, 27 Feb 2025 18:54:03 +0000 (14:54 -0400)]
refactoring

9 months agofix recompute of renamed files
Joey Hess [Thu, 27 Feb 2025 15:10:44 +0000 (11:10 -0400)]
fix recompute of renamed files

When a computed file has been renamed, a recompute needs to write to the
new filename.

I decided to remove --others because it's not clear what it should do in
the face of renames. Should it update only other files that have not
been renamed? Or update files that use the old key to the new key
anywhere in the tree? Or write the other files to the cwd, ignoring
renames? Since --others is just a way to save on compute time, adding
this complexity at this point seems like a bad idea. May revisit later.

Added temporary TODO-compute file

9 months agotodo
Joey Hess [Wed, 26 Feb 2025 19:59:47 +0000 (15:59 -0400)]
todo

9 months agorecompute closer to working properly
Joey Hess [Wed, 26 Feb 2025 19:51:31 +0000 (15:51 -0400)]
recompute closer to working properly

Proper behavior without --others implemented.

And eliminated most of the code duplication through refactoring.

Also, changed it to not stage recomputed files. This way, git diff will
show files that have differences.

9 months agorefactor
Joey Hess [Wed, 26 Feb 2025 18:05:37 +0000 (14:05 -0400)]
refactor

9 months agostarted git-annex recompute
Joey Hess [Wed, 26 Feb 2025 15:25:32 +0000 (11:25 -0400)]
started git-annex recompute

The perform action of this still needs work to do the right thing.
In particular, it currently behaves as if --others was always set.
And, it duplicates a lot of code from addcomputed.