* Added OsPath build flag, which speeds up git-annex's operations on files.
* git-lfs: Added an optional apiurl parameter.
(This needs version 1.2.5 of the haskell git-lfs library to be used.)
+ * fsck: Remember the files that are checked, so a later run with --more
+ will skip them, without needing to use --incremental.
-- Joey Hess <id@joeyh.name> Mon, 20 Jan 2025 10:24:51 -0400
#endif
data Incremental
- = NonIncremental
+ = NonIncremental (Maybe FsckDb.FsckHandle)
| ScheduleIncremental Duration UUID Incremental
| StartIncremental FsckDb.FsckHandle
| ContIncremental FsckDb.FsckHandle
prepIncremental :: UUID -> Maybe IncrementalOpt -> Annex Incremental
-prepIncremental _ Nothing = pure NonIncremental
prepIncremental u (Just StartIncrementalO) = do
recordStartTime u
ifM (FsckDb.newPass u)
Nothing -> StartIncrementalO
Just _ -> MoreIncrementalO
return (ScheduleIncremental delta u i)
+prepIncremental u Nothing =
+ ifM (Annex.getRead Annex.fast)
+ -- Avoid recording fscked files in --fast mode,
+ -- since that can interfere with a non-fast incremental
+ -- fsck.
+ ( pure (NonIncremental Nothing)
+ , (NonIncremental . Just) <$> openFsckDb u
+ )
cleanupIncremental :: Incremental -> Annex ()
cleanupIncremental (ScheduleIncremental delta u i) = do
withFsckDb :: Incremental -> (FsckDb.FsckHandle -> Annex ()) -> Annex ()
withFsckDb (ContIncremental h) a = a h
withFsckDb (StartIncremental h) a = a h
-withFsckDb NonIncremental _ = noop
+withFsckDb (NonIncremental mh) a = maybe noop a mh
withFsckDb (ScheduleIncremental _ _ i) a = withFsckDb i a
* `--incremental`
- Start a new incremental fsck pass. An incremental fsck can be interrupted
- at any time, with eg ctrl-c.
+ Start a new incremental fsck pass, clearing records of all files that
+ were checked in the previous incremental fsck pass.
* `--more`
- Resume the last incremental fsck pass, where it left off.
+ Skip files that were checked since the last incremental fsck pass
+ was started.
+
+ Note that before `--incremental` is used to start an incremental fsck
+ pass, files that are checked are still recorded, and using this option
+ will skip checking those files again.
Resuming may redundantly check some files that were checked
before. Any files that fsck found problems with before will be re-checked
on resume. Also, checkpoints are made every 1000 files or every 5 minutes
- during a fsck, and it resumes from the last checkpoint.
+ during a fsck, and it resumes from the last checkpoint, so if an
+ incremental fsck is interrupted using eg ctrl-c, it will recheck files
+ that didn't get into the last checkpoint.
* `--incremental-schedule=time`
On that note: There also does not appear to be a documented method to figure out whether a fsck was interrupted before. You could infer existence and date from the annex internal directory structure but seeing the progress requires manual sql.
Perhaps there could be a `fsck --info` flag for showing both interrupted fsck progress and perhaps also the progress of the current fsck.
+
+> I've implemented the default recording to the fsck database. [[done]]
+> --[[Joey]]
--- /dev/null
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 1"""
+ date="2025-03-17T18:34:20Z"
+ content="""
+I think it could make sense, when --incremental/--more are not passed, to
+initialize a new fsck database if there is not already one, and
+add each fscked key to the fsck database.
+
+That way, the user could run any combination of fscks, interrupted or not,
+and then use --more to fsck only new files. When the user wants to start
+a new fsck pass, they would use --incremental.
+
+It would need to avoid recording an incremental fsck pass start time,
+to avoid interfering with --incremental-schedule.
+
+The only problem I see with this is, someone might have a long-term
+incremental fsck they're running that is doing full checksumming.
+If they then do a quick fsck --fast for other reasons, it would
+record that every key has been fscked, and so lose their place.
+So it seems --fast should disable this new behavior. (Also incremental
+--fast fsck is not likely to be very useful anyway.)
+
+> I actually don't see much reason to not make use of an incremental fsck
+> either unless it's *really* old
+
+That's a hard judgement call for a program to make... someone might think
+10 minutes is really old, and someone else that a month is.
+
+As to figuring out whether a fsck was interrupted before, surely what
+matters is you remembering that? All git-annex has is a timestamp when
+the last fsck pass started, which is available in
+`.git/annex/fsck/*/state`, and a list of the keys that were fscked,
+which is not very useful as far as determining the progress of that fsck.
+"""]]