Nebulous Issues


We currently do not have a validation system in place. It would be useful to have a neb-fsck program that could do the following things:

  1. Scan the instance table, and confirm that the files exist on disk.
  2. Confirm that the md5sum matches for all instances of a given storage object.
  3. Scan the nebulous directory trees and identify files that do not have a matching entry in the instance table. These files could be reinserted, as the full disk path describes the ins_id and the ext_id for a given file. It could be reassociated with the storage_object (if it exists), or simply deleted.

In addition, all nebulous tasks that create a file or copy a file should validate that the files agree before the function completes successfully. Not doing this can introduce errors if we silently replicate a broken copy of the file. The client side replicate call does do md5sum validation on replicate.

Database structure

The user defined number of copies of a storage_object is recorded in the storage_object_xattr table. This requires joins, and given that most storage objects have an entry here, moving this information from storage_object_xattr into a column of storage_object makes sense.

We also currently do not record an md5sum value for every storage_object. This is somewhat expensive to compute, but would provide a record of what the file contents are, so the neb-fsck program would have a reference to work from. Otherwise, if the md5sums disagree between two files, how do we decide which file is correct?

Location awareness

We do not currently have each nebulous request tagged with the host it came from. If we did so, then we could have nebulous attempt to supply a copy from that host if one exists. This would reduce the network interaction and increase throughput. This can also be extended into a preference hierarchy, such that if this host doesn't have a copy, look in this cabinet, and then across all the cabinets at this location, and finally across the entire nebulous set.