P2P Synchronization of Databases

This is a scratchpad document for ideas about replicating/synchronizing Photovault databases.

Basic requirements

  • Several photovault databases can be members a a group
  • Only authorized databases can join a group
  • The database group has a group specific folder hierarchy
  • All group members have also a private folder hierarchy that is not visible to other members
  • All group members send info about udpdates to other available members. These will be propagated no other group members when these become available
  • Security of individual photos: User can restrict whether a photo is published. If an already published photo is changed to no-publish state, it must be removed from other group members.
  • If a photo file is not available in a local database it can query if it is available from other members of the group
  • User can select how actual files are stored in local database
    • store original, view resolution copy or only thumbnail
    • Load on demand or all available photos
    • policy for removing photos. E.g. LRU, store permanently
    • Local database flushing policy: e.g. delete file only if at least 2 databases are kommitted to keep it indefinitely (or with the same policy than local database?)

Architecture

Group comminucation & synchronization

  • Group communication using JXTA
    • Group discovery
    • Group joining
    • Peer discovery
    • Message passing between peers
    • Using JXTA CMS for photo file transfer?
  • Gossip like protocol for submitting changes between peers
    • No need for transaction spanning multiple photos

Changes needed

JXTA integration

  • Creation of new groups
  • Peer discovery & authorization
  • Gossip message passing
  • Photo instance discovery & transfer

Gossip replication protocol

  • Folder hierarchy synchronization
  • Photo synchronization
  • Getting initial state to a peer

Data model changes

  • Version timestamps to PhotoInfo?, PhotoFolder?
  • Refactor PhotoInfo? & ImageInstance? to match situation in which there may be several copies of same file
  • UUIDs to all synchronizable objects
  • Change subfolder mapping to be n:m
    • To allow more flexible security model
    • Exact semantics need to be thought of
  • Storage of gossip log records
  • Storage of information about discovered peers

Structural changes

  • Change all access to data objects to be via action objects that can be stored in log record

Future improvements

What needs to be synchronized

  • Folder hierarchy
  • nowiki:PhotoInfo objects
  • Photo instances (what instances exist, not information where the instance exists in certain database)
    • GUIDs of photo's folder are synchronized in any case, even if the folder itself is not visible to receiving database.

General principles

  • All synchronizable objects contain version field and vector of other known instances of the object (list of database UID + version number pairs)

XML export format

Example data:

<?xml version='1.0' ?>
<photovault-data source="cf5768f9dbe768f7795da8a84d2c643">
  <folder guid="cf5768f9dbe768f7795da8a84d2c643_folder_AAAAAQ==" version="2" name="Top">
    <description>Root folder</description>
  </folder>
  <folder guid="cf5768f9dbe768f7795da8a84d2c643_folder_AAAAAg==" 
          parent-guid="cf5768f9dbe768f7795da8a84d2c643_folder_AAAAAQ==" version="4" name="Subfolder">
    <description>A subfolder</description>
    <known-instances>
      <instance db="cf5768f9dbe768f7795da8a84d24625" version="2">
    </known-instances>
  </folder>
</photovault-data>

Persistence for version vectors

Each object capable for synchronization must contain VersionVector? instance.

Open Issues

How are folder trees of different machines mapped to each other

  • Proposal: The root folder of the hierarchy is always synchronized & visible as "Root of server". Then subfolders are visible under it until some folder is not synchronized. Subfolders of that kind of folder are not visible.
  • Privacy: Since photos that belong to non-synchronized folder still contain the GUID of the folder it might be possible to deduct too much based on association between several images. How could we avoid this?

Security

  • Ensuring that no confidential information is leaked
  • Ensuring that unauthorized updates are not accepted
  • How to start synchronization if an object is changed so that it is available for certain use?
  • How to continue synchronization if access rights are first denied & then given back?
  • Denial-of-service attacks:
    • Increasing version of an object to maxint
    • Flooding disk space with bogus log entries
    • ...

Attachments