P2P Synchronization of Databases
This is a scratchpad document for ideas about replicating/synchronizing Photovault databases.
Basic requirements
- Several photovault databases can be members a a group
- Only authorized databases can join a group
- The database group has a group specific folder hierarchy
- All group members have also a private folder hierarchy that is not visible to other members
- All group members send info about udpdates to other available members. These will be propagated no other group members when these become available
- Security of individual photos: User can restrict whether a photo is published. If an already published photo is changed to no-publish state, it must be removed from other group members.
- If a photo file is not available in a local database it can query if it is available from other members of the group
- User can select how actual files are stored in local database
- store original, view resolution copy or only thumbnail
- Load on demand or all available photos
- policy for removing photos. E.g. LRU, store permanently
- Local database flushing policy: e.g. delete file only if at least 2 databases are kommitted to keep it indefinitely (or with the same policy than local database?)
Architecture
Group comminucation & synchronization
- Group communication using JXTA
- Group discovery
- Group joining
- Peer discovery
- Message passing between peers
- Using JXTA CMS for photo file transfer?
- Gossip like protocol for submitting changes between peers
- No need for transaction spanning multiple photos
Changes needed
JXTA integration
- Creation of new groups
- Peer discovery & authorization
- Gossip message passing
- Photo instance discovery & transfer
Gossip replication protocol
- Folder hierarchy synchronization
- Photo synchronization
- Getting initial state to a peer
Data model changes
- Version timestamps to PhotoInfo?, PhotoFolder?
- Refactor PhotoInfo? & ImageInstance? to match situation in which there may be several copies of same file
- Photo - current PhotoInfo?
- PhotoVersion? - nformation about the instance file
- PhotoVersionInstance? - Where instance file is stored, can it be removed, ...
- UUIDs to all synchronizable objects
- PhotoInfo?
- Photo instance
- PhotoFolder?
- Change subfolder mapping to be n:m
- To allow more flexible security model
- Exact semantics need to be thought of
- Storage of gossip log records
- Storage of information about discovered peers
Structural changes
- Change all access to data objects to be via action objects that can be stored in log record
Future improvements
What needs to be synchronized
- Folder hierarchy
- nowiki:PhotoInfo objects
- Photo instances (what instances exist, not information where the instance exists in certain database)
- GUIDs of photo's folder are synchronized in any case, even if the folder itself is not visible to receiving database.
General principles
- All synchronizable objects contain version field and vector of other known instances of the object (list of database UID + version number pairs)
XML export format
Example data:
<?xml version='1.0' ?>
<photovault-data source="cf5768f9dbe768f7795da8a84d2c643">
<folder guid="cf5768f9dbe768f7795da8a84d2c643_folder_AAAAAQ==" version="2" name="Top">
<description>Root folder</description>
</folder>
<folder guid="cf5768f9dbe768f7795da8a84d2c643_folder_AAAAAg=="
parent-guid="cf5768f9dbe768f7795da8a84d2c643_folder_AAAAAQ==" version="4" name="Subfolder">
<description>A subfolder</description>
<known-instances>
<instance db="cf5768f9dbe768f7795da8a84d24625" version="2">
</known-instances>
</folder>
</photovault-data>
Persistence for version vectors
Each object capable for synchronization must contain VersionVector? instance.
Open Issues
How are folder trees of different machines mapped to each other
- Proposal: The root folder of the hierarchy is always synchronized & visible as "Root of server". Then subfolders are visible under it until some folder is not synchronized. Subfolders of that kind of folder are not visible.
- Privacy: Since photos that belong to non-synchronized folder still contain the GUID of the folder it might be possible to deduct too much based on association between several images. How could we avoid this?
Security
- Ensuring that no confidential information is leaked
- Ensuring that unauthorized updates are not accepted
- How to start synchronization if an object is changed so that it is available for certain use?
- How to continue synchronization if access rights are first denied & then given back?
- Denial-of-service attacks:
- Increasing version of an object to maxint
- Flooding disk space with bogus log entries
- ...
Attachments
- cync-classes.png (6.9 kB) - added by harri 2 years ago.
