123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106 |
- ========
- Concepts
- ========
- The main goal of the system is to provide *users* and *groups* of users remote
- access to *datasets*. The workflow should center around users accessing files
- locally but allow mirroring datasets remotely. To facilitate larger compute
- resources, non-attending remote computation should be made possible too.
- Datasets
- ========
- In a generic sense datasets are files and directories of files Datasets are
- *owned* by one user but can be given *read* access to other users and groups of
- users.
- Dataset can either be *original* root datasets or *derived* from a parent
- dataset. For example, a tomographic scan yielding dark, reference and projection
- frames is the origin for subsequent datasets that might contain sinograms,
- reconstructions, segmentations and final analysis. To derive a dataset from
- another, the original dataset must be *closed* to prevent modifications and
- provide a reproducible chain of intermediate results.
- Typed datasets
- --------------
- A generic dataset covers all kinds of data but cannot be used to deduce
- information for automatic processing. Hence, a hierarchy of types with
- pre-determined attributes is required.
- Types will also help to provide dataset-specific previews. For generic datasets
- only a vague file browser kind of preview is possible. For known datasets we can
- provide specific previewers, for example a WebGL-based 3D visualization for a
- reconstruction or segmented volume.
- .. _programmatic-derivation:
- Programmatic derivation
- -----------------------
- Besides manual derivation by the user, it should be possible to let a background
- program process an input dataset on behalf of a user.
- Collections
- ===========
- Collections are a number of datasets that belong together *logically*. This
- concept models typical workflows, e.g. scanning a sample yields flats, darks and
- projections, reconstructing from this data yields a volume which then can be
- segmented to yet another dataset.
- Architecture
- ============
- The system is based on a client-server architecture. The server manages user
- roles, authorization, authentication and remote data storage. The system
- distinguishes between the actual dataset whose data and metadata is managed by
- the server and a local *working directory* containing a copy of the datasets' data.
- The user either declares an existing working directory to be a dataset by
- *initializing* and registering it with the server or checking out and retrieving
- the data into a working directory. The user *pushes* the working directory
- for the sake of synchronizing the remote and the local state.
- .. note::
- From my point of view pushing is *not* the place for commit-like actions,
- i.e. denoting a new version or whatever but merely storing the data
- remotely. But this point is open for discussion.
- The server provides a managing system view for web clients as well as an API
- view for programmatic access. This API is consumed by a local client to
- 1. create a new dataset from a local directory,
- 2. list available datasets for *cloning* and *deletion* as well as
- 3. push data to the remote server.
- Token-based authentication
- --------------------------
- Storing user name and password for the local client is not advised, hence each
- user can generate a token that is used by the API to authenticate and authorize
- resource access. To prevent third-party abuse, tokens can be revoked at any
- time.
- Token-based authentication is also used for :ref:`programmatic-derivation` in
- which server-side processes use the authentication token to implement data
- processing on behalf of the user.
- Error handling
- --------------
- All server-side errors must be mapped to appropriate HTTP status codes with a
- JSON response detailing the error.
- Related and prior work
- ======================
- * dCache
- * iCAT
|