concepts.rst 3.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
  1. ========
  2. Concepts
  3. ========
  4. The main goal of the system is to provide *users* and *groups* of users remote
  5. access to *datasets*. The workflow should center around users accessing files
  6. locally but allow mirroring datasets remotely. To facilitate larger compute
  7. resources, non-attending remote computation should be made possible too.
  8. Datasets
  9. ========
  10. In a generic sense datasets are files and directories of files Datasets are
  11. *owned* by one user but can be given *read* access to other users and groups of
  12. users.
  13. Dataset can either be *original* root datasets or *derived* from a parent
  14. dataset. For example, a tomographic scan yielding dark, reference and projection
  15. frames is the origin for subsequent datasets that might contain sinograms,
  16. reconstructions, segmentations and final analysis. To derive a dataset from
  17. another, the original dataset must be *closed* to prevent modifications and
  18. provide a reproducible chain of intermediate results.
  19. Typed datasets
  20. --------------
  21. A generic dataset covers all kinds of data but cannot be used to deduce
  22. information for automatic processing. Hence, a hierarchy of types with
  23. pre-determined attributes is required.
  24. Types will also help to provide dataset-specific previews. For generic datasets
  25. only a vague file browser kind of preview is possible. For known datasets we can
  26. provide specific previewers, for example a WebGL-based 3D visualization for a
  27. reconstruction or segmented volume.
  28. .. _programmatic-derivation:
  29. Programmatic derivation
  30. -----------------------
  31. Besides manual derivation by the user, it should be possible to let a background
  32. program process an input dataset on behalf of a user.
  33. Collections
  34. ===========
  35. Collections are a number of datasets that belong together *logically*. This
  36. concept models typical workflows, e.g. scanning a sample yields flats, darks and
  37. projections, reconstructing from this data yields a volume which then can be
  38. segmented to yet another dataset.
  39. Architecture
  40. ============
  41. The system is based on a client-server architecture. The server manages user
  42. roles, authorization, authentication and remote data storage. The system
  43. distinguishes between the actual dataset whose data and metadata is managed by
  44. the server and a local *working directory* containing a copy of the datasets' data.
  45. The user either declares an existing working directory to be a dataset by
  46. *initializing* and registering it with the server or checking out and retrieving
  47. the data into a working directory. The user *pushes* the working directory
  48. for the sake of synchronizing the remote and the local state.
  49. .. note::
  50. From my point of view pushing is *not* the place for commit-like actions,
  51. i.e. denoting a new version or whatever but merely storing the data
  52. remotely. But this point is open for discussion.
  53. The server provides a managing system view for web clients as well as an API
  54. view for programmatic access. This API is consumed by a local client to
  55. 1. create a new dataset from a local directory,
  56. 2. list available datasets for *cloning* and *deletion* as well as
  57. 3. push data to the remote server.
  58. Token-based authentication
  59. --------------------------
  60. Storing user name and password for the local client is not advised, hence each
  61. user can generate a token that is used by the API to authenticate and authorize
  62. resource access. To prevent third-party abuse, tokens can be revoked at any
  63. time.
  64. Token-based authentication is also used for :ref:`programmatic-derivation` in
  65. which server-side processes use the authentication token to implement data
  66. processing on behalf of the user.
  67. Error handling
  68. --------------
  69. All server-side errors must be mapped to appropriate HTTP status codes with a
  70. JSON response detailing the error.
  71. Related and prior work
  72. ======================
  73. * dCache
  74. * iCAT