System RolesSystem Roles

Roles Overview

While the client consists in a single role (the Web Client role), server-side components include three roles: the Web Service role (implements a secure XML Web API), the Back-End role (implements distributed computations), and the Database role (implements a SQL-based storage system). All roles can be collapsed into a single machine, or assigned to separate physical or even virtual machines.

Web Client Role

Clients implementing the Web Client role run within a browser. Web clients send XML requests to Web Service machines to perform all data-related operations. They are also responsible for implementing data visualization and user interaction capabilities (using Microsoft Silverlight technology). To reduce traffic and latency, Web clients also supports local data caching and data compression.

Web Service Role

Servers implementing the Web Service role are stateless Web servers which implement the secure XML-based Web API. Web Service servers are responsible for accepting and processing XML requests, while enforcing security. Web Service servers may be used to create and start new tasks. However such tasks are processed in the background by Back-End servers. Multiple Web Service servers can be deployed to scale out.

Back-End Role

Servers implementing the Back-End role implement a computing mesh responsible for scheduling and executing tasks. Distributed computing is used to increase scalability, while allowing users to obtain results more quickly. The distributed computing infrastructure is able to manage task priorities, detect abandoned tasks, restart failed tasks, terminate long-running tasks, and synchronize task execution between nodes. Multiple Back-End servers can be deployed to scale out.

Database Role

Servers implementing the Database role implement a SQL-based storage system. The system uses multiple databases to manage data. A single master database acts as a resource directory, while data databases are used to store uploaded data & analysis results. For example, when new data is uploaded, it is stored in a new table within a data database. However, statistics about this data (as well as directory information about the new table) are stored in the master database.

System Roles

System DataSystem Data

Users, Workspaces, Rights

To create a new account, users must first register. For security reasons, users must resolve a visual CAPTCHA challenge. If the system has been configured to require e-mail confirmation, users must click on an activation link before they can log in. Registered users can create workspaces, which are essentially secure containers encapsulating objects such as data sets, analysis results, images, or comments. Registered users can also share workspaces with other users by granting them specific rights (ex: read, write, manage). However, users must confirm they accept such invitations by approving the granted right before it becomes effective.

Databases, Tables, Fields

The master database does not store uploaded data. Uploaded data (along with analysis results) is stored in separate data databases. Instead, the master database keeps an inventory of all data databases, tables within these data databases, and even fields within these tables. For example, for each data table, the master database keeps track of associated field names, field types, and field statistics. The inventory mechanism essentially allows the system to find where data is stored, and to quickly access summary information about the data. New uploaded data is stored in the master database in a staging area, in the form of chunk records. Once the upload is complete, a new table is created in a data database, and chunk records are deleted.

Nodes, Jobs, Tasks

The master database contains records which participate distributed computing. When a server is added to the grid, a new processing node record is registered in SQL. When a new task is started, a new job record (marked as scheduled) is created. The distributed system ensures that nodes compete for job execution. Ultimately, each job is assigned to a single node, which becomes responsible for executing the task. Upon completion (or failure), the job record is updated, so as to signal its new status to all nodes. Each job is able to spawn child jobs, which may execute on different nodes. The distributed computing infrastructure is able to manage task priorities, detect abandoned tasks, restart failed tasks, terminate long running tasks, and synchronize task execution between nodes.

Keys, Licenses, Logs

The master database contains records used to implement and enforce system security. The organization license table stores digitally signed license keys, which impose specific usage restrictions on the organization and its users. When authentication tickets, CAPTCHA challenges, or license keys must be verified, records stored in the key table are used to perform cryptographic operations. When a user makes a change to a workspace or objects it contains, a record containing event information is stored in the log table.

Comments, Downloads, Images, Settings

The master database contains tables used to implement collaboration features. When a user creates a comment, a record containing comment details is stored in the comment table. When a user requests data download, a record specifying how data should be filtered before it is streamed out is stored in the download table. When a user exports or publishes content as an image, a record containing bitmap information is stored in the image table.

System Data

System SecuritySystem Security

Right Enforcement

To secure operations which read data, the XML Web API performs a SQL join between the target table, the workspace table, and the right table. For example, consider the case of a caller retrieving all comments he or she has access to. The system will find all workspaces the user has read (or better) rights to, and perform a join with comments under these workspaces. To secure operations which modify data, the XML Web API simply checks if the caller has rights to perform the operation. For example, consider the case of a caller deleting a comment. The system will find under which workspace the comment is located, and check if the caller has been granted write (or better) rights to this workspace.

License Restrictions

Licenses determine which restrictions are applied to the organization and its users. Each license key is digitally signed, and specifies limits, such as an expiry date, how many rows can be imported, how many data sets can be created, etc. Data Applied uses a public/private key pair mechanism to generate license keys. This choice explains why our license keys are much longer than those used by many products. Organization licenses are cumulative: each added organization license unlocks additional capabilities by lifting restrictions. User licenses however are restrictive: user licenses restrict default capabilities granted by organization licenses. The system validates entered license information, and enforces such restrictions.

Secure Delegation

Data Applied includes powerful secure delegation capabilities: users can allow third-parties to perform operations on their behalf - securely. Authenticated users can issue restricted tickets which allow authentication but also impose a set of usage restrictions. For example, a user could obtain a restricted ticket which is only valid for 5 minutes, and only allows the caller to retrieve a specific entity type from a specific workspace. The restricted ticket could then be sent to a third-party application, allowing it to securely perform certain operations on behalf of the user. When a restricted ticket is generated, a record is created in the database to keep track of the list of restrictions to apply. When the XML Web API receives a request which includes a restricted ticket, it loads the list of restrictions, and applies them to the execution pipeline. As with non-restricted tickets, cryptographic signed hashes are used to verify tickets.

Cryptographic Validations

Users who register must supply a user name and password. The system computes and stores a cryptographic hash of the user name + password combination. This means that actual passwords are never stored, and that password hashes cannot be compared. When authenticating using a log in name and password, the system again computes a cryptographic hash of the user name + password combination, and checks whether it matches the stored hash. If so, the system issues an authentication ticket, composed of a user ID followed by a cryptographic signed hash of this user ID. When presented with a ticket, the system extracts the user ID, again computes a cryptographic hash of this user ID, and checks whether it matches the hash specified by the ticket (all in memory). If valid, the system accepts the authentication claim, and performs further security checks using this identity. Finally, when processing license keys, the system uses a well-known public key to verify the digital signature. The private key is kept confidential however, because it is used to generate valid license keys.

System Security