Usage Policies and Application Procedure
Introduction
Data-Scope Compute and Storage Allocation
The Data-Scope machine provides world-class, data-intensive analysis capability for massive scientific datasets. Cluster resources are available for not-for-profit research and education through a lightweight proposal process.
Proposals will be reviewed on-demand as they are submitted by the Data-Scope Allocation Committee and the overall usage of the machine will be evaluated and reported quarterly. Inquiries about Data-Scope should be sent to ds-proposals@pha.jhu.edu.
Committee
Randal Burns, Computer Science (Chair)
Tamas Budavari, Physics and Astronomy
Ben Langmead, Computer Science
Sarah Wheelan, Oncology Biostatistics and Bioinformatics
John Pilam, Institute for Data-Intensive Engineering and Science
Rich Ercolani, Data-Scope Systems Administration, Institute for Data-Intensive Engineering and Science
Usage Policies
The Data-Scope is intended to provided a data-intensive analysis capability for Big Data problems. As such, the majority of users will run projects of finite duration, typically 3 to 6 months, and leverage Data-Scope's unique properties, fast I/O with SSDs or high computing density with GPUs. Proposals that use Data-Scope as a compute facility alone will be redirected to other JHU resources, such as the
Homewood High-Performance Computing Cluster (HHPC) or the GPU Laboratory.
Data-Scope Application
Researchers interested in utilizing the Data-Scope instrument should submit a short (1-2 page) document that addresses the following points. Proposals can be submitted as PDF documents by email to
ds-proposals@pha.jhu.edu.
Describe the scientific importance of the computation.
What computation/analysis will be performed?
What are size and format of the the input and outputs data?
Describe the code/software to be executed. Does it need to be customized for the Data-Scope?
How many and what types of Data-Scope resources do you require?
Do you need Windows or Unix?
Do you need GPUs? How many per node (0-2)?
Do you need SSDs? How many per node (max 12)?
What are your storage requirements?
How much scratch storage will the computation use?
How much long-term storage will be needed, and for how long?
Data Handling
How will you ingest data into the system? Over the network, Internet2, or via Sneakernet (by shipping disk drives)?
How will you retrieve results and to what ultimate destination?
Have you optimized your data layout? If yes, please describe how the data are arranged.
Provide a timeline for use of the machine
Initial deployment (small scale to develop and test codes in the Data-Scope environment)
Full-scale deployment (to perform analysis)
Destaging period (to remove results from the machine and deallocate resources)
Additional Information
Resident Services
It is expected that a minority fraction of the machine will be used to run long-standing services. The Institute for Data-Intensive Science and Engineering (IDIES) runs many such services, including SDSS.org, The Turbulence Project, and the Open Connectome Project. Proposals to this effect will be considered. However, this will always be a secondary usage of the machine.
Long-Term Storage
Permanent and backed-up storage may be available for projects that are long-term or generate data products that the Investigators cannot easily retrieve. Projects that wish to use long-term storage will be assessed a one-time charge back that will cover the acquisition and deployment of the storage, increasing the capacity of Data-Scope commensurately. Ask us about backup charges. These rates will be determined at the time of proposals, but we expect them not to exceed $100 per Terabyte.