2. Concepts
This section defines and connects the various jargon that we'll be using throughout the rest of this training. Some terms might be straightforward; others only making sense after a practical example, which we shall tackle in the next section.
Compute
In non-technical language, the word compute is usually used as a verb synonymous with calculate.
In cloud computing, "compute" is more often used as an adjective or noun to refer to processing power; compute is one of the resources that you consume when you use the cloud or HPC for modelling and data analysis. It can be contrasted with other resources such as RAM/memory, storage, and networking.
Virtual machine
When using the cloud, the primary method for claiming and using compute is to create and access a virtual machine (VM). But what is a virtual machine?
The concept of a virtual machine goes back to the 1930s, when Alan Turing noted that any computer can simulate the calculations performed by any other computer. A powerful computer can even simulate several other computers simultaneously by dividing processing time between them.
When a real computer "acts like" another computer in this way, it is called emulation. The imaginary computer being emulated is called a virtual machine.
When you have a powerful real computer with a lot of central processing units (CPUs, or cores), it is able to divide up the cores between multiple smaller virtual machines. These powerful servers used for emulating multiple VMs are called compute hosts or hypervisors.
(Strictly speaking, "hypervisor" really refers to the low-level software and firmware responsible for emulation. However, we often play fast-and-loose with the language, referring to the whole server as a hypervisor.)
The Melbourne Research Cloud has approximately 200 servers, and each server has between 32 and 512 cores to divide up between our users' virtual machines. When you gain access to a virtual machine, it will look to you like a remote computer with maybe 2, 4, or 8 cores; but in physical terms it is being emulated by a hypervisor which is also running several other virtual machines for other users.
Instance
An instance is another name for a virtual machine.
When launching a virtual machine, you usually base it on a pre-made image, which you can think of as a template. Thus, a virtual machine instantiates a pre-made image.
Image
An image is a file used to populate a blank instance with an operating system; depending on the image, additional software applications may be included. We provide a variety of official images covering the most common Linux distributions, as well as Microsoft Windows.
Instance snapshot
An instance snapshot is an image created from an existing instance.
Say you've installed some software on an instance and after much effort have configured it just the way you like, and now you want a second instance setup the same way. By taking a snapshot, you can instantiate a new instance which will be identical.
OpenStack
A variety of resources need to work with each other to provide a viable cloud computing service:
- Compute and memory
- Storage
- Networking
- Account management
- A user interface (dashboard)
The Melbourne Research Cloud is based on a family of open source software collectively referred to as OpenStack. OpenStack consists of components designed to work with each other:
- Nova for compute/virtualisation
- Cinder for volume storage
- Neutron for networking
- Keystone for account management
- Horizon for the dashboard
OpenStack has other components as well, but these are the primary ones.
Openstack API and command line clients
OpenStack comes with an Application Programming Interface (API) which allows you to query and control your cloud resources in a lightweight and programmatic way without needing to use the browser-based dashboard. There are also free OpenStack command line clients designed to work with the API.
Flavors
In OpenStack, you can't simply state the number of cores and amount of memory and storage you want for each VM. Instead, you are constrained to choose from a variety of flavors; a flavor prescribes the size of a VM.
All our flavors come with 30G of storage (called the root disk), and an amount of compute/memory described in the flavor name.
For example, the uom.general.1c4g
flavor has 1 vCPU (virtual-CPU) and 4G of RAM, while the uom.general.4c16g
flavor has 4 vCPUs and 16G of RAM.
Availability Zone
An Availability Zone (AZ) describes the physical data centre location where an instance resides.
There is currently one MRC AZ: melbourne-qh2-uom
.
Storage
We've already mentioned that each virtual machine comes with a 30G root disk. That might not seem like very much, and you might wonder why there are no flavors with more storage available.
The reason is that most Linux systems are fairly small from a storage point of view; a Linux-based virtual machine could run quite happily even with just 10G of storage. However, we understand that you will often want additional storage in which to hold large data sets, and options exist to accommodate these needs.
Volume Storage
Volume storage is like a (large) plugin USB hard drive for your instance. You can make it whatever size you have quota for, and connect it to one instance at a time as needed. You can have multiple volumes per project, and you can connect multiple volumes to an instance simultaneously.
Although most Linux filesystems are resistant to fragmentation, a rule of thumb to optimise efficiency is keep them less than half full. Therefore, you are encouraged to request volume storage quota if you will need to add more than about 10G of data to an instance; and you should request at least twice as much as what you need.
Object Storage
Object Storage allows you to store files without being constrained by the size of the disk in which they reside. They can be accessed by multiple instances at a time, or indeed from anywhere in the world if you so choose.
Behind the scenes, object storage maintains at least three copies of every file, allowing automatic recovery should a file become corrupt due to hardware failure. This makes object storage very robust. Object storage is similar to the S3 service offered by Amazon Web Services.
Networking
Since a virtual machine is like a remote computer, it needs a network connection, a service to accept login attempts, and a firewall to control what types of traffic are allowed.
Default networks
When launching a virtual machine, you will be able to choose from two possible networks for your instance:
qh2-uom
, which exposes your instance to the public internet.qh2-uom-internal
, which places your instance inside a private network which can only be accessed from on campus or by VPN.
A third option, Classic Provider, is just equivalent to qh2-uom-internal
.
You will need to choose qh2-uom
if you want to offer a service such as a website to the public internet.
To increase your security, we recommend using qh2-uom-internal
if you don't need to expose your instance to the public.
Secure Shell
Secure Shell (ssh) is a protocol analogous to https, which you will be familiar with if you have ever visited an encrypted website. Like https, ssh sets up encrypted connections between your local client device and a remote server.
Whereas https is used to access websites, the primary purpose of ssh is to permit remote command line logins to servers. Certain other systems administration tasks such as securely copying files from one machine to another are also possible over ssh.
The most common ssh software is OpenSSH. Our official Linux images all have an OpenSSH service running by default, so your primary means of interacting with your Linux instances will probably be via ssh.
Naturally, hackers love it when they can gain an ssh login to someone else's machine, so exposing an ssh service to the public internet is not risk-free. We ameliorate the risk in two ways:
- Our official images run fail2ban, which automatically blocks IP addresses after too many failed login attempts.
- We do not enable password logins; instead, you are expected to use keypair authentication, which is both more convenient and more secure.
Keypairs
When logging into websites or physical machines, we are accustomed to authenticating with a username and password. This form of authentication has a couple of drawbacks:
- It is all based on the single factor of what you know. If what you know can be discovered or guessed, then it is easy for someone else to log in as you. Better security comes from layering multiple factors for authentication.
- It is less convenient to automate, if you want to programmatically access your instances in a non-interactive way.
Keypair authentication overcomes both of these issues: it adds an extra authentication factor (something you have), and it permits logging in without a password, which is much more convenient.
A keypair can be thought of as a lock and key (commonly known as a public key and private key respectively). The Cloud system keeps a copy of your lock and puts it on every instance you build. You can then open every instance with your private key. Your private key is simply a file which you keep on your computer.
Just like a lock, your public key is allowed to be visible to the public. Your private key must be kept secret; if anyone else ever gets a copy of your private key, they will be able to use it to impersonate you.
This article provides a similar - but more correct - use of the analogy.
Firewall and Security Groups
By default, our compute hosts have a firewall which will not let any network traffic in or out of your instance. You can add rules to the firewall by specifying security groups. Security groups can be added when creating the instance, or at any later time; changing the security groups should take effect immediately.
You usually want to allow all outbound traffic, so that your instance can request data from the external internet. This is a default security group rule, but sometimes people remove it, and then they run into problems later.
You should be more careful about what traffic you allow in. Before opening any port to the public internet, you should think about whether it really needs to be public, and you should ensure that any service you expose via that port has been configured for security.
We can't provide security advice for every possible service you might want to offer; instead, when preparing to expose a service, you should do independent research. A rule of thumb is to do an internet search for the name of the software in combination with keywords such as "hardening" or "security".
Linux
The recommended way to get the most out of the Melbourne Research Cloud is to base your virtual machines on one of the Linux images. Ubuntu in particular tends to be easiest to work with, especially for people who are new to Linux.
Linux is a free and open source operating system first released in the early 1990s; it is now the most ubiquitous operating system,
Getting Help
If you have technical questions, or would like to discuss your needs with us, you may request help or further hands on training here: