Google Cloud Platform's Cloud Storage (GCS) is a "storage-as-a-service" solution that manages a project's data without requiring management of disks and servers. The service can be set up to automatically replicate and distribute data across multiple data centers in multiple regions. The storage abstraction is an object-based one, similar to the web. In addition, objects are commonly accessed via URIs with the gs:// prefix. In this set of codelabs, we will practice setting up and interacting with GCS.

In this lab, you will create a VM that will download up-to-date information about earthquakes that the USGS provides, create an image showing the data via a Python script, and then distribute the image via a storage bucket. As a result, the VM needs to be configured to create storage buckets as well as read and write objects to/from them.

Visit Compute Engine in the web console and begin the creation of a new Ubuntu 18.04 VM in us-west1-b using an f1-micro instance. Scroll down to the "Identity and API access" section. As the UI shows, unless otherwise specified, Compute Engine instances are assigned a "Compute Engine default service account" that controls its access to the platform.

In another window, bring up your project's IAM settings and find the service account. A partial example is shown below.

Answer the following questions:

Go back to your Compute Engine configuration. It is good practice to restrict the privileges of your VM in case it is ever compromised. The recommended way would be to create a custom service account that contains only the roles that the VM needs. Another way to restrict the VM's access is to use the legacy "Access scopes" mechanism shown in the UI. Hover over the question mark next to "Access scopes". Answer the following questions:

Thus, while the service account has roles attached to it that are overly permissive, the default access scope overly restricts them.

We will configure the VM to use the default service account and its roles, but customize the access scope to enable the functions of the lab. Under "Access scopes", select "Set access for each API".

Scroll down to where Storage is configured to see its default setting. Change the setting to "Full"

Create the VM and wait for it to come up. Then, from the Compute Engine console, click on "ssh" to bring up a session on it.

Note that the VM can also be created via the gcloud CLI as shown below:

gcloud compute instances create usgs \
  --machine-type f1-micro --zone us-west1-b  \
  --image-project ubuntu-os-cloud --image-family ubuntu-1804-lts \
  --scopes storage-full

On your ssh session on the Compute Engine VM, clone the repository containing the code for the lab and change directories into it.

git clone https://github.com/GoogleCloudPlatform/training-data-analyst

cd training-data-analyst/CPB100/lab2b

Download the latest earthquake data as a CSV file. Examine the first two rows of the file.

wget https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_week.csv -O earthquakes.csv

head -2 earthquakes.csv

Answer the following question:

Install the required Python3 packages for the lab including matplotlib, numpy, and requests:

sudo apt-get update -y
sudo apt-get install -y python3-mpltoolkits.basemap python3-numpy python3-matplotlib python3-requests

The Python script transform.py will ingest the earthquake data in the CSV file we have downloaded and create a visual representation of the earthquakes and their magnitudes on a world map. First, the script includes the packages it relies upon including matplotlib (mpl) and numpy (np) as well as a Basemap class that provides the world map for us to overlay the data onto.

Then, an Earthquake class is defined. Its constructor is a parsed row from the CSV file. As the class shows, 4 individual fields from the row are accessed to create the instance.

The function get_earthquake_data(url) takes a URL (or a local file if given a file:///), downloads the URL, creates a CSV reader to parse each row returned, and creates a list of Earthquakes using each row returned by the CSV reader using a Python list comprehension. Another list comprehension is then used to filter out all rows in which the quake magnitude is not positive.

Next, the Basemap (m) is created and visual markers are set up to signify different levels of earthquake magnitudes.

The script then takes all of the quakes and plots them using the markers on the map. It does so by calculating the x and y coordinates on the plot via the longitude and latitude information, then determining the marker size and color using get_marker.

Finally, the image is emitted to the given file (earthquakes.png)

Run the code to generate the image:

python3 transform.py

The file earthquakes.png should have been created. Unfortunately, we need a way to access it. While we could try to copy it over ssh, we will instead use Cloud Storage. Most command-line functionality on Google Cloud is done via the gcloud command with the exception of Cloud Storage and Big Query which have their own commands in the SDK (gsutil and bq). To begin with, make a new bucket with gsutil mb using a unique bucket name.

gsutil mb gs://<UNIQUE_BUCKET_NAME>

Then use gsutil cp to copy all of the earthquake files, including the image, to the bucket:

gsutil cp earthquakes.* gs://<UNIQUE_BUCKET_NAME>

In the web console, bring up Cloud Storage, navigate to the bucket you have created, and click on the earthquake.png file. Show the image that has been created. Take a screenshot that includes your project identifier and include it in your notebook.

The previous lab used the Compute Engine default service account and set its access scopes to restrict the VM to full access only to Cloud Storage. The access scope method is a legacy mechanism and it is undesirable to have multiple, disparate ways to perform access control as it increases complexity. Instead, best practices for implementing least-privileges on Google Cloud is to set the access scope to allow the entire platform, but to create service accounts with the minimal roles and permissions attached to them. This lab will demonstrate how this is done using the bucket previously created.

Go to IAM and visit "Service accounts"

Create a new service account and call it gcs-lab.

Then, add a role to the service account that only allows it to read objects in your buckets (Storage Object Viewer)

Skip the process for assigning per-user access to the service account as we will be assigning this account to a subsequent Compute Engine VM.

You should see the service account in the UI after its creation:

Visit Compute Engine in the web console and begin the creation of a new Ubuntu 18.04 VM in us-west1-b using an f1-micro instance. Scroll down to the "Identity and API access" section. Instead of using the Compute Engine default service account, select the service account you have just created instead.

With this service account, all accesses to resources will be done according to the roles that have been associated with it (i.e. Storage Object Viewer).

As before, the VM can also be created via the gcloud CLI as shown below that specifies the service account in its arguments:

gcloud compute instances create gcs-lab-vm \
  --machine-type f1-micro --zone us-west1-b \
  --image-project ubuntu-os-cloud --image-family ubuntu-1804-lts \
  --scopes cloud-platform \
  --service-account gcs-lab@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com

ssh into the VM.

Attempt the following command:

gcloud compute instances list 

Answer the following question for your lab notebook:

Go back to the service account in IAM and click the pencil icon on the far right to edit the account.

Then, click on "ADD ANOTHER ROLE", then in the filter box, type "Compute".

Scroll the options to find the Compute role that best fits what you require to list instances. Add the role and save the changes.

Go back to the VM and repeat the command until it succeeds. Take a screenshot of the output for your notebook.

Answer the following question for your lab notebook.

In the same ssh session, use the gsutil command to copy the earthquake image file in the previous lab from the storage bucket onto the VM.

gsutil cp gs://<UNIQUE_BUCKET_NAME>/earthquakes.png .

The command should succeed since the Storage Object Viewer role has been attached to the service account assigned to the VM.

Rename the file to a different name and then attempt to copy it back into the bucket.

cp earthquakes.png moonquakes.png

gsutil cp moonquakes.png gs://<UNIQUE_BUCKET_NAME>/

Answer the following question:

Go back to the service account in IAM and click the pencil icon on the far right to edit the account as done previously

Then, click on "ADD ANOTHER ROLE", then in the filter box, type "Storage Object". Scroll the options to find the role that best fits what you require. Add the role and save the changes.

Go back to the VM and repeat the gsutil command until it succeeds. Take a screenshot of the output for your notebook.

Answer the following question:

In many instances, applications will need to interact with Google Cloud Storage programmatically. The Google Cloud SDK provides language support across a plethora of common programming languages that can be used to interact with storage buckets. For this lab, we will show how Python can be used to do so.

First, bring up a Cloud Shell session and download an image of your choice by filling in a number (00 to 19) and storing it in image.jpg.

// Fill in <NUM> with 00, 01, ..., 19
wget -O image.jpg http://chi-ni.com/motd/<NUM>.jpg

Then, set up a Python environment and install the Google Cloud SDK's storage package.

python3 -m venv env
source env/bin/activate
pip3 install google-cloud-storage

Then launch a Python 3 interpreter

python3

First, import the storage module from the package:

from google.cloud import storage

Instantiate a client from it.

storage_client = storage.Client()

Get a handle to your storage bucket, omitting the gs:// prefix.

bucket = storage_client.get_bucket('<UNIQUE_BUCKET_NAME>')

Create a blob that represents specific objects stored in buckets. The constructor for blob takes the name you want to create in the bucket for storing the blob (gcs-lab-image.jpg).

blob = bucket.blob('gcs-lab-image.jpg')

Open the image downloaded previously for reading in raw binary mode:

myImage = open('image.jpg', mode='rb')

Upload the contents of the image, specifying its content type:

blob.upload_from_string(myImage.read(), content_type='image/jpeg')

Make the object in the bucket publicly accessible via URL:

blob.make_public()

Then, get the URL for the object.

blob.public_url

Keep the Python interpreter running.

Visit the URL via a web browser and take a screenshot the shows the entire URL and the image that has been retrieved:

Back in the Python interpreter in Cloud Shell, delete the object from the bucket:

blob.delete()

It is important to deploy any cloud infrastructure with the minimal amount of privileges necessary to run. The principle of least privileges requires practice to apply appropriately. In this lab, you will learn how to reduce the permissions of service accounts with over-provisioned permissions.

Follow the directions at:

https://thunder-ctf.cloud/leastprivilege/

Play all levels and visit the scoreboard when complete.

From the web console