S3 is a simple object storage tool offered by Amazon Web Services.
Object storage allows people to store all types of files without defining a schema such as miscellaneous CSV files.
S3 is an example of object storage done well: modular name spacing, multi-zone access, ease of accessing files, etc.
S3 object storage is a great solution for teams working on using data generated from programs in a collaborative.
This is due to the fact that a program can write a file to S3 and then a team member can access the file when needed.
Object storage is also a solution for staging data in an Extract Translate Load (ETL) pipeline due to the ability to create idomatic file paths and naming conventions.
There is currently a SDK for Golang maintained by AWS.
To install the SDK run: go get -u github.com/aws/aws-sdk-go
while in your gopath.
package helpers
import (
"bytes"
"fmt"
"net/http"
"os"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
)
// GetFile will read in a file and return a file typ
func GetFile(fileDir string) (*os.File, error) {
file, err := os.Open(fileDir)
if err != nil {
return nil, err
}
return file, nil
}
// NewSession
// Returns a new AWS session. Uses your creds in ~/.aws
func NewSession(S3Region string) (*session.Session, error) {
sess, err := session.NewSession(&aws.Config{Region: aws.String(S3Region)})
if err != nil {
return nil, err
}
return sess, nil
}
// AddFileToS3 takes in a session, fileDir, s3_Bucket, and s3_Dir_Path
// In this case fileDir is the local file to read in and upload to S3; in this case we expect it to be local to the code
// S3 Dir Path is the directory within the S3 Bucket to write to
func AddFileToS3(s *session.Session, file *os.File, s3Bucket string, s3DirPath string) error {
// Get file size and read the file content into a buffer
fileInfo, _ := file.Stat()
size := fileInfo.Size()
buffer := make([]byte, size)
file.Read(buffer)
key := fmt.Sprintf("%s%s", s3DirPath, file.Name())
_, err := s3.New(s).PutObject(&s3.PutObjectInput{
Bucket: aws.String(s3Bucket),
Key: aws.String(key),
ACL: aws.String("private"),
Body: bytes.NewReader(buffer),
ContentLength: aws.Int64(size),
ContentType: aws.String(http.DetectContentType(buffer)),
ContentDisposition: aws.String("attachment"),
ServerSideEncryption: aws.String("AES256"),
})
return err
}
Say you have a program that generates a comma-seperated value (CSV) or tab-seperated value (TSV) that you want stored in S3.
You would use the following snippet to read in the CSV or TSV and then upload it to S3.
f, err := GetFile("/path/to/file.csv")
defer f.Close()
if err != nil {
fmt.Println(err)
}
s, err := NewSession("s3-bucket-region")
if err != nil {
fmt.Println(err)
}
err = AddFileToS3(s, f, "my_bucket", "test/")
if err != nil {
fmt.Println(err)
}
In the first set of code we read in the file using the GetFile
function.
After this we generate a new AWS session by passing in the region the S3 bucket exists in.
In the final step, we call our AddFileToS3
function. This function uses the passed in file object f
to generate a S3 PutObjectInput
.
AddFileToS3
also takes in a string s3_Bucket
which is the appropriate S3 Bucket to upload to and s3_Dir_Path
which is the path to store the file in. Passing in ""
would set the directory to be the root of the bucket.