Writing to S3 in Golang

Why use AWS S3?

S3 is a simple object storage tool offered by Amazon Web Services.

Object storage allows people to store all types of files without defining a schema such as miscellaneous CSV files.

S3 is an example of object storage done well: modular name spacing, multi-zone access, ease of accessing files, etc.

S3 object storage is a great solution for teams working on using data generated from programs in a collaborative.

This is due to the fact that a program can write a file to S3 and then a team member can access the file when needed.

Object storage is also a solution for staging data in an Extract Translate Load (ETL) pipeline due to the ability to create idomatic file paths and naming conventions.

Getting the Dependencies

There is currently a SDK for Golang maintained by AWS.

To install the SDK run: go get -u github.com/aws/aws-sdk-go while in your gopath.

The Code To Upload


        package helpers
        import (
          "bytes"
          "fmt"
          "net/http"
          "os"

          "github.com/aws/aws-sdk-go/aws"
          "github.com/aws/aws-sdk-go/aws/session"
          "github.com/aws/aws-sdk-go/service/s3"
        )

        // GetFile will read in a file and return a file typ
        func GetFile(fileDir string) (*os.File, error) {
          file, err := os.Open(fileDir)
          if err != nil {
            return nil, err
          }
          return file, nil
        }
        
        // NewSession
        // Returns a new AWS session.  Uses your creds in ~/.aws
        func NewSession(S3Region string) (*session.Session, error) {
          sess, err := session.NewSession(&aws.Config{Region: aws.String(S3Region)})
          if err != nil {
            return nil, err
          }
          return sess, nil
        }
        
        // AddFileToS3 takes in a session, fileDir, s3_Bucket, and s3_Dir_Path
        // In this case fileDir is the local file to read in and upload to S3; in this case we expect it to be local to the code
        // S3 Dir Path is the directory within the S3 Bucket to write to
        func AddFileToS3(s *session.Session, file *os.File, s3Bucket string, s3DirPath string) error {

          // Get file size and read the file content into a buffer
          fileInfo, _ := file.Stat()
          size := fileInfo.Size()
          buffer := make([]byte, size)
          file.Read(buffer)
          key := fmt.Sprintf("%s%s", s3DirPath, file.Name())
        
          _, err := s3.New(s).PutObject(&s3.PutObjectInput{
            Bucket:               aws.String(s3Bucket),
            Key:                  aws.String(key),
            ACL:                  aws.String("private"),
            Body:                 bytes.NewReader(buffer),
            ContentLength:        aws.Int64(size),
            ContentType:          aws.String(http.DetectContentType(buffer)),
            ContentDisposition:   aws.String("attachment"),
            ServerSideEncryption: aws.String("AES256"),
          })
          return err
        }
      

Utilizing this Code

Say you have a program that generates a comma-seperated value (CSV) or tab-seperated value (TSV) that you want stored in S3.

You would use the following snippet to read in the CSV or TSV and then upload it to S3.


        f, err := GetFile("/path/to/file.csv")
        defer f.Close()

        if err != nil {
          fmt.Println(err)
        }
      
        s, err := NewSession("s3-bucket-region")
        if err != nil {
          fmt.Println(err)
        }
      
        err = AddFileToS3(s, f, "my_bucket", "test/")
        if err != nil {
          fmt.Println(err)
        }
      

In the first set of code we read in the file using the GetFile function.

After this we generate a new AWS session by passing in the region the S3 bucket exists in.

In the final step, we call our AddFileToS3 function. This function uses the passed in file object f to generate a S3 PutObjectInput.

AddFileToS3 also takes in a string s3_Bucket which is the appropriate S3 Bucket to upload to and s3_Dir_Path which is the path to store the file in. Passing in "" would set the directory to be the root of the bucket.