programming

Creating a Thumbnail Image Generator Using Pipeline Pattern

We also use Generate Image using Golang to make it easier for editors so they don’t need to edit using other applications so that we can easily put the desired image. Now santekno will try to create a Thumbnail Image generator that already exists in this tutorial. Suppose we want to make this thumbnail image more concise than the original image. Then we need to convert it into a lighter file with a small size. What if there are many images, then if we use the usual Golang sequencial, it will be long when we execute it. So, we will try to compare how the process of generating this Thumbnail image with sequential golang using concurrent Pipeline Patter.

If you haven’t learned what a Pipeline Pattern is, you can check out this tutorial first.

Project Preparation

Now we will create a new project by creating the learn-golang-generator-image-thumbnail folder. After that, initialize the project module with this command.

go mod init github.com/santekno/learn-golang-generator-image-thumbnail

Prepare the required image or photo or can take a photo in the santekno repository here

https://github.com/santekno/learn-golang-generator-image-thumbnail/tree/main/images

Generate Image Thumbnail Using Sequential

Before going into the code we need to understand the big point process that will be processed in this Thumbnail Image generator. Here are the stages of the process that we must understand and later we will divide it into several functions as follows.

  1. The function reads the image file from the images/ folder by validating the file must have an image extension.
  2. The function manipulates the image file by using the library package github.com/disintegration/imaging with a size of 100 x 100 pixels.
  3. The function saves the resulting thumbnail image into the thumbnail/ folder.

Have you imagined what the process will be like? Hopefully friends can understand the process that we will make in this Golang.

More clearly we describe the process illustration below.

 LR flowchart
    subgraph subGraph1 ["func walkFiles()"]
    C("func\n getFileContentType()") --> D("func\n processImage()")
    D --> E("func\n saveThumbnail()")
    end
    id1((start)) --> d("main func") --> subGraph1 --> e("print\nprocess time") --> id2((finish))

Create a main.go file where we will create all the functions in this file. First we create this generator process with a normal sequential process.

Function Retrieve Image from Folder

The function that we will create will read a folder that contains several image files while checking whether this extension is an image or not. We see below the function.

func walkFiles(root string) error {
	err := filepath.Walk(root, func(path string, info os.FileInfo, err error) error {

		// filter out errors
		if err != nil {
			return err
		}

		// check if it is a file
		if !info.Mode().IsRegular() {
			return nil
		}

		// check if it is image/jpeg
		contentType, _ := getFileContentType(path)
		if contentType != "image/jpeg" {
			return nil
		}

		return nil
	})

	if err != nil {
		return err
	}
	return nil
}

// getFileContentType - return content type and error status
func getFileContentType(file string) (string, error) {

	out, err := os.Open(file)
	if err != nil {
		return "", err
	}
	defer out.Close()

	// Only the first 512 bytes are used to sniff the content type.
	buffer := make([]bytes, 512)

	_, err = out.Read(buffer)
	if err != nil {
		return "", err
	}

	// Use the net/http package's handy DectectContentType function. Always returns a valid
	// content-type by returning "application/octet-stream" if no others seem to match.
	contentType := http.DetectContentType(buffer)

	return contentType, nil
}

In the code above, we can see that there are 2 functions that we have created, namely the walkFiles function which is useful for reading files in one folder sent from the parameter, then the second function, namely getFileContentType, is useful for checking whether the file has a content type in the sense that its type is image or not so that when we want to make a thumbnail later when generating not all files that support only images so that it has been filtered from the beginning only images can be generated by our program.

Image File Manipulation Function

This function is a process to change the image that will be compressed into a thumbnail image type where the size will be 100x100 pixels. In this function we have help using an additional library, namely the library github.com/disintegration/imaging library. Then we need to add the library first with this execution

go get -u github.com/disintegration/imaging

Next add the main.go file with the function below it like this.

// processImage - takes image file as input
// return pointer to thumbnail image in memory.
func processImage(path string) (*image.NRGBA, error) {

	// load the image from file
	srcImage, err := imaging.Open(path)
	if err != nil {
		return nil, err
	}

	// scale the image to 100px * 100px
	thumbnailImage := imaging.Thumbnail(srcImage, 100, 100, imaging.Lanczos)

	return thumbnailImage, nil
}

And don’t forget to update and add to the walkFiles() function to access this processImage function after checking the image.

func walkFiles(root string) error {
	err := filepath.Walk(root, func(path string, info os.FileInfo, err error) error {

		...

		// process the image
		thumbnailImage, err := processImage(path)
		if err != nil {
			return err
		}

    ...

		return nil
	})

	if err != nil {
		return err
	}
	return nil
}

Function Save Thumbnail Image Result

The process when we will save the results of this thumbnail image into a folder with the folder name thumbnail/. Later the result of the generate image function processImage in the form of a thumbnailImage file, so we will save the result file from the generator image function into one folder. The following is more complete as below.

// saveThumbnail - save the thumnail image to folder
func saveThumbnail(srcImagePath string, thumbnailImage *image.NRGBA) error {
	filename := filepath.Base(srcImagePath)
	dstImagePath := "thumbnails/" + filename

	// save the image in the thumbnails folder.
	err := imaging.Save(thumbnailImage, dstImagePath)
	if err != nil {
		return err
	}
	fmt.Printf("%s -> %s\n", srcImagePath, dstImagePath)
	return nil
}

That means also prepare the folder of the saved thumbnail image in this folder thumbnails/. and we will also access the function walFiles() after calling the function processImage().

func walkFiles(root string) error {
	err := filepath.Walk(root, func(path string, info os.FileInfo, err error) error {

    ..

		// process the image
		thumbnailImage, err := processImage(path)
		if err != nil {
			return err
		}

		// save the thumbnail image to disk
		err = saveThumbnail(path, thumbnailImage)
		if err != nil {
			return err
		}
		return nil
	})

  ...

	return nil
}

The thumbnail image generator process is ready with the standard sequential process, we try to run it with the command below.

➜ learn-golang-generator-image-thumbnail git:(main) ✗ ./learn-golang-generator-image-thumbnail images
images/sample-1.jpg -> thumbnails/sample-1.jpg
images/sample-10.jpg -> thumbnails/sample-10.jpg
images/sample-11.jpg -> thumbnails/sample-11.jpg
images/sample-12.jpg -> thumbnails/sample-12.jpg
images/sample-13.jpg -> thumbnails/sample-13.jpg
images/sample-14.jpg -> thumbnails/sample-14.jpg
images/sample-2.jpg -> thumbnails/sample-2.jpg
images/sample-3.jpg -> thumbnails/sample-3.jpg
images/sample-4.jpg -> thumbnails/sample-4.jpg
images/sample-5.jpg -> thumbnails/sample-5.jpg
images/sample-6.jpg -> thumbnails/sample-6.jpg
images/sample-7.jpg -> thumbnails/sample-7.jpg
images/sample-8.jpg -> thumbnails/sample-8.jpg
images/sample-9.jpg -> thumbnails/sample-9.jpg
Time taken: 145.78275ms

The result of the process of creating a thumbnail Image generator is around 145ms, which is quite fast because the images we use are not too many, only 14 image files.

Changing the Process Mechanism using Pipeline Pattern Concurrent Golang

We have seen above when using a sequential process to generate a thumbnail image of 14 images takes about 145ms. If we calculate one, divided by 14, it becomes 14ms for every one image processed. So if we have 1 million images the time required is about 14ms x 1 million = 14,000,000ms or 3.89 hours. This is quite long if you want to process that much data. So we will try to implement this Pipeline Pattern whether it can reduce the process to be faster or not.

First, we need to make some code changes. So that our previous program code is not deleted, we create a sequential folder to move the code we previously created into the folder. Then we create another new folder called pipeline-pattern so that the folder structure in the project will be like this.

.
├── README.md
├── learn-golang-generator-image-thumbnail
├── go.mod
├── go.sum
├── images
│ ├── sample-1.jpg
│ ├── sample-10.jpg
│ ├── sample-11.jpg
│ ├── sample-12.jpg
│ ├── sample-13.jpg
│ ├── sample-14.jpg
│ ├── sample-2.jpg
│ ├── sample-3.jpg
│ ├── sample-4.jpg
│ ├── sample-5.jpg
│ ├── sample-6.jpg
│ ├── sample-7.jpg
│ ├── sample-8.jpg
│ └── sample-9.jpg
├── main.go
├── pipeline-pattern
└── pipeline.go
├── sequential
│ └── sequential.go
└── thumbnails

In accordance with the folder structure that we have created, the functions related to sequential are in the sequential folder while for what we will create now is the pipeline pattern in the pipeline-pattern folder. Let’s try to create it directly in the pipeline.go file.

First we need struct to help deliver standardized pipeline data so that each process will receive the same struct data like this.

type result struct {
	srcImagePath string
	thumbnailImage *image.NRGBA
	err error
}

Changing the Function to Retrieve Image from Folder

In the pipeline.go file we create the same function, walkFiles() but there are some things that we have to change including the parameters changed to channel type which can be asynchronous when the program is run.

func walkFiles(done <-chan struct{}, root string) (<-chan string, <-chan error) {
	// create output channels
	paths := make(chan string)
	errc := make(chan error, 1)

	go func() {
		defer close(paths)
		errc <- filepath.Walk(root, func(path string, info os.FileInfo, err error) error {
			// filter out errors
			if err != nil {
				return err
			}

			// check if it is a file
			if !info.Mode().IsRegular() {
				return nil
			}

			// check if it is image/jpeg
			contentType,_ := sequential.GetFileContentType(path)
			if contentType != "image/jpeg" {
				return nil
			}

			// send file path to next stage
			select {
			case paths <- path:
			case <-done:
				return fmt.Errorf("walk canceled")
			}
			return nil
		})
	}()
	return paths, errc
}

The above process will run using a goroutine that will send the files read and sent the file path so that it can be processed to the next function. Then the function call sequential.GetFileContentType as validation we take from the previous package in the sequential folder. Then there is a need to update the function to a global function by changing it so that it can be accessed in various packages from

func getFileContentType(file string) (string, error)

to

func GetFileContentType(file string) (string, error)

Changing the Image File Manipulation Function

In the manipulation function the process is the same but we will apply channeling where the function can be processed in parallel. Here are more details below.

func processImage(done <-chan struct{}, paths <-chan string) <-chan result {
	results := make(chan result)
	var wg sync.WaitGroup

	thumbnailer := func() {
		for srcImagePath := range paths {
			srcImage, err := imaging.Open(srcImagePath)
			if err != nil {
				select {
				case results <- result{srcImagePath, nil, err}:
				case <-done:
					return
				}
			}
			thumbnailImage := imaging.Thumbnail(srcImage, 100, 100, imaging.Lanczos)

			select {
			case results <- result{srcImagePath, thumbnailImage, err}:
			case <-done:
				return
			}
		}
	}

	const numThumbnailer = 5
	for i := 0; i < numThumbnailer; i++ {
		wg.Add(1)
		go func() {
			thumbnailer()
			wg.Done()
		}()
	}

	go func() {
		wg.Wait()
		close(results)
	}()

	return results
}

We can see that the processImage() process is more complicated because we are implementing channels and goroutines so that processes do not need to wait for each other because the process we do is based on the process sent by the paths channel. As long as the paths channel still has data being sent, this function will continue to work.

Changing to Global in the Save Thumbnail Image Result Function

In the save thumbnail function, we will also change the parameter to channel as below.

func saveThumbnail(done < chan struct{}, thumbs < chan result) < chan result {
	results := make(chan result)
	var wg sync.WaitGroup

	saveThumbnailer := func() {
		for img := range thumbs {
			filename := filepath.Base(img.srcImagePath)
			dstImagePath := "thumbnails/" + filename

			// save the image in the thumbnails folder.
			err := imaging.Save(img.thumbnailImage, dstImagePath)
			if err != nil {
				select {
				case results <- result{img.srcImagePath, dstImagePath, img.thumbnailImage, err}:
				case <-done:
					return
				}
			}
			select {
			case results <- result{img.srcImagePath, dstImagePath, img.thumbnailImage, err}:
			case <-done:
				return
			}
		}
	}

	const numGoroutine = 5
	for i := 0; i < numGoroutine; i++ {
		wg.Add(1)
		go func() {
			saveThumbnailer()
			wg.Done()
		}()
	}

	go func() {
		wg.Wait()
		close(results)
	}()

	return results
}

Create SetupPipeline Function

This SetupPipeline function is used to collect all running goroutine processes into one function that can later be accessed by the main function more easily.

func SetupPipeLine(root string) error {
	done := make(chan struct{})
	defer close(done)

	// do the file walk
	paths, errc := walkFiles(done, root)

	// process the images
	resultImages := processImage(done, paths)

	// save thumbnail images
	results := saveThumbnail(done, resultImages)

	// save thumbnail images
	for r := range results {
		if r.err != nil {
			return r.err
		}
		fmt.Printf("%s -> %s\n", r.srcImagePath, r.destImagePath)
	}

	// check for errors on the channel, from walkfiles stage.
	if err := <-errc; err != nil {
		return err
	}

	return nil
}

We have created all the functions for the needs of this pipeline pattern generate image, then we just have to try to run the program by first changing the main.go file because previously we used sequential functions now we use the pipeline pattern that we have created.

// Image processing - sequential
// Input - directory with images.
// output - thumbnail images
func main() {
	if len(os.Args) < 2 {
		log.Fatal("need to send directory path of images")
	}
	start := time.Now()

	// using sequential
	// err := sequential.WalkFiles(os.Args[1])

	// using pipeline pattern
	err := pipelinepattern.SetupPipeLine(os.Args[1])

	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("Time taken: %s\n", time.Since(start))
}

Seen above the use of sequential functions we comment first so that it is not executed when the program runs. Run the program with the same command as above, namely

go run main.go images

The results of the process will be seen 2 times faster, which is approximately 64ms

➜ learn-golang-generator-image-thumbnail git:(main) ✗ go run main.go images
images/sample-11.jpg -> thumbnails/sample-11.jpg
images/sample-10.jpg -> thumbnails/sample-10.jpg
images/sample-1.jpg -> thumbnails/sample-1.jpg
images/sample-12.jpg -> thumbnails/sample-12.jpg
images/sample-13.jpg -> thumbnails/sample-13.jpg
images/sample-14.jpg -> thumbnails/sample-14.jpg
images/sample-4.jpg -> thumbnails/sample-4.jpg
images/sample-2.jpg -> thumbnails/sample-2.jpg
images/sample-3.jpg -> thumbnails/sample-3.jpg
images/sample-5.jpg -> thumbnails/sample-5.jpg
images/sample-6.jpg -> thumbnails/sample-6.jpg
images/sample-7.jpg -> thumbnails/sample-7.jpg
images/sample-8.jpg -> thumbnails/sample-8.jpg
images/sample-9.jpg -> thumbnails/sample-9.jpg
Time taken: 64.981125ms

Experiment Results

Here is a table of experimental results with a larger amount of image data so that we can see the difference in processing time between the two flows that we have used.

Amount of DataSequentialPipeline Pattern
14145.78ms64.98ms
179217.38s6.46s
358433.51s12.09s
14.3362m18.07s50.83s

Conclusion

Pipeline Pattern is very useful when we have processes that are interrelated but the data is a lot and each data to the other data does not need to wait so that we can parallelize the process. It is very useful when we implement a process like this to make the process more efficient so that each data that will be processed sequentially does not need to wait for the previous data process to finish.

This can be seen from the experiments that we do by comparing the first two processes, namely using a sequential process where each data waits for the process to finish, while the second process uses a pipeline pattern where the first data, second data and so on do not need to wait for the previous process to finish, as long as each data has the same process sequence so that it provides faster data processing that can be up to 2 times faster than the process using ordinary sequential.

This experiment does not have large data, only 14 images, but if you want to try further exploration, you can add more images so that you can see whether the process is more efficient or not.

comments powered by Disqus