We also use Generate Image using Golang to make it easier for editors so they don’t need to edit using other applications so that we can easily put the desired image. Now santekno will try to create a Thumbnail Image generator that already exists in this tutorial. Suppose we want to make this thumbnail image more concise than the original image. Then we need to convert it into a lighter file with a small size. What if there are many images, then if we use the usual Golang sequencial, it will be long when we execute it. So, we will try to compare how the process of generating this Thumbnail image with sequential golang using concurrent Pipeline Patter.
If you haven’t learned what a Pipeline Pattern is, you can check out this tutorial first.
Project Preparation
Now we will create a new project by creating the learn-golang-generator-image-thumbnail
folder. After that, initialize the project module with this command.
go mod init github.com/santekno/learn-golang-generator-image-thumbnail
Prepare the required image or photo or can take a photo in the santekno repository here
https://github.com/santekno/learn-golang-generator-image-thumbnail/tree/main/images
Generate Image Thumbnail Using Sequential
Before going into the code we need to understand the big point process that will be processed in this Thumbnail Image generator. Here are the stages of the process that we must understand and later we will divide it into several functions as follows.
- The function reads the image file from the
images/
folder by validating the file must have an image extension. - The function manipulates the image file by using the library package
github.com/disintegration/imaging
with a size of 100 x 100 pixels. - The function saves the resulting thumbnail image into the
thumbnail/
folder.
Have you imagined what the process will be like? Hopefully friends can understand the process that we will make in this Golang.
More clearly we describe the process illustration below.
LR flowchart subgraph subGraph1 ["func walkFiles()"] C("func\n getFileContentType()") --> D("func\n processImage()") D --> E("func\n saveThumbnail()") end id1((start)) --> d("main func") --> subGraph1 --> e("print\nprocess time") --> id2((finish))
Create a main.go
file where we will create all the functions in this file. First we create this generator process with a normal sequential process.
Function Retrieve Image from Folder
The function that we will create will read a folder that contains several image files while checking whether this extension is an image or not. We see below the function.
func walkFiles(root string) error {
err := filepath.Walk(root, func(path string, info os.FileInfo, err error) error {
// filter out errors
if err != nil {
return err
}
// check if it is a file
if !info.Mode().IsRegular() {
return nil
}
// check if it is image/jpeg
contentType, _ := getFileContentType(path)
if contentType != "image/jpeg" {
return nil
}
return nil
})
if err != nil {
return err
}
return nil
}
// getFileContentType - return content type and error status
func getFileContentType(file string) (string, error) {
out, err := os.Open(file)
if err != nil {
return "", err
}
defer out.Close()
// Only the first 512 bytes are used to sniff the content type.
buffer := make([]bytes, 512)
_, err = out.Read(buffer)
if err != nil {
return "", err
}
// Use the net/http package's handy DectectContentType function. Always returns a valid
// content-type by returning "application/octet-stream" if no others seem to match.
contentType := http.DetectContentType(buffer)
return contentType, nil
}
In the code above, we can see that there are 2 functions that we have created, namely the walkFiles
function which is useful for reading files in one folder sent from the parameter, then the second function, namely getFileContentType
, is useful for checking whether the file has a content type in the sense that its type is image or not so that when we want to make a thumbnail later when generating not all files that support only images so that it has been filtered from the beginning only images can be generated by our program.
Image File Manipulation Function
This function is a process to change the image that will be compressed into a thumbnail image type where the size will be 100x100 pixels. In this function we have help using an additional library, namely the library
github.com/disintegration/imaging
library. Then we need to add the library first with this execution
go get -u github.com/disintegration/imaging
Next add the main.go
file with the function below it like this.
// processImage - takes image file as input
// return pointer to thumbnail image in memory.
func processImage(path string) (*image.NRGBA, error) {
// load the image from file
srcImage, err := imaging.Open(path)
if err != nil {
return nil, err
}
// scale the image to 100px * 100px
thumbnailImage := imaging.Thumbnail(srcImage, 100, 100, imaging.Lanczos)
return thumbnailImage, nil
}
And don’t forget to update and add to the walkFiles()
function to access this processImage
function after checking the image.
func walkFiles(root string) error {
err := filepath.Walk(root, func(path string, info os.FileInfo, err error) error {
...
// process the image
thumbnailImage, err := processImage(path)
if err != nil {
return err
}
...
return nil
})
if err != nil {
return err
}
return nil
}
Function Save Thumbnail Image Result
The process when we will save the results of this thumbnail image into a folder with the folder name thumbnail/
. Later the result of the generate image function processImage
in the form of a thumbnailImage
file, so we will save the result file from the generator image function into one folder. The following is more complete as below.
// saveThumbnail - save the thumnail image to folder
func saveThumbnail(srcImagePath string, thumbnailImage *image.NRGBA) error {
filename := filepath.Base(srcImagePath)
dstImagePath := "thumbnails/" + filename
// save the image in the thumbnails folder.
err := imaging.Save(thumbnailImage, dstImagePath)
if err != nil {
return err
}
fmt.Printf("%s -> %s\n", srcImagePath, dstImagePath)
return nil
}
That means also prepare the folder of the saved thumbnail image in this folder thumbnails/
. and we will also access the function walFiles()
after calling the function processImage()
.
func walkFiles(root string) error {
err := filepath.Walk(root, func(path string, info os.FileInfo, err error) error {
..
// process the image
thumbnailImage, err := processImage(path)
if err != nil {
return err
}
// save the thumbnail image to disk
err = saveThumbnail(path, thumbnailImage)
if err != nil {
return err
}
return nil
})
...
return nil
}
The thumbnail image generator process is ready with the standard sequential process, we try to run it with the command below.
➜ learn-golang-generator-image-thumbnail git:(main) ✗ ./learn-golang-generator-image-thumbnail images
images/sample-1.jpg -> thumbnails/sample-1.jpg
images/sample-10.jpg -> thumbnails/sample-10.jpg
images/sample-11.jpg -> thumbnails/sample-11.jpg
images/sample-12.jpg -> thumbnails/sample-12.jpg
images/sample-13.jpg -> thumbnails/sample-13.jpg
images/sample-14.jpg -> thumbnails/sample-14.jpg
images/sample-2.jpg -> thumbnails/sample-2.jpg
images/sample-3.jpg -> thumbnails/sample-3.jpg
images/sample-4.jpg -> thumbnails/sample-4.jpg
images/sample-5.jpg -> thumbnails/sample-5.jpg
images/sample-6.jpg -> thumbnails/sample-6.jpg
images/sample-7.jpg -> thumbnails/sample-7.jpg
images/sample-8.jpg -> thumbnails/sample-8.jpg
images/sample-9.jpg -> thumbnails/sample-9.jpg
Time taken: 145.78275ms
The result of the process of creating a thumbnail Image generator is around 145ms, which is quite fast because the images we use are not too many, only 14 image files.
Changing the Process Mechanism using Pipeline Pattern Concurrent Golang
We have seen above when using a sequential process to generate a thumbnail image of 14 images takes about 145ms. If we calculate one, divided by 14, it becomes 14ms for every one image processed. So if we have 1 million images the time required is about 14ms x 1 million = 14,000,000ms or 3.89 hours. This is quite long if you want to process that much data. So we will try to implement this Pipeline Pattern whether it can reduce the process to be faster or not.
First, we need to make some code changes. So that our previous program code is not deleted, we create a sequential
folder to move the code we previously created into the folder. Then we create another new folder called pipeline-pattern
so that the folder structure in the project will be like this.
.
├── README.md
├── learn-golang-generator-image-thumbnail
├── go.mod
├── go.sum
├── images
│ ├── sample-1.jpg
│ ├── sample-10.jpg
│ ├── sample-11.jpg
│ ├── sample-12.jpg
│ ├── sample-13.jpg
│ ├── sample-14.jpg
│ ├── sample-2.jpg
│ ├── sample-3.jpg
│ ├── sample-4.jpg
│ ├── sample-5.jpg
│ ├── sample-6.jpg
│ ├── sample-7.jpg
│ ├── sample-8.jpg
│ └── sample-9.jpg
├── main.go
├── pipeline-pattern
└── pipeline.go
├── sequential
│ └── sequential.go
└── thumbnails
In accordance with the folder structure that we have created, the functions related to sequential are in the sequential
folder while for what we will create now is the pipeline pattern in the pipeline-pattern
folder. Let’s try to create it directly in the pipeline.go
file.
First we need struct
to help deliver standardized pipeline data so that each process will receive the same struct
data like this.
type result struct {
srcImagePath string
thumbnailImage *image.NRGBA
err error
}
Changing the Function to Retrieve Image from Folder
In the pipeline.go
file we create the same function, walkFiles()
but there are some things that we have to change including the parameters changed to channel type which can be asynchronous
when the program is run.
func walkFiles(done <-chan struct{}, root string) (<-chan string, <-chan error) {
// create output channels
paths := make(chan string)
errc := make(chan error, 1)
go func() {
defer close(paths)
errc <- filepath.Walk(root, func(path string, info os.FileInfo, err error) error {
// filter out errors
if err != nil {
return err
}
// check if it is a file
if !info.Mode().IsRegular() {
return nil
}
// check if it is image/jpeg
contentType,_ := sequential.GetFileContentType(path)
if contentType != "image/jpeg" {
return nil
}
// send file path to next stage
select {
case paths <- path:
case <-done:
return fmt.Errorf("walk canceled")
}
return nil
})
}()
return paths, errc
}
The above process will run using a goroutine that will send the files read and sent the file path so that it can be processed to the next function. Then the function call sequential.GetFileContentType
as validation we take from the previous package in the sequential folder. Then there is a need to update the function to a global function by changing it so that it can be accessed in various packages from
func getFileContentType(file string) (string, error)
to
func GetFileContentType(file string) (string, error)
Changing the Image File Manipulation Function
In the manipulation function the process is the same but we will apply channeling where the function can be processed in parallel. Here are more details below.
func processImage(done <-chan struct{}, paths <-chan string) <-chan result {
results := make(chan result)
var wg sync.WaitGroup
thumbnailer := func() {
for srcImagePath := range paths {
srcImage, err := imaging.Open(srcImagePath)
if err != nil {
select {
case results <- result{srcImagePath, nil, err}:
case <-done:
return
}
}
thumbnailImage := imaging.Thumbnail(srcImage, 100, 100, imaging.Lanczos)
select {
case results <- result{srcImagePath, thumbnailImage, err}:
case <-done:
return
}
}
}
const numThumbnailer = 5
for i := 0; i < numThumbnailer; i++ {
wg.Add(1)
go func() {
thumbnailer()
wg.Done()
}()
}
go func() {
wg.Wait()
close(results)
}()
return results
}
We can see that the processImage()
process is more complicated because we are implementing channels and goroutines so that processes do not need to wait for each other because the process we do is based on the process sent by the paths
channel. As long as the paths
channel still has data being sent, this function will continue to work.
Changing to Global in the Save Thumbnail Image Result Function
In the save thumbnail function, we will also change the parameter to channel as below.
func saveThumbnail(done < chan struct{}, thumbs < chan result) < chan result {
results := make(chan result)
var wg sync.WaitGroup
saveThumbnailer := func() {
for img := range thumbs {
filename := filepath.Base(img.srcImagePath)
dstImagePath := "thumbnails/" + filename
// save the image in the thumbnails folder.
err := imaging.Save(img.thumbnailImage, dstImagePath)
if err != nil {
select {
case results <- result{img.srcImagePath, dstImagePath, img.thumbnailImage, err}:
case <-done:
return
}
}
select {
case results <- result{img.srcImagePath, dstImagePath, img.thumbnailImage, err}:
case <-done:
return
}
}
}
const numGoroutine = 5
for i := 0; i < numGoroutine; i++ {
wg.Add(1)
go func() {
saveThumbnailer()
wg.Done()
}()
}
go func() {
wg.Wait()
close(results)
}()
return results
}
Create SetupPipeline Function
This SetupPipeline
function is used to collect all running goroutine processes into one function that can later be accessed by the main
function more easily.
func SetupPipeLine(root string) error {
done := make(chan struct{})
defer close(done)
// do the file walk
paths, errc := walkFiles(done, root)
// process the images
resultImages := processImage(done, paths)
// save thumbnail images
results := saveThumbnail(done, resultImages)
// save thumbnail images
for r := range results {
if r.err != nil {
return r.err
}
fmt.Printf("%s -> %s\n", r.srcImagePath, r.destImagePath)
}
// check for errors on the channel, from walkfiles stage.
if err := <-errc; err != nil {
return err
}
return nil
}
We have created all the functions for the needs of this pipeline pattern generate image, then we just have to try to run the program by first changing the main.go
file because previously we used sequential functions now we use the pipeline pattern that we have created.
// Image processing - sequential
// Input - directory with images.
// output - thumbnail images
func main() {
if len(os.Args) < 2 {
log.Fatal("need to send directory path of images")
}
start := time.Now()
// using sequential
// err := sequential.WalkFiles(os.Args[1])
// using pipeline pattern
err := pipelinepattern.SetupPipeLine(os.Args[1])
if err != nil {
log.Fatal(err)
}
fmt.Printf("Time taken: %s\n", time.Since(start))
}
Seen above the use of sequential functions we comment first so that it is not executed when the program runs. Run the program with the same command as above, namely
go run main.go images
The results of the process will be seen 2 times faster, which is approximately 64ms
➜ learn-golang-generator-image-thumbnail git:(main) ✗ go run main.go images
images/sample-11.jpg -> thumbnails/sample-11.jpg
images/sample-10.jpg -> thumbnails/sample-10.jpg
images/sample-1.jpg -> thumbnails/sample-1.jpg
images/sample-12.jpg -> thumbnails/sample-12.jpg
images/sample-13.jpg -> thumbnails/sample-13.jpg
images/sample-14.jpg -> thumbnails/sample-14.jpg
images/sample-4.jpg -> thumbnails/sample-4.jpg
images/sample-2.jpg -> thumbnails/sample-2.jpg
images/sample-3.jpg -> thumbnails/sample-3.jpg
images/sample-5.jpg -> thumbnails/sample-5.jpg
images/sample-6.jpg -> thumbnails/sample-6.jpg
images/sample-7.jpg -> thumbnails/sample-7.jpg
images/sample-8.jpg -> thumbnails/sample-8.jpg
images/sample-9.jpg -> thumbnails/sample-9.jpg
Time taken: 64.981125ms
Experiment Results
Here is a table of experimental results with a larger amount of image data so that we can see the difference in processing time between the two flows that we have used.
Amount of Data | Sequential | Pipeline Pattern |
---|---|---|
14 | 145.78ms | 64.98ms |
1792 | 17.38s | 6.46s |
3584 | 33.51s | 12.09s |
14.336 | 2m18.07s | 50.83s |
Conclusion
Pipeline Pattern is very useful when we have processes that are interrelated but the data is a lot and each data to the other data does not need to wait so that we can parallelize the process. It is very useful when we implement a process like this to make the process more efficient so that each data that will be processed sequentially does not need to wait for the previous data process to finish.
This can be seen from the experiments that we do by comparing the first two processes, namely using a sequential process where each data waits for the process to finish, while the second process uses a pipeline pattern where the first data, second data and so on do not need to wait for the previous process to finish, as long as each data has the same process sequence so that it provides faster data processing that can be up to 2 times faster than the process using ordinary sequential.
This experiment does not have large data, only 14 images, but if you want to try further exploration, you can add more images so that you can see whether the process is more efficient or not.