A brief introduction to Download Managers in GoLang

Have you ever wondered how web browsers download anything from the internet? Imagine a scenario in which an application wants to download a large file (Zip file, Video, etc.) from a URL during its execution and perform some operations on the downloaded file. Recently, I encountered this problem while working on a GoLang application.

I wanted a solution that can save the downloaded file to a given file location, resume the download after connection failure, and keep track of the download progress. I was in a bit of a hurry to complete my task, so I decided to use an open-source library (grab) which satisfies the above requirements and supports some more.

This blog post is a step-by-step explanation of the internal workings of a download manager like grab.

Let’s use the following URL:http://www.golang-book.com/public/pdf/gobook.pdf as an example to download a PDF of An Introduction to Programming in Go.

Step#1 - Decide the File Location to store the Downloaded file

Step#2 - Find out the size of the file at the given URL and check whether it supports partial downloads or not

Result of the HEAD request for the URL(http://www.golang-book.com/public/pdf/gobook.pdf) is shown in the below image. The value of Content-Length header is 2556363(~2.5MB) which is the size of the file to be downloaded and Accept-Ranges is bytes which means the URL supprts partial downloads.

HTTP-Headers

Step#3 - Determine the name of the downloadable file

 func guessFilename(resp *http.Response) (string, error) {
	filename := resp.Request.URL.Path
	if cd := resp.Header.Get("Content-Disposition"); cd != "" {
		if _, params, err := mime.ParseMediaType(cd); err == nil {
			if val, ok := params["filename"]; ok {
				filename = val
			}
		}
	}
	filename = filepath.Base(path.Clean("/" + filename))
	if filename == "" || filename == "." || filename == "/" {
		return "", errors.New("filename couln't be determined")
	}
	return filename, nil
 }

Step#4 - Save the checksum and digest algorithm

Step#5 - Decide whether to download from scratch or resume a partial download

func sendRequest(filename, URL string) (*http.Response, error) {
	existingFileSize, err := getFileSize(filename)
	if err != nil {
		return nil, err
	}
	client := http.Client{}
	request, err := http.NewRequest("GET", URL, nil)
	if err != nil {
		return nil, err
	}
	if existingFileSize > 0 {
		request.Header.Set("Range", fmt.Sprintf("bytes=%v-", existingFileSize))
	}
	resp, err := client.Do(request)
	if err != nil {
		return nil, err
	}
	if resp.StatusCode == http.StatusOK || resp.StatusCode == http.StatusPartialContent {
		return resp, err
	}
	return nil, errors.New(fmt.Sprintf("Unexpected Status Code:%v", resp.StatusCode))
}

func getFileSize(filepath string) (int64, error) {
	var fileSize int64
	fi, err := os.Stat(filepath)
	if err != nil {
		return fileSize, err
	}
	if fi.IsDir() {
		return fileSize, nil
	}
	fileSize = fi.Size()
	return fileSize, nil
}

Step#6 - Copy the data from the response body and write it to the destination file

Step#7 - Compute and Compare the checksum

Additional Improvements

Conclusion

In this blog post, we have taken a quick look at the internals of a download manager. We also explored some of the HTTP features like HEAD method and headers like Content-Length, Accept-Ranges, Content-Disposition, Digest, and Range which we don’t frequently use in our day to day work.