aboutsummaryrefslogtreecommitdiff

go-linkfinder

go-linkfinder is a simple web crawling and link extraction tool written in Go, inspired by https://github.com/GerbenJavado/LinkFinder. It supports fetching content from URLs or files, recursively crawling same-domain links up to a specified depth, and customizing request headers, user-agent, timeout, and delay between requests.

WARNING: This tool is intended for authorized security testing and web research only. Unauthorized crawling may violate website terms of service or legal regulations. The author and contributors are not responsible for misuse. Always obtain explicit permission before crawling any site.

Features

  • Input Sources: Fetch content from URLs or local files.
  • Recursive Crawling: Configurable recursion depth to crawl same-domain links.
  • Request Customization: Set HTTP headers and user-agent strings.
  • Timeout Control: Customize HTTP request timeout.
  • Delay Between Requests: Configurable delay (default 4 seconds) to avoid overloading servers.
  • Automatic Gzip Handling: Supports gzip compressed HTTP responses.
  • URL Normalization: Resolves relative URLs to absolute form.
  • Duplicate Detection: Tracks and avoids revisiting URLs.

Installation

Prerequisites

  • Go: Version 1.21 or later.
  • Make: For building with the provided Makefile.
  • Git: To clone the repository.

Steps

  • Clone the repository:
$ git clone https://cgit.heqnx.com/go-linkfinder
$ cd go-linkfinder
  • Install dependencies:
$ go mod tidy
  • Build for all platforms:
$ make all
  • Binaries will be generated in the build/ directory for Linux, Windows, and macOS; alternatively, build for a specific platform:
$ make linux-amd64
$ make windows-amd64
$ make darwin-arm64
  • (Optional) Run directly with Go:
$ go run main.go -depth <depth> -delay <delay> -header <Header: Value> -input <url/file> -timeout <timeout> -user-agent <ua>

Usage

Command-Line Flags

Usage of ./go-linkfinder-linux-amd64:
  -depth int
        recursion depth for same-domain links (0 disables crawling) (default 0)
  -delay int
        delay between requests in seconds when crawling (only applies if depth > 0) (default 4)
  -header value
        add HTTP header to request (can be repeated, e.g. -header "Authorization: Bearer token")
  -input string
        url or file path (required)
  -timeout int
        timeout for HTTP requests in seconds (default 10)
  -user-agent string
        set User-Agent header (default "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.10 Safari/605.1.1")

Examples

Crawl a single URL without recursion

$ ./go-linkfinder -input https://example.com -depth 0
  • Extracts and prints all links found on the initial page only.
$ ./go-linkfinder -input https://example.com -depth 2 -delay 5
  • Crawls links up to 2 levels deep within the same domain.

  • Waits 5 seconds between each HTTP request to avoid server overload.

Use Custom Headers and User-Agent

$ ./go-linkfinder -input https://example.com -depth 1 -header "Authorization: Bearer mytoken" -user-agent "CustomAgent/1.0"

License

This project is licensed under the MIT License. See the LICENSE file for details.

Disclaimer

go-linkfinder is provided "as is" without warranty. The author and contributors are not liable for any damages or legal consequences arising from its use. Use responsibly and only in authorized environments.