go-linkfinder
go-linkfinder
is a simple web crawling and link extraction tool written in Go, inspired by https://github.com/GerbenJavado/LinkFinder. It supports fetching content from URLs or files, recursively crawling same-domain links up to a specified depth, and customizing request headers, user-agent, timeout, and delay between requests.
WARNING: This tool is intended for authorized security testing and web research only. Unauthorized crawling may violate website terms of service or legal regulations. The author and contributors are not responsible for misuse. Always obtain explicit permission before crawling any site.
Features
- Input Sources: Fetch content from URLs or local files.
- Recursive Crawling: Configurable recursion depth to crawl same-domain links.
- Request Customization: Set HTTP headers and user-agent strings.
- Timeout Control: Customize HTTP request timeout.
- Delay Between Requests: Configurable delay (default 4 seconds) to avoid overloading servers.
- Automatic Gzip Handling: Supports gzip compressed HTTP responses.
- URL Normalization: Resolves relative URLs to absolute form.
- Duplicate Detection: Tracks and avoids revisiting URLs.
Installation
Prerequisites
- Go: Version 1.21 or later.
- Make: For building with the provided Makefile.
- Git: To clone the repository.
Steps
- Clone the repository:
$ git clone https://cgit.heqnx.com/go-linkfinder
$ cd go-linkfinder
- Install dependencies:
$ go mod tidy
- Build for all platforms:
$ make all
- Binaries will be generated in the build/ directory for Linux, Windows, and macOS; alternatively, build for a specific platform:
$ make linux-amd64
$ make windows-amd64
$ make darwin-arm64
- (Optional) Run directly with Go:
$ go run main.go -depth <depth> -delay <delay> -header <Header: Value> -input <url/file> -timeout <timeout> -user-agent <ua>
Usage
Command-Line Flags
Usage of ./go-linkfinder-linux-amd64:
-depth int
recursion depth for same-domain links (0 disables crawling) (default 0)
-delay int
delay between requests in seconds when crawling (only applies if depth > 0) (default 4)
-header value
add HTTP header to request (can be repeated, e.g. -header "Authorization: Bearer token")
-input string
url or file path (required)
-timeout int
timeout for HTTP requests in seconds (default 10)
-user-agent string
set User-Agent header (default "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.10 Safari/605.1.1")
Examples
Crawl a single URL without recursion
$ ./go-linkfinder -input https://example.com -depth 0
- Extracts and prints all links found on the initial page only.
Recursively crawl same-domain links with delay
$ ./go-linkfinder -input https://example.com -depth 2 -delay 5
-
Crawls links up to 2 levels deep within the same domain.
-
Waits 5 seconds between each HTTP request to avoid server overload.
Use Custom Headers and User-Agent
$ ./go-linkfinder -input https://example.com -depth 1 -header "Authorization: Bearer mytoken" -user-agent "CustomAgent/1.0"
License
This project is licensed under the MIT License. See the LICENSE file for details.
Disclaimer
go-linkfinder
is provided "as is" without warranty. The author and contributors are not liable for any damages or legal consequences arising from its use. Use responsibly and only in authorized environments.