# go-linkfinder `go-linkfinder` is a simple web crawling and link extraction tool written in Go, inspired by [https://github.com/GerbenJavado/LinkFinder](https://github.com/GerbenJavado/LinkFinder). It supports fetching content from URLs or files, recursively crawling same-domain links up to a specified depth, and customizing request headers, user-agent, timeout, and delay between requests. > **WARNING**: This tool is intended for **authorized security testing and web research only**. Unauthorized crawling may violate website terms of service or legal regulations. The author and contributors are not responsible for misuse. Always obtain explicit permission before crawling any site. ## Features - **Input Sources**: Fetch content from URLs or local files. - **Recursive Crawling**: Configurable recursion depth to crawl same-domain links. - **Request Customization**: Set HTTP headers and user-agent strings. - **Timeout Control**: Customize HTTP request timeout. - **Delay Between Requests**: Configurable delay (default 4 seconds) to avoid overloading servers. - **Automatic Gzip Handling**: Supports gzip compressed HTTP responses. - **URL Normalization**: Resolves relative URLs to absolute form. - **Duplicate Detection**: Tracks and avoids revisiting URLs. ## Installation ### Prerequisites - **Go**: Version 1.21 or later. - **Make**: For building with the provided Makefile. - **Git**: To clone the repository. ### Steps - Clone the repository: ``` $ git clone https://cgit.heqnx.com/go-linkfinder $ cd go-linkfinder ``` - Install dependencies: ``` $ go mod tidy ``` - Build for all platforms: ``` $ make all ``` - Binaries will be generated in the build/ directory for Linux, Windows, and macOS; alternatively, build for a specific platform: ``` $ make linux-amd64 $ make windows-amd64 $ make darwin-arm64 ``` - (Optional) Run directly with Go: ``` $ go run main.go -depth -delay -header -input -timeout -user-agent ``` ## Usage ### Command-Line Flags ``` Usage of ./go-linkfinder-linux-amd64: -depth int recursion depth for same-domain links (0 disables crawling) (default 0) -delay int delay between requests in seconds when crawling (only applies if depth > 0) (default 4) -header value add HTTP header to request (can be repeated, e.g. -header "Authorization: Bearer token") -input string url or file path (required) -timeout int timeout for HTTP requests in seconds (default 10) -user-agent string set User-Agent header (default "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.10 Safari/605.1.1") ``` ## Examples ### Crawl a single URL without recursion ``` $ ./go-linkfinder -input https://example.com -depth 0 ``` - Extracts and prints all links found on the initial page only. ### Recursively crawl same-domain links with delay ``` $ ./go-linkfinder -input https://example.com -depth 2 -delay 5 ``` - Crawls links up to 2 levels deep within the same domain. - Waits 5 seconds between each HTTP request to avoid server overload. ### Use Custom Headers and User-Agent ``` $ ./go-linkfinder -input https://example.com -depth 1 -header "Authorization: Bearer mytoken" -user-agent "CustomAgent/1.0" ``` ## License This project is licensed under the MIT License. See the LICENSE file for details. ## Disclaimer `go-linkfinder` is provided "as is" without warranty. The author and contributors are not liable for any damages or legal consequences arising from its use. Use responsibly and only in authorized environments.