aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md114
1 files changed, 114 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..4842136
--- /dev/null
+++ b/README.md
@@ -0,0 +1,114 @@
+# go-linkfinder
+
+`go-linkfinder` is a simple web crawling and link extraction tool written in Go, inspired by [https://github.com/GerbenJavado/LinkFinder](https://github.com/GerbenJavado/LinkFinder). It supports fetching content from URLs or files, recursively crawling same-domain links up to a specified depth, and customizing request headers, user-agent, timeout, and delay between requests.
+
+> **WARNING**: This tool is intended for **authorized security testing and web research only**. Unauthorized crawling may violate website terms of service or legal regulations. The author and contributors are not responsible for misuse. Always obtain explicit permission before crawling any site.
+
+## Features
+
+- **Input Sources**: Fetch content from URLs or local files.
+- **Recursive Crawling**: Configurable recursion depth to crawl same-domain links.
+- **Request Customization**: Set HTTP headers and user-agent strings.
+- **Timeout Control**: Customize HTTP request timeout.
+- **Delay Between Requests**: Configurable delay (default 4 seconds) to avoid overloading servers.
+- **Automatic Gzip Handling**: Supports gzip compressed HTTP responses.
+- **URL Normalization**: Resolves relative URLs to absolute form.
+- **Duplicate Detection**: Tracks and avoids revisiting URLs.
+
+## Installation
+
+### Prerequisites
+
+- **Go**: Version 1.21 or later.
+- **Make**: For building with the provided Makefile.
+- **Git**: To clone the repository.
+
+### Steps
+
+- Clone the repository:
+
+```
+$ git clone https://cgit.heqnx.com/go-linkfinder
+$ cd go-linkfinder
+```
+
+- Install dependencies:
+
+```
+$ go mod tidy
+```
+
+- Build for all platforms:
+
+```
+$ make all
+```
+
+- Binaries will be generated in the build/ directory for Linux, Windows, and macOS; alternatively, build for a specific platform:
+
+```
+$ make linux-amd64
+$ make windows-amd64
+$ make darwin-arm64
+```
+
+- (Optional) Run directly with Go:
+
+```
+$ go run main.go -depth <depth> -delay <delay> -header <Header: Value> -input <url/file> -timeout <timeout> -user-agent <ua>
+```
+
+## Usage
+
+### Command-Line Flags
+
+```
+Usage of ./go-linkfinder-linux-amd64:
+ -depth int
+ recursion depth for same-domain links (0 disables crawling) (default 0)
+ -delay int
+ delay between requests in seconds when crawling (only applies if depth > 0) (default 4)
+ -header value
+ add HTTP header to request (can be repeated, e.g. -header "Authorization: Bearer token")
+ -input string
+ url or file path (required)
+ -timeout int
+ timeout for HTTP requests in seconds (default 10)
+ -user-agent string
+ set User-Agent header (default "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.10 Safari/605.1.1")
+```
+
+## Examples
+
+### Crawl a single URL without recursion
+
+```
+$ ./go-linkfinder -input https://example.com -depth 0
+```
+
+- Extracts and prints all links found on the initial page only.
+
+### Recursively crawl same-domain links with delay
+
+```
+$ ./go-linkfinder -input https://example.com -depth 2 -delay 5
+```
+
+- Crawls links up to 2 levels deep within the same domain.
+
+- Waits 5 seconds between each HTTP request to avoid server overload.
+
+### Use Custom Headers and User-Agent
+
+```
+$ ./go-linkfinder -input https://example.com -depth 1 -header "Authorization: Bearer mytoken" -user-agent "CustomAgent/1.0"
+```
+
+## License
+
+This project is licensed under the MIT License. See the LICENSE file for details.
+
+## Disclaimer
+
+`go-linkfinder` is provided "as is" without warranty. The author and contributors are not liable for any damages or legal consequences arising from its use. Use responsibly and only in authorized environments.
+