recon.py - Zero Dependency OSINT Tool for Kali Linux

Why I Built recon – Solving the OSINT Fragmentation Problem

As someone who spends a significant amount of time in defensive security research and penetration testing, I grew tired of juggling between multiple OSINT tools – each with their own dependencies, syntax, and output formats. One day I'd be using theHarvester for domain reconnaissance, the next I'd be switching to sherlock for username hunting, and then pulling out separate WHOIS and DNS tools. The context switching was killing my workflow efficiency.

I wanted something that could do everything from a single Python file – a Swiss Army knife that lived in my Kali Linux arsenal without requiring virtual environments or dependency management. More importantly, I wanted it to be completely self-contained. No pip installs, no external APIs (except for the ones I'd have to call anyway), just pure Python standard library magic.

The real pain point was repeatability. Every client engagement meant rebuilding my toolset, fighting with broken dependencies, and spending more time on setup than actual reconnaissance.

Technical Architecture – Designing for Zero Dependencies

The entire recon tool lives in a single recon.py file that clocks in at around 3,000 lines. Here's the architecture that makes it work:

Concurrent Execution Model

Rather than sequential lookups that waste time waiting for slow HTTP responses, I implemented a thread pool using concurrent.futures.ThreadPoolExecutor. Each module spins up its own executor with configurable workers:

# Simplified version of the concurrent execution pattern
def scan_usernames(username, sites):
    with ThreadPoolExecutor(max_workers=20) as executor:
        futures = {executor.submit(check_site, username, site): site 
                   for site in sites}
        results = {'found': [], 'not_found': [], 'error': []}
        
        for future in as_completed(futures):
            site = futures[future]
            try:
                result = future.result()
                if result['status'] == 'found':
                    results['found'].append(result)
                else:
                    results['not_found'].append(site)
            except Exception as e:
                results['error'].append({site: str(e)})
        
        return results

This design allows checking 80+ social media sites in under 30 seconds instead of several minutes.

Raw Socket Implementations

For critical protocols where external libraries would be overkill, I implemented raw socket connections. The WHOIS module connects directly to port 43 and handles referral chasing:

def whois_lookup(domain):
    tld = domain.split('.')[-1]
    server = get_whois_server(tld)  # Built-in TLD routing table
    
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect((server, 43))
    sock.send(f"{domain}\r\n".encode())
    
    response = b""
    while True:
        data = sock.recv(4096)
        if not data:
            break
        response += data
    sock.close()
    
    return parse_whois_response(response.decode())

Implementation Challenges – The Deep Technical Bits

Battle: Async vs Threads for HTTP Requests

I experimented extensively with asyncio before settling on threads. While async is theoretically more efficient for I/O-bound tasks, the reality was that site blocking patterns varied wildly – some sites blocked async user agents, others had different rate limits. Thread pools provided more predictable behavior and easier debugging.

Challenge: Maintaining 80+ Site Selectors

Keeping up with website HTML/CSS changes requires constant maintenance. My solution was to implement a flexible selector system:

SITE_CONFIGS = {
    'github': {
        'url': 'https://github.com/{}',
        'error_type': 'status_code',
        'error_code': 404,
        'success_check': lambda r: 'Contributions' in r.text,
        'title_check': True
    },
    'instagram': {
        'url': 'https://instagram.com/{}',
        'error_type': 'text_check',
        'error_text': 'Sorry, this page',
        'success_check': lambda r: '"userId"' in r.text
    }
}

This configuration-driven approach makes adding new sites a matter of copying a template rather than rewriting logic.

Trick: Phone Number Intelligence Without External APIs

Building a comprehensive phone number intelligence feature without relying on external APIs required embedding the complete ITU E.164 database. I parsed the official ITU data and converted it into a compact trie structure that lives entirely in memory:

# Example of the embedded phone database structure
COUNTRY_DATA = {
    '1': {'US': {'area_codes': {'212': 'New York', '213': 'Los Angeles'}}},
    '44': {'GB': {'format': '07XXX XXX XXX'}},
    '91': {'IN': {'carriers': {'98': 'Airtel', '99': 'Vodafone'}}}
}

Getting Started – Real World Usage Examples

Let's walk through some practical scenarios:

Investigating a Suspicious Email

python recon.py -e suspicious@example.com -o email_report.json

# Sample output shows:
# ✓ MX record found: Google Workspace
# ✓ SPF record present (v=spf1 include:_spf.google.com ~all)
# ✓ Valid Gravatar profile
# ✗ Not found in HaveIBeenPwned breach database
# → Suggested next steps: Check HIBP directly for comprehensive breach search

Domain Reconnaissance Before Pentest Engagement

python recon.py -d target-company.com

# Finds:
# • 15 subdomains via crt.sh certificate transparency
# • WordPress technology stack detection
# • Cloudflare protection layer identified
# • Expired SSL certificate in staging environment
# • Interesting directories from robots.txt analysis

Username Hunting Across Platforms

python recon.py -u johnsmith -v

# Concurrent scan reveals:
# Found on: GitHub, LinkedIn, Twitter, Instagram, Steam
# Not found on: Facebook, Reddit, YouTube
# Interesting: Username available on TikTok and Threads

Pro tip: Use the verbose flag (-v) during investigations but omit it for automated scripting. The JSON output (-o flag) feeds beautifully into downstream analysis tools.

Phone Number Investigation

python recon.py -p +12125551234

# Output includes:
# Country: United States (US)
# Region: New York
# Area Code: 212 → New York City
# Carrier: Likely landline (based on prefix analysis)
# Generated investigation links for Truecaller, Whitepages, etc.

Future Roadmap – Where recon is Heading

The project is far from complete. Here's what I'm actively working on:

Rate Limiting Intelligence - Adding automatic backoff and retry logic that adapts to each site's rate limit patterns
Proxy Rotation Support - Built-in Tor integration and proxy pool management for large-scale reconnaissance
Cache Layer - SQLite-based result caching to avoid redundant lookups during extended investigations
Additional Modules - Cryptocurrency address lookups and dark web forum monitoring are in early testing
Web UI - A Flask-based web interface for teams who prefer visual reconnaissance workflows

The goal remains the same: become the definitive single-file reconnaissance toolkit that security professionals can trust and rely on without the overhead of complex dependency chains.

Community Contributions Welcome

If you've discovered new sites that should be added to the username scanner, found edge cases in phone number parsing, or have ideas for additional OSINT modules, I'd love to hear from you. The project lives on GitHub under the MIT license – feel free to open issues, submit pull requests, or just star the repository if you find it useful.

You can reach me at aswinmathew.xyz or connect on LinkedIn. Happy hunting!