Why I Built recon – Solving the OSINT Fragmentation Problem
As someone who spends a significant amount of time in defensive security research and penetration testing, I grew tired of juggling between multiple OSINT tools – each with their own dependencies, syntax, and output formats. One day I'd be using theHarvester for domain reconnaissance, the next I'd be switching to sherlock for username hunting, and then pulling out separate WHOIS and DNS tools. The context switching was killing my workflow efficiency.
I wanted something that could do everything from a single Python file – a Swiss Army knife that lived in my Kali Linux arsenal without requiring virtual environments or dependency management. More importantly, I wanted it to be completely self-contained. No pip installs, no external APIs (except for the ones I'd have to call anyway), just pure Python standard library magic.
The real pain point was repeatability. Every client engagement meant rebuilding my toolset, fighting with broken dependencies, and spending more time on setup than actual reconnaissance.
Technical Architecture – Designing for Zero Dependencies
The entire recon tool lives in a single recon.py file that clocks in at around 3,000 lines. Here's the architecture that makes it work:
Concurrent Execution Model
Rather than sequential lookups that waste time waiting for slow HTTP responses, I implemented a thread pool using concurrent.futures.ThreadPoolExecutor. Each module spins up its own executor with configurable workers:
# Simplified version of the concurrent execution pattern
def scan_usernames(username, sites):
with ThreadPoolExecutor(max_workers=20) as executor:
futures = {executor.submit(check_site, username, site): site
for site in sites}
results = {'found': [], 'not_found': [], 'error': []}
for future in as_completed(futures):
site = futures[future]
try:
result = future.result()
if result['status'] == 'found':
results['found'].append(result)
else:
results['not_found'].append(site)
except Exception as e:
results['error'].append({site: str(e)})
return results
This design allows checking 80+ social media sites in under 30 seconds instead of several minutes.
Raw Socket Implementations
For critical protocols where external libraries would be overkill, I implemented raw socket connections. The WHOIS module connects directly to port 43 and handles referral chasing:
def whois_lookup(domain):
tld = domain.split('.')[-1]
server = get_whois_server(tld) # Built-in TLD routing table
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((server, 43))
sock.send(f"{domain}\r\n".encode())
response = b""
while True:
data = sock.recv(4096)
if not data:
break
response += data
sock.close()
return parse_whois_response(response.decode())
Implementation Challenges – The Deep Technical Bits
Battle: Async vs Threads for HTTP Requests
I experimented extensively with asyncio before settling on threads. While async is theoretically more efficient for I/O-bound tasks, the reality was that site blocking patterns varied wildly – some sites blocked async user agents, others had different rate limits. Thread pools provided more predictable behavior and easier debugging.
Challenge: Maintaining 80+ Site Selectors
Keeping up with website HTML/CSS changes requires constant maintenance. My solution was to implement a flexible selector system:
SITE_CONFIGS = {
'github': {
'url': 'https://github.com/{}',
'error_type': 'status_code',
'error_code': 404,
'success_check': lambda r: 'Contributions' in r.text,
'title_check': True
},
'instagram': {
'url': 'https://instagram.com/{}',
'error_type': 'text_check',
'error_text': 'Sorry, this page',
'success_check': lambda r: '"userId"' in r.text
}
}
This configuration-driven approach makes adding new sites a matter of copying a template rather than rewriting logic.
Trick: Phone Number Intelligence Without External APIs
Building a comprehensive phone number intelligence feature without relying on external APIs required embedding the complete ITU E.164 database. I parsed the official ITU data and converted it into a compact trie structure that lives entirely in memory:
# Example of the embedded phone database structure
COUNTRY_DATA = {
'1': {'US': {'area_codes': {'212': 'New York', '213': 'Los Angeles'}}},
'44': {'GB': {'format': '07XXX XXX XXX'}},
'91': {'IN': {'carriers': {'98': 'Airtel', '99': 'Vodafone'}}}
}
Getting Started – Real World Usage Examples
Let's walk through some practical scenarios:
Investigating a Suspicious Email
python recon.py -e suspicious@example.com -o email_report.json
# Sample output shows:
# ✓ MX record found: Google Workspace
# ✓ SPF record present (v=spf1 include:_spf.google.com ~all)
# ✓ Valid Gravatar profile
# ✗ Not found in HaveIBeenPwned breach database
# → Suggested next steps: Check HIBP directly for comprehensive breach search
Domain Reconnaissance Before Pentest Engagement
python recon.py -d target-company.com
# Finds:
# • 15 subdomains via crt.sh certificate transparency
# • WordPress technology stack detection
# • Cloudflare protection layer identified
# • Expired SSL certificate in staging environment
# • Interesting directories from robots.txt analysis
Username Hunting Across Platforms
python recon.py -u johnsmith -v
# Concurrent scan reveals:
# Found on: GitHub, LinkedIn, Twitter, Instagram, Steam
# Not found on: Facebook, Reddit, YouTube
# Interesting: Username available on TikTok and Threads
Pro tip: Use the verbose flag (-v) during investigations but omit it for automated scripting. The JSON output (-o flag) feeds beautifully into downstream analysis tools.
Phone Number Investigation
python recon.py -p +12125551234
# Output includes:
# Country: United States (US)
# Region: New York
# Area Code: 212 → New York City
# Carrier: Likely landline (based on prefix analysis)
# Generated investigation links for Truecaller, Whitepages, etc.
Future Roadmap – Where recon is Heading
The project is far from complete. Here's what I'm actively working on:
- Rate Limiting Intelligence - Adding automatic backoff and retry logic that adapts to each site's rate limit patterns
- Proxy Rotation Support - Built-in Tor integration and proxy pool management for large-scale reconnaissance
- Cache Layer - SQLite-based result caching to avoid redundant lookups during extended investigations
- Additional Modules - Cryptocurrency address lookups and dark web forum monitoring are in early testing
- Web UI - A Flask-based web interface for teams who prefer visual reconnaissance workflows
The goal remains the same: become the definitive single-file reconnaissance toolkit that security professionals can trust and rely on without the overhead of complex dependency chains.
Community Contributions Welcome
If you've discovered new sites that should be added to the username scanner, found edge cases in phone number parsing, or have ideas for additional OSINT modules, I'd love to hear from you. The project lives on GitHub under the MIT license – feel free to open issues, submit pull requests, or just star the repository if you find it useful.
You can reach me at aswinmathew.xyz or connect on LinkedIn. Happy hunting!