Ethical Note: OSINT uses publicly available information only.
Do not hack, bypass logins, or invade privacy.
1. Introduction
OSINT stands for Open Source Intelligence.
It refers to collecting and analyzing information from publicly available sources such as
websites, social media, news, forums, and public records.
OSINT is widely used in:
- Cybersecurity
- Law enforcement
- Journalism
- Business intelligence
- Ethical hacking
The collected information helps in analysis, investigation, and decision-making.
2. What is Open Source Information?
Open source information is data that is:
- Legally available to the public
- Free or accessible without hacking
- Shared openly on the internet or in public databases
Examples:
- Social media profiles
- Websites and blogs
- Public government records
- Online maps
- News articles
3. Importance of OSINT
OSINT helps to:
- Identify threats and risks
- Support investigations related to cyber attacks
- Find digital footprints
- Verify identities
- Collect evidence legally
- Improve security awareness
4. Types of OSINT Sources
🌐 Internet Sources
- Google Search
- Websites
- Forums
- Blogs
- Online news
📱 Social Media
- Facebook
- Instagram
- LinkedIn
- Twitter (X)
🗂 Public Records
- Company registrations
- Domain records (WHOIS)
- Court records
- Government databases
🗺 Geographical Data
- Google Maps
- Satellite images
- Location tagging
5. OSINT Tools (Overview)
| Tool Name |
Purpose |
| Google Dorking | Advanced search techniques |
| Maltego | Link analysis / relationship mapping |
| Shodan | Discover internet-connected devices |
| Have I Been Pwned | Check data breaches |
| WHOIS | Domain registration information |
| Recon-ng | Reconnaissance framework |
6. OSINT Process
- Define Objective – Decide what information is needed
- Collect Data – Gather data from public sources
- Filter & Verify – Remove false or irrelevant information
- Analyze Data – Identify patterns and insights
- Report – Document findings clearly
7. Legal and Ethical Considerations
- Use only publicly available information
- Do not hack, bypass access controls, or invade privacy
- Follow your country’s laws and cyber regulations
- Use information responsibly and professionally
8. Advantages and Disadvantages
✅ Advantages
- Free and legal (when used correctly)
- Large amount of available data
- Easy to access
- Useful for research and security
❌ Disadvantages
- Fake or misleading information is possible
- Time-consuming
- Information overload
- Privacy concerns (must be handled ethically)
9. Conclusion
OSINT is a powerful method to collect intelligence using open sources.
When used ethically and legally, it improves cybersecurity, investigations,
and research quality.
10. References
OSINT Tools with Usage
1) Google Dorking
Google Dorking (also known as “Google Hacking”) uses advanced Google search operators
to locate information that may not be easy to find through normal searches.
Security professionals use it to identify exposed files, misconfigured servers, and potential vulnerabilities.
Important: Using search operators to view publicly indexed content is generally legal.
Using results to break into systems or access private data is illegal.
Google Dorking Operators
| Operator |
Description |
Example |
| allintext | Find pages containing all keywords in the page text | allintext:"keyword" |
| intext | Find pages containing a keyword in the page text | intext:"keyword" |
| inurl | Find pages with a keyword in the URL | inurl:"admin" |
| allinurl | Find pages where all keywords appear in the URL | allinurl:"admin login" |
| allintitle | Find pages where all keywords appear in the title | allintitle:"index of" |
| site | Limit results to a specific domain/website | site:example.com |
| filetype | Find results of a specific file type | filetype:pdf |
| link | Find pages that link to a given URL (limited use today) | link:example.com |
| numrange | Find results containing numbers within a range | numrange:321-325 |
| before / after | Search within a date range (often used with other operators) | before:2024-01-01 after:2023-01-01 |
| inanchor | Find pages with keywords in anchor text (links) | inanchor:"keyword" |
| allinanchor | All keywords must be in anchor text | allinanchor:"keyword" |
| inpostauthor | Blog search operator for author (works where supported) | inpostauthor:"name" |
| allinpostauthor | Blog search operator for author (all terms) | allinpostauthor:"name" |
| related | Find websites similar to a given website | related:example.com |
| cache | Show Google’s cached version of a page | cache:example.com |
Combining Operators
site: limits to a domain (example: site:example.com)
filetype: or ext: finds specific file extensions (example: filetype:pdf)
inurl: finds keywords in the URL (example: inurl:admin)
intitle: finds keywords in the title (example: intitle:"index of")
intext: finds keywords in page text (example: intext:password)
cache: shows cached version
- excludes terms/sites (example: -site:youtube.com)
Common Use Cases (Educational)
- Finding exposed login pages:
inurl:admin login or inurl:login.php
- Finding sensitive files:
filetype:env "DB_PASSWORD" (for awareness/auditing)
- Open directories:
intitle:"index of" "backup"
- Vulnerable plugins:
inurl:/wp-content/plugins/
- Credential leaks (awareness):
filetype:txt inurl:password
Protecting Against Google Dorking
- robots.txt (limit crawling of sensitive paths)
- noindex tags for pages you don’t want indexed
- Disable directory listing on Apache/Nginx
- Audit your own domain with safe dork queries
How to Use (Simple Steps)
- Open Google
- Type a search operator query (example:
site:example.com filetype:pdf)
- Analyze the search results
2) Shodan.io
Shodan is a search engine that discovers internet-connected devices such as routers, servers,
webcams, and industrial systems by collecting public “banner” information (open ports, services, versions, and headers).
What it does
- Device discovery: Finds IoT and internet-connected systems
- Metadata indexing: Collects banners, HTTP headers, SSL certificate data, etc.
- Exposure detection: Helps identify misconfigurations or outdated services
- Advanced filtering: Search by country, org, port, service, and more
How it’s used
- Cybersecurity: recon, asset discovery, threat intelligence
- Research: internet trends, device exposure, C2 tracking
- Ethical hacking: identify exposure (only with permission)
How to use
- Open shodan.io
- Create a free account (optional but helpful)
- Search for a keyword (example:
router, webcam) or an IP address
- View open ports, location, and device/service details
3) Maltego
Maltego is an investigation platform used to mine, merge, and map data for OSINT and cyber investigations.
It helps reveal connections between people, organizations, domains, and digital footprints.
Core functionality
- Visualization: Node-based graph of relationships
- Transforms: Built-in queries/plugins to pull data from sources
- Data sources: Many connectors and datasets (some require accounts/keys)
Common use cases
- Cybersecurity: map attack surface, threat actor research
- Law enforcement: criminal network investigation
- Fraud & compliance: KYC support and relationship mapping
Versions
- Community Edition (CE): Free with limitations
- Professional/Enterprise: Paid plans with more features
How to use
- Download and install Maltego
- Open the software
- Enter a target (domain / email / name)
- Right-click → Run Transform
- Review results in graph format
4) Have I Been Pwned
Have I Been Pwned (HIBP) is a trusted service created by security researcher Troy Hunt
that helps users check whether their email, phone number, or passwords have appeared in known data breaches.
Key features
- Email breach search
- Pwned Passwords (checks if a password appeared in breaches)
- Domain breach monitoring (useful for organizations)
- Notifications/alerts for future breaches
How to use
- Open haveibeenpwned.com
- Enter an email address
- Click search/check
- Review breach details and affected data types
5) VirusTotal
VirusTotal is an online service (owned by Google) that analyzes files, URLs, domains, and IP addresses
using many antivirus engines and threat feeds. It is commonly used to validate suspicious links and files.
Key features
- Multi-engine scanning (files/URLs)
- Detailed reports with vendors’ detections
- Domain/IP analysis and community intelligence
- API access (public API has rate limits)
How to use
- Open virustotal.com
- Upload a file OR paste a URL
- Click scan/analyze
- Review detections and details
6) Censys
Censys is an internet intelligence platform that scans the internet to discover servers, websites,
IP addresses, open ports, SSL certificates, and exposed services. It only collects publicly visible data.
Purpose
- Identify exposed servers and services
- Analyze domains and IP addresses
- Detect misconfigured systems
- Study internet-wide security trends
How to use (step-by-step)
- Open censys.io
- Create a free account or log in
- Search a domain, IP, or service (example:
example.com / 8.8.8.8 / https)
- Review results: open ports, services, certificates, hosting/provider, country
- Open a result for more details (protocols, software versions, TLS info)
Quick demo example
Search a well-known domain (e.g., google.com) and explain that big websites use multiple IPs,
HTTPS services, and valid SSL certificates.
7) WHOIS
WHOIS is a public directory that provides information about domain registrations and related records.
When someone registers a domain, details are stored in WHOIS (often privacy-protected).
What WHOIS can show
- Registrar company
- Registration/creation date
- Expiration date
- Name servers
- Domain status
- Sometimes owner details (if not privacy protected)
How to use
Method 1 (Website):
- Open a WHOIS lookup site (example: whois.domaintools.com)
- Enter a domain name
- View registrar, dates, nameservers, and status
Method 2 (Command line):
whois example.com
Note: Many domains use privacy protection, so owner details may be hidden.
8) Wayback Machine
The Wayback Machine (Internet Archive) stores historical snapshots of websites.
It helps you view older versions of a website—even if the content was later removed.
Why it’s useful in OSINT
- View deleted web pages
- Analyze changes over time
- Investigate scam/fake sites history
- Recover removed content
- Track company history
How to use
- Open archive.org
- Enter a website URL (example:
example.com)
- Select a year from the timeline
- Choose a highlighted date
- Browse the archived version
Limitations
- Not every website is archived
- Some pages may not load fully
- Dynamic content may not be captured
9) theHarvester
theHarvester is an OSINT tool for collecting emails, subdomains, IP addresses, and names from public sources.
It is commonly used during the reconnaissance phase of security assessments.
How to use (command line)
theHarvester -d example.com -b google
Command explanation:
-d → Target domain
-b → Data source (google, bing, linkedin, etc.)
What the output shows
- Emails found
- Subdomains discovered
- Hosts / IP addresses
10) SpiderFoot
SpiderFoot is an automated OSINT tool that collects data about targets such as domains, IPs,
emails, subdomains, breaches, and social accounts. It generates structured reports and dashboards.
How to use (step-by-step)
- Install SpiderFoot (Windows/Linux/Kali)
- Start the web UI:
python3 sf.py -l 127.0.0.1:5001
- Open:
http://127.0.0.1:5001
- Click New Scan
- Enter target (example:
example.com)
- Select scan type and start the scan
- Review categorized results and dashboard summary
Advantages
- Fully automated reconnaissance
- Easy web interface
- Saves time and generates reports
Limitations
- Some modules require API keys
- Large scans may take time