Published
Aug 12, 2024
Updated
Aug 12, 2024

Can AI Stop Phishing? Multimodal LLMs Show Promise

Multimodal Large Language Models for Phishing Webpage Detection and Identification
By
Jehyun Lee|Peiyuan Lim|Bryan Hooi|Dinil Mon Divakaran

Summary

Phishing attacks continue to be a pervasive threat in the digital landscape. With almost a million unique phishing web pages appearing in the first quarter of 2024 alone, traditional defenses are struggling to keep up. Blacklists are quickly outdated, and even sophisticated machine learning models can be easily evaded by increasingly sophisticated phishing techniques that mimic legitimate sites. Researchers are now turning to powerful multimodal Large Language Models (LLMs) to tackle this challenge. LLMs, trained on massive amounts of data, can analyze not just the text of a webpage, but also its visual elements like logos, themes, and favicons to identify the brand a page is impersonating. In a new study, researchers built a two-phase system using LLMs. The first phase uses the LLM to identify the brand being represented, while the second phase verifies whether the website domain matches the identified brand. This two-pronged approach helps distinguish between, say, a real PayPal page and a fake one hosted on a suspicious domain. The research team tested GPT-4, Gemini Pro 1.0, and Claude3, and found that all three showed promising results, particularly when provided with both visual screenshots and HTML data. GPT-4 and Claude3 performed especially well, achieving high accuracy in detecting phishing attempts while also offering explanations for their decisions. Remarkably, these LLM-based systems outperformed existing state-of-the-art brand-based phishing detectors, showing greater resilience against adversarial attacks designed to fool AI. While the use of LLMs for security introduces new opportunities, it also presents new challenges. The cost of using these powerful models for real-time analysis, the potential for economic denial-of-service attacks, and the risk of indirect prompt injection are all important considerations. The accessibility of these LLMs also means attackers can study them to develop new evasion techniques. However, this research highlights the potential of multimodal LLMs as a significant step forward in the fight against online phishing.
🍰 Interesting in building your own agents?
PromptLayer provides the tools to manage and monitor prompts with your whole team. Get started for free.

Question & Answers

How does the two-phase LLM system work to detect phishing websites?
The system operates through a sequential two-phase verification process. Phase 1: The LLM analyzes both visual elements (screenshots, logos, themes) and HTML data to identify the brand being represented on the webpage. Phase 2: The system verifies if the website's domain name legitimately matches the identified brand. For example, if the LLM identifies PayPal branding in Phase 1, but the domain is 'paypa1-secure.com' instead of 'paypal.com', the system would flag this as a potential phishing attempt. This approach has proven particularly effective because it combines both visual and textual analysis capabilities of multimodal LLMs like GPT-4 and Claude3.
What are the main benefits of AI-powered phishing detection for everyday internet users?
AI-powered phishing detection offers enhanced protection during daily online activities. It acts like a smart guardian that can instantly analyze websites you visit, checking not just the text but also visual elements like logos and layout to determine if they're genuine. The main benefits include real-time protection while shopping or banking online, reduced risk of falling for sophisticated scams, and peace of mind when entering sensitive information. For instance, it can warn you immediately if a website pretending to be your bank uses suspicious elements or domains, helping prevent financial fraud and identity theft before it happens.
How can businesses protect themselves from phishing attacks in 2024?
Businesses can implement a multi-layered approach to protect against phishing attacks. This includes deploying advanced AI-powered detection tools that can analyze both visual and textual elements of suspicious websites, providing regular security awareness training to employees, and maintaining updated email security protocols. It's also crucial to use strong authentication methods like 2FA, keep software and systems updated, and establish clear protocols for handling sensitive information. Regular security audits and penetration testing can help identify vulnerabilities before they're exploited by attackers.

PromptLayer Features

  1. Testing & Evaluation
  2. The paper's systematic evaluation of multiple LLMs' phishing detection capabilities aligns with PromptLayer's testing infrastructure
Implementation Details
Set up batch tests comparing LLM responses across different phishing scenarios, implement scoring metrics for accuracy, create regression tests for known phishing patterns
Key Benefits
• Standardized evaluation across multiple LLM models • Reproducible testing framework for phishing detection • Automated performance monitoring and comparison
Potential Improvements
• Add specialized phishing-specific scoring metrics • Implement adversarial test cases • Develop automated prompt optimization
Business Value
Efficiency Gains
Reduces manual testing effort by 70% through automated evaluation pipelines
Cost Savings
Optimizes LLM usage by identifying most cost-effective model for phishing detection
Quality Improvement
Ensures consistent detection accuracy through systematic testing
  1. Workflow Management
  2. The two-phase detection system maps directly to PromptLayer's multi-step orchestration capabilities
Implementation Details
Create separate prompt templates for brand identification and domain verification, chain them in sequence, track versions of each component
Key Benefits
• Modular system design for easier maintenance • Version control for both phases • Reusable components for different security applications
Potential Improvements
• Add parallel processing capabilities • Implement feedback loops between phases • Create automated prompt updating system
Business Value
Efficiency Gains
Streamlines deployment and updates of multi-stage security systems
Cost Savings
Reduces development time by 40% through reusable components
Quality Improvement
Enables precise tracking and optimization of each detection phase

The first platform built for prompt engineering