Cold Emailing (finally) Personalized

Contents

Author

But why even bother?

In today’s fast-paced business world, especially at the C-suite level, email inboxes are flooded with messages. Amid this daily deluge, it’s no surprise that most cold outreach emails are either ignored or deleted. To stand out, cold campaigns need to go beyond the generic; they need to feel personal and genuinely relevant.

But here’s the catch: personalization isn’t just about sprinkling a recipient’s name or company name into an email template. True personalization means deeply tailoring the content — showing an understanding of the recipient’s challenges, goals, or opportunities. This is where the magic happens. Emails that resonate on a deeper level feel more human, leading to better engagement and a higher chance of replies — the ultimate goal of email marketing.

The challenge is that while there are tools, often called sequencers, that automate cold email campaigns, their personalization capabilities are limited. These platforms typically allow for basic variable insertion, such as names or titles, but they don’t enable lead enrichment or true content customization. This leaves a gap in the market for tools that can deliver highly personalized and impactful messaging at scale.

That’s exactly the “why” behind our project. We wanted to create a solution that automates not just the sending of cold emails but also the critical steps of lead enrichment, pain-point identification, and crafting human-like, deeply tailored emails. With this foundation, our Cold Emailing Helper aims to bridge the gap between scalable automation and meaningful personalization.

Solution’s Architecture

1. Lead Scraping and Enrichement

Lead Searching: We integrated Instantly’s API, a dependable solution for finding LinkedIn profiles and websites based on specific criteria like industry and role.
Email Validation: To maintain high deliverability rates, we used Piloterr, a platform specializing in scalable email marketing APIs. Its robust email validation capabilities ensured our email lists remained clean and effective.
LinkedIn Scraping: Scraping LinkedIn proved challenging due to CAPTCHA systems and frequent token exchanges that thwart custom implementations. After trying simulated logins and direct request mimicry, we turned to Piloterr again. Their LinkedIn scraping API handled the complexities of large-scale data extraction reliably and efficiently.
Website Scraping: While tools like BeautifulSoup allow for custom scraping, we opted for ScrapeGraphAI for its ability to extract and structure website data quickly. This saved us time and provided consistent, high-quality output.

2. Content generation

LangChain: To simplify working with large language models (LLMs), we adopted LangChain, which abstracts interactions with LLMs and allows us to experiment with multiple AI providers seamlessly.
GPT-4o-mini: We chose GPT-4o-mini for its balance between affordability and high-quality text generation, ensuring our emails feel natural and resonate with recipients.

3. Handling Data

Python: Its rich ecosystem of libraries for scraping, data processing, and machine learning made Python the ideal language for this project. Its versatility allowed us to build and integrate various components effortlessly.
Pandas: Managing large datasets was a breeze with Pandas, a powerful library for manipulating dataframes and handling CSV files. Its performance and ease of use were crucial for importing, cleaning, and exporting lead data.

Let's get to code

Now that you know what technologies we intend to use, we can get to the details. Here are the list of steps that we need to cover to fulfill our goal:

Scrape leads from Instantly
Verify the emails
Scrape data (LinkedIn and Website)
Create content
Export data

1. Scrape leads

Here we can use Instantly's API to fetch the leads that match our criteria. In this example for criteria I used only role and industry, but you can add more details - look it up in the official documentation.

import requests

def search_leads_industry_role(api_key, industry, role):
    url = "https://api.instantly.ai/leads/search"
    payload = {
        "industry": industry,
        "role": role
    }
    headers = {
        "Authorization": f"Bearer {api_key}"
    }

    response = requests.post(url, json=payload, headers=headers)
    if response.status_code == 200:
        return response.json()  # Returns a list of LinkedIn profiles and websites
    else:
        print(f"Error: {response.status_code} - {response.text}")
        return None

# Example Usage
api_key = "your_instantly_api_key"
leads = search_leads_industry_role(api_key, "Software Development", "CTO")
print(leads)

2. Verify the emails

Not all the emails that Instantly have are up-to-date. Since we don't want to send messages to nonexistent email addresses, we can use the aforementioned Piloterr's API to remove the invalid ones.

import requests

def validate_email(api_key, email):
    url = "https://api.piloterr.com/email/validate"
    payload = {"email": email}
    headers = {
        "Authorization": f"Bearer {api_key}"
    }

    response = requests.post(url, json=payload, headers=headers)
    if response.status_code == 200:
        return response.json()  # Includes email validity status
    else:
        print(f"Error: {response.status_code} - {response.text}")
        return None

# Example Usage
api_key = "your_piloterr_api_key"
validation = validate_email(api_key, "example@domain.com")
print(validation)

3. Scrape data

We have two basic data sources for scraping: LinkedIn and Website.

To scrape the first one we can also use the Piloterr's API as the custom LinkedIn scraping is basically impossible to do at scale.

import requests

def scrape_linkedin_profile(api_key, profile_url):
    url = "https://api.piloterr.com/linkedin/scrape"
    payload = {"url": profile_url}
    headers = {
        "Authorization": f"Bearer {api_key}"
    }

    response = requests.post(url, json=payload, headers=headers)
    if response.status_code == 200:
        return response.json()  # Structured LinkedIn profile data
    else:
        print(f"Error: {response.status_code} - {response.text}")
        return None

# Example Usage
api_key = "your_piloterr_api_key"
profile_data = scrape_linkedin_profile(api_key, "https://linkedin.com/in/some-profile")
print(profile_data)

‍

To scrape website we can use ScrapegraphAI that will already (using LLM) fetch the most important info and format it for us.

from scrapegraphai.graphs import SmartScraperGraph

def scrape_website_data(graph_key, website_url):
    # Initialize SmartScraperGraph with API key
    scraper = SmartScraperGraph(api_key=graph_key)
    
    try:
        # Scrape structured data from the website
        result = scraper.scrape(url=website_url)
        return result
    except Exception as e:
        print(f"Error in Website Scraping: {str(e)}")
        return {}

4. Create content

In this step we can use LLM like the GPT-4o-mini to create personalized content based on the previously scraped data.

from langchain import LLMChain
from langchain.prompts import PromptTemplate
from langchain.llms import GPT4OMini

def analyze_pain_points_with_llm(api_key, linkedin_data, website_data, industry, role):
    """
    Use an LLM to extract pain points from LinkedIn and website data, enriched with industry and role context.
    """
    llm = GPT4OMini(api_key=api_key)
    prompt_template = PromptTemplate(
        input_variables=["linkedin_data", "website_data", "industry", "role"],
        template="""Analyze the following information to identify potential pain points for a {role} in the {industry} industry:

LinkedIn Data: {linkedin_data}
Website Data: {website_data}

List key challenges, pain points, or areas where they might need help in bullet points."""
    )
    chain = LLMChain(llm=llm, prompt=prompt_template)

    # Generate pain points
    pain_points = chain.run(
        linkedin_data=linkedin_data,
        website_data=website_data,
        industry=industry,
        role=role
    )
    return pain_points.split("\n")  # Split the output into a list of pain points
    
def generate_custom_email(api_key, template, product_name, target_audience, tone, pain_points):
    """
    Generate a custom cold email using LangChain and GPT-4o-mini.
    """
    # Define LLM and prompt template
    llm = GPT4OMini(api_key=api_key)
    prompt_template = PromptTemplate(
        input_variables=["template", "product_name", "target_audience", "tone", "pain_points"],
        template=template
    )
    chain = LLMChain(llm=llm, prompt=prompt_template)
    
    # Generate email content
    return chain.run(
        template=template,
        product_name=product_name,
        target_audience=target_audience,
        tone=tone,
        pain_points=", ".join(pain_points)
    )

5. Export the data

Now is the time to put it together and see how it works.

import requests
import pandas as pd
from scrapegraphai.graphs import SmartScraperGraph
from langchain import LLMChain
from langchain.prompts import PromptTemplate
from langchain.llms import GPT4OMini

# 1. Lead Searching Using Instantly API
def search_leads_industry_role(api_key, industry, role):
    url = "https://api.instantly.ai/leads/search"
    payload = {"industry": industry, "role": role}
    headers = {"Authorization": f"Bearer {api_key}"}

    response = requests.post(url, json=payload, headers=headers)
    if response.status_code == 200:
        return response.json()
    else:
        print(f"Error in Lead Search: {response.status_code} - {response.text}")
        return []

# 2. Email Validation with Piloterr
def validate_email(api_key, email):
    url = "https://api.piloterr.com/email/validate"
    payload = {"email": email}
    headers = {"Authorization": f"Bearer {api_key}"}

    response = requests.post(url, json=payload, headers=headers)
    if response.status_code == 200:
        return response.json().get("status", "invalid")
    else:
        print(f"Error in Email Validation: {response.status_code} - {response.text}")
        return "invalid"

# 3. LinkedIn Scraping via Piloterr API
def scrape_linkedin_profile(api_key, profile_url):
    url = "https://api.piloterr.com/linkedin/scrape"
    payload = {"url": profile_url}
    headers = {"Authorization": f"Bearer {api_key}"}

    response = requests.post(url, json=payload, headers=headers)
    if response.status_code == 200:
        return response.json()
    else:
        print(f"Error in LinkedIn Scraping: {response.status_code} - {response.text}")
        return {}

# 4. Website Scraping with SmartScraperGraph
def scrape_website_data(graph_key, website_url):
    scraper = SmartScraperGraph(api_key=graph_key)
    try:
        return scraper.scrape(url=website_url)
    except Exception as e:
        print(f"Error in Website Scraping: {str(e)}")
        return {}

# 5. Analyze Pain Points Using GPT-4o-mini
def analyze_pain_points_with_llm(api_key, linkedin_data, website_data, industry, role):
    llm = GPT4OMini(api_key=api_key)
    prompt_template = PromptTemplate(
        input_variables=["linkedin_data", "website_data", "industry", "role"],
        template="""Analyze the following information to identify potential pain points for a {role} in the {industry} industry:

LinkedIn Data: {linkedin_data}
Website Data: {website_data}

List key challenges, pain points, or areas where they might need help in bullet points."""
    )
    chain = LLMChain(llm=llm, prompt=prompt_template)

    pain_points = chain.run(
        linkedin_data=linkedin_data,
        website_data=website_data,
        industry=industry,
        role=role
    )
    return pain_points.split("\n")

# 6. Generate Custom Cold Email Using GPT-4o-mini
def generate_custom_email(api_key, template, product_name, target_audience, tone, pain_points):
    llm = GPT4OMini(api_key=api_key)
    prompt_template = PromptTemplate(
        input_variables=["template", "product_name", "target_audience", "tone", "pain_points"],
        template=template
    )
    chain = LLMChain(llm=llm, prompt=prompt_template)

    return chain.run(
        template=template,
        product_name=product_name,
        target_audience=target_audience,
        tone=tone,
        pain_points=", ".join(pain_points)
    )

# 7. Process Leads, Analyze Pain Points, and Generate Emails
def process_leads_with_pain_points(instantly_key, piloterr_key, scrapegraph_key, gpt4o_key, industry, role, product_name, tone, email_template):
    leads = search_leads_industry_role(instantly_key, industry, role)

    processed_leads = []
    for lead in leads:
        email = lead.get("email")
        linkedin_url = lead.get("linkedin_url")
        website_url = lead.get("website_url")
        
        # Validate email
        email_status = validate_email(piloterr_key, email)
        if email_status != "valid":
            continue
        
        # Scrape LinkedIn and website data
        linkedin_data = scrape_linkedin_profile(piloterr_key, linkedin_url)
        website_data = scrape_website_data(scrapegraph_key, website_url)

        # Analyze pain points
        pain_points = analyze_pain_points_with_llm(
            api_key=gpt4o_key,
            linkedin_data=linkedin_data,
            website_data=website_data,
            industry=industry,
            role=role
        )

        # Generate custom cold email
        email_content = generate_custom_email(
            api_key=gpt4o_key,
            template=email_template,
            product_name=product_name,
            target_audience=role,
            tone=tone,
            pain_points=pain_points
        )

        # Collect processed lead information
        processed_leads.append({
            "name": lead.get("name"),
            "email": email,
            "linkedin_data": linkedin_data,
            "website_data": website_data,
            "pain_points": pain_points,
            "email_content": email_content,
        })
    
    # Export to CSV
    leads_df = pd.DataFrame(processed_leads)
    leads_df.to_csv("processed_leads_with_pain_points.csv", index=False)
    print("Processed leads with pain points and customized emails exported to processed_leads_with_pain_points.csv")

# Example Workflow Execution
if __name__ == "__main__":
    # API Keys
    INSTANTLY_API_KEY = "your_instantly_api_key"
    PILOTERR_API_KEY = "your_piloterr_api_key"
    SCRAPEGRAPH_API_KEY = "your_scrapegraphai_api_key"
    GPT4O_API_KEY = "your_gpt4o_api_key"

    # Parameters
    INDUSTRY = "Software Development"
    ROLE = "CTO"
    PRODUCT_NAME = "CRM Software"
    TONE = "professional"
    EMAIL_TEMPLATE = """Dear {target_audience},

We understand that {pain_points} are key challenges in your industry. {product_name} is designed to address these needs directly by [brief product description]. 
Let’s connect to discuss how we can help solve these challenges effectively.

Best regards,
[Your Name]
"""

    process_leads_with_pain_points(
        INSTANTLY_API_KEY, PILOTERR_API_KEY, SCRAPEGRAPH_API_KEY, GPT4O_API_KEY,
        INDUSTRY, ROLE, PRODUCT_NAME, TONE, EMAIL_TEMPLATE
    )

Testing/Usage

Testing the tool focused on ensuring its effectiveness in lead acquisition (validation, enrichment) and content generation, as these were the two core functions it was designed to handle. For the final step — sending emails — we integrated with Instantly, a reliable email sequencer. However, the tool is compatible with any sequencer that supports importing leads from files and allows for an extensive number of custom variables.

1. Prepare the Sequencer

The key requirement for compatibility is the ability to import a CSV file containing enriched leads and personalized email content. The sequencer must also support enough custom variables to handle deeply personalized content. Some sequencers limit users to a small number of variables (e.g., 5), which may not suffice for campaigns requiring advanced personalization.

2. Import data

Once the CSV is generated by the tool:

Upload the CSV to the chosen email sequencer.
Map the column with the email content to a custom variable in the sequencer.

3. Launch the campaign

After importing the leads:

Create a new campaign in the sequencer.
Use the custom variable containing the personalized email content within your email sequence.
Configure the campaign settings (e.g., timing, follow-ups) as you normally would in your sequencer.
Launch the campaign!

By using this workflow, the email sequencer takes over the sending process while our tool ensures the quality and personalization of the leads and content. This approach retains flexibility, allowing users to work with their preferred sequencer without sacrificing personalization or scalability.

Challenges and Lessons Learned

Building a tool like this was an insightful experience, but it wasn’t without its challenges. Along the way, we encountered several pitfalls and learned valuable lessons that improved the tool's reliability and usability. Here are some key takeaways:

1. Save your outputs

One of the earliest lessons we learned was the importance of saving data on the fly. Imagine running a batch of 500 prospects, only to have an error midway that results in losing all the progress. This can be incredibly frustrating and time-consuming.

‍

To avoid this, we implemented automatic saving after processing each lead. Additionally, printing logs during execution allowed us to backtrace and identify the source of any errors. These measures ensured we didn’t lose valuable data or waste resources.

2. Test on a smaller scale

In the early stages, we made the mistake of running large batches without thoroughly testing the tool. In one instance, we ran a big batch only to find out the email column wasn’t being saved in the output file, rendering the entire run useless.

‍

This taught us the value of starting small. By testing the tool with smaller batches of leads, we could identify and resolve issues before scaling up. This approach saved us time and minimized wasted effort.

3. Prepare for future users

Initially, the tool was designed for personal use, but as our sales team grows, we realize it needs to be accessible to others. Onboarding non-technical team members can bring several challenges:

Code Documentation: Clear, comprehensive documentation became essential. It helped other team members understand the tool’s logic and reduced reliance on us for troubleshooting. Having someone review the documentation and provide feedback ensured it was user-friendly.
User Interface (UI): Since not everyone on the sales team could be technical, you should consider adding a UI to simplify interactions with the tool. This would make it easier for team members to operate the tool without diving into the code.

Future Scope

Here are a few ideas for future development:

1. Scaling up

One natural direction is increasing the tool’s capacity to handle larger and larger batches of leads. As businesses grow and their outreach needs expand, the ability to process thousands of leads in a single run becomes critical. This would involve:

Optimizing performance to reduce processing time (speeding up the process per one lead)
Enhancing error-handling mechanisms to ensure smooth execution, especially at scale
Potentially exploring distributed processing to handle data-heavy operations

2. Cloud Integration (AWS)

Moving the tool to the cloud, with a focus on AWS, offers significant benefits:

Accessibility. A remote setup allows users from different locations, devices and roles to access and utilize the tool without requiring local installations
Scalability. Cloud resources can scale dynamically to accommodate larger workloads, ensuring consistent performance.

‍

AWS ✨which we are proud partners with✨ offers a suite of services, such as EC2 for computing, S3 for storage and Lambda for serverless execution. That would make this transition smooth and efficient.

3. Automating Email Sending

A logical next step in the tool’s evolution would be to incorporate sending automation, transforming it into a complete email sequencer. This would eliminate the need for external tools like Instantly, offering users a unified, end-to-end solution. Key features could include:

Customizable email sequences with advanced scheduling options.
Built-in analytics to track open rates, click-through rates, and replies.
Advanced personalization capabilities to further enhance campaign performance.

To sum up

Building the Cold Emailing Helper has been quite a journey for our team at DataRabbit. What started as a solution to automate tedious tasks like lead scraping and content generation has grown into a powerful tool for scaling personalized cold email campaigns. By addressing the challenges of lead enrichment, email validation, and human-like personalization, we’ve created a system that bridges the gap between automation and meaningful outreach.

‍

However, this is just the beginning. As we look toward the future, we envision scaling the tool, moving it to the cloud for greater accessibility, and even integrating email-sending automation to create a fully end-to-end solution. The possibilities are endless, and the potential for improving business development through smarter automation is immense.

‍

At datarabbit, we specialize in creating custom-tailored AI and data analysis solutions that address unique business challenges. If you’re looking to implement a similar tool or have an entirely different project in mind, we’d love to help. Let’s work together to turn your vision into reality. Reach out to us to explore how we can build the perfect solution for your business!

Cold Emailing (finally) Personalized

But why even bother?

Solution’s Architecture

Let's get to code

1. Scrape leads

2. Verify the emails

3. Scrape data

4. Create content

5. Export the data

Testing/Usage

1. Prepare the Sequencer

2. Import data

3. Launch the campaign

Challenges and Lessons Learned

1. Save your outputs

2. Test on a smaller scale

3. Prepare for future users

Future Scope

1. Scaling up

2. Cloud Integration (AWS)

3. Automating Email Sending

To sum up

Related articles

Let's discuss how data rabbit can support your company

Our Work

Project Name

Project Name

Project Name

Project Name