At datarabbit, personal outreach has always been at the heart of how we grow our business. For years, we relied on attending conferences, tapping into personal referrals, and writing each outreach message by hand. These methods helped us build meaningful relationships and secure new leads. But as effective as they were, we quickly realized their limitations.
You simply can’t be everywhere at once. Conferences take time and resources, and referrals — no matter how reliable — eventually run their course. We needed something more scalable, a system that wouldn’t rely entirely on our personal efforts.
That’s when we started cold emailing. By combining automation with thoughtful personalization, we saw an opportunity to build a repeatable process that could reach more leads while maintaining the personal touch we value. In this blog post, we’ll walk you through how we developed our Cold Emailing Helper — a tool that automates lead scraping, verification, pain-point identification, and email composition. Our goal? To share our journey and inspire you to scale your outreach in a way that works for your business too.
But why even bother?
In today’s fast-paced business world, especially at the C-suite level, email inboxes are flooded with messages. Amid this daily deluge, it’s no surprise that most cold outreach emails are either ignored or deleted. To stand out, cold campaigns need to go beyond the generic; they need to feel personal and genuinely relevant.
But here’s the catch: personalization isn’t just about sprinkling a recipient’s name or company name into an email template. True personalization means deeply tailoring the content — showing an understanding of the recipient’s challenges, goals, or opportunities. This is where the magic happens. Emails that resonate on a deeper level feel more human, leading to better engagement and a higher chance of replies — the ultimate goal of email marketing.
The challenge is that while there are tools, often called sequencers, that automate cold email campaigns, their personalization capabilities are limited. These platforms typically allow for basic variable insertion, such as names or titles, but they don’t enable lead enrichment or true content customization. This leaves a gap in the market for tools that can deliver highly personalized and impactful messaging at scale.
That’s exactly the “why” behind our project. We wanted to create a solution that automates not just the sending of cold emails but also the critical steps of lead enrichment, pain-point identification, and crafting human-like, deeply tailored emails. With this foundation, our Cold Emailing Helper aims to bridge the gap between scalable automation and meaningful personalization.
Solution’s Architecture
1. Lead Scraping and Enrichement
- Lead Searching: We integrated Instantly’s API, a dependable solution for finding LinkedIn profiles and websites based on specific criteria like industry and role.
- Email Validation: To maintain high deliverability rates, we used Piloterr, a platform specializing in scalable email marketing APIs. Its robust email validation capabilities ensured our email lists remained clean and effective.
- LinkedIn Scraping: Scraping LinkedIn proved challenging due to CAPTCHA systems and frequent token exchanges that thwart custom implementations. After trying simulated logins and direct request mimicry, we turned to Piloterr again. Their LinkedIn scraping API handled the complexities of large-scale data extraction reliably and efficiently.
- Website Scraping: While tools like BeautifulSoup allow for custom scraping, we opted for ScrapeGraphAI for its ability to extract and structure website data quickly. This saved us time and provided consistent, high-quality output.
2. Content generation
- LangChain: To simplify working with large language models (LLMs), we adopted LangChain, which abstracts interactions with LLMs and allows us to experiment with multiple AI providers seamlessly.
- GPT-4o-mini: We chose GPT-4o-mini for its balance between affordability and high-quality text generation, ensuring our emails feel natural and resonate with recipients.
3. Handling Data
- Python: Its rich ecosystem of libraries for scraping, data processing, and machine learning made Python the ideal language for this project. Its versatility allowed us to build and integrate various components effortlessly.
- Pandas: Managing large datasets was a breeze with Pandas, a powerful library for manipulating dataframes and handling CSV files. Its performance and ease of use were crucial for importing, cleaning, and exporting lead data.
Let's get to code
Now that you know what technologies we intend to use, we can get to the details. Here are the list of steps that we need to cover to fulfill our goal:
- Scrape leads from Instantly
- Verify the emails
- Scrape data (LinkedIn and Website)
- Create content
- Export data
1. Scrape leads
Here we can use Instantly's API to fetch the leads that match our criteria. In this example for criteria I used only role and industry, but you can add more details - look it up in the official documentation.
import requests
def search_leads_industry_role(api_key, industry, role):
url = "https://api.instantly.ai/leads/search"
payload = {
"industry": industry,
"role": role
}
headers = {
"Authorization": f"Bearer {api_key}"
}
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
return response.json() # Returns a list of LinkedIn profiles and websites
else:
print(f"Error: {response.status_code} - {response.text}")
return None
# Example Usage
api_key = "your_instantly_api_key"
leads = search_leads_industry_role(api_key, "Software Development", "CTO")
print(leads)
2. Verify the emails
Not all the emails that Instantly have are up-to-date. Since we don't want to send messages to nonexistent email addresses, we can use the aforementioned Piloterr's API to remove the invalid ones.
import requests
def validate_email(api_key, email):
url = "https://api.piloterr.com/email/validate"
payload = {"email": email}
headers = {
"Authorization": f"Bearer {api_key}"
}
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
return response.json() # Includes email validity status
else:
print(f"Error: {response.status_code} - {response.text}")
return None
# Example Usage
api_key = "your_piloterr_api_key"
validation = validate_email(api_key, "example@domain.com")
print(validation)
3. Scrape data
We have two basic data sources for scraping: LinkedIn and Website.
To scrape the first one we can also use the Piloterr's API as the custom LinkedIn scraping is basically impossible to do at scale.
import requests
def scrape_linkedin_profile(api_key, profile_url):
url = "https://api.piloterr.com/linkedin/scrape"
payload = {"url": profile_url}
headers = {
"Authorization": f"Bearer {api_key}"
}
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
return response.json() # Structured LinkedIn profile data
else:
print(f"Error: {response.status_code} - {response.text}")
return None
# Example Usage
api_key = "your_piloterr_api_key"
profile_data = scrape_linkedin_profile(api_key, "https://linkedin.com/in/some-profile")
print(profile_data)
To scrape website we can use ScrapegraphAI that will already (using LLM) fetch the most important info and format it for us.
from scrapegraphai.graphs import SmartScraperGraph
def scrape_website_data(graph_key, website_url):
# Initialize SmartScraperGraph with API key
scraper = SmartScraperGraph(api_key=graph_key)
try:
# Scrape structured data from the website
result = scraper.scrape(url=website_url)
return result
except Exception as e:
print(f"Error in Website Scraping: {str(e)}")
return {}
4. Create content
In this step we can use LLM like the GPT-4o-mini to create personalized content based on the previously scraped data.
from langchain import LLMChain
from langchain.prompts import PromptTemplate
from langchain.llms import GPT4OMini
def analyze_pain_points_with_llm(api_key, linkedin_data, website_data, industry, role):
"""
Use an LLM to extract pain points from LinkedIn and website data, enriched with industry and role context.
"""
llm = GPT4OMini(api_key=api_key)
prompt_template = PromptTemplate(
input_variables=["linkedin_data", "website_data", "industry", "role"],
template="""Analyze the following information to identify potential pain points for a {role} in the {industry} industry:
LinkedIn Data: {linkedin_data}
Website Data: {website_data}
List key challenges, pain points, or areas where they might need help in bullet points."""
)
chain = LLMChain(llm=llm, prompt=prompt_template)
# Generate pain points
pain_points = chain.run(
linkedin_data=linkedin_data,
website_data=website_data,
industry=industry,
role=role
)
return pain_points.split("\n") # Split the output into a list of pain points
def generate_custom_email(api_key, template, product_name, target_audience, tone, pain_points):
"""
Generate a custom cold email using LangChain and GPT-4o-mini.
"""
# Define LLM and prompt template
llm = GPT4OMini(api_key=api_key)
prompt_template = PromptTemplate(
input_variables=["template", "product_name", "target_audience", "tone", "pain_points"],
template=template
)
chain = LLMChain(llm=llm, prompt=prompt_template)
# Generate email content
return chain.run(
template=template,
product_name=product_name,
target_audience=target_audience,
tone=tone,
pain_points=", ".join(pain_points)
)
5. Export the data
Now is the time to put it together and see how it works.
import requests
import pandas as pd
from scrapegraphai.graphs import SmartScraperGraph
from langchain import LLMChain
from langchain.prompts import PromptTemplate
from langchain.llms import GPT4OMini
# 1. Lead Searching Using Instantly API
def search_leads_industry_role(api_key, industry, role):
url = "https://api.instantly.ai/leads/search"
payload = {"industry": industry, "role": role}
headers = {"Authorization": f"Bearer {api_key}"}
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
return response.json()
else:
print(f"Error in Lead Search: {response.status_code} - {response.text}")
return []
# 2. Email Validation with Piloterr
def validate_email(api_key, email):
url = "https://api.piloterr.com/email/validate"
payload = {"email": email}
headers = {"Authorization": f"Bearer {api_key}"}
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
return response.json().get("status", "invalid")
else:
print(f"Error in Email Validation: {response.status_code} - {response.text}")
return "invalid"
# 3. LinkedIn Scraping via Piloterr API
def scrape_linkedin_profile(api_key, profile_url):
url = "https://api.piloterr.com/linkedin/scrape"
payload = {"url": profile_url}
headers = {"Authorization": f"Bearer {api_key}"}
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
return response.json()
else:
print(f"Error in LinkedIn Scraping: {response.status_code} - {response.text}")
return {}
# 4. Website Scraping with SmartScraperGraph
def scrape_website_data(graph_key, website_url):
scraper = SmartScraperGraph(api_key=graph_key)
try:
return scraper.scrape(url=website_url)
except Exception as e:
print(f"Error in Website Scraping: {str(e)}")
return {}
# 5. Analyze Pain Points Using GPT-4o-mini
def analyze_pain_points_with_llm(api_key, linkedin_data, website_data, industry, role):
llm = GPT4OMini(api_key=api_key)
prompt_template = PromptTemplate(
input_variables=["linkedin_data", "website_data", "industry", "role"],
template="""Analyze the following information to identify potential pain points for a {role} in the {industry} industry:
LinkedIn Data: {linkedin_data}
Website Data: {website_data}
List key challenges, pain points, or areas where they might need help in bullet points."""
)
chain = LLMChain(llm=llm, prompt=prompt_template)
pain_points = chain.run(
linkedin_data=linkedin_data,
website_data=website_data,
industry=industry,
role=role
)
return pain_points.split("\n")
# 6. Generate Custom Cold Email Using GPT-4o-mini
def generate_custom_email(api_key, template, product_name, target_audience, tone, pain_points):
llm = GPT4OMini(api_key=api_key)
prompt_template = PromptTemplate(
input_variables=["template", "product_name", "target_audience", "tone", "pain_points"],
template=template
)
chain = LLMChain(llm=llm, prompt=prompt_template)
return chain.run(
template=template,
product_name=product_name,
target_audience=target_audience,
tone=tone,
pain_points=", ".join(pain_points)
)
# 7. Process Leads, Analyze Pain Points, and Generate Emails
def process_leads_with_pain_points(instantly_key, piloterr_key, scrapegraph_key, gpt4o_key, industry, role, product_name, tone, email_template):
leads = search_leads_industry_role(instantly_key, industry, role)
processed_leads = []
for lead in leads:
email = lead.get("email")
linkedin_url = lead.get("linkedin_url")
website_url = lead.get("website_url")
# Validate email
email_status = validate_email(piloterr_key, email)
if email_status != "valid":
continue
# Scrape LinkedIn and website data
linkedin_data = scrape_linkedin_profile(piloterr_key, linkedin_url)
website_data = scrape_website_data(scrapegraph_key, website_url)
# Analyze pain points
pain_points = analyze_pain_points_with_llm(
api_key=gpt4o_key,
linkedin_data=linkedin_data,
website_data=website_data,
industry=industry,
role=role
)
# Generate custom cold email
email_content = generate_custom_email(
api_key=gpt4o_key,
template=email_template,
product_name=product_name,
target_audience=role,
tone=tone,
pain_points=pain_points
)
# Collect processed lead information
processed_leads.append({
"name": lead.get("name"),
"email": email,
"linkedin_data": linkedin_data,
"website_data": website_data,
"pain_points": pain_points,
"email_content": email_content,
})
# Export to CSV
leads_df = pd.DataFrame(processed_leads)
leads_df.to_csv("processed_leads_with_pain_points.csv", index=False)
print("Processed leads with pain points and customized emails exported to processed_leads_with_pain_points.csv")
# Example Workflow Execution
if __name__ == "__main__":
# API Keys
INSTANTLY_API_KEY = "your_instantly_api_key"
PILOTERR_API_KEY = "your_piloterr_api_key"
SCRAPEGRAPH_API_KEY = "your_scrapegraphai_api_key"
GPT4O_API_KEY = "your_gpt4o_api_key"
# Parameters
INDUSTRY = "Software Development"
ROLE = "CTO"
PRODUCT_NAME = "CRM Software"
TONE = "professional"
EMAIL_TEMPLATE = """Dear {target_audience},
We understand that {pain_points} are key challenges in your industry. {product_name} is designed to address these needs directly by [brief product description].
Let’s connect to discuss how we can help solve these challenges effectively.
Best regards,
[Your Name]
"""
process_leads_with_pain_points(
INSTANTLY_API_KEY, PILOTERR_API_KEY, SCRAPEGRAPH_API_KEY, GPT4O_API_KEY,
INDUSTRY, ROLE, PRODUCT_NAME, TONE, EMAIL_TEMPLATE
)
Testing/Usage
Testing the tool focused on ensuring its effectiveness in lead acquisition (validation, enrichment) and content generation, as these were the two core functions it was designed to handle. For the final step — sending emails — we integrated with Instantly, a reliable email sequencer. However, the tool is compatible with any sequencer that supports importing leads from files and allows for an extensive number of custom variables.
1. Prepare the Sequencer
The key requirement for compatibility is the ability to import a CSV file containing enriched leads and personalized email content. The sequencer must also support enough custom variables to handle deeply personalized content. Some sequencers limit users to a small number of variables (e.g., 5), which may not suffice for campaigns requiring advanced personalization.
2. Import data
Once the CSV is generated by the tool:
- Upload the CSV to the chosen email sequencer.
- Map the column with the email content to a custom variable in the sequencer.
3. Launch the campaign
After importing the leads:
- Create a new campaign in the sequencer.
- Use the custom variable containing the personalized email content within your email sequence.
- Configure the campaign settings (e.g., timing, follow-ups) as you normally would in your sequencer.
- Launch the campaign!
By using this workflow, the email sequencer takes over the sending process while our tool ensures the quality and personalization of the leads and content. This approach retains flexibility, allowing users to work with their preferred sequencer without sacrificing personalization or scalability.
Challenges and Lessons Learned
Building a tool like this was an insightful experience, but it wasn’t without its challenges. Along the way, we encountered several pitfalls and learned valuable lessons that improved the tool's reliability and usability. Here are some key takeaways:
1. Save your outputs
One of the earliest lessons we learned was the importance of saving data on the fly. Imagine running a batch of 500 prospects, only to have an error midway that results in losing all the progress. This can be incredibly frustrating and time-consuming.
To avoid this, we implemented automatic saving after processing each lead. Additionally, printing logs during execution allowed us to backtrace and identify the source of any errors. These measures ensured we didn’t lose valuable data or waste resources.
2. Test on a smaller scale
In the early stages, we made the mistake of running large batches without thoroughly testing the tool. In one instance, we ran a big batch only to find out the email column wasn’t being saved in the output file, rendering the entire run useless.
This taught us the value of starting small. By testing the tool with smaller batches of leads, we could identify and resolve issues before scaling up. This approach saved us time and minimized wasted effort.
3. Prepare for future users
Initially, the tool was designed for personal use, but as our sales team grows, we realize it needs to be accessible to others. Onboarding non-technical team members can bring several challenges:
- Code Documentation: Clear, comprehensive documentation became essential. It helped other team members understand the tool’s logic and reduced reliance on us for troubleshooting. Having someone review the documentation and provide feedback ensured it was user-friendly.
- User Interface (UI): Since not everyone on the sales team could be technical, you should consider adding a UI to simplify interactions with the tool. This would make it easier for team members to operate the tool without diving into the code.
Future Scope
Here are a few ideas for future development:
1. Scaling up
One natural direction is increasing the tool’s capacity to handle larger and larger batches of leads. As businesses grow and their outreach needs expand, the ability to process thousands of leads in a single run becomes critical. This would involve:
- Optimizing performance to reduce processing time (speeding up the process per one lead)
- Enhancing error-handling mechanisms to ensure smooth execution, especially at scale
- Potentially exploring distributed processing to handle data-heavy operations
2. Cloud Integration (AWS)
Moving the tool to the cloud, with a focus on AWS, offers significant benefits:
- Accessibility. A remote setup allows users from different locations, devices and roles to access and utilize the tool without requiring local installations
- Scalability. Cloud resources can scale dynamically to accommodate larger workloads, ensuring consistent performance.
AWS ✨which we are proud partners with✨ offers a suite of services, such as EC2 for computing, S3 for storage and Lambda for serverless execution. That would make this transition smooth and efficient.
3. Automating Email Sending
A logical next step in the tool’s evolution would be to incorporate sending automation, transforming it into a complete email sequencer. This would eliminate the need for external tools like Instantly, offering users a unified, end-to-end solution. Key features could include:
- Customizable email sequences with advanced scheduling options.
- Built-in analytics to track open rates, click-through rates, and replies.
- Advanced personalization capabilities to further enhance campaign performance.
To sum up
Building the Cold Emailing Helper has been quite a journey for our team at DataRabbit. What started as a solution to automate tedious tasks like lead scraping and content generation has grown into a powerful tool for scaling personalized cold email campaigns. By addressing the challenges of lead enrichment, email validation, and human-like personalization, we’ve created a system that bridges the gap between automation and meaningful outreach.
However, this is just the beginning. As we look toward the future, we envision scaling the tool, moving it to the cloud for greater accessibility, and even integrating email-sending automation to create a fully end-to-end solution. The possibilities are endless, and the potential for improving business development through smarter automation is immense.
At datarabbit, we specialize in creating custom-tailored AI and data analysis solutions that address unique business challenges. If you’re looking to implement a similar tool or have an entirely different project in mind, we’d love to help. Let’s work together to turn your vision into reality. Reach out to us to explore how we can build the perfect solution for your business!