Introduction
Hello! Welcome to my new series of articles focused on building a Salesforce extraction pipeline from scratch, without relying on any special libraries or SaaS services. In this series, I will use Python to demonstrate the pipeline, but I aim to present it in a generalized way that should allow you to port it over to any other language.
In this first part of the series, we will dive into authentication using the SOAP API method, which will enable connections via both the REST API and the BULK API. By the end of the series, you will have a stateless, distributed extraction pipeline that can handle both Salesforce APIs and could easily be extended to other data sources as well. This will all be done from scratch, without using any Salesforce-specific libraries.
I chose to use SOAP because, although it’s harder to find good documentation on it, it is actually easier to get a demo up and running. You do not need to have an app already set up in Salesforce, nor do you need to use the CLI to create an OAuth token. If you’re planning to deploy this in production and security is a concern (which it probably should be), then setting up OAuth would be ideal. But for learning, testing, and demo purposes, SOAP authentication is sufficient.
I will maintain a GitHub repository for this series, and I will put each part in its own branch, with each successive branch expanding upon the previous part’s codebase.
You can find the repository here:
Authentication Implementation
In this section, I will walk through the salesforce_authentication.py
code step by step.
Imports and Requirements
The first section of the code includes the necessary imports:
import requests
import os
from xml.sax.saxutils import escape
import xml.etree.ElementTree as ET
requests
: Used to make HTTP requests to the Salesforce API.os
: Used to access environment variables, such as the Salesforce URL.xml.sax.saxutils.escape
: Used to escape special characters in XML, ensuring that the XML request is well-formed.xml.etree.ElementTree
: Used to parse XML responses from Salesforce.
Understanding Salesforce Security Tokens
Before we proceed, it’s important to understand what the security token is and how to obtain it. Salesforce requires a security token in addition to your password when logging in from an untrusted network. The security token is a unique key tied to your user account.
To obtain your security token:
- Log in to your Salesforce account.
- Click on your avatar or name in the top-right corner and selectย Settings.
- In the left-hand sidebar, navigate toย My Personal Informationย >ย Reset My Security Token.
- Clickย Reset Security Token. Salesforce will send a new security token to your registered email address.
Make sure to keep your security token secure, as it provides access to your Salesforce data.
Main Authentication Function
The next section is the main public interface for the module, the get_session_id
function:
def get_session_id(username, password, security_token):
"""
Authenticates with Salesforce and retrieves a session ID.
:param username: The Salesforce username.
:param password: The Salesforce password.
:param security_token: The Salesforce security token.
:return: The Salesforce session ID.
:rtype: str
:raises Exception: If the authentication request fails or the session ID is not found.
"""
response = requests.post(
get_url(),
data=get_body(username, password, security_token),
headers=get_headers()
)
if response.status_code != 200:
raise Exception(f"Authentication failed with status code {response.status_code}: {response.text}")
session_id = extract_session_id(response.text)
return session_id
This function encapsulates the entire login process: constructing the request body, sending it, and handling the response.
- URL Construction:ย
get_url()
ย retrieves the Salesforce login URL for the SOAP API. - Body Construction:ย
get_body()
ย builds the XML-formatted request that Salesforce expects, including your username, password, and security token. - Header Setup:ย
get_headers()
ย defines the necessary headers to inform Salesforce that this is a SOAP request. - Session ID Extraction: The session ID is extracted from the SOAP response usingย
extract_session_id()
.
Helper Functions
The rest of the file consists of helper functions:
def get_url():
url = f"{os.environ['SALESFORCE_URL']}/services/Soap/u/60.0"
return url
def get_body(username, password, security_token):
"""
Constructs the XML body for the Salesforce SOAP login request.
:param username: The Salesforce username.
:param password: The Salesforce password.
:param security_token: The Salesforce security token.
:return: The XML-formatted body for the login request.
"""
formatted_password = escape_chars(password)
formatted_secutiry_token = escape_chars(security_token)
return f"""<?xml version="1.0" encoding="utf-8" ?>
<env:Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
<env:Body>
<n1:login xmlns:n1="urn:partner.soap.sforce.com">
<n1:username>{username}</n1:username>
<n1:password>{formatted_password}{formatted_secutiry_token}</n1:password>
</n1:login>
</env:Body>
</env:Envelope>"""
def escape_chars(string):
"""
Escapes special XML characters in a string to ensure valid XML formatting.
:param string: The string to escape.
:return: The escaped string.
"""
extra_entities = {"'": "'", '"': """}
escaped_string = escape(string, extra_entities)
return escaped_string
def get_headers():
headers = {
'SOAPAction': 'login',
'Content-Type': 'text/xml',
'charset': 'UTF-8'
}
return headers
def extract_session_id(xml_str):
"""
Parses the Salesforce SOAP response XML to extract the session ID.
:param xml_str: The XML response as a string.
:return: The extracted session ID, or None if not found.
:raises ET.ParseError: If the XML parsing fails.
"""
namespaces = {
'soapenv': 'http://schemas.xmlsoap.org/soap/envelope/',
'ns': 'urn:partner.soap.sforce.com'
}
root = ET.fromstring(xml_str)
session_id_elem = root.find(
'.//soapenv:Body/ns:loginResponse/ns:result/ns:sessionId', namespaces)
if session_id_elem is not None:
return session_id_elem.text
else:
print("sessionId not found in the response.")
return None
get_url
: Fetches the base Salesforce URL from an environment variable (SALESFORCE_URL
) and appends the path needed to call the SOAP authentication API (/services/Soap/u/60.0
). The version number (60.0
) corresponds to the API version you are targeting.get_body
: Constructs the XML body required by the SOAP API using the providedยusername
,ยpassword
, andยsecurity_token
. It sanitizes theยpassword
ย andยsecurity_token
ย by escaping special XML characters to prevent formatting errors.escape_chars
: Escapes special XML characters in a string to ensure valid XML formatting. This is important because certain characters (likeย<
, data-preserve-html-node=”true”ย>
,ย&
,ย'
,ย"
) can break the XML structure if not properly escaped.get_headers
: Returns the necessary headers for the authentication request. Specifically, it sets theยSOAPAction
ย toยlogin
ย and specifies the content type asยtext/xml
ย withยUTF-8
ย charset.extract_session_id
: Parses the SOAP response XML to extract the session ID. It uses the namespaces defined in the response to correctly locate theยsessionId
ย element.In the SOAP response, elements are namespaced with prefixes likeยsoapenv
ย andยns
. Theยnamespaces
ย dictionary maps these prefixes to their corresponding URIs, allowingยElementTree
ย to correctly parse the XML and find theยsessionId
ย element using an XPath expression.
In Action
Once you have created the salesforce_authentication.py
file (or cloned it from the repository), you can test the authentication by running the following code:
import salesforce_authentication as SA
username = 'your_username_here'
password = 'your_password_here'
security_token = 'your_security_token_here'
session_id = SA.get_session_id(username, password, security_token)
print(session_id)
Remember to replace 'your_username_here'
, 'your_password_here'
, and 'your_security_token_here'
with your actual Salesforce credentials.
Alternatively, you might want to store your credentials securely, such as in environment variables or a configuration file that is not checked into version control.
Example Using Environment Variables
Here’s how you could modify the code to use environment variables:
import os
import salesforce_authentication as SA
username = os.environ.get('SALESFORCE_USERNAME')
password = os.environ.get('SALESFORCE_PASSWORD')
security_token = os.environ.get('SALESFORCE_SECURITY_TOKEN')
session_id = SA.get_session_id(username, password, security_token)
print(session_id)
This approach helps keep your credentials out of your codebase.
Fetching Data Using the Session ID
Once you have the session ID, you can use it to make authenticated requests to Salesforce’s REST API. Below is a modified sneak peek at next week’s focusโgrabbing records from Salesforce using SOQL and the REST API.
def get_rest_query_results(query):
"""
Executes a REST API GET request to Salesforce with the given query.
:param query: The SOQL query string to execute.
:return: The JSON response from the Salesforce API.
:raises Exception: If the REST query fails with a non-200 status code.
"""
response = requests.get(
f"{os.environ['SALESFORCE_URL']}/services/data/v60.0/{query}",
headers={
"Authorization": f"Bearer {fetch_session_id()}"
}
)
if response.status_code != 200:
raise Exception(f"Rest query failed with a status of {response.status_code}: {response.text}")
return response.json()
def fetch_session_id():
"""
Fetches the current Salesforce session ID using stored credentials.
:return: The Salesforce session ID as a string.
"""
credentials = get_credentials("salesforce")
return get_session_id(
credentials["salesforce_username"],
credentials["salesforce_password"],
credentials["salesforce_security_token"]
)
In this snippet:
get_rest_query_results
: Executes a REST API GET request to Salesforce with the given SOQL query.- It constructs the request URL by appending the query to the base Salesforce URL.
- It sets theย
Authorization
ย header using the session ID obtained fromยfetch_session_id()
. - It checks for a successful response and returns the JSON data.
fetch_session_id
: Fetches the current Salesforce session ID using stored credentials.get_credentials("salesforce")
: A function that retrieves your Salesforce credentials from a secure location (e.g., environment variables, a configuration file, or a secrets manager).- It then callsย
get_session_id
ย with the retrieved credentials.
Conclusion
Authenticating with Salesforce using the SOAP API is relatively straightforward once you understand the process, but there are gaps in the documentation that can make it a bit frustrating to figure out. I hope that what I’ve shown here is helpful to you, regardless of the programming language you’re using. My goal was to demonstrate the underlying mechanisms needed for authentication, rather than just showing “This is how you connect to Salesforce using Python.”
If I left anything unclear, or if you have any comments or questions, please leave them below.
What’s Next
In the next article, I plan to show how to fetch records from Salesforce using the REST API. This will include query generation and dealing with how to add fields to queriesโsince bounded and unbounded queries have different limitationsโand handling Salesforce’s pagination structure.
In the article after that, I will delve into the Salesforce BULK API, which presents its own challengesโmainly surrounding the asynchronous query system, where you have to wait and check on the query status before you can fetch results.