I'm trying to write a script that can collect information about phones and add it to a dataframe. I have such a dataset with customer ID. At the same time, the phone numbers are stored inside the web page in the form of a link.
| Date | ID | Comment | 
|---|---|---|
| 20240514 May, 14 22:00 | R_111 | Le client ne répond pas | 
I'm trying to write a script that can collect information about phones and add it to a dataframe. I have such a dataset with customer ID. At the same time, the phone numbers are stored inside the web page in the form of a link.
| Date | ID | Comment | 
|---|---|---|
| 20240514 May, 14 22:00 | R_111 | Le client ne répond pas | 
I think you can take a list of ID customers from Dataframe and use the library to notify the phone number by ID. Example example.com/client/?id=111
The order (ID) page looks like this:
%%html
<!doctype html>
<html>
    <head>
        <title>id 111</title>
    </head>
    <body>
    <div>
            <div id="contactButton" class="bg-primary-subtle py-2 px-3 rounded-3 text-primary fw-medium" style="cursor: pointer">
                Contact
            </div>
            <div class="d-flex flex-column position-relative mt-2 d-none" id="contactBlock">
                <div id="phone" class="position-absolute end-0 text-nowrap">
                    <a href="tel:+77777777777" class="btn btn-lg btn-outline-primary fw-medium">
                        
                    <button class="btn btn-lg btn-outline-secondary fw-medium" data-bs-toggle="modal" data-bs-target="#exampleModal">
                     
                    </button>
                </div>
            </div>
        </div>
</body>
</html>
I want to get such a dataframe:
| ID | Phone | 
|---|---|
| R_111 | 777777777 | 
I wrote the following code:
import requests
from bs4 import BeautifulSoup
def get_client_phone(client_id):
    # url client 
    _url = f"https://example.com/client/?id={client_id}"
    response = requests.get(_url, data=cfg.payload, headers=headers)
    
    # Status
    if response.status_code != 200:
        print(f"Eror: {response.status_code}")
        return None
    # Parse page
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Find phone
    phone_element = soup.find(id='phone')
    
    if phone_element:
        # Extract phone
        phone_link = phone_element.find('a', href=True)
        if phone_link:
            phone_number = phone_link['href'].replace('tel:', '')  # Remove 'tel:'
            return phone_number
    else:
        print("The phone was not found")
        return None
client_id = 'R_111' 
phone_number = get_client_phone(client_id)
if phone_number:
    print(f"Phone {client_id}: {phone_number}")
else:
    print("Error")
    
        
            
                
                    
                    Seems that mapping works and focus is on dataframe - Extract the ids from your dataframe as a series, iterate over them and record the results in a dictionary, which you can then easily transfer back into a dataframe.
# list or series of your ids
client_id_series = ['R_111','R_222']
pd.DataFrame(
    [
        {'ID':client_id,'Phone':get_client_phone(client_id)} 
        for client_id 
        in client_id_series
    ]
)
| ID | Phone | 
|---|---|
| R_111 | +77777777777 | 
| R_222 | +88888888888 | 
Or simply iterate your existing dataframe directly and only add the column with the result of the phone number
data = {
    'Date': ['20240514 May, 14 22:00', '20240514 May, 14 23:00'],
    'ID': ['R_111', 'R_222'],
    'Comment': ['Le client ne répond pas', None]
}
df = pd.DataFrame(data)
df['Phone'] = df['ID'].apply(get_client_phone)
print(df)
| Date | ID | Comment | Phone | |
|---|---|---|---|---|
| 0 | 20240514 May, 14 22:00 | R_111 | Le client ne répond pas | +77777777777 | 
| 1 | 20240514 May, 14 23:00 | R_222 | +88888888888 | 

soupto make sure that they deliver the expected content. – HedgeHog Commented Jan 24 at 9:52