Now Reading
Reverse engineering the Fb Messenger API

Reverse engineering the Fb Messenger API

2023-04-06 21:07:50

I just lately had event to reverse engineer the Facebook
Messenger
API for private use, and
realized the case research would make an incredible tutorial. This weblog put up
lays out how I deciphered the API, and explains the reverse
engineering methods I used, so that you could go forth and do the
identical. The required background information is pretty minimal.

Keep in mind: Reverse engineering is moral, pro-democratic, and
protected underneath US legislation, however you continue to must train integrity and
duty when interacting with any on-line system. Examples of
irresponsible habits embody:

  • Sending automated (or non-automated) spam to different customers
  • Downloading folks’s knowledge with out their consent
  • Placing undue load on infrastructure you aren’t paying for

This type of habits is inappropriate no matter how it’s
completed. However when put to the fitting use, reverse engineering is a
strategy to give your self and others better company, freedom, and
creativity on-line. For instance, you may use it to develop an
various interface to a web-based system which might in any other case be
inaccessible to customers with disabilities. Or you may regulate an
utility to be runnable on older methods which can be now not
supported, which might profit customers who can not afford to purchase new
{hardware}.

Warning: However the above, Fb likes to
robotically droop and/or ban individuals who take a look at their API humorous,
even in the event you aren’t doing something dangerous. Discover with warning.

Desk of contents

Objective

Our aim right here shall be to develop a command-line program known as
Messyger that permits:

  • Seeing your most up-to-date conversations, and which of them have unread
    messages.
  • Sending a message to a dialog.

After all, this isn’t sufficient for a full Messenger consumer, however it’s
sufficient to indicate off the methods with out having an excessive amount of busywork. To
apply your personal expertise, you may add extra capabilities after
studying this put up.

We’ll use Python as a result of it makes the code concise and simple to learn.

The complete code from this weblog put up is out there on
GitHub
.

Get the e-mail and password

Step 1 of utilizing Messenger is offering your e-mail deal with and password
to log in. The identical shall be true of Messyger:

import argparse

parser = argparse.ArgumentParser("messyger")
parser.add_argument("-u", "--email", required=True)
parser.add_argument("-p", "--password", required=True)
args = parser.parse_args()

print("e-mail:", args.e-mail)
print("password:", args.password)

(Read more about
argparse
.)

And utilization:

% python3 messyger.py -u [email protected] -p 0aSPlneurgscxzpuEZb9
e-mail: [email protected]
password: 0aSPlneurgscxzpuEZb9

And earlier than you ask, no, none of those credentials are legitimate. As a result of
Fb banned the entire accounts for having an excessive amount of suspicious
exercise…

Examine the login kind

Messenger login UI

So what occurs after we click on the login button? We are able to discover out by
opening the developer instruments in Chrome (or its equal in different
browsers; any will suffice) and switching to the Community tab to see
the checklist of all HTTP requests made by the browser whereas loading the
web page.

Network tab on login UI

Once we click on the login button, what we see (assuming we first verify
the “Protect log” checkbox) is a brand new request displaying up on the backside
of the log, to the relevant-seeming URL
https://www.messenger.com/login/password/.

Login request in Network
tab

This can be a POST request, which implies that knowledge is being submitted to
the server, which is what we count on for a request to log in. (Read
more about HTTP request
methods
.)
If we scroll down, Chrome will present us the shape knowledge that was
submitted as a part of this request, which certainly contains the e-mail and
password:

Form data for login request

(Read more about form
data
.)

There are additionally a bunch of different parameters right here, so we’ll need to
determine what these imply, and in the event that they’re essential. However first, we
ought to determine how making this request really ends in us being
logged in.

Usually, logins are dealt with utilizing cookies, so that you’ll present your
username and password, and the server offers you some cookies for
the browser to retailer. (Read more about
cookies
.)
Then, the cookies are included in all subsequent requests, permitting
the server to confirm that you’ve already logged in.

If we scroll up and take a look at the response headers, we will see that
certainly the response makes use of the Set-Cookie header to cross some cookies
again to the browser.

Set-Cookie headers in login
response

(Read more about HTTP response
headers
.)

Replicate the login request

Now that we’ve recognized the request that’s used to log in, we’ll
need to replicate it outdoors of the browser, in order that we’ve full
management over it. The aim is to take the e-mail and password, and
trade them for the cookies that can permit us to make subsequent
authenticated requests.

Fortunately, Chrome (and different browsers) present a simple manner to do that.
You’ll be able to right-click the request and extract a cURL command that can
do the identical factor because the browser did, however from the command line.

Copy as cURL right-click menu

(Read more about cURL.)

Right here’s what that appears like:

% curl 'https://www.messenger.com/login/password/' 
    -H 'authority: www.messenger.com' 
    -H 'pragma: no-cache' 
    -H 'cache-control: no-cache' 
    -H 'sec-ch-ua: "Chromium";v="94", "Google Chrome";v="94", ";Not A Model";v="99"' 
    -H 'sec-ch-ua-mobile: ?0' 
    -H 'sec-ch-ua-platform: "Linux"' 
    -H 'origin: https://www.messenger.com' 
    -H 'upgrade-insecure-requests: 1' 
    -H 'dnt: 1' 
    -H 'content-type: utility/x-www-form-urlencoded' 
    -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36' 
    -H 'settle for: textual content/html,utility/xhtml+xml,utility/xml;q=0.9,picture/avif,picture/webp,picture/apng,*/*;q=0.8,utility/signed-exchange;v=b3;q=0.9' 
    -H 'sec-fetch-site: same-origin' 
    -H 'sec-fetch-mode: navigate' 
    -H 'sec-fetch-user: ?1' 
    -H 'sec-fetch-dest: doc' 
    -H 'referer: https://www.messenger.com/' 
    -H 'accept-language: en-US,en;q=0.9' 
    -H 'cookie: wd=1010x980; dpr=2; datr=UqKaYf_W73hoTmwXhi8ZqzZ4' 
    --data-raw 'jazoest=2913&lsd=AVrs5S09Cjw&initial_request_id=APeMI6-a6r5592s5ETA6Zr5&timezone=480&lgndim=eyJ3IjoxOTIwLCJoIjoxMDgwLCJhdyI6MTkyMCwiYWgiOjEwNTMsImMiOjI0fQpercent3Dpercent3D&lgnrnd=114743_C4xH&lgnjs=n&e-mail=camilla.woodwardpercent40protonmail.com&cross=0aSPlneurgscxzpuEZb9&login=1&persistent=1&default_persistent=" 
    --compressed

Once we run this command, we’ll see that it finishes efficiently however
doesn’t print something. It’s because cURL doesn’t print response
headers by default, and this request solely returns headers (no physique
content material). We are able to add the -i option to display response
headers
,
which do seem to have the cookies we have been anticipating:

HTTP/2 302
set-cookie: sb=ja6aYcS61HGuWo-I6JaD_8G3; expires=Tue, 21-Nov-2023 20:39:41 GMT; Max-Age=63072000; path=/; area=.messenger.com; safe; httponly; SameSite=None
set-cookie: c_user=100075402451059; expires=Mon, 21-Nov-2022 20:39:40 GMT; Max-Age=31535999; path=/; area=.messenger.com; safe; SameSite=None
set-cookie: xs=36percent3Adbs1ryav8jfpEgpercent3A2percent3A1637527181percent3A-1percent3A-1; expires=Mon, 21-Nov-2022 20:39:40 GMT; Max-Age=31535999; path=/; area=.messenger.com; safe; httponly; SameSite=None
location: https://www.messenger.com/
content-security-policy-report-only: default-src https: knowledge: wss: blob: chrome-extension: "unsafe-inline' 'unsafe-eval';block-all-mixed-content;report-uri https://www.fb.com/csp/reporting/?reduce=0;
content-security-policy: default-src knowledge: blob: https://*.fbcdn.web https://*.fb.com *.fbsbx.com *.messenger.com;script-src *.fb.com *.fbcdn.web *.fb.web *.google-analytics.com *.google.com 127.0.0.1:* 'unsafe-inline' 'unsafe-eval' blob: knowledge: 'self' join.fb.web *.messenger.com;style-src knowledge: blob: 'unsafe-inline' *.fb.com *.fbcdn.web *.messenger.com;connect-src *.fb.com fb.com *.fbcdn.web *.fb.web wss://*.fb.com:* wss://*.whatsapp.com:* attachment.fbsbx.com ws://localhost:* blob: *.cdninstagram.com 'self' *.messenger.com wss://*.messenger.com www.messenger.com www.google-analytics.com wss://*.messenger.com:*;font-src *.messenger.com *.fb.com https://*.fbcdn.web knowledge: *.gstatic.com;img-src *.fbcdn.web https://*.fb.com cdninstagram.com *.cdninstagram.com *.tenor.co *.tenor.com *.giphy.com knowledge: *.fbsbx.com *.messenger.com messenger.com blob: android-webview-video-poster: *.xx.fbcdn.web https://messenger.com;media-src *.messenger.com *.fb.com https://*.fbcdn.web knowledge: *.fbsbx.com *.fbcdn.web *.cdninstagram.com https://*.giphy.com blob:;frame-src *.messenger.com *.fb.com https://*.fbcdn.web knowledge: *.fbsbx.com *.fbcdn.web *.cdninstagram.com blob: *.doubleclick.web;
report-to: {"max_age":86400,"endpoints":[{"url":"https://www.facebook.com/browser_reporting/?minimize=0"}],"group":"coep_report"}
x-fb-rlafr: 0
document-policy: force-load-at-top
cross-origin-resource-policy: same-origin
cross-origin-embedder-policy-report-only: require-corp;report-to="coep_report"
cross-origin-opener-policy: same-origin-allow-popups
pragma: no-cache
cache-control: personal, no-cache, no-store, must-revalidate
expires: Sat, 01 Jan 2000 00:00:00 GMT
x-content-type-options: nosniff
x-xss-protection: 0
x-frame-options: DENY
access-control-expose-headers: X-FB-Debug, X-Loader-Size
access-control-allow-methods: OPTIONS
access-control-allow-credentials: true
access-control-allow-origin: https://www.messenger.com
range: Origin
strict-transport-security: max-age=15552000; preload; includeSubDomains
content-type: textual content/html; charset="utf-8"
x-fb-debug: niLVdyLSHPRUwRyAzfXpDUEgukRqEdXXNZ1yK3fO1yjJG1z8FrzwfA1OMfo1QbiSxCnBZx72f1nk6HEXi44NDg==
content-length: 0
date: Solar, 21 Nov 2021 20:39:42 GMT
precedence: u=3,i
alt-svc: h3=":443"; ma=3600, h3-29=":443"; ma=3600

Nevertheless, cURL syntax is a bit of annoying, and I personally choose to
use HTTPie as an alternative. Happily there’s a good
device known as CurliPie that converts
cURL syntax to HTTPie. That offers us this:

% http -f https://www.messenger.com/login/password/ 
    Authority:www.messenger.com 
    Pragma:no-cache 
    Cache-Management:no-cache 
    Sec-Ch-Ua:'"Chromium";v="94", "Google Chrome";v="94", ";Not A Model";v="99"' 
    Sec-Ch-Ua-Cell:'?0' 
    Sec-Ch-Ua-Platform:Linux 
    Origin:https://www.messenger.com 
    Improve-Insecure-Requests:1 
    Dnt:1 
    Content material-Sort:utility/x-www-form-urlencoded 
    Consumer-Agent:'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36' 
    Settle for:'textual content/html, utility/xhtml+xml, utility/xml;q=0.9, picture/avif, picture/webp, picture/apng, */*;q=0.8, utility/signed-exchange;v=b3;q=0.9' 
    Sec-Fetch-Website:same-origin 
    Sec-Fetch-Mode:navigate 
    Sec-Fetch-Consumer:'?1' 
    Sec-Fetch-Dest:doc 
    Referer:https://www.messenger.com/ 
    Settle for-Language:'en-US, en;q=0.9' 
    Cookie:'wd=1010x980; dpr=2; datr=UqKaYf_W73hoTmwXhi8ZqzZ4' 
    jazoest=2913 
    lsd=AVrs5S09Cjw 
    initial_request_id=APeMI6-a6r5592s5ETA6Zr5 
    timezone=480 
    lgndim=eyJ3IjoxOTIwLCJoIjoxMDgwLCJhdyI6MTkyMCwiYWgiOjEwNTMsImMiOjI0fQ== 
    lgnrnd=114743_C4xH 
    lgnjs=n 
    [email protected] 
    cross=0aSPlneurgscxzpuEZb9 
    login=1 
    persistent=1

Discover how now the e-mail and password are separated out into totally different
arguments, as an alternative of crammed into an enormous lengthy string underneath
--data-raw.

Working the HTTPie command above reveals us the headers in a pleasant format
by default, together with (once more) the cookies:

HTTP/1.1 302 Discovered
Entry-Management-Permit-Credentials: true
Entry-Management-Permit-Strategies: OPTIONS
Entry-Management-Permit-Origin: https://www.messenger.com
Entry-Management-Expose-Headers: X-FB-Debug, X-Loader-Size
Alt-Svc: h3=":443"; ma=3600, h3-29=":443"; ma=3600
Cache-Management: personal, no-cache, no-store, must-revalidate
Connection: keep-alive
Content material-Size: 0
Content material-Sort: textual content/html; charset="utf-8"
Date: Solar, 21 Nov 2021 20:56:53 GMT
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Location: https://www.messenger.com/
Pragma: no-cache
Precedence: u=3,i
Set-Cookie: sb=lLKaYbQYjhQ1r-tdd337y6b6; expires=Tue, 21-Nov-2023 20:56:52 GMT; Max-Age=63072000; path=/; area=.messenger.com; safe; httponly; SameSite=None
Set-Cookie: c_user=100075402451059; expires=Mon, 21-Nov-2022 20:56:50 GMT; Max-Age=31535998; path=/; area=.messenger.com; safe; SameSite=None
Set-Cookie: xs=36percent3AxbGwJByz_Zpfagpercent3A2percent3A1637528212percent3A-1percent3A-1; expires=Mon, 21-Nov-2022 20:56:50 GMT; Max-Age=31535998; path=/; area=.messenger.com; safe; httponly; SameSite=None
Strict-Transport-Safety: max-age=15552000; preload; includeSubDomains
Fluctuate: Origin
X-Content material-Sort-Choices: nosniff
X-FB-Debug: vR9/wct/iva6TWZRO48tsnEYT1xrMyIErMwNH0P47uFA65WrEtUiMR38CY6p8NLdT2aIh1nXbSszogNuHE6Bng==
X-Body-Choices: DENY
X-XSS-Safety: 0
content-security-policy: default-src knowledge: blob: https://*.fbcdn.web https://*.fb.com *.fbsbx.com *.messenger.com;script-src *.fb.com *.fbcdn.web *.fb.web *.google-analytics.com *.google.com 127.0.0.1:* 'unsafe-inline' 'unsafe-eval' blob: knowledge: 'self' join.fb.web *.messenger.com;style-src knowledge: blob: 'unsafe-inline' *.fb.com *.fbcdn.web *.messenger.com;connect-src *.fb.com fb.com *.fbcdn.web *.fb.web wss://*.fb.com:* wss://*.whatsapp.com:* attachment.fbsbx.com ws://localhost:* blob: *.cdninstagram.com 'self' *.messenger.com wss://*.messenger.com www.messenger.com www.google-analytics.com wss://*.messenger.com:*;font-src *.messenger.com *.fb.com https://*.fbcdn.web knowledge: *.gstatic.com;img-src *.fbcdn.web https://*.fb.com cdninstagram.com *.cdninstagram.com *.tenor.co *.tenor.com *.giphy.com knowledge: *.fbsbx.com *.messenger.com messenger.com blob: android-webview-video-poster: *.xx.fbcdn.web https://messenger.com;media-src *.messenger.com *.fb.com https://*.fbcdn.web knowledge: *.fbsbx.com *.fbcdn.web *.cdninstagram.com https://*.giphy.com blob:;frame-src *.messenger.com *.fb.com https://*.fbcdn.web knowledge: *.fbsbx.com *.fbcdn.web *.cdninstagram.com blob: *.doubleclick.web;
content-security-policy-report-only: default-src https: knowledge: wss: blob: chrome-extension: 'unsafe-inline' 'unsafe-eval';block-all-mixed-content;report-uri https://www.fb.com/csp/reporting/?reduce=0;
cross-origin-embedder-policy-report-only: require-corp;report-to="coep_report"
cross-origin-opener-policy: same-origin-allow-popups
cross-origin-resource-policy: same-origin
document-policy: force-load-at-top
report-to: {"max_age":86400,"endpoints":[{"url":"https://www.facebook.com/browser_reporting/?minimize=0"}],"group":"coep_report"}
x-fb-rlafr: 0

Discover that the response says 302 Discovered, and features a Location: https://www.messenger.com. This instructs the browser (after setting
the related cookies) to redirect the person to
https://www.messenger.com, the place you now will see your
conversations. (Read more about HTTP response
codes
.)

Simplify the login request

That HTTP request has lots of parameters in it! Browsers ship loads
of headers by default, and web sites will often add on a bunch extra
for good measure, however often many of the headers (and even kind
parameters) are unneeded.

As soon as we’ve a working request in HTTPie, we will strip out parameters
separately to see which of them are literally required. For instance, if
we alter the password, it stops working:

% http -f https://www.messenger.com/login/password/ 
    Authority:www.messenger.com 
    Pragma:no-cache 
    Cache-Management:no-cache 
    Sec-Ch-Ua:'"Chromium";v="94", "Google Chrome";v="94", ";Not A Model";v="99"' 
    Sec-Ch-Ua-Cell:'?0' 
    Sec-Ch-Ua-Platform:Linux 
    Origin:https://www.messenger.com 
    Improve-Insecure-Requests:1 
    Dnt:1 
    Content material-Sort:utility/x-www-form-urlencoded 
    Consumer-Agent:'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36' 
    Settle for:'textual content/html, utility/xhtml+xml, utility/xml;q=0.9, picture/avif, picture/webp, picture/apng, */*;q=0.8, utility/signed-exchange;v=b3;q=0.9' 
    Sec-Fetch-Website:same-origin 
    Sec-Fetch-Mode:navigate 
    Sec-Fetch-Consumer:'?1' 
    Sec-Fetch-Dest:doc 
    Referer:https://www.messenger.com/ 
    Settle for-Language:'en-US, en;q=0.9' 
    Cookie:'wd=1010x980; dpr=2; datr=UqKaYf_W73hoTmwXhi8ZqzZ4' 
    jazoest=2913 
    lsd=AVrs5S09Cjw 
    initial_request_id=APeMI6-a6r5592s5ETA6Zr5 
    timezone=480 
    lgndim=eyJ3IjoxOTIwLCJoIjoxMDgwLCJhdyI6MTkyMCwiYWgiOjEwNTMsImMiOjI0fQ== 
    lgnrnd=114743_C4xH 
    lgnjs=n 
    [email protected] 
    cross=thisiswrong 
    login=1 
    persistent=1

HTTP/1.1 200 OK
Entry-Management-Permit-Credentials: true
Entry-Management-Permit-Strategies: OPTIONS
Entry-Management-Permit-Origin: https://www.messenger.com
Entry-Management-Expose-Headers: X-FB-Debug, X-Loader-Size
Alt-Svc: h3=":443"; ma=3600, h3-29=":443"; ma=3600
Cache-Management: personal, no-cache, no-store, must-revalidate
Connection: keep-alive
Content material-Encoding: gzip
Content material-Sort: textual content/html; charset="utf-8"
Date: Solar, 21 Nov 2021 22:33:31 GMT
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Pragma: no-cache
Precedence: u=3,i
Set-Cookie: sb=O8maYf6WN7nBh2pODWFo9bTa; expires=Tue, 21-Nov-2023 22:33:31 GMT; Max-Age=63072000; path=/; area=.messenger.com; safe; httponly; SameSite=None
Strict-Transport-Safety: max-age=15552000; preload; includeSubDomains
Switch-Encoding: chunked
Fluctuate: Origin
Fluctuate: Settle for-Encoding
X-Content material-Sort-Choices: nosniff
X-FB-Debug: 1JwQc9JQPTRqGrUzIfY/OED6es6VSkBWrdeTj8XQ3BOF6nbEtDsuPSsQ52lfVuYvS/8Xz1BNlQadiGzFgGzisQ==
X-Body-Choices: DENY
X-XSS-Safety: 0
content-security-policy: default-src knowledge: blob: https://*.fbcdn.web https://*.fb.com *.fbsbx.com *.messenger.com;script-src *.fb.com *.fbcdn.web *.fb.web *.google-analytics.com *.google.com 127.0.0.1:* 'unsafe-inline' 'unsafe-eval' blob: knowledge: 'self' join.fb.web *.messenger.com;style-src knowledge: blob: 'unsafe-inline' *.fb.com *.fbcdn.web *.messenger.com;connect-src *.fb.com fb.com *.fbcdn.web *.fb.web wss://*.fb.com:* wss://*.whatsapp.com:* attachment.fbsbx.com ws://localhost:* blob: *.cdninstagram.com 'self' *.messenger.com wss://*.messenger.com www.messenger.com www.google-analytics.com wss://*.messenger.com:*;font-src *.messenger.com *.fb.com https://*.fbcdn.web knowledge: *.gstatic.com;img-src *.fbcdn.web https://*.fb.com cdninstagram.com *.cdninstagram.com *.tenor.co *.tenor.com *.giphy.com knowledge: *.fbsbx.com *.messenger.com messenger.com blob: android-webview-video-poster: *.xx.fbcdn.web https://messenger.com;media-src *.messenger.com *.fb.com https://*.fbcdn.web knowledge: *.fbsbx.com *.fbcdn.web *.cdninstagram.com https://*.giphy.com blob:;frame-src *.messenger.com *.fb.com https://*.fbcdn.web knowledge: *.fbsbx.com *.fbcdn.web *.cdninstagram.com blob: *.doubleclick.web;
content-security-policy-report-only: default-src https: knowledge: wss: blob: chrome-extension: 'unsafe-inline' 'unsafe-eval';block-all-mixed-content;report-uri https://www.fb.com/csp/reporting/?reduce=0;
cross-origin-embedder-policy-report-only: require-corp;report-to="coep_report"
cross-origin-opener-policy: same-origin-allow-popups
cross-origin-resource-policy: same-origin
document-policy: force-load-at-top
report-to: {"max_age":86400,"endpoints":[{"url":"https://www.facebook.com/browser_reporting/?minimize=0"}],"group":"coep_report"}
x-fb-rlafr: 0

<!DOCTYPE html>
<html lang="en" id="fb" class="no_js">
... a bunch of HTML ...

Discover how we now get a 200 OK response as an alternative of 302 Discovered, and
the c_user and xs cookies aren’t getting set anymore, which suggests
our login try is failing. (We’d count on to see 401 Unauthorized or 403 Forbidden as an alternative of 200 OK for a failed
login try, however servers don’t all the time return essentially the most smart
standing codes.)

If we undergo eradicating every parameter that may be eliminated with out
shedding the cookies, then that is what we find yourself with:

% http -f https://www.messenger.com/login/password/ 
    Cookie:'datr=UqKaYf_W73hoTmwXhi8ZqzZ4' 
    lsd=AVrs5S09Cjw 
    initial_request_id=APeMI6-a6r5592s5ETA6Zr5 
    [email protected] 
    cross=0aSPlneurgscxzpuEZb9

Monitor down hidden login parameters

We’ve now simplified the login command loads, however what are these
datr, lsd, and initial_request_id values? Normally, when
seeing parameters like these in outgoing requests, there are three
prospects:

  1. The consumer obtained the worth from the server on a earlier request, and
    is simply sending it again.
  2. The consumer is producing the worth from scratch (e.g. based mostly on the
    present timestamp, or a random quantity generator).
  3. Some mixture of the 2 (the consumer will get a price from the
    server after which modifies it not directly earlier than sending it again).

A number of reverse engineering is making educated guesses and seeing if
they pan out. Let’s make a guess that case (1) is what’s occurring
right here. This appears particularly doubtless as a result of initial_request_id sounds
like it’s referring to a earlier request. There’s actually just one
vital request that occurs earlier than the login request, which is
the preliminary HTML web page containing the login kind. So, a pure place
to begin is to try that web page to see if it has something that
appears to be like related.

To take action, we will open a brand new personal searching window and cargo up
Messenger once more. Then, we will go to the Sources tab to see the HTML
that’s getting used to show the present web page. (You would possibly must
reload the web page in the event you loaded it earlier than opening the developer instruments.)

Source tab for login form

Happily, Chrome (as with different browsers) has a neat characteristic the place
they’ll reformat code within the Sources tab to be simpler to learn. (If
there isn’t a popup telling you about it, it’s the button labeled {}
within the lower-left of the textual content pane.)

Within the reformatted HTML, if we seek for initial_request_id, verify
out what we discover:

Search results for initialrequestid in login form
source

Not solely is there a suspicious-looking worth for initial_request_id
included within the HTML (AovnGK3QdvNgThxCxYXmSDz), however just a bit
above it’s a worth for lsd (AVpERbgkGxw)! If we refresh the web page,
we’ll discover that these values change each time. Because the values are
included as <enter> tags within the <kind> for the login button,
they’ll robotically get submitted together with the e-mail deal with and
password, identical to we noticed within the Community tab earlier.

Elsewhere within the web page is a price for datr
(a26cYYDoj0oHu9oca8jmB8W6):

Search results for datr in login form
source

If we need to verify these values are used the way in which we expect, we will
click on the login button and see that the values within the POST request
match up as anticipated:

Form data matching up with parameters extracted from
HTML

There’s a bit of gotcha that I bumped into the primary time I investigated
this. When you reload the web page earlier than logging in, mysteriously the
datr worth goes away from the HTML! Correspondingly, by switching to
the Utility tab and choosing https://www.messenger.com underneath
Cookies, you may see that there aren’t any cookies the primary time we load
the web page:

No cookies set before
reloading

However then after we reload… immediately, cookies!

Cookies set after
reloading

That is notably difficult since you wouldn’t count on the HTML
response to magically change simply by reloading the web page. If I needed to
guess, there’s some JavaScript on the frontend that detects when
you’re about to depart the web page (e.g. by reloading), and units the
datr cookie. Then, that cookie worth is robotically included in
the brand new request, which causes the server to change the HTML to no
longer embody a js_datr worth for some motive.

Parse the HTML response

Okay, now we all know find out how to log in to Messenger:

  1. Fetch the HTML web page at https://www.messenger.com
  2. Extract values for initial_request_id, lsd, and datr from
    numerous locations within the HTML
  3. Make a POST request to https://www.messenger.com/login/password/
    with these values alongside the e-mail deal with and password
  4. Extract the xs, sb, and c_user cookies from the response
    headers

Let’s lastly get again to Messyger and implement these steps. We’ll
use the Requests
library to simplify our HTTP requests.

Step 1 is pretty straightforward with Requests. We make a request, verify that
there was no error, after which get the HTML textual content:

import requests

html_resp = requests.get("https://www.messenger.com")
html_resp.raise_for_status()
html_page = html_resp.textual content

print(html_page)

For step 2, we’ll need to begin by wanting within the uncooked HTML that we
simply printed to see the place the values are positioned that we need to
extract. Let’s begin with initial_request_id:

<enter kind="hidden" autocomplete="off" id="initial_request_id" title="initial_request_id" worth="AS49ZKW_DYimevm1SD-qQ9Q" />

The half that we actually care about is
worth="AS49ZKW_DYimevm1SD-qQ9Q". Nevertheless, the textual content worth= reveals up
lots of instances within the HTML, so we additionally want the previous
title="initial_request_id" to make sure we’re wanting on the proper
worth.

Now that we all know what we’re searching for (i.e. one thing like
title="initial_request_id" worth="AS49ZKW_DYimevm1SD-qQ9Q"), we will
write a regular expression to
seek for this sample. That appears like this:

import re

initial_request_id = re.search(
    r'title="initial_request_id" worth="([^"]+)"',
    html_page
).group(1)

print(initial_request_id)

(Read more about the Python re
module
.)

On this common expression, [^"] stands for any character aside from
a double quote, + means a number of, and the parentheses create a
sub-expression whose worth will be returned by calling the .group()
technique.

The lsd parameter happens in HTML that appears like this:

<enter kind="hidden" title="lsd" worth="AVosEZyGrXU" autocomplete="off" />

We are able to write an identical common expression to extract its worth:

lsd = re.search(
    r'title="lsd" worth="([^"]+)"',
    html_page
).group(1)

print(lsd)

The datr parameter appears to be like a bit totally different:

["_js_datr","-nacYQnDcFLPM5Sc66w7KQKG",63072000000,"/",true]

Nevertheless, it’s not an excessive amount of harder to extract it:

datr = re.search(
    r'"_js_datr","([^"]+)"',
    html_page
).group(1)

print(datr)

Make the login request

Now that we’ve all the mandatory parameters, we will use the Requests
module to really carry out the login request to Messenger. As a
reminder, the login request utilizing HTTPie appears to be like like this:

% http -f https://www.messenger.com/login/password/ 
    Cookie:'datr=UqKaYf_W73hoTmwXhi8ZqzZ4' 
    lsd=AVrs5S09Cjw 
    initial_request_id=APeMI6-a6r5592s5ETA6Zr5 
    [email protected] 
    cross=0aSPlneurgscxzpuEZb9

We are able to replicate it in Python like so:

login = requests.put up(
    "https://www.messenger.com/login/password/",
    cookies={"datr": datr},
    knowledge={
        "lsd": lsd,
        "initial_request_id": initial_request_id,
        "e-mail": args.e-mail,
        "cross": args.password
    },
    allow_redirects=False  # don't observe 302
)
assert login.status_code == 302

print(login.cookies)

(The .cookies property in a Requests response comprises the values
that have been set by the Set-Cookie headers within the response.)

And right here’s what we get from operating the entire script to this point:

% python3 messyger.py -u [email protected] -p 0aSPlneurgscxzpuEZb9
{'c_user': '100075402451059', 'sb': 'yXmcYTct5V-EAvEuXfPrArCj', 'xs': '49percent3AZuqoxpqqnfwF_Apercent3A2percent3A1637644745percent3A-1percent3A-1'}

Success!

Discover the inbox request

Now that we’ve logged in, we must always have the ability to fetch knowledge about our
Messenger account. We’ll begin by attempting to get the data proven
within the left-hand aspect of the Messenger interface: your checklist of
conversations.

Requests after logging in

Since Messenger is a extremely interactive utility, it’s fairly
unlikely for it to ship that info encoded as uncooked HTML (though
we might verify by looking for a string like Hello Camilla within the HTML
response). Moderately, it’s extra doubtless that the data shall be
fetched asynchronously through JavaScript. For numerous historic causes,
that is known as an XHR request, the place XHR stands for XMLHttpRequest
regardless of having nothing to do with XML. (Read more about the different
types of asynchronous requests in
JavaScript
.)
We are able to filter for XHR requests within the Community tab:

XHR requests in Network
inspector

Nevertheless, there are a bunch of them, so it could be a ache to take a look at
every one and take a look at to determine if it has the info we’re searching for.
To take care of this downside, we will obtain all of the request and
response knowledge as an HTTP Archive (HAR), in order that we will search via
all of them on the identical time:

Right-click menu for "Save all as HAR with
content"

The HAR format is definitely simply JSON, so we will use a device like
jq to look via it. First we’ll
verify the checklist of all of the URLs that the browser made requests to:

% cat inbox-requests.har | jq '.log.entries | map(.request.url)'
[
  "https://www.messenger.com/login/password/",
  "https://www.messenger.com/",
  "https://www.messenger.com/t/100007424414992/",
  ... a bunch more URLs ...
  "https://www.messenger.com/ajax/bnzai?__a=1&__ccg=EXCELLENT&__comet_req=1&__hs=18957.HYP%3Amessengerdotcom_comet_pkg.2.1.0.0.&__hsi=0-0&__jssesw=1&__req=g&__rev=1004771992&__s=lryd0q%3Amohw6t%3Axnxaps&__spin_b=trunk&__spin_r=1004771992&__spin_t=1637886944&__user=100075402451059&dpr=2&fb_dtsg=AQE4bjFlv-4P3Xs%3A50%3A1637886942&jazoest=21949&lsd=M237eS5ouvAFHBqYl3StT7&ph=C3",
  "https://www.messenger.com/ajax/webstorage/process_keys/?state=0",
  "https://www.messenger.com/ajax/webstorage/process_keys/?state=0"
]

(Read more about how to use
jq
.)

The jq command above is equal to the next Python code, and
you may do it this manner too, if it feels extra comfy:

import json
with open("inbox-requests.har") as f:
    requests = json.load(f)

urls = []
for entry in requests["log"]["entries"]:
    urls.append(entry["request"]["url"])

print(json.dumps(urls, indent=2))

(Read more about the Python json
module
.)

Nevertheless, I like jq as a result of when you discover ways to use it, it’s loads
quicker than writing Python or looking via JSON by hand.

Now, relatively than printing each URL, let’s print solely those whose
responses contained the string Hello Camilla. This could permit us to
determine which of them have the info that reveals up within the sidebar:

% cat inbox-requests.har 
    | jq '.log.entries
            | map(choose(.response.content material.textual content | .?
                           | comprises("Hello Camilla"))
                    | .request.url)'
[
  "https://www.messenger.com/api/graphql/"
]

Or equivalently:

urls = []
for entry in requests["log"]["entries"]:
    strive:
        if "Hello Camilla" in entry["response"]["content"]["text"]:
            urls.append(entry["request"]["url"])
    besides KeyError:  # ignoring lacking keys is ".?" in jq
        proceed

print(json.dumps(urls, indent=2))

Nice! There’s a request to https://www.messenger.com/api/graphql/
that returns the info we would like. Sadly, the HAR obtain doesn’t
have a handy “Copy as cURL” choice, so we’ll return to the
browser to do this.

The principle difficult bit is there are literally a bunch of various
requests to this identical endpoint, and we have to know which one to repeat.
I’m certain there are many sensible methods to get round this, however I simply
did it the simple manner by printing out the requests that got here
earlier than and after it, so I might match them up visually.

% cat inbox-requests.har 
    | jq '.log.entries
            | map( .?
                                  )'
... bunch of requests ...
  {
    "url": "https://static.xx.fbcdn.web/rsrc.php/v3/yy/r/DeeNYB34aTG.js?_nc_x=0OMkmbJTxss",
    "hasMyData": false
  },
  {
    "url": "https://static.xx.fbcdn.web/rsrc.php/ym/r/YQbyhl59TWY.ico",
    "hasMyData": false
  },
  {
    "url": "https://www.messenger.com/api/graphql/",
    "hasMyData": true
  },
  {
    "url": "https://www.messenger.com/api/graphql/",
    "hasMyData": false
  },
  {
    "url": "https://www.messenger.com/ajax/bootloader-endpoint/?modules=TransportSelectingClientSingletonpercent2CRequestStreamCommonRequestStreamCommonTypes&__user=100075402451059&__a=1&__dyn=7AzHJ16U9ob8ng569yaxG4VuC0BVU98nwgU7SbGbwSwAyUcoeU5W2Sawba1DwUx60GE3Qwb-q7oc81xoswMwto886C1nzUO0n2US2G3i0Boy1PwBgK7o6C0Mo5W3S1lwlE-Uqw8y4UaEW0D8qBwJK5Umxm5o7GmdUlwhEe88o5i7-2K0_UbpEbUGdG0HE5d0&__csr=gacABdkJnqAlZjhsGiaCOR5PrKBrfh7KJd9qzbl5iKQJlQqQ_K8HBl6HJCzayXDyqiBHw75w5Iw3M40ju0578b81v81DFFQ9Ew0z-0MUeo4O0w9E1589ro3ew5TyU-3Sq0FFEymS2B0to2Lw1c2bw2t85W0B80jfw3ZU0wa0mq0vO0hi08uw8Grw3WE0we0hG054o4Yw4qh4xKex11WE2SWw3Eo0Pi0Yk0v-0WU0BG0fIGiq0mp1Slock2uey9d9wAwl8O19gtwhUx0Dwywj8W3-7Gjzp87Op2r80OpXz8qwhoC22l4xu448U4-4Uuwd279FU6-1owe62qywg8S1ew3dU4a5U3Awaa16xS0CQ9xO5t1O7XgoxSuE622e0F83ww6-wNwposw7uwae1oy9cBe0lu1zAG2e6o7S1gwWwZwn8aUoxeimFrxR6w5GwNyo0xO8w6zxu1awj8kw45wXw4im0wU2sxi3ulrw9-U1HUWuq514R0bi0ru581UA2m0d5woE2nwrpE1qAby4-iAldi2qm0hS16ge85G3K2q0nml06Vw8p02Pu2S0t20boway&__req=3&__hs=18957.HYPpercent3Amessengerdotcom_comet_pkg.2.1.0.0.&dpr=2&__ccg=EXCELLENT&__rev=1004771992&__s=lryd0qpercent3Amohw6tpercent3Axnxaps&__hsi=0-0&__comet_req=1&fb_dtsg_ag=AQyUiaFmn4dTZRTOD6xvQejSbiWCOkwF_hArsm6-mgIByBR7percent3A50percent3A1637886942&jazoest=25004&__spin_r=1004771992&__spin_b=trunk&__spin_t=1637886944&__jssesw=1",
    "hasMyData": false
  },
... bunch of requests ...

Then it was only a matter of discovering the corresponding place within the
browser Community tab (observe that the Community tab solely reveals the final bit
of every URL, after the final or second-to-last slash):

Copying the login request as
cURL

That offers us this:

% curl 'https://www.messenger.com/api/graphql/' 
    -H 'authority: www.messenger.com' 
    -H 'pragma: no-cache' 
    -H 'cache-control: no-cache' 
    -H 'sec-ch-ua: "Chromium";v="94", "Google Chrome";v="94", ";Not A Model";v="99"' 
    -H 'dnt: 1' 
    -H 'sec-ch-ua-mobile: ?0' 
    -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36' 
    -H 'x-fb-friendly-name: LSPlatformGraphQLLightspeedRequestQuery' 
    -H 'x-fb-lsd: M237eS5ouvAFHBqYl3StT7' 
    -H 'content-type: utility/x-www-form-urlencoded' 
    -H 'sec-ch-ua-platform: "Linux"' 
    -H 'settle for: */*' 
    -H 'origin: https://www.messenger.com' 
    -H 'sec-fetch-site: same-origin' 
    -H 'sec-fetch-mode: cors' 
    -H 'sec-fetch-dest: empty' 
    -H 'referer: https://www.messenger.com/t/100007424414992/' 
    -H 'accept-language: en-US,en;q=0.9' 
    -H 'cookie: wd=1074x980; dpr=2; datr=0SugYWovp6j2RMqGVQqOqQwr; sb=3iugYaVtLi-qyDF0VndcCAKs; c_user=100075402451059; xs=50percent3Aq86l0PoxUG0qewpercent3A2percent3A1637886942percent3A-1percent3A-1' 
    --data-raw 'av=100075402451059&__user=100075402451059&__a=1&__dyn=7AzHJ16U9ob8ng569yaxG4VuC0BVU98nwgU7SbGbwSwAyUcoeU5W2Sawba1DwUx60GE3Qwb-q7oc81xoswMwto886C1nzUO0n2US2G3i0Boy1PwBgK7o6C0Mo5W3S1lwlE-Uqw8y4UaEW0D8qBwJK5Umxm5o7GmdUlwhEe88o5i7-2K0_UbpEbUGdG0HE5d0&__csr=gacABdkJnqAlZjhsGiaCOR5PrKBrfh7KJd9qzbl5iKQJlQqQ_K8HBl6HJCzayXDyqiBHw75w5Iw3M40ju0578b81v81DFFQ9Ew0z-0MUeo4O0w9E1589ro3ew5TyU-3Sq0FFEymS2B0to2Lw1c2bw2t85W0B80jfw3ZU0wa0mq0vO0hi08uw8Grw3WE0we0hG054o4Yw4qh4xKex11WE2SWw3Eo0Pi0Yk0v-0WU0BG0fIGiq0mp1Slock2uey9d9wAwl8O19gtwhUx0Dwywj8W3-7Gjzp87Op2r80OpXz8qwhoC22l4xu448U4-4Uuwd279FU6-1owe62qywg8S1ew3dU4a5U3Awaa16xS0CQ9xO5t1O7XgoxSuE622e0F83ww6-wNwposw7uwae1oy9cBe0lu1zAG2e6o7S1gwWwZwn8aUoxeimFrxR6w5GwNyo0xO8w6zxu1awj8kw45wXw4im0wU2sxi3ulrw9-U1HUWuq514R0bi0ru581UA2m0d5woE2nwrpE1qAby4-iAldi2qm0hS16ge85G3K2q0nml06Vw8p02Pu2S0t20boway&__req=1&__hs=18957.HYPpercent3Amessengerdotcom_comet_pkg.2.1.0.0.&dpr=2&__ccg=EXCELLENT&__rev=1004771992&__s=lryd0qpercent3Amohw6tpercent3Axnxaps&__hsi=0-0&__comet_req=1&fb_dtsg=AQE4bjFlv-4P3Xspercent3A50percent3A1637886942&jazoest=21949&lsd=M237eS5ouvAFHBqYl3StT7&__spin_r=1004771992&__spin_b=trunk&__spin_t=1637886944&__jssesw=1&fb_api_caller_class=RelayModern&fb_api_req_friendly_name=LSPlatformGraphQLLightspeedRequestQuery&variables=%7Bpercent22deviceIdpercent22percent3Apercent226a9252cb-2145-4f81-9d69-1834b84ba614percent22percent2Cpercent22requestIdpercent22percent3A0percent2Cpercent22requestPayloadpercent22percent3Apercent22percent7Bpercent5Cpercent22databasepercent5Cpercent22percent3A1percent2Cpercent5Cpercent22versionpercent5Cpercent22percent3A4680497022042598percent2Cpercent5Cpercent22sync_paramspercent5Cpercent22percent3Apercent5Cpercent22percent7Bpercent5Cpercent5Cpercent5Cpercent22scalepercent5Cpercent5Cpercent5Cpercent22percent3A1percent2Cpercent5Cpercent5Cpercent5Cpercent22preview_heightpercent5Cpercent5Cpercent5Cpercent22percent3A200percent2Cpercent5Cpercent5Cpercent5Cpercent22preview_widthpercent5Cpercent5Cpercent5Cpercent22percent3A150percent2Cpercent5Cpercent5Cpercent5Cpercent22preview_height_largepercent5Cpercent5Cpercent5Cpercent22percent3A400percent2Cpercent5Cpercent5Cpercent5Cpercent22preview_width_largepercent5Cpercent5Cpercent5Cpercent22percent3A300percent2Cpercent5Cpercent5Cpercent5Cpercent22full_heightpercent5Cpercent5Cpercent5Cpercent22percent3A200percent2Cpercent5Cpercent5Cpercent5Cpercent22snapshot_num_threads_per_pagepercent5Cpercent5Cpercent5Cpercent22percent3A15percent2Cpercent5Cpercent5Cpercent5Cpercent22localepercent5Cpercent5Cpercent5Cpercent22percent3Apercent5Cpercent5Cpercent5Cpercent22en_USpercent5Cpercent5Cpercent5Cpercent22percent7Dpercent5Cpercent22percent2Cpercent5Cpercent22epoch_idpercent5Cpercent22percent3A0percent2Cpercent5Cpercent22last_applied_cursorpercent5Cpercent22percent3Anullpercent7Dpercent22percent2Cpercent22requestTypepercent22percent3A1percent7D&server_timestamps=true&doc_id=4476599072415612' 
    --compressed

If we run that cURL command and search within the output (for instance, by
placing | grep -o "Hello Camilla" on the top; read more about grep
options
), we will see that the
response does certainly have the info we’re searching for.

Make the inbox request extra readable

Though the cURL command works, that --data-raw parameter is
completely disgusting, so let’s convert it to HTTPie syntax utilizing
CurliPie:

% http -f https://www.messenger.com/api/graphql/ 
    Authority:www.messenger.com 
    Pragma:no-cache 
    Cache-Management:no-cache 
    Sec-Ch-Ua:'"Chromium";v="94", "Google Chrome";v="94", ";Not A Model";v="99"' 
    Dnt:1 
    Sec-Ch-Ua-Cell:'?0' 
    Consumer-Agent:'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36' 
    X-Fb-Pleasant-Identify:LSPlatformGraphQLLightspeedRequestQuery 
    X-Fb-Lsd:M237eS5ouvAFHBqYl3StT7 
    Content material-Sort:utility/x-www-form-urlencoded 
    Sec-Ch-Ua-Platform:Linux 
    Settle for:'*/*' 
    Origin:https://www.messenger.com 
    Sec-Fetch-Website:same-origin 
    Sec-Fetch-Mode:cors 
    Sec-Fetch-Dest:empty 
    Referer:https://www.messenger.com/t/100007424414992/ 
    Settle for-Language:'en-US, en;q=0.9' 
    Cookie:'wd=1074x980; dpr=2; datr=0SugYWovp6j2RMqGVQqOqQwr; sb=3iugYaVtLi-qyDF0VndcCAKs; c_user=100075402451059; xs=50percent3Aq86l0PoxUG0qewpercent3A2percent3A1637886942percent3A-1percent3A-1' 
    av=100075402451059 
    __user=100075402451059 
    __a=1 
    __dyn=7AzHJ16U9ob8ng569yaxG4VuC0BVU98nwgU7SbGbwSwAyUcoeU5W2Sawba1DwUx60GE3Qwb-q7oc81xoswMwto886C1nzUO0n2US2G3i0Boy1PwBgK7o6C0Mo5W3S1lwlE-Uqw8y4UaEW0D8qBwJK5Umxm5o7GmdUlwhEe88o5i7-2K0_UbpEbUGdG0HE5d0 
    __csr=gacABdkJnqAlZjhsGiaCOR5PrKBrfh7KJd9qzbl5iKQJlQqQ_K8HBl6HJCzayXDyqiBHw75w5Iw3M40ju0578b81v81DFFQ9Ew0z-0MUeo4O0w9E1589ro3ew5TyU-3Sq0FFEymS2B0to2Lw1c2bw2t85W0B80jfw3ZU0wa0mq0vO0hi08uw8Grw3WE0we0hG054o4Yw4qh4xKex11WE2SWw3Eo0Pi0Yk0v-0WU0BG0fIGiq0mp1Slock2uey9d9wAwl8O19gtwhUx0Dwywj8W3-7Gjzp87Op2r80OpXz8qwhoC22l4xu448U4-4Uuwd279FU6-1owe62qywg8S1ew3dU4a5U3Awaa16xS0CQ9xO5t1O7XgoxSuE622e0F83ww6-wNwposw7uwae1oy9cBe0lu1zAG2e6o7S1gwWwZwn8aUoxeimFrxR6w5GwNyo0xO8w6zxu1awj8kw45wXw4im0wU2sxi3ulrw9-U1HUWuq514R0bi0ru581UA2m0d5woE2nwrpE1qAby4-iAldi2qm0hS16ge85G3K2q0nml06Vw8p02Pu2S0t20boway 
    __req=1 
    __hs=18957.HYP:messengerdotcom_comet_pkg.2.1.0.0. 
    dpr=2 
    __ccg=EXCELLENT 
    __rev=1004771992 
    __s=lryd0q:mohw6t:xnxaps 
    __hsi=0-0 
    __comet_req=1 
    fb_dtsg=AQE4bjFlv-4P3Xs:50:1637886942 
    jazoest=21949 
    lsd=M237eS5ouvAFHBqYl3StT7 
    __spin_r=1004771992 
    __spin_b=trunk 
    __spin_t=1637886944 
    __jssesw=1 
    fb_api_caller_class=RelayModern 
    fb_api_req_friendly_name=LSPlatformGraphQLLightspeedRequestQuery 
    variables="{"deviceId":"6a9252cb-2145-4f81-9d69-1834b84ba614","requestId":0,"requestPayload":"{"database":1,"model":4680497022042598,"sync_params":"{"scale":1,"preview_height":200,"preview_width":150,"preview_height_large":400,"preview_width_large":300,"full_height":200,"snapshot_num_threads_per_page":15,"locale":"en_US"}","epoch_id":0,"last_applied_cursor":null}","requestType":1}" 
    server_timestamps=true 
    doc_id=4476599072415612

Now if we run this command, we’ll really see that it produces a
totally different consequence than the cURL one, which shouldn’t occur! Particularly:

{
    "knowledge": {
        "viewer": {
            "lightspeed_web_request": null
        }
    },
    "errors": [
        {
            "message": "A server error missing_required_variable_value occured. Check server logs for details.",
            "severity": "WARNING"
        },
... a bunch more scary text ...

Well, we can’t check the server logs, but presumably something about
the conversion from cURL to HTTPie messed up the request. How can we
debug things when something like this happens?

Well, one way is to use a service like httpbin
to check what requests your tools are actually sending out to the
internet. Here’s an example of sending ostensibly the same request to
httpbin using cURL and HTTPie, and seeing that the two requests were
actually not quite identical (e.g., the User-Agent header was
different):

% curl https://httpbin.org/post 
       -H 'example-header: foobar' 
       --data-raw 'param1=baz&param2=quux'
{
  "args": {},
  "data": "",
  "files": {},
  "form": {
    "param1": "baz",
    "param2": "quux"
  },
  "headers": {
    "Accept": "*/*",
    "Content-Length": "22",
    "Content-Type": "application/x-www-form-urlencoded",
    "Example-Header": "foobar",
    "Host": "httpbin.org",
    "User-Agent": "curl/7.74.0",
    "X-Amzn-Trace-Id": "Root=1-61a036e4-21a4901c0f5a2bc9091cab5a"
  },
  "json": null,
  "origin": "67.180.179.80",
  "url": "https://httpbin.org/post"
}

% http -f https://httpbin.org/post 
    example-header:foobar 
    param1=baz 
    param2=quux
{
    "args": {},
    "data": "",
    "files": {},
    "form": {
        "param1": "baz",
        "param2": "quux"
    },
    "headers": {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate",
        "Content-Length": "22",
        "Content-Type": "application/x-www-form-urlencoded; charset=utf-8",
        "Example-Header": "foobar",
        "Host": "httpbin.org",
        "User-Agent": "HTTPie/2.2.0",
        "X-Amzn-Trace-Id": "Root=1-61a03726-478a4c12219627695c0d62e9"
    },
    "json": null,
    "origin": "67.180.179.80",
    "url": "https://httpbin.org/post"
}

By replacing https://www.messenger.com/api/graphql/ with
https://httpbin.org/post in our cURL and HTTPie commands above, then
carefully comparing the output (maybe with the aid of a command like
git diff --no-index to highlight differences between two files;
read more about git diff --no-index), we can find out
that HTTPie is doing something peculiar to the backslashes in the
variables= argument. Here’s a simpler example to show the behavior:

% curl -s https://httpbin.org/post 
       --data-urlencode backslashes="\" 
    | jq .form.backslashes -r
\

% http -f https://httpbin.org/post 
          backslashes="\" 
    | jq .form.backslashes -r

(Using -r tells jq to print the value as a raw string, instead of as
a JSON string with quotes. Using --data-urlencode instead of
--data-raw means we don’t have to worry about URL
encoding
ourselves.
Using -s prevents cURL from printing out a progress bar.)

With cURL, it’s four backslashes in, four backslashes out. But with
HTTPie, it’s four backslashes in, only two backslashes out! Why?
Well, if we Google, we end up finding this GitHub
issue

that mentions HTTPie allows the use of backslash-escaping in form
parameters. This feature has the implication that if you actually
want to include a backslash in your form parameters, you need to
double it (use two backslashes instead of one). Indeed:

% http -f https://httpbin.org/post 
          backslashes="\\" 
    | jq .form.backslashes -r
\

So, if we double every backslash in the request, we end up with a
working HTTPie command line to fetch our inbox data:

% http -f https://www.messenger.com/api/graphql/ 
    Authority:www.messenger.com 
    Pragma:no-cache 
    Cache-Control:no-cache 
    Sec-Ch-Ua:'"Chromium";v="94", "Google Chrome";v="94", ";Not A Brand";v="99"' 
    Dnt:1 
    Sec-Ch-Ua-Mobile:'?0' 
    User-Agent:'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36' 
    X-Fb-Friendly-Name:LSPlatformGraphQLLightspeedRequestQuery 
    X-Fb-Lsd:M237eS5ouvAFHBqYl3StT7 
    Content-Type:application/x-www-form-urlencoded 
    Sec-Ch-Ua-Platform:Linux 
    Accept:'*/*' 
    Origin:https://www.messenger.com 
    Sec-Fetch-Site:same-origin 
    Sec-Fetch-Mode:cors 
    Sec-Fetch-Dest:empty 
    Referer:https://www.messenger.com/t/100007424414992/ 
    Accept-Language:'en-US, en;q=0.9' 
    Cookie:'wd=1074x980; dpr=2; datr=0SugYWovp6j2RMqGVQqOqQwr; sb=3iugYaVtLi-qyDF0VndcCAKs; c_user=100075402451059; xs=50%3Aq86l0PoxUG0qew%3A2%3A1637886942%3A-1%3A-1' 
    av=100075402451059 
    __user=100075402451059 
    __a=1 
    __dyn=7AzHJ16U9ob8ng569yaxG4VuC0BVU98nwgU7SbGbwSwAyUcoeU5W2Sawba1DwUx60GE3Qwb-q7oc81xoswMwto886C1nzUO0n2US2G3i0Boy1PwBgK7o6C0Mo5W3S1lwlE-Uqw8y4UaEW0D8qBwJK5Umxm5o7GmdUlwhEe88o5i7-2K0_UbpEbUGdG0HE5d0 
    __csr=gacABdkJnqAlZjhsGiaCOR5PrKBrfh7KJd9qzbl5iKQJlQqQ_K8HBl6HJCzayXDyqiBHw75w5Iw3M40ju0578b81v81DFFQ9Ew0z-0MUeo4O0w9E1589ro3ew5TyU-3Sq0FFEymS2B0to2Lw1c2bw2t85W0B80jfw3ZU0wa0mq0vO0hi08uw8Grw3WE0we0hG054o4Yw4qh4xKex11WE2SWw3Eo0Pi0Yk0v-0WU0BG0fIGiq0mp1Slock2uey9d9wAwl8O19gtwhUx0Dwywj8W3-7Gjzp87Op2r80OpXz8qwhoC22l4xu448U4-4Uuwd279FU6-1owe62qywg8S1ew3dU4a5U3Awaa16xS0CQ9xO5t1O7XgoxSuE622e0F83ww6-wNwposw7uwae1oy9cBe0lu1zAG2e6o7S1gwWwZwn8aUoxeimFrxR6w5GwNyo0xO8w6zxu1awj8kw45wXw4im0wU2sxi3ulrw9-U1HUWuq514R0bi0ru581UA2m0d5woE2nwrpE1qAby4-iAldi2qm0hS16ge85G3K2q0nml06Vw8p02Pu2S0t20boway 
    __req=1 
    __hs=18957.HYP:messengerdotcom_comet_pkg.2.1.0.0. 
    dpr=2 
    __ccg=EXCELLENT 
    __rev=1004771992 
    __s=lryd0q:mohw6t:xnxaps 
    __hsi=0-0 
    __comet_req=1 
    fb_dtsg=AQE4bjFlv-4P3Xs:50:1637886942 
    jazoest=21949 
    lsd=M237eS5ouvAFHBqYl3StT7 
    __spin_r=1004771992 
    __spin_b=trunk 
    __spin_t=1637886944 
    __jssesw=1 
    fb_api_caller_class=RelayModern 
    fb_api_req_friendly_name=LSPlatformGraphQLLightspeedRequestQuery 
    variables="{"deviceId":"6a9252cb-2145-4f81-9d69-1834b84ba614","requestId":0,"requestPayload":"{"database":1,"version":4680497022042598,"sync_params":"{\"scale\":1,\"preview_height\":200,\"preview_width\":150,\"preview_height_large\":400,\"preview_width_large\":300,\"full_height\":200,\"snapshot_num_threads_per_page\":15,\"locale\":\"en_US\"}","epoch_id":0,"last_applied_cursor":null}","requestType":1}" 
    server_timestamps=true 
    doc_id=4476599072415612

And with this in hand, we can pare out unneeded data and parameters to
arrive at the following minimal request that gets us what we want:

% http -f https://www.messenger.com/api/graphql/ 
    Cookie:'c_user=100075402451059; xs=50%3Aq86l0PoxUG0qew%3A2%3A1637886942%3A-1%3A-1' 
    fb_dtsg=AQE4bjFlv-4P3Xs:50:1637886942 
    doc_id=4476599072415612 
    variables="{"deviceId":"6a9252cb-2145-4f81-9d69-1834b84ba614","requestId":0,"requestPayload":"{"database":1,"version":4680497022042598,"sync_params":"{}"}","requestType":1}"

Lovely!

Find hidden inbox query parameters

So now we have the request we need to make. However, once again we
have a bunch of parameters whose values seem inscrutable:

  • fb_dtsg (AQE4bjFlv-4P3Xs:50:1637886942)
  • doc_id (4476599072415612)
  • deviceId (6a9252cb-2145-4f81-9d69-1834b84ba614)
  • version (4680497022042598)

We’ll start, arbitrarily, with fb_dtsg. Since we have the whole HAR
in one place, we know we don’t have to worry about the value of the
parameter changing out from under us when we reload the page. So, we
can just search directly for the value of the parameter:

% cat inbox-requests.har 
    | jq '.log.entries
            | map(select(.response.content.text | .?
                           | contains("AQE4bjFlv-4P3Xs:50:1637886942"))
                    | .request.url)'
[
  "https://www.messenger.com/t/100007424414992/"
]

Apparently, as soon as once more we get parameter values without cost simply by
wanting on the preliminary HTML response! Let’s extract that response from
the HAR and run it via Prettier to make
the HTML extra readable:

% cat inbox-requests.har 
    | jq '.log.entries
            | map(choose(.response.content material.textual content | .?
                           | comprises("AQE4bjFlv-4P3Xs:50:1637886942")))
            | .[0].response.content material.textual content' -r 
    > inbox.html
% prettier inbox.html > inbox-pretty.html

Looking out via inbox-pretty.html with our favourite textual content editor,
look what we discover for fb_dtsg:

                [
                  "DTSGInitialData",
                  [],
                  { token: "AQE4bjFlv-4P3Xs:50:1637886942" },
                  258,
                ],

And whereas we’re at it, why not seek for the opposite values too? Turns
on the market are two of them simply mendacity round (deviceId and
model):

                  {
                    syncScripts: [],
                    deviceId: "6a9252cb-2145-4f81-9d69-1834b84ba614",
                    schemaVersion: "4680497022042598",
                    schemaVersionV2: null,
                    accountKey: "",
                  },

That knocks out three parameter values, leaving solely doc_id. Since
its worth, 4476599072415612, tragically doesn’t present up within the HTML,
let’s return to the HAR:

% cat inbox-requests.har 
    | jq '.log.entries
            | map(choose(.response.content material.textual content | .?
                           | comprises("4476599072415612"))
                    | .request.url)'
[
  "https://static.xx.fbcdn.net/rsrc.php/v3iRYC4/y8/l/en_US/d0FzJm8Jr_2GGVI9daGiZfL5MEKqrgHqVIWF0joMj2QgTM4YRSBq4b1LW25pZd7TC-AjrHzCyljakIj8QgziINDKiIZu6XPLjKZJ_v74SDWO8lfwQznT2vHDG_5hUHYYOcBO0v0LGrADQfsxLTta97k6SMq0QFmd6lLAcfPeULHpocMm0pQ6ZiqCb9aFMEaLXT3_o_DtviHOB3GX1Isgz-QZRkiA16JwTjqxIM9tg2HGk3jOqpo4M8-E4se5OLtvP_50qxVk.js?_nc_x=0OMkmbJTxss"
]

Apparently, it’s in a kind of random, inscrutably-named JavaScript
information. Let’s test it out:

% cat inbox-requests.har 
    | jq '.log.entries
            | map(choose(.response.content material.textual content | .?
                           | comprises("4476599072415612")))
            | .[0].response.content material.textual content' -r 
    > docid.js
% prettier docid.js > docid-pretty.js

Aha, right here it’s, seeming to be a part of the definition of one thing
known as LSPlatformGraphQLLightspeedRequestQuery:

        params: {
          id: "4476599072415612",
          metadata: {},
          title: "LSPlatformGraphQLLightspeedRequestQuery",
          operationKind: "question",
          textual content: null,
        },

However we additionally must know find out how to decide this specific script out of the
many which can be loaded as a part of the web page. Let’s check out the place
this script is referenced within the HTML web page. It appears to be like like this:

    <script
      src="https://static.xx.fbcdn.web/rsrc.php/v3/y1/r/HRDukpAcyqY.js?_nc_x=0OMkmbJTxss"
      data-bootloader-hash="LWwZBAL"
      async="1"
      crossorigin="nameless"
      data-p=":1"
      data-c="1"
      onload='_btldr["LWwZBAL"]=1'
      onerror="_btldr["LWwZBAL"]=1"
      nonce="nW7qla6Q"
    ></script>
    <hyperlink
      rel="preload"
      href="https://static.xx.fbcdn.web/rsrc.php/v3iRYC4/y8/l/en_US/d0FzJm8Jr_2GGVI9daGiZfL5MEKqrgHqVIWF0joMj2QgTM4YRSBq4b1LW25pZd7TC-AjrHzCyljakIj8QgziINDKiIZu6XPLjKZJ_v74SDWO8lfwQznT2vHDG_5hUHYYOcBO0v0LGrADQfsxLTta97k6SMq0QFmd6lLAcfPeULHpocMm0pQ6ZiqCb9aFMEaLXT3_o_DtviHOB3GX1Isgz-QZRkiA16JwTjqxIM9tg2HGk3jOqpo4M8-E4se5OLtvP_50qxVk.js?_nc_x=0OMkmbJTxss"
      as="script"
      crossorigin="nameless"
      nonce="nW7qla6Q"
    />
    <script
      src="https://static.xx.fbcdn.web/rsrc.php/v3iRYC4/y8/l/en_US/d0FzJm8Jr_2GGVI9daGiZfL5MEKqrgHqVIWF0joMj2QgTM4YRSBq4b1LW25pZd7TC-AjrHzCyljakIj8QgziINDKiIZu6XPLjKZJ_v74SDWO8lfwQznT2vHDG_5hUHYYOcBO0v0LGrADQfsxLTta97k6SMq0QFmd6lLAcfPeULHpocMm0pQ6ZiqCb9aFMEaLXT3_o_DtviHOB3GX1Isgz-QZRkiA16JwTjqxIM9tg2HGk3jOqpo4M8-E4se5OLtvP_50qxVk.js?_nc_x=0OMkmbJTxss"
      data-bootloader-hash="Ll9z/Tq"
      async="1"
      crossorigin="nameless"
      data-p=":8,11,62,45,43,13,9,65,41,56,27,21,19,59,31,35,25,33,61,37,39,29,4,20"
      data-c="1"
      onload='_btldr["Ll9z/Tq"]=1'
      onerror="_btldr["Ll9z/Tq"]=1"
      nonce="nW7qla6Q"
    ></script>
    <hyperlink
      rel="preload"
      href="https://static.xx.fbcdn.web/rsrc.php/v3i7Mx4/ya/l/en_US/xovCaG3pRYkBA-zL6eV7sA_kvzFwGGA0iA_Y451ymmdRYKi3JNddo5uNTBXcpfVOaa67G_e6QNkiMZSz44Fa4IsHe1jJJzzjA8TXc-buPNEADH6ljxd0XkWPBzDnUKHZdnKhP0dmjvw4e8cHQ-wygr9Dbce6gKQUJ7j-7IAZomrkiSS24Cf0iRHhPpBbS9Mb_8VDsyKoohBoYL6MVOPJYEcxdVAvTH8o9Vk041xqiSJOXCzZm0UD5J6h5tubWo2SOY3BEhtTcw9z37VDEGPcTjrwwtNcGZfLak_UlVUiSZsFRYeVECK6mZFE3Bk1R6vYlIoidhMWzP.js?_nc_x=0OMkmbJTxss"
      as="script"
      crossorigin="nameless"
      nonce="nW7qla6Q"
    />
    <script
      src="https://static.xx.fbcdn.web/rsrc.php/v3i7Mx4/ya/l/en_US/xovCaG3pRYkBA-zL6eV7sA_kvzFwGGA0iA_Y451ymmdRYKi3JNddo5uNTBXcpfVOaa67G_e6QNkiMZSz44Fa4IsHe1jJJzzjA8TXc-buPNEADH6ljxd0XkWPBzDnUKHZdnKhP0dmjvw4e8cHQ-wygr9Dbce6gKQUJ7j-7IAZomrkiSS24Cf0iRHhPpBbS9Mb_8VDsyKoohBoYL6MVOPJYEcxdVAvTH8o9Vk041xqiSJOXCzZm0UD5J6h5tubWo2SOY3BEhtTcw9z37VDEGPcTjrwwtNcGZfLak_UlVUiSZsFRYeVECK6mZFE3Bk1R6vYlIoidhMWzP.js?_nc_x=0OMkmbJTxss"
      data-bootloader-hash="2abuuwv"
      async="1"
      crossorigin="nameless"
      data-p=":3,50,26,24,16,7,64,53,18,14,12,15,17,34,2,44,32,60,28,51,5,55,36,63,23,52,22,10,57,6"
      data-c="1"
      onload='_btldr["2abuuwv"]=1'
      onerror="_btldr["2abuuwv"]=1"
      nonce="nW7qla6Q"
    ></script>

Sadly, there look like an enormous variety of equally
inscrutably-named scripts all listed in the identical part. However no
matter, we will all the time simply obtain all of them after which see which one
has the content material we’re searching for.

By now we’ve a process for setting up the inbox request:

  1. Fetch the HTML web page for our Messenger inbox as soon as logged in.
  2. Extract the parameters for fb_dtsg, deviceId, and model
    from the HTML.
  3. Get an inventory of all of the scripts referenced within the HTML, and obtain
    every of these.
  4. Discover the script that defines
    LSPlatformGraphQLLightspeedRequestQuery, and extract the
    parameter for doc_id.
  5. Assemble a POST request utilizing these parameters, and fetch the
    response.

We now want to duplicate every of those steps in Python. Step 1 is
pretty simple; it’s the identical as the primary request we made,
besides now we’re logged in and might present cookies to authenticate
ourselves:

inbox_html_resp = requests.get(
    "https://www.messenger.com",
    cookies=login.cookies
)
inbox_html_resp.raise_for_status()
inbox_html_page = inbox_html_resp.textual content

print(inbox_html_page)

Step 2 is only a repeat of our earlier work writing common
expressions to extract parameters from HTML:

dtsg = re.search(
    r'"DTSGInitialData",[],{"token":"([^"]+)"',
    inbox_html_page
).group(1)

device_id = re.search(
    r'"deviceId":"([^"]+)"',
    inbox_html_page
).group(1)

schema_version = re.search(
    r'"schemaVersion":"([0-9]+)"',
    inbox_html_page
).group(1)

print("dtsg:", dtsg)
print("device_id:", device_id)
print("schema_version:", schema_version)

Right here we’re escaping the brackets and curly braces within the dtsg
common expression to keep away from them being interpreted as common
expression operators, and utilizing [0-9] to imply any digit zero via
9.

For step 3, we’ll need to begin by utilizing common expressions to get a
checklist of all of the scripts that (just like the one we’re searching for) have
rsrc.php of their URL and finish in .js:

script_urls = re.findall(
  r'"([^"]+rsrc.php/[^"]+.js[^"]+)"',
  inbox_html_page
)

Then we’ll need to fetch every of them:

scripts = []
for url in script_urls:
    resp = requests.get(url)
    resp.raise_for_status()
    scripts.append(resp.textual content)

Subsequent up, for step 4, we need to discover the script that defines
LSPlatformGraphQLLightspeedRequestQuery, and extract doc_id from
it:

for script in scripts:
    if "LSPlatformGraphQLLightspeedRequestQuery" not in script:
        proceed
    doc_id = re.search(
        r'id:"([0-9]+)",metadata:{},title:"LSPlatformGraphQLLightspeedRequestQuery"',
        script
    ).group(1)
    break

print("doc_id:", doc_id)

Right here’s what the parameter extraction appears to be like like in motion:

% python3 messyger.py -u [email protected] -p 0aSPlneurgscxzpuEZb9
dtsg: AQE0GKhvCGqVF3E:14:1637897352
device_id: 86fbb4b2-fe0e-43e9-8bd5-cd58f7bb763b
schema_version: 4680497022042598
doc_id: 4476599072415612

Lastly, for step 5, we have to recreate the next HTTPie request
in Python, utilizing our extracted parameters:

% http -f https://www.messenger.com/api/graphql/ 
    Cookie:'c_user=100075402451059; xs=50percent3Aq86l0PoxUG0qewpercent3A2percent3A1637886942percent3A-1percent3A-1' 
    fb_dtsg=AQE4bjFlv-4P3Xs:50:1637886942 
    doc_id=4476599072415612 
    variables="{"deviceId":"6a9252cb-2145-4f81-9d69-1834b84ba614","requestId":0,"requestPayload":"{"database":1,"model":4680497022042598,"sync_params":"{}"}","requestType":1}"

Right here’s what that appears like:

import json

inbox_resp = requests.put up(
    "https://www.messenger.com/api/graphql/",
    cookies=login.cookies,
    knowledge={
        "fb_dtsg": dtsg,
        "doc_id": doc_id,
        "variables": json.dumps({
            "deviceId": device_id,
            "requestId": 0,
            "requestPayload": json.dumps({
                "database": 1,
                "model": schema_version,
                "sync_params": json.dumps({})
            }),
            "requestType": 1
        })
    }
)
inbox_resp.raise_for_status()

print(inbox_resp.textual content)

You would possibly ask why the Messenger API expects a JSON string inside a
JSON string inside a JSON string inside an HTML kind. You’d have a
excellent query. However at the very least it explains why we had so many
backslashes to take care of earlier.

Decipher the inbox knowledge response

Alright, now that we’ve efficiently retrieved our inbox knowledge… what
format is that knowledge coming in, precisely? Properly, right here’s the beginning of it:

{
  "knowledge": {
    "viewer": {
      "lightspeed_web_request": {
        "payload": "perform f(){let inputs=arguments,LS=inputs[inputs.length-1],

Yep, that’s proper. It’s a JSON object… with a bunch of JavaScript
code inside it
! I can’t declare to grasp why, however this Facebook
blog post about “Project
LightSpeed”

might be associated, on condition that this JSON object is seemingly a
lightspeed_web_request. Apparently, within the new model of Messenger,
launched in early 2020, the server straight sends JavaScript for the
consumer to blindly execute and replace its native state.

In any case, let’s check out this JavaScript and see what could lie
inside:

% cat inbox-resp.json 
    | jq .knowledge.viewer.lightspeed_web_request.payload -r 
    > inbox-payload.js
% prettier inbox-payload.js > inbox-payload-pretty.js

It appears to be like like issues begin out with a bunch of initialization, setting
up some sort of sequence of operations to function as a single
transaction utilizing a bunch of
higher-order
functions
:

perform f() {
  let inputs = arguments,
    LS = inputs[inputs.length - 1],
    n = LS.n,
    m = [],
    output = [],
    U;
  return LS.seq([
    (_) =>
      LS.seq([
        (_) =>
          LS.sp(
            "executeFirstBlockForSyncTransaction",
            [0, 1],
            [-1, 4294967295],
            U,
            "HCwRAAAWlgEWlqOCxgETBAA",
            [0, 2],
            false,
            [0, 0],
            false,
            [0, 1],
            U
          ).then((r) => ([m[0]] = r)),
        (_) =>
          m[0]
            ? LS.seq([
                (_) =>
                  LS.seq([
                    (_) => LS.fe(LS.db.table(15).fetch(), (c) => c.delete()),
                    (_) => LS.fe(LS.db.table(18).fetch(), (c) => c.delete()),
                    (_) => LS.fe(LS.db.table(19).fetch(), (c) => c.delete()),
                    (_) => LS.fe(LS.db.table(20).fetch(), (c) => c.delete()),
                    (_) => LS.fe(LS.db.table(21).fetch(), (c) => c.delete()),
                    (_) => LS.fe(LS.db.table(22).fetch(), (c) => c.delete()),
                    (_) => LS.fe(LS.db.table(23).fetch(), (c) => c.delete()),
                    (_) => LS.fe(LS.db.table(24).fetch(), (c) => c.delete()),
                    ... lots more of this ...

After this boilerplate, however, the bulk of the script consists of
calls to the LS.sp function, like this:

                (_) =>
                  LS.sp(
                    "addParticipantIdToGroupThread",
                    [23284, 3405894928],
                    [23284, 3405894928],
                    [381, 1262926839],
                    [381, 1262927046],
                    [381, 1262844841],
                    U,
                    false,
                    U,
                    [0, 0],
                    [0, 80],
                    U,
                    U
                  ),

That is the place we’ve to begin gazing code and making guesses. One
cheap guess is that the primary argument to LS.sp represents the
motion to be taken (e.g. register a person as one of many individuals in
a bunch dialog), and the remaining arguments are parameters for
that motion (e.g., figuring out which person and which dialog are
to be operated on).

One factor that may be useful is knowing what a few of these
argument values are. For instance, what’s U? Happily, that one is
fairly straightforward to determine. From the start of the script:

  let inputs = arguments,
    LS = inputs[inputs.length - 1],
    n = LS.n,
    m = [],
    output = [],
    U;

So U is definitely simply undefined (which is the default worth of an
uninitialized variable in JavaScript); it appears to be like just like the code is utilizing
the U alias to save lots of characters.

What about these two-element arrays? They’re really all around the
place within the generated code, and unusually sufficient, each array has
precisely two integers, no extra, no much less. Much more unusually, there
aren’t any regular integers! Numbers solely present up inside two-element
arrays. Listed below are some examples of those arrays:

[-1, 4294967295]
[0, 0]
[0, 1]
[0, 2]
[0, 80]
[0, 1640485980]
[381, 1262844841]
[381, 1262926839]
[381, 1262927029]
[381, 1262927046]
[23284, 3405894928]
[23300, 2664454259],
[368832, 2185323521]
[230687821, 2208225279]

these arrays, we will see a few notable properties:

  • There are numerous “genres” of the arrays, which have totally different
    typical ranges. For instance, there are a bunch which can be [0, <small integer>], then a bunch which can be [381, <large integer>], then
    some which can be [<integer around 23000>, <large integer>].
  • We all know that a few of these values should by some means characterize issues like
    person IDs and dialog IDs, since capabilities like
    addParticipantIdToGroupThread don’t take any arguments that might
    convey this info apart from the two-element arrays.
  • Values like [0, 0] and [0, 1] present up loads.
  • The worth 4294967295 that reveals up in [-1, 4294967295] is
    precisely one lower than 2 to the 32, or the utmost worth of a 32-bit
    integer.

These properties are all suggestive, however the factor that gave me a
flash of perception was this code elsewhere within the script payload:

                        LS.i64.eq(i.a, [23284, 3405894928]) &&
                        LS.i64.eq([0, 0], [0, 0]) &&
                        LS.i64.eq(i.b, [381, 1262927029]) &&

The time period i64 sometimes refers to a 64-bit integer, so i64.eq would
be a perform for evaluating 64-bit integers for equality. Oh, so these
arrays have to be representing 64-bit integers! The primary integer have to be
the excessive 32 bits (which might usually be zero), and the second integer
is the low 32 bits. I might assume Messenger does this as a result of
JavaScript doesn’t have 64-bit
integers
.

For instance, the recurring worth [23284, 3405894928] would translate
to 2^32 * 23284 + 3405894928 = 100007424414992. And what do you
know, that’s precisely the worth that was displaying up within the URL
https://www.messenger.com/t/100007424414992/ in our screenshots!

Armed with this data, let’s check out the place Hello Camilla is
displaying up, as that can in all probability have the inbox info we’re
searching for. There turn into three occurrences:
deleteThenInsertThread, upsertMessage, and
setMessageDisplayedContentTypes. Properly, we’re seeking to generate a
dialog checklist, so the primary perform sounds essentially the most related.
Listed below are its arguments:

                  LS.sp(
                    "deleteThenInsertThread",
                    [381, 1262897440],
                    [381, 1262897440],
                    "Hello Camilla!",
                    U,
                    "https://scontent-sjc3-1.xx.fbcdn.web/v/t1.30497-1/143086968_2856368904622192_1959732218791162458_n.png?_nc_cat=1&ccb=1-5&_nc_sid=dbb9e7&_nc_ohc=Q_Y6W2vdkywAX8eYoNq&_nc_ht=scontent-sjc3-1.xx&oh=8e70b2536bd14a3ecbe072f3753622a3&oe=61C56EF8",
                    U,
                    [0, 80],
                    [23300, 2492200183],
                    [0, 0],
                    [0, 1],
                    "inbox",
                    "/messaging/lightspeed/media_fallback/?entity_id=100075230196983&entity_type=10&width=200&peak=200",
                    [0, 1640328952],
                    [0, 0],
                    [0, 0],
                    [0, 0],
                    false,
                    [23300, 2492200183],
                    U,
                    U,
                    ... heaps extra nulls and zeroes ...

Let’s break down the arguments right here based mostly on what we all know:

  • Hello Camilla! would in all probability be the textual content that’s displayed within the
    sidebar, i.e. the newest message in its dialog.
  • [23300, 2492200183] = 100075402451059 is the ID that’s displayed
    within the URL for the dialog with Kane Woods (who despatched the Hello Camilla! message). This parameter is repeated twice in several
    locations for some motive.
  • [381, 1262897440] = 1637645437216 is a UNIX
    timestamp
    for Tuesday, November 23, 2021 5:30:37.216 AM GMT, which is roughly the
    time that I’m scripting this information. UNIX timestamp is a brilliant widespread
    format to characterize dates and instances, so it’s price remembering, and
    after you see it sufficient you’ll have the ability to see an integer and say
    “that kinda appears to be like prefer it’s the fitting measurement to be a timestamp”. You
    can use an online tool to transform
    timestamps to human-readable illustration, and vice versa. This
    parameter can be repeated twice elsewhere for some
    motive.
  • The https://scontent-sjc3-1.xx.fbcdn.web URL will be opened
    straight within the browser and seems to be the profile image
    displayed subsequent to the dialog.
  • [0, 1640328952] looks like one other UNIX timestamp, Friday, December 24, 2021 6:55:52 AM GMT (this one in seconds relatively than
    milliseconds), however that’s an entire month after the primary one, so the
    relevance of this one isn’t completely clear.

Study the habits of the inbox knowledge response

This can be a good begin, however we’d like extra info to make sure we
perceive what these parameters actually imply. For one factor, we might
be making dangerous assumptions, and for an additional, there are already
unanswered questions, comparable to why there are seemingly duplicated
parameters. They might be actually redundant, or the totally different copies
might have totally different meanings and simply occur to have the identical worth
on this specific state of affairs. A technique we will resolve the confusion is
by gathering extra knowledge:

See Also

  1. Reload the web page and obtain the response knowledge once more. Examine it
    with the unique (say, utilizing git diff --no-index on the
    formatted JavaScript payloads) to see what modifications between two
    subsequent requests for a similar knowledge.
  2. Now make some change by interacting with Messenger, e.g. by sending
    a brand new message. Obtain the response knowledge a 3rd time and evaluate
    it with the unique as effectively. Something that’s totally different now that
    wasn’t totally different within the first comparability have to be a change attributable to
    your newest interplay. This would possibly assist to determine how the values
    of the parameters relate to the info being proven on the webpage.

Let’s begin by sending a brand new message from our take a look at account in one in every of
its conversations. Right here is how the invocation to
deleteThenInsertThread within the inbox API response modifications:

                   LS.sp(
                     "deleteThenInsertThread",
-                    [381, 1262927029],
-                    [381, 1262927029],
-                    "Hey there Camilla!",
+                    [381, 1596534595],
+                    [381, 1596534595],
+                    "You: Good day, it is a response",
                     U,
                     "https://scontent-sjc3-1.xx.fbcdn.web/v/t31.18172-1/p200x200/13235223_1713693562221441_496736952870870067_o.jpg?_nc_cat=110&ccb=1-5&_nc_sid=dbb9e7&_nc_ohc=AlIiI4QhLQUAX_
FB-Bd&_nc_ht=scontent-sjc3-1.xx&oh=fc8dd918473653af963d198ea106e8eb&oe=61C7D45C",
                     U,
                     [0, 80],
                     [23284, 3405894928],
                     [0, 0],
                     [0, 1],
                     "inbox",
                     "/messaging/lightspeed/media_fallback/?entity_id=100007424414992&entity_type=10&width=200&peak=200",
                     [0, 1640485980],
                     [0, 0],
                     [0, 0],
                     [0, 0],
                     false,
-                    [23284, 3405894928],
+                    [23300, 2664454259],
                     U,
                     U,

The UNIX timestamps on the high have modified from [381, 1262927029] = Tuesday, November 23, 2021 5:31:06.805 AM GMT to [381, 1596534595] = Saturday, November 27, 2021 2:11:14.371 AM GMT, with the brand new
timestamp being precisely the time after we despatched the most recent message. This
means that one or each of these timestamps corresponds to the time
of the final message or different replace to the dialog. We’ll must
do extra investigation to determine the distinction between the 2
parameters.

The last-message string was up to date from Hey there Camilla! to You: Good day, it is a response, and the presence of You: in right here
means that it’s not the uncooked message, however really corresponds to
the literal textual content proven within the inbox sidebar.

And at last, the ID on the backside has modified. Be aware that beforehand
there have been two situations of [23284, 3405894928] = 100007424414992
(the person ID of the particular person we have been messaging), however now one in every of them has
modified to [23300, 2664454259] = 100075402451059 (our personal person ID,
which we will discover by looking for ourselves in Messenger and checking
the URL). This habits means that the primary of the 2 IDs is the
ID of the particular person we’re messaging, whereas the second is the ID of the
one that despatched the final message (beforehand the opposite particular person, now
us).

Let’s now discover learn/unread habits. We’ll obtain a brand new message
from one other account, after which see how the API response modifications when
we learn that message (clearing its unread standing).

Aha, this produces a distinction between the 2 mysterious timestamps
handed to deleteThenInsertThread:

                   LS.sp(
                     "deleteThenInsertThread",
                     [381, 1597327774],
-                    [381, 1596534595],
+                    [381, 1597327774],
                     "Have you ever learn this but?",
                     U,
                     "https://scontent-sjc3-1.xx.fbcdn.web/v/t31.18172-1/p200x200/13235223_1713693562221441_496736952870870067_o.jpg?_nc_cat=110&ccb=1-5&_nc_sid=dbb9e7&_nc_ohc=AlIiI4QhLQUAX_FB-Bd&_nc_ht=scontent-sjc3-1.xx&oh=fc8dd918473653af963d198ea106e8eb&oe=61C7D45C",

Primarily based on this info, it appears doubtless that the primary timestamp is
when the newest message was despatched, whereas the second timestamp is
the timestamp of the newest message that’s been learn to this point. In
different phrases, the dialog has unread message(s) when these two
timestamps differ.

One final element to clear up earlier than we will assemble our inbox view:
how will we really translate from these person IDs again to human names
we will show? Properly, looking for a reputation like Kane Woods within the
response reveals that this info will be simply extracted from the
arguments to the verifyContactRowExists perform:

                  LS.sp(
                    "verifyContactRowExists",
                    [23300, 2492200183],
                    [0, 1],
                    "https://scontent-sjc3-1.xx.fbcdn.web/v/t1.30497-1/p100x100/143086968_2856368904622192_1959732218791162458_n.png?_nc_cat=1&ccb=1-5&_nc_sid=dbb9e7&_nc_ohc=Q_Y6W2vdkywAX9oSRfq&_nc_ht=scontent-sjc3-1.xx&oh=daaf26cd7b77fa48b5bc2aafc7ee8a0c&oe=61C90FD1",
                    "Kane Woods",
                    ... extra arguments ...

Parse the inbox knowledge response

We now have a guidelines of knowledge to extract from the inbox knowledge
response:

  • Extract the embedded JavaScript snippet from the JSON response we
    get from Messenger.
  • Take a look at calls to deleteThenInsertThread to get an inventory of
    conversations that may be displayed within the sidebar.
  • Extract the last-sent message description from a string argument.
  • Get the person ID of the particular person the dialog is with, in addition to
    the person ID of the one who despatched the final message.
  • Examine the 2 timestamps within the preliminary arguments to find out
    whether or not the dialog is marked as unread or not.
  • Take a look at calls to verifyContactRowExists to map person IDs again to
    human names.

Nevertheless, this code looks like an enormous mess to parse with common
expressions, because the required knowledge is caught in particular positional
arguments of lengthy perform invocations, separated by plenty of cruft we
don’t care about. One other method is known as for.

The usual method to parsing programming languages with out common
expressions is to make use of a device to transform them into their abstract
syntax tree
,
which permits manipulating the info embedded within the language with out
needing to parse plenty of syntax. We’ll use the
Esprima library to parse
JavaScript from Python:

import esprima

inbox_json = inbox_resp.json()
inbox_js = inbox_json["data"]["viewer"]["lightspeed_web_request"]["payload"]

ast = esprima.parseScript(inbox_js)

print(ast)

Now, as an alternative of getting to parse a perform name like this:

                  LS.sp(
                    "updateThreadsRangesV2",
                    "inbox",
                    [0, 0],
                    [0, 1],
                    [-2147483648, 0]
                  ),

We get a pre-parsed knowledge construction like this, the place all of the arguments
are neatly recognized for us and we will simply loop over them in Python:

{
    kind: "CallExpression",
    callee: {
        kind: "MemberExpression",
        computed: False,
        object: {
            kind: "Identifier",
            title: "LS"
        },
        property: {
            kind: "Identifier",
            title: "sp"
        }
    },
    arguments: [
        {
            type: "Literal",
            value: "updateThreadsRangesV2",
            raw: ""updateThreadsRangesV2""
        },
        {
            type: "Literal",
            value: "inbox",
            raw: ""inbox""
        },
        {
            type: "ArrayExpression",
            elements: [
                {
                    type: "Literal",
                    value: 0,
                    raw: "0"
                },
                {
                    type: "Literal",
                    value: 0,
                    raw: "0"
                }
            ]
        },
        {
            kind: "ArrayExpression",
            components: [
                {
                    type: "Literal",
                    value: 0,
                    raw: "0"
                },
                {
                    type: "Literal",
                    value: 1,
                    raw: "1"
                }
            ]
        },
        {
            kind: "ArrayExpression",
            components: [
                {
                    type: "UnaryExpression",
                    prefix: True,
                    operator: "-",
                    argument: {
                        type: "Literal",
                        value: 2147483648,
                        raw: "2147483648"
                    }
                },
                {
                    type: "Literal",
                    value: 0,
                    raw: "0"
                }
            ]
        }
    ]
},

Step 1 of processing the AST we get from Esprima shall be to determine
all of the makes use of of LS.sp. We are able to write a perform to match this
sample, based mostly on wanting on the AST snippet above:

def is_lightspeed_call(node):
    return (
        node.kind == "CallExpression"
        and node.callee.kind == "MemberExpression"
        and node.callee.object.kind == "Identifier"
        and node.callee.object.title == "LS"
        and node.callee.property.kind == "Identifier"
        and node.callee.property.title == "sp"
    )

Then we’ll need to rework the arguments into Python values relatively
than the objects that seem within the AST, once more by inspecting the AST
snippet above to see how issues are represented:

def parse_argument(node):
    if node.kind == "Literal":
        return node.worth
    if node.kind == "ArrayExpression":
        assert len(node.components) == 2
        high_bits, low_bits = map(parse_argument, node.components)
        return (high_bits << 32) + low_bits
    if (
        node.kind == "UnaryExpression" and
        node.prefix and
        node.operator == "-"
    ):
        return -parse_argument(node.argument)

(We’re utilizing << 32 to implement Messenger’s multiply-by-2-to-the-32
operation; read more about
<<
.)

What we need to do now’s undergo each node within the AST, looking
for LS.sp invocations, and type them by which perform is being
known as. Happily, that is precisely the kind of job that libraries
like Esprima are designed for. The standard strategy to do it’s by writing
a perform which the library will name for each node within the AST. Right here
is what that appears like:

import collections
fn_calls = collections.defaultdict(checklist)

def handle_node(node, meta):
    if not is_lightspeed_call(node):
        return

    args = [parse_argument(arg) for arg in node.arguments]
    (fn_name, *fn_args) = args

    fn_calls[fn_name].append(fn_args)

esprima.parseScript(inbox_js, delegate=handle_node)

print(json.dumps(fn_calls, indent=2))

Right here’s what that appears like:

% python3 messyger.py -u [email protected] -p 5155xKYdE1zi0KxGPMvF
{
  "executeFirstBlockForSyncTransaction": [
    [
      1,
      -1,
      null,
      "HCwRAAAWZhaql9XyAxMEAA",
      2,
      false,
      0,
      false,
      1,
      null
    ]
  ],
  "deleteThenInsertThread": [
    [
      1638035642666,
      1638035369213,
      "Totes agreed",
      null,
      "https://scontent-sjc3-1.xx.fbcdn.net/v/t1.30497-1/143086968_2856368904622192_1959732218791162458_n.png?_nc_cat=1&ccb=1-5&_nc_sid=dbb9e7&_nc_ohc=Q_Y6W2vdkywAX8I78V4&_nc_ht=scontent-sjc3-1.xx&oh=c780f73897ab04a260dbbd203a14e2e5&oe=61C96378",
      null,
      80,
      100075475206906,
      ... lots more ...

(You may notice we’re using a different email address here, because
this is the point in the blog post where the original account I was
using for testing got banned for acting too suspicious.)

With this data in hand, we can finally assemble our parsed function
calls into a useful thread listing:

conversations = collections.defaultdict(dict)

for args in fn_calls["deleteThenInsertThread"]:
    last_sent_ts, last_read_ts, last_msg, *relaxation = args
    user_id, last_msg_author = [
        arg for arg in rest if isinstance(arg, int) and arg > 1e14
    ]
    conversations[user_id]["unread"] = last_sent_ts != last_read_ts
    conversations[user_id]["last_message"] = last_msg
    conversations[user_id]["last_message_author"] = last_msg_author

for args in fn_calls["verifyContactRowExists"]:
    user_id, _, _, title, *relaxation = args
    conversations[user_id]["name"] = title

print(json.dumps(conversations, indent=2))

And, huzzah! Conversations listed from most to least latest, with our
personal person ID out there because the final entry within the dictionary. (That final
bit is admittedly a bit of bizarre, however it might all the time be cleaned up
later if desired.)

% python3 messyger.py -u [email protected] -p 5155xKYdE1zi0KxGPMvF
{
  "100075475206906": {
    "unread": true,
    "last_message": "Totes agreed",
    "last_message_author": 100075475206906,
    "title": "Kerri Blackmore"
  },
  "100075217039998": {
    "unread": false,
    "last_message": "You: How ya doin",
    "last_message_author": 100075103764938,
    "title": "Astrid Mccallum"
  },
  "100075103764938": {
    "title": "Ailish Maldonado"
  }
}

Discover the send-message request

Let’s transfer on to the final deliberate characteristic of Messyger: sending a
message to a dialog. We are able to begin by opening up Messenger with
the developer instruments open and sending a message, to see what request(s)
it triggers:

Requests triggered by sending a
message

Hmmm, one thing is odd right here. Within the right-hand column, labeled
Waterfall, we will see a visible illustration of the time when every
request was made. Most of them have been made at about the identical time,
through the preliminary web page load. The third-to-last request was made a
little bit after (it appears to be like just like the consumer makes a request like this
periodically, even in the event you don’t do something). And the final two
requests are the one two that have been made after we despatched the message.
However these final two requests aren’t API requests, they’re simply requests
to fetch photographs! So how was the consumer capable of inform the server to ship
our message?

Properly, if we scroll up within the Community tab, we will see there are a number of
requests which can be nonetheless listed as “Pending”, which means they haven’t
completed transmitting knowledge but:

Websocket requests shown as
Pending

These connections are websocket
connections
,
that are long-lasting HTTP connections that the server and consumer can
use to ship knowledge forwards and backwards at any time. Certainly, if we click on on
a kind of connections, we will see that they’ve been exhausting at work
the whole time sending messages forwards and backwards:

List of messages sent over chat
websocket

Since we didn’t see any new HTTP requests within the Community tab when
sending a message, it’s potential that the consumer used one in every of its
websocket connections to speak with the server as an alternative. Certainly,
if we verify the previous few websocket messages that have been exchanged after
we pressed Return, we discover this one, which is a message from the
consumer to the server containing precisely the textual content of the message we
despatched:

Websocket message responsible for sending our
text

To examine the contents, we will obtain that message from Chrome (in
base64 because it’s labeled as a “binary message”):

Right-click menu for downloading the websocket
message

Utilizing a device like base64decode.org,
we will then examine the contents:

Using base64decode.org to decode the websocket
message

It seems that there’s a little bit of junk originally, however then the
relaxation is only a JSON object that appears awfully just like the one which
we used when making our request to get the inbox knowledge.

Right here’s the uncooked JSON:

{
  "request_id": 76,
  "kind": 3,
  "payload": "{"version_id":"4680497022042598","duties":[{"label":"46","payload":"{"thread_id":100075475206906,"otid":"6870463702739115828","source":65537,"send_type":1,"text":"Let's see how this message gets sent","initiating_source":1}","queue_name":"100075475206906","task_id":10,"failure_count":null},{"label":"21","payload":"{"thread_id":100075475206906,"last_read_watermark_ts":1638046193775,"sync_group":1}","queue_name":"100075475206906","task_id":11,"failure_count":null}],"epoch_id":6870463702858032614,"data_trace_id":"#0Q0JVtIKTdGjTDCwodlbNg"}",
  "app_id": "772021112871879"
}

The place the payload secret is a JSON string that expands to this:

{
  "version_id": "4680497022042598",
  "duties": [
    {
      "label": "46",
      "payload": "{"thread_id":100075475206906,"otid":"6870463702739115828","source":65537,"send_type":1,"text":"Let's see how this message gets sent","initiating_source":1}",
      "queue_name": "100075475206906",
      "task_id": 10,
      "failure_count": null
    },
    {
      "label": "21",
      "payload": "{"thread_id":100075475206906,"last_read_watermark_ts":1638046193775,"sync_group":1}",
      "queue_name": "100075475206906",
      "task_id": 11,
      "failure_count": null
    }
  ],
  "epoch_id": 6870463702858032000,
  "data_trace_id": "#0Q0JVtIKTdGjTDCwodlbNg"
}

And the payload keys in that are JSON strings that increase to
these:

{
  "thread_id": 100075475206906,
  "otid": "6870463702739115828",
  "supply": 65537,
  "send_type": 1,
  "textual content": "Let's examine how this message will get despatched",
  "initiating_source": 1
}

{
  "thread_id": 100075475206906,
  "last_read_watermark_ts": 1638046193775,
  "sync_group": 1
}

Replicate the send-message request

Now that we’ve recognized the request that the consumer makes use of to ship a
message, we need to replicate it outdoors the browser for testing.
Nevertheless, this one is a little more sophisticated, as a result of we will’t simply
“Copy as cURL” for a websocket message. We might get the cURL command
to open the websocket, however then the server and consumer would possibly must
trade a bunch of particular person messages on the socket earlier than we might
ship the request we would like.

Nevertheless, if we take a look at the info on this websocket message, it appears
unusually just like the HTTP request we made earlier when fetching
the inbox knowledge. Particularly:

  • Each requests have an embedded string of JSON (known as payload and
    requestPayload respectively).
  • Each “payloads” have a top-level key that appears to specify some variety
    of schema model, with worth 4680497022042598 (the important thing being
    known as version_id and model respectively).
  • Alongside the “payload” there may be additionally a sibling key that specifies
    some sort of request kind, with worth kind: 3 within the websocket
    message and requestType: 1 within the inbox request.

Is it potential that the Messenger API helps making the identical request
in two other ways (through particular person HTTP request or as a message on
an already-open websocket)? In that case, it could make issues simpler for us,
as a result of then as an alternative of determining find out how to do the stuff with
websockets, we might simply make this request the identical manner as we made
the inbox request.

We have now no specific motive to consider this can work, since we don’t
have a working instance within the consumer to check in opposition to, but when our
guess occurs to be proper, we might save lots of time, so let’s give
it a strive. We’ll begin with our present code to make the inbox
request:

inbox_resp = requests.put up(
    "https://www.messenger.com/api/graphql/",
    cookies=login.cookies,
    knowledge={
        "fb_dtsg": dtsg,
        "doc_id": doc_id,
        "variables": json.dumps({
            "deviceId": device_id,
            "requestId": 0,
            "requestPayload": json.dumps({
                "database": 1,
                "model": schema_version,
                "sync_params": json.dumps({})
            }),
            "requestType": 1
        })
    }
)
inbox_resp.raise_for_status()

Then we’ll modify it to substitute out its requestPayload for the
one we noticed within the websocket message:

send_message_resp = requests.put up(
    "https://www.messenger.com/api/graphql/",
    cookies=login.cookies,
    knowledge={
        "fb_dtsg": dtsg,
        "doc_id": doc_id,
        "variables": json.dumps({
            "deviceId": device_id,
            "requestId": 0,
            "requestPayload": json.dumps({
                "version_id": "4680497022042598",
                "duties": [
                  {
                    "label": "46",
                    "payload": json.dumps({
                        "thread_id": 100075475206906,
                        "otid": "6870463702739115828",
                        "source": 65537,
                        "send_type": 1,
                        "text": "Let's see how this message gets sent",
                        "initiating_source": 1
                    }),
                    "queue_name": "100075475206906",
                    "task_id": 10,
                    "failure_count": None
                  },
                  {
                    "label": "21",
                    "payload": json.dumps({
                        "thread_id": 100075475206906,
                        "last_read_watermark_ts": 1638046193775,
                        "sync_group": 1
                    }),
                    "queue_name": "100075475206906",
                    "task_id": 11,
                    "failure_count": None
                  }
                ],
                "epoch_id": 6870463702858032000,
                "data_trace_id": "#0Q0JVtIKTdGjTDCwodlbNg"
            }),
            "requestType": 3  # to match kind: 3 in websocket message
        })
    }
)
send_message_resp.raise_for_status()

print(send_message_resp.textual content)

If we run this, the outcomes are blended: it doesn’t return an error
(as an alternative it returns a bunch of embedded JavaScript identical to the inbox
response), however it additionally doesn’t really ship a message that reveals up
within the Messenger interface. There are a pair totally different potential
explanations for why this might be; for instance:

  1. We’d not have the ability to ship messages utilizing this API in any respect, and
    our unique guess was incorrect.
  2. We might have a syntax error someplace in our request, and the
    server would possibly simply ignore malformed requests and ship again a generic
    response.
  3. Not like getting the inbox knowledge, which is read-only, sending a
    message is a write operation. The API might need been designed so
    that repeating the identical request greater than as soon as doesn’t lead to
    a number of messages getting despatched.

From a design perspective, (3) really makes lots of sense. After
all, requests generally fail contained in the community and should be
retried, so it’s finest if repeating a request a number of instances doesn’t
lead to motion being taken a number of instances.

One strategy to see if so is by taking part in round with the
parameters of the request to see if one in every of them needs to be modified
earlier than the repeated request shall be seen as a request to ship a new
message, relatively than be ignored as a replica.

And if we do that, we discover that certainly, altering the otid worth
(e.g., by including 1 to it) ends in a brand new copy of the message being
despatched!

Two copies of our message

Apparently, otid is a few sort of distinctive identifier used to forestall
messages from by accident getting despatched twice (every message from the
consumer will get assigned a singular otid, and every otid can solely be
processed as soon as by the server).

Clear up the send-message request

Now that we’ve a proof of idea for find out how to ship messages, we will
clear it up by utilizing variables as an alternative of hardcoding in values. We’ll
begin by studying in command-line arguments for sending a message:

parser.add_argument("-m", "--message")
parser.add_argument("-r", "--recipient", kind=int)

Then we will substitute Let's examine how this message will get despatched with
args.message, and 100075475206906 with args.recipient.

Many of the different parameters within the request look moderately essential,
so we’ll go away most of them in. The one exception is data_trace_id,
which suggests one thing used for debugging, so we’ll take away that.

There are a few leftover hardcoded numbers:

  • supply is about to 65537 (precisely yet another than 2 to the sixteenth
    energy). Nevertheless, testing means that the worth doesn’t really
    matter, so we’ll simply set it to 0 for now and revisit later if it
    causes points.
  • label is about to 46 and 21 respectively within the two components of
    the duties array. These values appear prone to be fastened ID numbers
    which can be a part of the API (46 which means “ship message” and 21
    which means “replace last-read indicator”).
  • last_read_watermark_ts is about to a UNIX timestamp that appears to be
    for the time the message was despatched, which we will substitute with one
    generated by our code.
  • task_id is about to 10 and 11 respectively for the 2 duties.
    Testing means that the values don’t matter, so we’ll set them to
    0 and 1, respectively.
  • requestType is 3 and this comes from the websocket API request,
    so we’ll go away that as is.

Right here’s what we find yourself with after making these substitutions:

import datetime

timestamp = int(datetime.datetime.now().timestamp() * 1000)

send_message_resp = requests.put up(
    "https://www.messenger.com/api/graphql/",
    cookies=login.cookies,
    knowledge={
        "fb_dtsg": dtsg,
        "doc_id": doc_id,
        "variables": json.dumps(
            {
                "deviceId": device_id,
                "requestId": 0,
                "requestPayload": json.dumps(
                    {
                        "version_id": str(schema_version),
                        "duties": [
                            {
                                "label": "46",
                                "payload": json.dumps(
                                    {
                                        "thread_id": args.recipient,
                                        "otid": "6870463702739115830",
                                        "source": 0,
                                        "send_type": 1,
                                        "text": args.message,
                                        "initiating_source": 1,
                                    }
                                ),
                                "queue_name": str(args.recipient),
                                "task_id": 0,
                                "failure_count": None,
                            },
                            {
                                "label": "21",
                                "payload": json.dumps(
                                    {
                                        "thread_id": args.recipient,
                                        "last_read_watermark_ts": timestamp,
                                        "sync_group": 1,
                                    }
                                ),
                                "queue_name": str(args.recipient),
                                "task_id": 1,
                                "failure_count": None,
                            },
                        ],
                        "epoch_id": 6870463702858032000,
                    }
                ),
                "requestType": 3,
            }
        ),
    },
)

All we’ve left now are the mysterious otid and epoch_id
parameters. Producing otid appropriately is essential for ensuring
our messages are literally despatched, so we’ll need to perceive find out how to do
it.

Because the otid is totally different for each message, it could most probably
need to be generated on the consumer aspect. Subsequently, an affordable place
to begin could be to look the client-side JavaScript for mentions of
otid.

% cat inbox-requests.har 
    | jq '.log.entries
            | map(choose(.response.content material.textual content | .?
                           | comprises("otid"))
                    | .request.url)'
[
  "https://static.xx.fbcdn.net/rsrc.php/v3iRYC4/y8/l/en_US/d0FzJm8Jr_2GGVI9daGiZfL5MEKqrgHqVIWF0joMj2QgTM4YRSBq4b1LW25pZd7TC-AjrHzCyljakIj8QgziINDKiIZu6XPLjKZJ_v74SDWO8lfwQznT2vHDG_5hUHYYOcBO0v0LGrADQfsxLTta97k6SMq0QFmd6lLAcfPeULHpocMm0pQ6ZiqCb9aFMEaLXT3_o_DtviHOB3GX1Isgz-QZRkiA16JwTjqxIM9tg2HGk3jOqpo4M8-E4se5OLtvP_50qxVk.js?_nc_x=0OMkmbJTxss
"
]
% cat inbox-requests.har 
    | jq '.log.entries
            | map(choose(.response.content material.textual content | .?
                           | comprises("otid")))
            | .[0].response.content material.textual content' -r 
    > otid.js
% prettier otid.js > otid-pretty.js

There are appearances of otid on this script. By themselves, none of
them are very explanatory, however context is every thing. One of many
matches appears to be like like this:

                d[2].set("otid", c.i64.to_string(d[1])),

And if we glance up on the high of the perform the place this assertion
seems, we see it appears to be like like this:

__d(
  "LSCreateGroupThreadWithAdminText",
  [
    "LSCreateOfflineThreadingID",
    "LSIssueNewTask",
    "LSLocalApplyOptimisticGroupThread",
  ],
  perform (a, b, c, d, e, f) {

Hey, wait, does otid stand for OfflineThreadingID? Growth, let’s
seek for the definition of this LSCreateOfflineThreadingID
perform. Conveniently sufficient, it’s even in the identical file!

__d(
  "LSCreateOfflineThreadingID",
  [],
  perform (a, b, c, d, e, f) {
    a = perform () {
      var a = arguments,
        b = a[a.length - 1];
      b.n;
      var c = [],
        d = [];
      return (
        (c[0] = b.i64.random()),
        (d[0] = b.i64.and_(
          b.i64.or_(
            b.i64.lsl_(a[0], b.i64.to_int32([0, 22])),
            b.i64.and_(c[0], [0, 4194303])
          ),
          [2147483647, 4294967295]
        )),
        b.resolve(d)
      );
    };
    e.exports = a;
  },
  null
);

By this perform, it looks like the algorithm is one thing
like:

  1. Generate a random 64-bit integer (b.i64.random()).
  2. Mix that with 4194303 utilizing bitwise AND. Since 4194303
    occurs to be one lower than 2 to the twenty second energy, this has the
    impact of dropping all bits previous the rightmost 22, which converts
    the worth from step 1 right into a random 22-bit integer.
  3. Take the argument to LSCreateOfflineThreadingID, and shift it by
    22 bits to the left (lsl is a standard abbreviation for left shift
    logical
    , which is often written as <<).
  4. Mix the previous two values utilizing bitwise OR. Because the first
    worth has solely the rightmost bits, whereas the second worth has solely
    the leftmost bits, this operation primarily concatenates the 2
    numbers. Arithmetically, it really works out to (lefthand bits) * 2^22 + (righthand bits).

(Read more about bitwise
operators
.)

This raises the query: what really is the argument handed to
LSCreateOfflineThreadingID? Properly, if we glance again on the unique
perform that talked about LSCreateOfflineThreadingID, we will see that
that is the place the decision occurs:

            perform (a) {
              return c
                .sp(b("LSCreateOfflineThreadingID"), c.i64.of_float(Date.now()))
                .then(perform (a) {
                  return (a = a), (d[1] = a[0]), a;
                });
            },

Aha! The argument is only a UNIX timestamp generated by Date.now().
So, with that in thoughts, we will generate our personal “offline threading IDs”
in Python:

timestamp = int(datetime.datetime.now().timestamp() * 1000)
otid = (timestamp << 22) + random.randrange(2 ** 22)

One final puzzle to resolve: what about epoch_id? Really, it’s
virtually the identical as otid! Within the websocket message, we had otid = 6870463702739115830 and epoch_id = 6870463702858032000. It appears
like epoch_id is simply otid rounded all the way down to some specific even
boundary. Actually, with a bit of extra inspection, it seems that
the boundary is simply the identical 22-bit boundary as above. So, that is
what’s occurring:

import random

timestamp = int(datetime.datetime.now().timestamp() * 1000)
epoch = timestamp << 22
otid = epoch + random.randrange(2 ** 22)

In the end, we will add this code to Messyger and have a completely(?)
functioning Messenger consumer that may fetch our checklist of conversations,
and ship a textual content message to any one in every of them. The complete code is on
GitHub
.

What subsequent?

Messyger is nowhere close to a full Messenger consumer. I don’t intend to
add any extra options, as a result of Messyger is an academic case research
relatively than a sensible utility. Nevertheless, if you wish to apply
your expertise, you would possibly take into account determining find out how to deal with among the
complexities that Messyger glosses over; for instance:

  • We solely had direct messages right here; what about group chats?
  • How do you fetch previous messages, along with the newest one?
  • How are you going to ship and obtain photographs, as an alternative of simply texts?
  • How are emojis and reactions represented?
  • How are you going to get details about when the recipient has learn your
    message?

Or, you’ll find a web site of your personal that you just want uncovered a greater
API. Can you determine find out how to make issues work the way in which you need?

⛏️ Go forth and construct one thing!

Necessary authorized discover: This weblog put up is maintained by Radian
LLC
.



Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top