Reverse engineering the Fb Messenger API
I just lately had event to reverse engineer the Facebook
Messenger API for private use, and
realized the case research would make an incredible tutorial. This weblog put up
lays out how I deciphered the API, and explains the reverse
engineering methods I used, so that you could go forth and do the
identical. The required background information is pretty minimal.
Keep in mind: Reverse engineering is moral, pro-democratic, and
protected underneath US legislation, however you continue to must train integrity and
duty when interacting with any on-line system. Examples of
irresponsible habits embody:
- Sending automated (or non-automated) spam to different customers
- Downloading folks’s knowledge with out their consent
- Placing undue load on infrastructure you aren’t paying for
This type of habits is inappropriate no matter how it’s
completed. However when put to the fitting use, reverse engineering is a
strategy to give your self and others better company, freedom, and
creativity on-line. For instance, you may use it to develop an
various interface to a web-based system which might in any other case be
inaccessible to customers with disabilities. Or you may regulate an
utility to be runnable on older methods which can be now not
supported, which might profit customers who can not afford to purchase new
{hardware}.
Warning: However the above, Fb likes to
robotically droop and/or ban individuals who take a look at their API humorous,
even in the event you aren’t doing something dangerous. Discover with warning.
Desk of contents
Objective
Our aim right here shall be to develop a command-line program known as
Messyger that permits:
- Seeing your most up-to-date conversations, and which of them have unread
messages. - Sending a message to a dialog.
After all, this isn’t sufficient for a full Messenger consumer, however it’s
sufficient to indicate off the methods with out having an excessive amount of busywork. To
apply your personal expertise, you may add extra capabilities after
studying this put up.
We’ll use Python as a result of it makes the code concise and simple to learn.
The complete code from this weblog put up is out there on
GitHub.
Get the e-mail and password
Step 1 of utilizing Messenger is offering your e-mail deal with and password
to log in. The identical shall be true of Messyger:
import argparse
parser = argparse.ArgumentParser("messyger")
parser.add_argument("-u", "--email", required=True)
parser.add_argument("-p", "--password", required=True)
args = parser.parse_args()
print("e-mail:", args.e-mail)
print("password:", args.password)
And utilization:
% python3 messyger.py -u [email protected] -p 0aSPlneurgscxzpuEZb9
e-mail: [email protected]
password: 0aSPlneurgscxzpuEZb9
And earlier than you ask, no, none of those credentials are legitimate. As a result of
Fb banned the entire accounts for having an excessive amount of suspicious
exercise…
Examine the login kind
So what occurs after we click on the login button? We are able to discover out by
opening the developer instruments in Chrome (or its equal in different
browsers; any will suffice) and switching to the Community tab to see
the checklist of all HTTP requests made by the browser whereas loading the
web page.
Once we click on the login button, what we see (assuming we first verify
the “Protect log” checkbox) is a brand new request displaying up on the backside
of the log, to the relevant-seeming URL
https://www.messenger.com/login/password/
.
This can be a POST request, which implies that knowledge is being submitted to
the server, which is what we count on for a request to log in. (Read
more about HTTP request
methods.)
If we scroll down, Chrome will present us the shape knowledge that was
submitted as a part of this request, which certainly contains the e-mail and
password:
There are additionally a bunch of different parameters right here, so we’ll need to
determine what these imply, and in the event that they’re essential. However first, we
ought to determine how making this request really ends in us being
logged in.
Usually, logins are dealt with utilizing cookies, so that you’ll present your
username and password, and the server offers you some cookies for
the browser to retailer. (Read more about
cookies.)
Then, the cookies are included in all subsequent requests, permitting
the server to confirm that you’ve already logged in.
If we scroll up and take a look at the response headers, we will see that
certainly the response makes use of the Set-Cookie
header to cross some cookies
again to the browser.
(Read more about HTTP response
headers.)
Replicate the login request
Now that we’ve recognized the request that’s used to log in, we’ll
need to replicate it outdoors of the browser, in order that we’ve full
management over it. The aim is to take the e-mail and password, and
trade them for the cookies that can permit us to make subsequent
authenticated requests.
Fortunately, Chrome (and different browsers) present a simple manner to do that.
You’ll be able to right-click the request and extract a cURL command that can
do the identical factor because the browser did, however from the command line.
Right here’s what that appears like:
% curl 'https://www.messenger.com/login/password/'
-H 'authority: www.messenger.com'
-H 'pragma: no-cache'
-H 'cache-control: no-cache'
-H 'sec-ch-ua: "Chromium";v="94", "Google Chrome";v="94", ";Not A Model";v="99"'
-H 'sec-ch-ua-mobile: ?0'
-H 'sec-ch-ua-platform: "Linux"'
-H 'origin: https://www.messenger.com'
-H 'upgrade-insecure-requests: 1'
-H 'dnt: 1'
-H 'content-type: utility/x-www-form-urlencoded'
-H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36'
-H 'settle for: textual content/html,utility/xhtml+xml,utility/xml;q=0.9,picture/avif,picture/webp,picture/apng,*/*;q=0.8,utility/signed-exchange;v=b3;q=0.9'
-H 'sec-fetch-site: same-origin'
-H 'sec-fetch-mode: navigate'
-H 'sec-fetch-user: ?1'
-H 'sec-fetch-dest: doc'
-H 'referer: https://www.messenger.com/'
-H 'accept-language: en-US,en;q=0.9'
-H 'cookie: wd=1010x980; dpr=2; datr=UqKaYf_W73hoTmwXhi8ZqzZ4'
--data-raw 'jazoest=2913&lsd=AVrs5S09Cjw&initial_request_id=APeMI6-a6r5592s5ETA6Zr5&timezone=480&lgndim=eyJ3IjoxOTIwLCJoIjoxMDgwLCJhdyI6MTkyMCwiYWgiOjEwNTMsImMiOjI0fQpercent3Dpercent3D&lgnrnd=114743_C4xH&lgnjs=n&e-mail=camilla.woodwardpercent40protonmail.com&cross=0aSPlneurgscxzpuEZb9&login=1&persistent=1&default_persistent="
--compressed
Once we run this command, we’ll see that it finishes efficiently however
doesn’t print something. It’s because cURL doesn’t print response
headers by default, and this request solely returns headers (no physique
content material). We are able to add the -i
option to display response
headers,
which do seem to have the cookies we have been anticipating:
HTTP/2 302
set-cookie: sb=ja6aYcS61HGuWo-I6JaD_8G3; expires=Tue, 21-Nov-2023 20:39:41 GMT; Max-Age=63072000; path=/; area=.messenger.com; safe; httponly; SameSite=None
set-cookie: c_user=100075402451059; expires=Mon, 21-Nov-2022 20:39:40 GMT; Max-Age=31535999; path=/; area=.messenger.com; safe; SameSite=None
set-cookie: xs=36percent3Adbs1ryav8jfpEgpercent3A2percent3A1637527181percent3A-1percent3A-1; expires=Mon, 21-Nov-2022 20:39:40 GMT; Max-Age=31535999; path=/; area=.messenger.com; safe; httponly; SameSite=None
location: https://www.messenger.com/
content-security-policy-report-only: default-src https: knowledge: wss: blob: chrome-extension: "unsafe-inline' 'unsafe-eval';block-all-mixed-content;report-uri https://www.fb.com/csp/reporting/?reduce=0;
content-security-policy: default-src knowledge: blob: https://*.fbcdn.web https://*.fb.com *.fbsbx.com *.messenger.com;script-src *.fb.com *.fbcdn.web *.fb.web *.google-analytics.com *.google.com 127.0.0.1:* 'unsafe-inline' 'unsafe-eval' blob: knowledge: 'self' join.fb.web *.messenger.com;style-src knowledge: blob: 'unsafe-inline' *.fb.com *.fbcdn.web *.messenger.com;connect-src *.fb.com fb.com *.fbcdn.web *.fb.web wss://*.fb.com:* wss://*.whatsapp.com:* attachment.fbsbx.com ws://localhost:* blob: *.cdninstagram.com 'self' *.messenger.com wss://*.messenger.com www.messenger.com www.google-analytics.com wss://*.messenger.com:*;font-src *.messenger.com *.fb.com https://*.fbcdn.web knowledge: *.gstatic.com;img-src *.fbcdn.web https://*.fb.com cdninstagram.com *.cdninstagram.com *.tenor.co *.tenor.com *.giphy.com knowledge: *.fbsbx.com *.messenger.com messenger.com blob: android-webview-video-poster: *.xx.fbcdn.web https://messenger.com;media-src *.messenger.com *.fb.com https://*.fbcdn.web knowledge: *.fbsbx.com *.fbcdn.web *.cdninstagram.com https://*.giphy.com blob:;frame-src *.messenger.com *.fb.com https://*.fbcdn.web knowledge: *.fbsbx.com *.fbcdn.web *.cdninstagram.com blob: *.doubleclick.web;
report-to: {"max_age":86400,"endpoints":[{"url":"https://www.facebook.com/browser_reporting/?minimize=0"}],"group":"coep_report"}
x-fb-rlafr: 0
document-policy: force-load-at-top
cross-origin-resource-policy: same-origin
cross-origin-embedder-policy-report-only: require-corp;report-to="coep_report"
cross-origin-opener-policy: same-origin-allow-popups
pragma: no-cache
cache-control: personal, no-cache, no-store, must-revalidate
expires: Sat, 01 Jan 2000 00:00:00 GMT
x-content-type-options: nosniff
x-xss-protection: 0
x-frame-options: DENY
access-control-expose-headers: X-FB-Debug, X-Loader-Size
access-control-allow-methods: OPTIONS
access-control-allow-credentials: true
access-control-allow-origin: https://www.messenger.com
range: Origin
strict-transport-security: max-age=15552000; preload; includeSubDomains
content-type: textual content/html; charset="utf-8"
x-fb-debug: niLVdyLSHPRUwRyAzfXpDUEgukRqEdXXNZ1yK3fO1yjJG1z8FrzwfA1OMfo1QbiSxCnBZx72f1nk6HEXi44NDg==
content-length: 0
date: Solar, 21 Nov 2021 20:39:42 GMT
precedence: u=3,i
alt-svc: h3=":443"; ma=3600, h3-29=":443"; ma=3600
Nevertheless, cURL syntax is a bit of annoying, and I personally choose to
use HTTPie as an alternative. Happily there’s a good
device known as CurliPie that converts
cURL syntax to HTTPie. That offers us this:
% http -f https://www.messenger.com/login/password/
Authority:www.messenger.com
Pragma:no-cache
Cache-Management:no-cache
Sec-Ch-Ua:'"Chromium";v="94", "Google Chrome";v="94", ";Not A Model";v="99"'
Sec-Ch-Ua-Cell:'?0'
Sec-Ch-Ua-Platform:Linux
Origin:https://www.messenger.com
Improve-Insecure-Requests:1
Dnt:1
Content material-Sort:utility/x-www-form-urlencoded
Consumer-Agent:'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36'
Settle for:'textual content/html, utility/xhtml+xml, utility/xml;q=0.9, picture/avif, picture/webp, picture/apng, */*;q=0.8, utility/signed-exchange;v=b3;q=0.9'
Sec-Fetch-Website:same-origin
Sec-Fetch-Mode:navigate
Sec-Fetch-Consumer:'?1'
Sec-Fetch-Dest:doc
Referer:https://www.messenger.com/
Settle for-Language:'en-US, en;q=0.9'
Cookie:'wd=1010x980; dpr=2; datr=UqKaYf_W73hoTmwXhi8ZqzZ4'
jazoest=2913
lsd=AVrs5S09Cjw
initial_request_id=APeMI6-a6r5592s5ETA6Zr5
timezone=480
lgndim=eyJ3IjoxOTIwLCJoIjoxMDgwLCJhdyI6MTkyMCwiYWgiOjEwNTMsImMiOjI0fQ==
lgnrnd=114743_C4xH
lgnjs=n
[email protected]
cross=0aSPlneurgscxzpuEZb9
login=1
persistent=1
Discover how now the e-mail and password are separated out into totally different
arguments, as an alternative of crammed into an enormous lengthy string underneath
--data-raw
.
Working the HTTPie command above reveals us the headers in a pleasant format
by default, together with (once more) the cookies:
HTTP/1.1 302 Discovered
Entry-Management-Permit-Credentials: true
Entry-Management-Permit-Strategies: OPTIONS
Entry-Management-Permit-Origin: https://www.messenger.com
Entry-Management-Expose-Headers: X-FB-Debug, X-Loader-Size
Alt-Svc: h3=":443"; ma=3600, h3-29=":443"; ma=3600
Cache-Management: personal, no-cache, no-store, must-revalidate
Connection: keep-alive
Content material-Size: 0
Content material-Sort: textual content/html; charset="utf-8"
Date: Solar, 21 Nov 2021 20:56:53 GMT
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Location: https://www.messenger.com/
Pragma: no-cache
Precedence: u=3,i
Set-Cookie: sb=lLKaYbQYjhQ1r-tdd337y6b6; expires=Tue, 21-Nov-2023 20:56:52 GMT; Max-Age=63072000; path=/; area=.messenger.com; safe; httponly; SameSite=None
Set-Cookie: c_user=100075402451059; expires=Mon, 21-Nov-2022 20:56:50 GMT; Max-Age=31535998; path=/; area=.messenger.com; safe; SameSite=None
Set-Cookie: xs=36percent3AxbGwJByz_Zpfagpercent3A2percent3A1637528212percent3A-1percent3A-1; expires=Mon, 21-Nov-2022 20:56:50 GMT; Max-Age=31535998; path=/; area=.messenger.com; safe; httponly; SameSite=None
Strict-Transport-Safety: max-age=15552000; preload; includeSubDomains
Fluctuate: Origin
X-Content material-Sort-Choices: nosniff
X-FB-Debug: vR9/wct/iva6TWZRO48tsnEYT1xrMyIErMwNH0P47uFA65WrEtUiMR38CY6p8NLdT2aIh1nXbSszogNuHE6Bng==
X-Body-Choices: DENY
X-XSS-Safety: 0
content-security-policy: default-src knowledge: blob: https://*.fbcdn.web https://*.fb.com *.fbsbx.com *.messenger.com;script-src *.fb.com *.fbcdn.web *.fb.web *.google-analytics.com *.google.com 127.0.0.1:* 'unsafe-inline' 'unsafe-eval' blob: knowledge: 'self' join.fb.web *.messenger.com;style-src knowledge: blob: 'unsafe-inline' *.fb.com *.fbcdn.web *.messenger.com;connect-src *.fb.com fb.com *.fbcdn.web *.fb.web wss://*.fb.com:* wss://*.whatsapp.com:* attachment.fbsbx.com ws://localhost:* blob: *.cdninstagram.com 'self' *.messenger.com wss://*.messenger.com www.messenger.com www.google-analytics.com wss://*.messenger.com:*;font-src *.messenger.com *.fb.com https://*.fbcdn.web knowledge: *.gstatic.com;img-src *.fbcdn.web https://*.fb.com cdninstagram.com *.cdninstagram.com *.tenor.co *.tenor.com *.giphy.com knowledge: *.fbsbx.com *.messenger.com messenger.com blob: android-webview-video-poster: *.xx.fbcdn.web https://messenger.com;media-src *.messenger.com *.fb.com https://*.fbcdn.web knowledge: *.fbsbx.com *.fbcdn.web *.cdninstagram.com https://*.giphy.com blob:;frame-src *.messenger.com *.fb.com https://*.fbcdn.web knowledge: *.fbsbx.com *.fbcdn.web *.cdninstagram.com blob: *.doubleclick.web;
content-security-policy-report-only: default-src https: knowledge: wss: blob: chrome-extension: 'unsafe-inline' 'unsafe-eval';block-all-mixed-content;report-uri https://www.fb.com/csp/reporting/?reduce=0;
cross-origin-embedder-policy-report-only: require-corp;report-to="coep_report"
cross-origin-opener-policy: same-origin-allow-popups
cross-origin-resource-policy: same-origin
document-policy: force-load-at-top
report-to: {"max_age":86400,"endpoints":[{"url":"https://www.facebook.com/browser_reporting/?minimize=0"}],"group":"coep_report"}
x-fb-rlafr: 0
Discover that the response says 302 Discovered
, and features a Location: https://www.messenger.com
. This instructs the browser (after setting
the related cookies) to redirect the person to
https://www.messenger.com
, the place you now will see your
conversations. (Read more about HTTP response
codes.)
Simplify the login request
That HTTP request has lots of parameters in it! Browsers ship loads
of headers by default, and web sites will often add on a bunch extra
for good measure, however often many of the headers (and even kind
parameters) are unneeded.
As soon as we’ve a working request in HTTPie, we will strip out parameters
separately to see which of them are literally required. For instance, if
we alter the password, it stops working:
% http -f https://www.messenger.com/login/password/
Authority:www.messenger.com
Pragma:no-cache
Cache-Management:no-cache
Sec-Ch-Ua:'"Chromium";v="94", "Google Chrome";v="94", ";Not A Model";v="99"'
Sec-Ch-Ua-Cell:'?0'
Sec-Ch-Ua-Platform:Linux
Origin:https://www.messenger.com
Improve-Insecure-Requests:1
Dnt:1
Content material-Sort:utility/x-www-form-urlencoded
Consumer-Agent:'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36'
Settle for:'textual content/html, utility/xhtml+xml, utility/xml;q=0.9, picture/avif, picture/webp, picture/apng, */*;q=0.8, utility/signed-exchange;v=b3;q=0.9'
Sec-Fetch-Website:same-origin
Sec-Fetch-Mode:navigate
Sec-Fetch-Consumer:'?1'
Sec-Fetch-Dest:doc
Referer:https://www.messenger.com/
Settle for-Language:'en-US, en;q=0.9'
Cookie:'wd=1010x980; dpr=2; datr=UqKaYf_W73hoTmwXhi8ZqzZ4'
jazoest=2913
lsd=AVrs5S09Cjw
initial_request_id=APeMI6-a6r5592s5ETA6Zr5
timezone=480
lgndim=eyJ3IjoxOTIwLCJoIjoxMDgwLCJhdyI6MTkyMCwiYWgiOjEwNTMsImMiOjI0fQ==
lgnrnd=114743_C4xH
lgnjs=n
[email protected]
cross=thisiswrong
login=1
persistent=1
HTTP/1.1 200 OK
Entry-Management-Permit-Credentials: true
Entry-Management-Permit-Strategies: OPTIONS
Entry-Management-Permit-Origin: https://www.messenger.com
Entry-Management-Expose-Headers: X-FB-Debug, X-Loader-Size
Alt-Svc: h3=":443"; ma=3600, h3-29=":443"; ma=3600
Cache-Management: personal, no-cache, no-store, must-revalidate
Connection: keep-alive
Content material-Encoding: gzip
Content material-Sort: textual content/html; charset="utf-8"
Date: Solar, 21 Nov 2021 22:33:31 GMT
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Pragma: no-cache
Precedence: u=3,i
Set-Cookie: sb=O8maYf6WN7nBh2pODWFo9bTa; expires=Tue, 21-Nov-2023 22:33:31 GMT; Max-Age=63072000; path=/; area=.messenger.com; safe; httponly; SameSite=None
Strict-Transport-Safety: max-age=15552000; preload; includeSubDomains
Switch-Encoding: chunked
Fluctuate: Origin
Fluctuate: Settle for-Encoding
X-Content material-Sort-Choices: nosniff
X-FB-Debug: 1JwQc9JQPTRqGrUzIfY/OED6es6VSkBWrdeTj8XQ3BOF6nbEtDsuPSsQ52lfVuYvS/8Xz1BNlQadiGzFgGzisQ==
X-Body-Choices: DENY
X-XSS-Safety: 0
content-security-policy: default-src knowledge: blob: https://*.fbcdn.web https://*.fb.com *.fbsbx.com *.messenger.com;script-src *.fb.com *.fbcdn.web *.fb.web *.google-analytics.com *.google.com 127.0.0.1:* 'unsafe-inline' 'unsafe-eval' blob: knowledge: 'self' join.fb.web *.messenger.com;style-src knowledge: blob: 'unsafe-inline' *.fb.com *.fbcdn.web *.messenger.com;connect-src *.fb.com fb.com *.fbcdn.web *.fb.web wss://*.fb.com:* wss://*.whatsapp.com:* attachment.fbsbx.com ws://localhost:* blob: *.cdninstagram.com 'self' *.messenger.com wss://*.messenger.com www.messenger.com www.google-analytics.com wss://*.messenger.com:*;font-src *.messenger.com *.fb.com https://*.fbcdn.web knowledge: *.gstatic.com;img-src *.fbcdn.web https://*.fb.com cdninstagram.com *.cdninstagram.com *.tenor.co *.tenor.com *.giphy.com knowledge: *.fbsbx.com *.messenger.com messenger.com blob: android-webview-video-poster: *.xx.fbcdn.web https://messenger.com;media-src *.messenger.com *.fb.com https://*.fbcdn.web knowledge: *.fbsbx.com *.fbcdn.web *.cdninstagram.com https://*.giphy.com blob:;frame-src *.messenger.com *.fb.com https://*.fbcdn.web knowledge: *.fbsbx.com *.fbcdn.web *.cdninstagram.com blob: *.doubleclick.web;
content-security-policy-report-only: default-src https: knowledge: wss: blob: chrome-extension: 'unsafe-inline' 'unsafe-eval';block-all-mixed-content;report-uri https://www.fb.com/csp/reporting/?reduce=0;
cross-origin-embedder-policy-report-only: require-corp;report-to="coep_report"
cross-origin-opener-policy: same-origin-allow-popups
cross-origin-resource-policy: same-origin
document-policy: force-load-at-top
report-to: {"max_age":86400,"endpoints":[{"url":"https://www.facebook.com/browser_reporting/?minimize=0"}],"group":"coep_report"}
x-fb-rlafr: 0
<!DOCTYPE html>
<html lang="en" id="fb" class="no_js">
... a bunch of HTML ...
Discover how we now get a 200 OK
response as an alternative of 302 Discovered
, and
the c_user
and xs
cookies aren’t getting set anymore, which suggests
our login try is failing. (We’d count on to see 401 Unauthorized
or 403 Forbidden
as an alternative of 200 OK
for a failed
login try, however servers don’t all the time return essentially the most smart
standing codes.)
If we undergo eradicating every parameter that may be eliminated with out
shedding the cookies, then that is what we find yourself with:
% http -f https://www.messenger.com/login/password/
Cookie:'datr=UqKaYf_W73hoTmwXhi8ZqzZ4'
lsd=AVrs5S09Cjw
initial_request_id=APeMI6-a6r5592s5ETA6Zr5
[email protected]
cross=0aSPlneurgscxzpuEZb9
Monitor down hidden login parameters
We’ve now simplified the login command loads, however what are these
datr
, lsd
, and initial_request_id
values? Normally, when
seeing parameters like these in outgoing requests, there are three
prospects:
- The consumer obtained the worth from the server on a earlier request, and
is simply sending it again. - The consumer is producing the worth from scratch (e.g. based mostly on the
present timestamp, or a random quantity generator). - Some mixture of the 2 (the consumer will get a price from the
server after which modifies it not directly earlier than sending it again).
A number of reverse engineering is making educated guesses and seeing if
they pan out. Let’s make a guess that case (1) is what’s occurring
right here. This appears particularly doubtless as a result of initial_request_id
sounds
like it’s referring to a earlier request. There’s actually just one
vital request that occurs earlier than the login request, which is
the preliminary HTML web page containing the login kind. So, a pure place
to begin is to try that web page to see if it has something that
appears to be like related.
To take action, we will open a brand new personal searching window and cargo up
Messenger once more. Then, we will go to the Sources tab to see the HTML
that’s getting used to show the present web page. (You would possibly must
reload the web page in the event you loaded it earlier than opening the developer instruments.)
Happily, Chrome (as with different browsers) has a neat characteristic the place
they’ll reformat code within the Sources tab to be simpler to learn. (If
there isn’t a popup telling you about it, it’s the button labeled {}
within the lower-left of the textual content pane.)
Within the reformatted HTML, if we seek for initial_request_id
, verify
out what we discover:
Not solely is there a suspicious-looking worth for initial_request_id
included within the HTML (AovnGK3QdvNgThxCxYXmSDz
), however just a bit
above it’s a worth for lsd
(AVpERbgkGxw
)! If we refresh the web page,
we’ll discover that these values change each time. Because the values are
included as <enter>
tags within the <kind>
for the login button,
they’ll robotically get submitted together with the e-mail deal with and
password, identical to we noticed within the Community tab earlier.
Elsewhere within the web page is a price for datr
(a26cYYDoj0oHu9oca8jmB8W6
):
If we need to verify these values are used the way in which we expect, we will
click on the login button and see that the values within the POST request
match up as anticipated:
There’s a bit of gotcha that I bumped into the primary time I investigated
this. When you reload the web page earlier than logging in, mysteriously the
datr
worth goes away from the HTML! Correspondingly, by switching to
the Utility tab and choosing https://www.messenger.com
underneath
Cookies, you may see that there aren’t any cookies the primary time we load
the web page:
However then after we reload… immediately, cookies!
That is notably difficult since you wouldn’t count on the HTML
response to magically change simply by reloading the web page. If I needed to
guess, there’s some JavaScript on the frontend that detects when
you’re about to depart the web page (e.g. by reloading), and units the
datr
cookie. Then, that cookie worth is robotically included in
the brand new request, which causes the server to change the HTML to no
longer embody a js_datr
worth for some motive.
Parse the HTML response
Okay, now we all know find out how to log in to Messenger:
- Fetch the HTML web page at
https://www.messenger.com
- Extract values for
initial_request_id
,lsd
, anddatr
from
numerous locations within the HTML - Make a POST request to
https://www.messenger.com/login/password/
with these values alongside the e-mail deal with and password - Extract the
xs
,sb
, andc_user
cookies from the response
headers
Let’s lastly get again to Messyger and implement these steps. We’ll
use the Requests
library to simplify our HTTP requests.
Step 1 is pretty straightforward with Requests. We make a request, verify that
there was no error, after which get the HTML textual content:
import requests
html_resp = requests.get("https://www.messenger.com")
html_resp.raise_for_status()
html_page = html_resp.textual content
print(html_page)
For step 2, we’ll need to begin by wanting within the uncooked HTML that we
simply printed to see the place the values are positioned that we need to
extract. Let’s begin with initial_request_id
:
<enter kind="hidden" autocomplete="off" id="initial_request_id" title="initial_request_id" worth="AS49ZKW_DYimevm1SD-qQ9Q" />
The half that we actually care about is
worth="AS49ZKW_DYimevm1SD-qQ9Q"
. Nevertheless, the textual content worth=
reveals up
lots of instances within the HTML, so we additionally want the previous
title="initial_request_id"
to make sure we’re wanting on the proper
worth.
Now that we all know what we’re searching for (i.e. one thing like
title="initial_request_id" worth="AS49ZKW_DYimevm1SD-qQ9Q"
), we will
write a regular expression to
seek for this sample. That appears like this:
import re
initial_request_id = re.search(
r'title="initial_request_id" worth="([^"]+)"',
html_page
).group(1)
print(initial_request_id)
(Read more about the Python re
module.)
On this common expression, [^"]
stands for any character aside from
a double quote, +
means a number of, and the parentheses create a
sub-expression whose worth will be returned by calling the .group()
technique.
The lsd
parameter happens in HTML that appears like this:
<enter kind="hidden" title="lsd" worth="AVosEZyGrXU" autocomplete="off" />
We are able to write an identical common expression to extract its worth:
lsd = re.search(
r'title="lsd" worth="([^"]+)"',
html_page
).group(1)
print(lsd)
The datr
parameter appears to be like a bit totally different:
["_js_datr","-nacYQnDcFLPM5Sc66w7KQKG",63072000000,"/",true]
Nevertheless, it’s not an excessive amount of harder to extract it:
datr = re.search(
r'"_js_datr","([^"]+)"',
html_page
).group(1)
print(datr)
Make the login request
Now that we’ve all the mandatory parameters, we will use the Requests
module to really carry out the login request to Messenger. As a
reminder, the login request utilizing HTTPie appears to be like like this:
% http -f https://www.messenger.com/login/password/
Cookie:'datr=UqKaYf_W73hoTmwXhi8ZqzZ4'
lsd=AVrs5S09Cjw
initial_request_id=APeMI6-a6r5592s5ETA6Zr5
[email protected]
cross=0aSPlneurgscxzpuEZb9
We are able to replicate it in Python like so:
login = requests.put up(
"https://www.messenger.com/login/password/",
cookies={"datr": datr},
knowledge={
"lsd": lsd,
"initial_request_id": initial_request_id,
"e-mail": args.e-mail,
"cross": args.password
},
allow_redirects=False # don't observe 302
)
assert login.status_code == 302
print(login.cookies)
(The .cookies
property in a Requests response comprises the values
that have been set by the Set-Cookie
headers within the response.)
And right here’s what we get from operating the entire script to this point:
% python3 messyger.py -u [email protected] -p 0aSPlneurgscxzpuEZb9
{'c_user': '100075402451059', 'sb': 'yXmcYTct5V-EAvEuXfPrArCj', 'xs': '49percent3AZuqoxpqqnfwF_Apercent3A2percent3A1637644745percent3A-1percent3A-1'}
Success!
Discover the inbox request
Now that we’ve logged in, we must always have the ability to fetch knowledge about our
Messenger account. We’ll begin by attempting to get the data proven
within the left-hand aspect of the Messenger interface: your checklist of
conversations.
Since Messenger is a extremely interactive utility, it’s fairly
unlikely for it to ship that info encoded as uncooked HTML (though
we might verify by looking for a string like Hello Camilla
within the HTML
response). Moderately, it’s extra doubtless that the data shall be
fetched asynchronously through JavaScript. For numerous historic causes,
that is known as an XHR request, the place XHR stands for XMLHttpRequest
regardless of having nothing to do with XML. (Read more about the different
types of asynchronous requests in
JavaScript.)
We are able to filter for XHR requests within the Community tab:
Nevertheless, there are a bunch of them, so it could be a ache to take a look at
every one and take a look at to determine if it has the info we’re searching for.
To take care of this downside, we will obtain all of the request and
response knowledge as an HTTP Archive (HAR), in order that we will search via
all of them on the identical time:
The HAR format is definitely simply JSON, so we will use a device like
jq to look via it. First we’ll
verify the checklist of all of the URLs that the browser made requests to:
% cat inbox-requests.har | jq '.log.entries | map(.request.url)'
[
"https://www.messenger.com/login/password/",
"https://www.messenger.com/",
"https://www.messenger.com/t/100007424414992/",
... a bunch more URLs ...
"https://www.messenger.com/ajax/bnzai?__a=1&__ccg=EXCELLENT&__comet_req=1&__hs=18957.HYP%3Amessengerdotcom_comet_pkg.2.1.0.0.&__hsi=0-0&__jssesw=1&__req=g&__rev=1004771992&__s=lryd0q%3Amohw6t%3Axnxaps&__spin_b=trunk&__spin_r=1004771992&__spin_t=1637886944&__user=100075402451059&dpr=2&fb_dtsg=AQE4bjFlv-4P3Xs%3A50%3A1637886942&jazoest=21949&lsd=M237eS5ouvAFHBqYl3StT7&ph=C3",
"https://www.messenger.com/ajax/webstorage/process_keys/?state=0",
"https://www.messenger.com/ajax/webstorage/process_keys/?state=0"
]
(Read more about how to use
jq.)
The jq command above is equal to the next Python code, and
you may do it this manner too, if it feels extra comfy:
import json
with open("inbox-requests.har") as f:
requests = json.load(f)
urls = []
for entry in requests["log"]["entries"]:
urls.append(entry["request"]["url"])
print(json.dumps(urls, indent=2))
(Read more about the Python json
module.)
Nevertheless, I like jq as a result of when you discover ways to use it, it’s loads
quicker than writing Python or looking via JSON by hand.
Now, relatively than printing each URL, let’s print solely those whose
responses contained the string Hello Camilla
. This could permit us to
determine which of them have the info that reveals up within the sidebar:
% cat inbox-requests.har
| jq '.log.entries
| map(choose(.response.content material.textual content | .?
| comprises("Hello Camilla"))
| .request.url)'
[
"https://www.messenger.com/api/graphql/"
]
Or equivalently:
urls = []
for entry in requests["log"]["entries"]:
strive:
if "Hello Camilla" in entry["response"]["content"]["text"]:
urls.append(entry["request"]["url"])
besides KeyError: # ignoring lacking keys is ".?" in jq
proceed
print(json.dumps(urls, indent=2))
Nice! There’s a request to https://www.messenger.com/api/graphql/
that returns the info we would like. Sadly, the HAR obtain doesn’t
have a handy “Copy as cURL” choice, so we’ll return to the
browser to do this.
The principle difficult bit is there are literally a bunch of various
requests to this identical endpoint, and we have to know which one to repeat.
I’m certain there are many sensible methods to get round this, however I simply
did it the simple manner by printing out the requests that got here
earlier than and after it, so I might match them up visually.
% cat inbox-requests.har
| jq '.log.entries
| map( .?
)'
... bunch of requests ...
{
"url": "https://static.xx.fbcdn.web/rsrc.php/v3/yy/r/DeeNYB34aTG.js?_nc_x=0OMkmbJTxss",
"hasMyData": false
},
{
"url": "https://static.xx.fbcdn.web/rsrc.php/ym/r/YQbyhl59TWY.ico",
"hasMyData": false
},
{
"url": "https://www.messenger.com/api/graphql/",
"hasMyData": true
},
{
"url": "https://www.messenger.com/api/graphql/",
"hasMyData": false
},
{
"url": "https://www.messenger.com/ajax/bootloader-endpoint/?modules=TransportSelectingClientSingletonpercent2CRequestStreamCommonRequestStreamCommonTypes&__user=100075402451059&__a=1&__dyn=7AzHJ16U9ob8ng569yaxG4VuC0BVU98nwgU7SbGbwSwAyUcoeU5W2Sawba1DwUx60GE3Qwb-q7oc81xoswMwto886C1nzUO0n2US2G3i0Boy1PwBgK7o6C0Mo5W3S1lwlE-Uqw8y4UaEW0D8qBwJK5Umxm5o7GmdUlwhEe88o5i7-2K0_UbpEbUGdG0HE5d0&__csr=gacABdkJnqAlZjhsGiaCOR5PrKBrfh7KJd9qzbl5iKQJlQqQ_K8HBl6HJCzayXDyqiBHw75w5Iw3M40ju0578b81v81DFFQ9Ew0z-0MUeo4O0w9E1589ro3ew5TyU-3Sq0FFEymS2B0to2Lw1c2bw2t85W0B80jfw3ZU0wa0mq0vO0hi08uw8Grw3WE0we0hG054o4Yw4qh4xKex11WE2SWw3Eo0Pi0Yk0v-0WU0BG0fIGiq0mp1Slock2uey9d9wAwl8O19gtwhUx0Dwywj8W3-7Gjzp87Op2r80OpXz8qwhoC22l4xu448U4-4Uuwd279FU6-1owe62qywg8S1ew3dU4a5U3Awaa16xS0CQ9xO5t1O7XgoxSuE622e0F83ww6-wNwposw7uwae1oy9cBe0lu1zAG2e6o7S1gwWwZwn8aUoxeimFrxR6w5GwNyo0xO8w6zxu1awj8kw45wXw4im0wU2sxi3ulrw9-U1HUWuq514R0bi0ru581UA2m0d5woE2nwrpE1qAby4-iAldi2qm0hS16ge85G3K2q0nml06Vw8p02Pu2S0t20boway&__req=3&__hs=18957.HYPpercent3Amessengerdotcom_comet_pkg.2.1.0.0.&dpr=2&__ccg=EXCELLENT&__rev=1004771992&__s=lryd0qpercent3Amohw6tpercent3Axnxaps&__hsi=0-0&__comet_req=1&fb_dtsg_ag=AQyUiaFmn4dTZRTOD6xvQejSbiWCOkwF_hArsm6-mgIByBR7percent3A50percent3A1637886942&jazoest=25004&__spin_r=1004771992&__spin_b=trunk&__spin_t=1637886944&__jssesw=1",
"hasMyData": false
},
... bunch of requests ...
Then it was only a matter of discovering the corresponding place within the
browser Community tab (observe that the Community tab solely reveals the final bit
of every URL, after the final or second-to-last slash):
That offers us this:
% curl 'https://www.messenger.com/api/graphql/'
-H 'authority: www.messenger.com'
-H 'pragma: no-cache'
-H 'cache-control: no-cache'
-H 'sec-ch-ua: "Chromium";v="94", "Google Chrome";v="94", ";Not A Model";v="99"'
-H 'dnt: 1'
-H 'sec-ch-ua-mobile: ?0'
-H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36'
-H 'x-fb-friendly-name: LSPlatformGraphQLLightspeedRequestQuery'
-H 'x-fb-lsd: M237eS5ouvAFHBqYl3StT7'
-H 'content-type: utility/x-www-form-urlencoded'
-H 'sec-ch-ua-platform: "Linux"'
-H 'settle for: */*'
-H 'origin: https://www.messenger.com'
-H 'sec-fetch-site: same-origin'
-H 'sec-fetch-mode: cors'
-H 'sec-fetch-dest: empty'
-H 'referer: https://www.messenger.com/t/100007424414992/'
-H 'accept-language: en-US,en;q=0.9'
-H 'cookie: wd=1074x980; dpr=2; datr=0SugYWovp6j2RMqGVQqOqQwr; sb=3iugYaVtLi-qyDF0VndcCAKs; c_user=100075402451059; xs=50percent3Aq86l0PoxUG0qewpercent3A2percent3A1637886942percent3A-1percent3A-1'
--data-raw 'av=100075402451059&__user=100075402451059&__a=1&__dyn=7AzHJ16U9ob8ng569yaxG4VuC0BVU98nwgU7SbGbwSwAyUcoeU5W2Sawba1DwUx60GE3Qwb-q7oc81xoswMwto886C1nzUO0n2US2G3i0Boy1PwBgK7o6C0Mo5W3S1lwlE-Uqw8y4UaEW0D8qBwJK5Umxm5o7GmdUlwhEe88o5i7-2K0_UbpEbUGdG0HE5d0&__csr=gacABdkJnqAlZjhsGiaCOR5PrKBrfh7KJd9qzbl5iKQJlQqQ_K8HBl6HJCzayXDyqiBHw75w5Iw3M40ju0578b81v81DFFQ9Ew0z-0MUeo4O0w9E1589ro3ew5TyU-3Sq0FFEymS2B0to2Lw1c2bw2t85W0B80jfw3ZU0wa0mq0vO0hi08uw8Grw3WE0we0hG054o4Yw4qh4xKex11WE2SWw3Eo0Pi0Yk0v-0WU0BG0fIGiq0mp1Slock2uey9d9wAwl8O19gtwhUx0Dwywj8W3-7Gjzp87Op2r80OpXz8qwhoC22l4xu448U4-4Uuwd279FU6-1owe62qywg8S1ew3dU4a5U3Awaa16xS0CQ9xO5t1O7XgoxSuE622e0F83ww6-wNwposw7uwae1oy9cBe0lu1zAG2e6o7S1gwWwZwn8aUoxeimFrxR6w5GwNyo0xO8w6zxu1awj8kw45wXw4im0wU2sxi3ulrw9-U1HUWuq514R0bi0ru581UA2m0d5woE2nwrpE1qAby4-iAldi2qm0hS16ge85G3K2q0nml06Vw8p02Pu2S0t20boway&__req=1&__hs=18957.HYPpercent3Amessengerdotcom_comet_pkg.2.1.0.0.&dpr=2&__ccg=EXCELLENT&__rev=1004771992&__s=lryd0qpercent3Amohw6tpercent3Axnxaps&__hsi=0-0&__comet_req=1&fb_dtsg=AQE4bjFlv-4P3Xspercent3A50percent3A1637886942&jazoest=21949&lsd=M237eS5ouvAFHBqYl3StT7&__spin_r=1004771992&__spin_b=trunk&__spin_t=1637886944&__jssesw=1&fb_api_caller_class=RelayModern&fb_api_req_friendly_name=LSPlatformGraphQLLightspeedRequestQuery&variables=%7Bpercent22deviceIdpercent22percent3Apercent226a9252cb-2145-4f81-9d69-1834b84ba614percent22percent2Cpercent22requestIdpercent22percent3A0percent2Cpercent22requestPayloadpercent22percent3Apercent22percent7Bpercent5Cpercent22databasepercent5Cpercent22percent3A1percent2Cpercent5Cpercent22versionpercent5Cpercent22percent3A4680497022042598percent2Cpercent5Cpercent22sync_paramspercent5Cpercent22percent3Apercent5Cpercent22percent7Bpercent5Cpercent5Cpercent5Cpercent22scalepercent5Cpercent5Cpercent5Cpercent22percent3A1percent2Cpercent5Cpercent5Cpercent5Cpercent22preview_heightpercent5Cpercent5Cpercent5Cpercent22percent3A200percent2Cpercent5Cpercent5Cpercent5Cpercent22preview_widthpercent5Cpercent5Cpercent5Cpercent22percent3A150percent2Cpercent5Cpercent5Cpercent5Cpercent22preview_height_largepercent5Cpercent5Cpercent5Cpercent22percent3A400percent2Cpercent5Cpercent5Cpercent5Cpercent22preview_width_largepercent5Cpercent5Cpercent5Cpercent22percent3A300percent2Cpercent5Cpercent5Cpercent5Cpercent22full_heightpercent5Cpercent5Cpercent5Cpercent22percent3A200percent2Cpercent5Cpercent5Cpercent5Cpercent22snapshot_num_threads_per_pagepercent5Cpercent5Cpercent5Cpercent22percent3A15percent2Cpercent5Cpercent5Cpercent5Cpercent22localepercent5Cpercent5Cpercent5Cpercent22percent3Apercent5Cpercent5Cpercent5Cpercent22en_USpercent5Cpercent5Cpercent5Cpercent22percent7Dpercent5Cpercent22percent2Cpercent5Cpercent22epoch_idpercent5Cpercent22percent3A0percent2Cpercent5Cpercent22last_applied_cursorpercent5Cpercent22percent3Anullpercent7Dpercent22percent2Cpercent22requestTypepercent22percent3A1percent7D&server_timestamps=true&doc_id=4476599072415612'
--compressed
If we run that cURL command and search within the output (for instance, by
placing | grep -o "Hello Camilla"
on the top; read more about grep
options), we will see that the
response does certainly have the info we’re searching for.
Make the inbox request extra readable
Though the cURL command works, that --data-raw
parameter is
completely disgusting, so let’s convert it to HTTPie syntax utilizing
CurliPie:
% http -f https://www.messenger.com/api/graphql/
Authority:www.messenger.com
Pragma:no-cache
Cache-Management:no-cache
Sec-Ch-Ua:'"Chromium";v="94", "Google Chrome";v="94", ";Not A Model";v="99"'
Dnt:1
Sec-Ch-Ua-Cell:'?0'
Consumer-Agent:'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36'
X-Fb-Pleasant-Identify:LSPlatformGraphQLLightspeedRequestQuery
X-Fb-Lsd:M237eS5ouvAFHBqYl3StT7
Content material-Sort:utility/x-www-form-urlencoded
Sec-Ch-Ua-Platform:Linux
Settle for:'*/*'
Origin:https://www.messenger.com
Sec-Fetch-Website:same-origin
Sec-Fetch-Mode:cors
Sec-Fetch-Dest:empty
Referer:https://www.messenger.com/t/100007424414992/
Settle for-Language:'en-US, en;q=0.9'
Cookie:'wd=1074x980; dpr=2; datr=0SugYWovp6j2RMqGVQqOqQwr; sb=3iugYaVtLi-qyDF0VndcCAKs; c_user=100075402451059; xs=50percent3Aq86l0PoxUG0qewpercent3A2percent3A1637886942percent3A-1percent3A-1'
av=100075402451059
__user=100075402451059
__a=1
__dyn=7AzHJ16U9ob8ng569yaxG4VuC0BVU98nwgU7SbGbwSwAyUcoeU5W2Sawba1DwUx60GE3Qwb-q7oc81xoswMwto886C1nzUO0n2US2G3i0Boy1PwBgK7o6C0Mo5W3S1lwlE-Uqw8y4UaEW0D8qBwJK5Umxm5o7GmdUlwhEe88o5i7-2K0_UbpEbUGdG0HE5d0
__csr=gacABdkJnqAlZjhsGiaCOR5PrKBrfh7KJd9qzbl5iKQJlQqQ_K8HBl6HJCzayXDyqiBHw75w5Iw3M40ju0578b81v81DFFQ9Ew0z-0MUeo4O0w9E1589ro3ew5TyU-3Sq0FFEymS2B0to2Lw1c2bw2t85W0B80jfw3ZU0wa0mq0vO0hi08uw8Grw3WE0we0hG054o4Yw4qh4xKex11WE2SWw3Eo0Pi0Yk0v-0WU0BG0fIGiq0mp1Slock2uey9d9wAwl8O19gtwhUx0Dwywj8W3-7Gjzp87Op2r80OpXz8qwhoC22l4xu448U4-4Uuwd279FU6-1owe62qywg8S1ew3dU4a5U3Awaa16xS0CQ9xO5t1O7XgoxSuE622e0F83ww6-wNwposw7uwae1oy9cBe0lu1zAG2e6o7S1gwWwZwn8aUoxeimFrxR6w5GwNyo0xO8w6zxu1awj8kw45wXw4im0wU2sxi3ulrw9-U1HUWuq514R0bi0ru581UA2m0d5woE2nwrpE1qAby4-iAldi2qm0hS16ge85G3K2q0nml06Vw8p02Pu2S0t20boway
__req=1
__hs=18957.HYP:messengerdotcom_comet_pkg.2.1.0.0.
dpr=2
__ccg=EXCELLENT
__rev=1004771992
__s=lryd0q:mohw6t:xnxaps
__hsi=0-0
__comet_req=1
fb_dtsg=AQE4bjFlv-4P3Xs:50:1637886942
jazoest=21949
lsd=M237eS5ouvAFHBqYl3StT7
__spin_r=1004771992
__spin_b=trunk
__spin_t=1637886944
__jssesw=1
fb_api_caller_class=RelayModern
fb_api_req_friendly_name=LSPlatformGraphQLLightspeedRequestQuery
variables="{"deviceId":"6a9252cb-2145-4f81-9d69-1834b84ba614","requestId":0,"requestPayload":"{"database":1,"model":4680497022042598,"sync_params":"{"scale":1,"preview_height":200,"preview_width":150,"preview_height_large":400,"preview_width_large":300,"full_height":200,"snapshot_num_threads_per_page":15,"locale":"en_US"}","epoch_id":0,"last_applied_cursor":null}","requestType":1}"
server_timestamps=true
doc_id=4476599072415612
Now if we run this command, we’ll really see that it produces a
totally different consequence than the cURL one, which shouldn’t occur! Particularly:
{
"knowledge": {
"viewer": {
"lightspeed_web_request": null
}
},
"errors": [
{
"message": "A server error missing_required_variable_value occured. Check server logs for details.",
"severity": "WARNING"
},
... a bunch more scary text ...
Well, we can’t check the server logs, but presumably something about
the conversion from cURL to HTTPie messed up the request. How can we
debug things when something like this happens?
Well, one way is to use a service like httpbin
to check what requests your tools are actually sending out to the
internet. Here’s an example of sending ostensibly the same request to
httpbin using cURL and HTTPie, and seeing that the two requests were
actually not quite identical (e.g., the User-Agent
header was
different):
% curl https://httpbin.org/post
-H 'example-header: foobar'
--data-raw 'param1=baz¶m2=quux'
{
"args": {},
"data": "",
"files": {},
"form": {
"param1": "baz",
"param2": "quux"
},
"headers": {
"Accept": "*/*",
"Content-Length": "22",
"Content-Type": "application/x-www-form-urlencoded",
"Example-Header": "foobar",
"Host": "httpbin.org",
"User-Agent": "curl/7.74.0",
"X-Amzn-Trace-Id": "Root=1-61a036e4-21a4901c0f5a2bc9091cab5a"
},
"json": null,
"origin": "67.180.179.80",
"url": "https://httpbin.org/post"
}
% http -f https://httpbin.org/post
example-header:foobar
param1=baz
param2=quux
{
"args": {},
"data": "",
"files": {},
"form": {
"param1": "baz",
"param2": "quux"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "22",
"Content-Type": "application/x-www-form-urlencoded; charset=utf-8",
"Example-Header": "foobar",
"Host": "httpbin.org",
"User-Agent": "HTTPie/2.2.0",
"X-Amzn-Trace-Id": "Root=1-61a03726-478a4c12219627695c0d62e9"
},
"json": null,
"origin": "67.180.179.80",
"url": "https://httpbin.org/post"
}
By replacing https://www.messenger.com/api/graphql/
with
https://httpbin.org/post
in our cURL and HTTPie commands above, then
carefully comparing the output (maybe with the aid of a command like
git diff --no-index
to highlight differences between two files;
read more about git diff --no-index
), we can find out
that HTTPie is doing something peculiar to the backslashes in the
variables=
argument. Here’s a simpler example to show the behavior:
% curl -s https://httpbin.org/post
--data-urlencode backslashes="\"
| jq .form.backslashes -r
\
% http -f https://httpbin.org/post
backslashes="\"
| jq .form.backslashes -r
(Using -r
tells jq to print the value as a raw string, instead of as
a JSON string with quotes. Using --data-urlencode
instead of
--data-raw
means we don’t have to worry about URL
encoding ourselves.
Using -s
prevents cURL from printing out a progress bar.)
With cURL, it’s four backslashes in, four backslashes out. But with
HTTPie, it’s four backslashes in, only two backslashes out! Why?
Well, if we Google, we end up finding this GitHub
issue
that mentions HTTPie allows the use of backslash-escaping in form
parameters. This feature has the implication that if you actually
want to include a backslash in your form parameters, you need to
double it (use two backslashes instead of one). Indeed:
% http -f https://httpbin.org/post
backslashes="\\"
| jq .form.backslashes -r
\
So, if we double every backslash in the request, we end up with a
working HTTPie command line to fetch our inbox data:
% http -f https://www.messenger.com/api/graphql/
Authority:www.messenger.com
Pragma:no-cache
Cache-Control:no-cache
Sec-Ch-Ua:'"Chromium";v="94", "Google Chrome";v="94", ";Not A Brand";v="99"'
Dnt:1
Sec-Ch-Ua-Mobile:'?0'
User-Agent:'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36'
X-Fb-Friendly-Name:LSPlatformGraphQLLightspeedRequestQuery
X-Fb-Lsd:M237eS5ouvAFHBqYl3StT7
Content-Type:application/x-www-form-urlencoded
Sec-Ch-Ua-Platform:Linux
Accept:'*/*'
Origin:https://www.messenger.com
Sec-Fetch-Site:same-origin
Sec-Fetch-Mode:cors
Sec-Fetch-Dest:empty
Referer:https://www.messenger.com/t/100007424414992/
Accept-Language:'en-US, en;q=0.9'
Cookie:'wd=1074x980; dpr=2; datr=0SugYWovp6j2RMqGVQqOqQwr; sb=3iugYaVtLi-qyDF0VndcCAKs; c_user=100075402451059; xs=50%3Aq86l0PoxUG0qew%3A2%3A1637886942%3A-1%3A-1'
av=100075402451059
__user=100075402451059
__a=1
__dyn=7AzHJ16U9ob8ng569yaxG4VuC0BVU98nwgU7SbGbwSwAyUcoeU5W2Sawba1DwUx60GE3Qwb-q7oc81xoswMwto886C1nzUO0n2US2G3i0Boy1PwBgK7o6C0Mo5W3S1lwlE-Uqw8y4UaEW0D8qBwJK5Umxm5o7GmdUlwhEe88o5i7-2K0_UbpEbUGdG0HE5d0
__csr=gacABdkJnqAlZjhsGiaCOR5PrKBrfh7KJd9qzbl5iKQJlQqQ_K8HBl6HJCzayXDyqiBHw75w5Iw3M40ju0578b81v81DFFQ9Ew0z-0MUeo4O0w9E1589ro3ew5TyU-3Sq0FFEymS2B0to2Lw1c2bw2t85W0B80jfw3ZU0wa0mq0vO0hi08uw8Grw3WE0we0hG054o4Yw4qh4xKex11WE2SWw3Eo0Pi0Yk0v-0WU0BG0fIGiq0mp1Slock2uey9d9wAwl8O19gtwhUx0Dwywj8W3-7Gjzp87Op2r80OpXz8qwhoC22l4xu448U4-4Uuwd279FU6-1owe62qywg8S1ew3dU4a5U3Awaa16xS0CQ9xO5t1O7XgoxSuE622e0F83ww6-wNwposw7uwae1oy9cBe0lu1zAG2e6o7S1gwWwZwn8aUoxeimFrxR6w5GwNyo0xO8w6zxu1awj8kw45wXw4im0wU2sxi3ulrw9-U1HUWuq514R0bi0ru581UA2m0d5woE2nwrpE1qAby4-iAldi2qm0hS16ge85G3K2q0nml06Vw8p02Pu2S0t20boway
__req=1
__hs=18957.HYP:messengerdotcom_comet_pkg.2.1.0.0.
dpr=2
__ccg=EXCELLENT
__rev=1004771992
__s=lryd0q:mohw6t:xnxaps
__hsi=0-0
__comet_req=1
fb_dtsg=AQE4bjFlv-4P3Xs:50:1637886942
jazoest=21949
lsd=M237eS5ouvAFHBqYl3StT7
__spin_r=1004771992
__spin_b=trunk
__spin_t=1637886944
__jssesw=1
fb_api_caller_class=RelayModern
fb_api_req_friendly_name=LSPlatformGraphQLLightspeedRequestQuery
variables="{"deviceId":"6a9252cb-2145-4f81-9d69-1834b84ba614","requestId":0,"requestPayload":"{"database":1,"version":4680497022042598,"sync_params":"{\"scale\":1,\"preview_height\":200,\"preview_width\":150,\"preview_height_large\":400,\"preview_width_large\":300,\"full_height\":200,\"snapshot_num_threads_per_page\":15,\"locale\":\"en_US\"}","epoch_id":0,"last_applied_cursor":null}","requestType":1}"
server_timestamps=true
doc_id=4476599072415612
And with this in hand, we can pare out unneeded data and parameters to
arrive at the following minimal request that gets us what we want:
% http -f https://www.messenger.com/api/graphql/
Cookie:'c_user=100075402451059; xs=50%3Aq86l0PoxUG0qew%3A2%3A1637886942%3A-1%3A-1'
fb_dtsg=AQE4bjFlv-4P3Xs:50:1637886942
doc_id=4476599072415612
variables="{"deviceId":"6a9252cb-2145-4f81-9d69-1834b84ba614","requestId":0,"requestPayload":"{"database":1,"version":4680497022042598,"sync_params":"{}"}","requestType":1}"
Lovely!
Find hidden inbox query parameters
So now we have the request we need to make. However, once again we
have a bunch of parameters whose values seem inscrutable:
fb_dtsg
(AQE4bjFlv-4P3Xs:50:1637886942
)doc_id
(4476599072415612
)deviceId
(6a9252cb-2145-4f81-9d69-1834b84ba614
)version
(4680497022042598
)
We’ll start, arbitrarily, with fb_dtsg
. Since we have the whole HAR
in one place, we know we don’t have to worry about the value of the
parameter changing out from under us when we reload the page. So, we
can just search directly for the value of the parameter:
% cat inbox-requests.har
| jq '.log.entries
| map(select(.response.content.text | .?
| contains("AQE4bjFlv-4P3Xs:50:1637886942"))
| .request.url)'
[
"https://www.messenger.com/t/100007424414992/"
]
Apparently, as soon as once more we get parameter values without cost simply by
wanting on the preliminary HTML response! Let’s extract that response from
the HAR and run it via Prettier to make
the HTML extra readable:
% cat inbox-requests.har
| jq '.log.entries
| map(choose(.response.content material.textual content | .?
| comprises("AQE4bjFlv-4P3Xs:50:1637886942")))
| .[0].response.content material.textual content' -r
> inbox.html
% prettier inbox.html > inbox-pretty.html
Looking out via inbox-pretty.html
with our favourite textual content editor,
look what we discover for fb_dtsg
:
[
"DTSGInitialData",
[],
{ token: "AQE4bjFlv-4P3Xs:50:1637886942" },
258,
],
And whereas we’re at it, why not seek for the opposite values too? Turns
on the market are two of them simply mendacity round (deviceId
and
model
):
{
syncScripts: [],
deviceId: "6a9252cb-2145-4f81-9d69-1834b84ba614",
schemaVersion: "4680497022042598",
schemaVersionV2: null,
accountKey: "",
},
That knocks out three parameter values, leaving solely doc_id
. Since
its worth, 4476599072415612
, tragically doesn’t present up within the HTML,
let’s return to the HAR:
% cat inbox-requests.har
| jq '.log.entries
| map(choose(.response.content material.textual content | .?
| comprises("4476599072415612"))
| .request.url)'
[
"https://static.xx.fbcdn.net/rsrc.php/v3iRYC4/y8/l/en_US/d0FzJm8Jr_2GGVI9daGiZfL5MEKqrgHqVIWF0joMj2QgTM4YRSBq4b1LW25pZd7TC-AjrHzCyljakIj8QgziINDKiIZu6XPLjKZJ_v74SDWO8lfwQznT2vHDG_5hUHYYOcBO0v0LGrADQfsxLTta97k6SMq0QFmd6lLAcfPeULHpocMm0pQ6ZiqCb9aFMEaLXT3_o_DtviHOB3GX1Isgz-QZRkiA16JwTjqxIM9tg2HGk3jOqpo4M8-E4se5OLtvP_50qxVk.js?_nc_x=0OMkmbJTxss"
]
Apparently, it’s in a kind of random, inscrutably-named JavaScript
information. Let’s test it out:
% cat inbox-requests.har
| jq '.log.entries
| map(choose(.response.content material.textual content | .?
| comprises("4476599072415612")))
| .[0].response.content material.textual content' -r
> docid.js
% prettier docid.js > docid-pretty.js
Aha, right here it’s, seeming to be a part of the definition of one thing
known as LSPlatformGraphQLLightspeedRequestQuery
:
params: {
id: "4476599072415612",
metadata: {},
title: "LSPlatformGraphQLLightspeedRequestQuery",
operationKind: "question",
textual content: null,
},
However we additionally must know find out how to decide this specific script out of the
many which can be loaded as a part of the web page. Let’s check out the place
this script is referenced within the HTML web page. It appears to be like like this:
<script
src="https://static.xx.fbcdn.web/rsrc.php/v3/y1/r/HRDukpAcyqY.js?_nc_x=0OMkmbJTxss"
data-bootloader-hash="LWwZBAL"
async="1"
crossorigin="nameless"
data-p=":1"
data-c="1"
onload='_btldr["LWwZBAL"]=1'
onerror="_btldr["LWwZBAL"]=1"
nonce="nW7qla6Q"
></script>
<hyperlink
rel="preload"
href="https://static.xx.fbcdn.web/rsrc.php/v3iRYC4/y8/l/en_US/d0FzJm8Jr_2GGVI9daGiZfL5MEKqrgHqVIWF0joMj2QgTM4YRSBq4b1LW25pZd7TC-AjrHzCyljakIj8QgziINDKiIZu6XPLjKZJ_v74SDWO8lfwQznT2vHDG_5hUHYYOcBO0v0LGrADQfsxLTta97k6SMq0QFmd6lLAcfPeULHpocMm0pQ6ZiqCb9aFMEaLXT3_o_DtviHOB3GX1Isgz-QZRkiA16JwTjqxIM9tg2HGk3jOqpo4M8-E4se5OLtvP_50qxVk.js?_nc_x=0OMkmbJTxss"
as="script"
crossorigin="nameless"
nonce="nW7qla6Q"
/>
<script
src="https://static.xx.fbcdn.web/rsrc.php/v3iRYC4/y8/l/en_US/d0FzJm8Jr_2GGVI9daGiZfL5MEKqrgHqVIWF0joMj2QgTM4YRSBq4b1LW25pZd7TC-AjrHzCyljakIj8QgziINDKiIZu6XPLjKZJ_v74SDWO8lfwQznT2vHDG_5hUHYYOcBO0v0LGrADQfsxLTta97k6SMq0QFmd6lLAcfPeULHpocMm0pQ6ZiqCb9aFMEaLXT3_o_DtviHOB3GX1Isgz-QZRkiA16JwTjqxIM9tg2HGk3jOqpo4M8-E4se5OLtvP_50qxVk.js?_nc_x=0OMkmbJTxss"
data-bootloader-hash="Ll9z/Tq"
async="1"
crossorigin="nameless"
data-p=":8,11,62,45,43,13,9,65,41,56,27,21,19,59,31,35,25,33,61,37,39,29,4,20"
data-c="1"
onload='_btldr["Ll9z/Tq"]=1'
onerror="_btldr["Ll9z/Tq"]=1"
nonce="nW7qla6Q"
></script>
<hyperlink
rel="preload"
href="https://static.xx.fbcdn.web/rsrc.php/v3i7Mx4/ya/l/en_US/xovCaG3pRYkBA-zL6eV7sA_kvzFwGGA0iA_Y451ymmdRYKi3JNddo5uNTBXcpfVOaa67G_e6QNkiMZSz44Fa4IsHe1jJJzzjA8TXc-buPNEADH6ljxd0XkWPBzDnUKHZdnKhP0dmjvw4e8cHQ-wygr9Dbce6gKQUJ7j-7IAZomrkiSS24Cf0iRHhPpBbS9Mb_8VDsyKoohBoYL6MVOPJYEcxdVAvTH8o9Vk041xqiSJOXCzZm0UD5J6h5tubWo2SOY3BEhtTcw9z37VDEGPcTjrwwtNcGZfLak_UlVUiSZsFRYeVECK6mZFE3Bk1R6vYlIoidhMWzP.js?_nc_x=0OMkmbJTxss"
as="script"
crossorigin="nameless"
nonce="nW7qla6Q"
/>
<script
src="https://static.xx.fbcdn.web/rsrc.php/v3i7Mx4/ya/l/en_US/xovCaG3pRYkBA-zL6eV7sA_kvzFwGGA0iA_Y451ymmdRYKi3JNddo5uNTBXcpfVOaa67G_e6QNkiMZSz44Fa4IsHe1jJJzzjA8TXc-buPNEADH6ljxd0XkWPBzDnUKHZdnKhP0dmjvw4e8cHQ-wygr9Dbce6gKQUJ7j-7IAZomrkiSS24Cf0iRHhPpBbS9Mb_8VDsyKoohBoYL6MVOPJYEcxdVAvTH8o9Vk041xqiSJOXCzZm0UD5J6h5tubWo2SOY3BEhtTcw9z37VDEGPcTjrwwtNcGZfLak_UlVUiSZsFRYeVECK6mZFE3Bk1R6vYlIoidhMWzP.js?_nc_x=0OMkmbJTxss"
data-bootloader-hash="2abuuwv"
async="1"
crossorigin="nameless"
data-p=":3,50,26,24,16,7,64,53,18,14,12,15,17,34,2,44,32,60,28,51,5,55,36,63,23,52,22,10,57,6"
data-c="1"
onload='_btldr["2abuuwv"]=1'
onerror="_btldr["2abuuwv"]=1"
nonce="nW7qla6Q"
></script>
Sadly, there look like an enormous variety of equally
inscrutably-named scripts all listed in the identical part. However no
matter, we will all the time simply obtain all of them after which see which one
has the content material we’re searching for.
By now we’ve a process for setting up the inbox request:
- Fetch the HTML web page for our Messenger inbox as soon as logged in.
- Extract the parameters for
fb_dtsg
,deviceId
, andmodel
from the HTML. - Get an inventory of all of the scripts referenced within the HTML, and obtain
every of these. - Discover the script that defines
LSPlatformGraphQLLightspeedRequestQuery
, and extract the
parameter fordoc_id
. - Assemble a POST request utilizing these parameters, and fetch the
response.
We now want to duplicate every of those steps in Python. Step 1 is
pretty simple; it’s the identical as the primary request we made,
besides now we’re logged in and might present cookies to authenticate
ourselves:
inbox_html_resp = requests.get(
"https://www.messenger.com",
cookies=login.cookies
)
inbox_html_resp.raise_for_status()
inbox_html_page = inbox_html_resp.textual content
print(inbox_html_page)
Step 2 is only a repeat of our earlier work writing common
expressions to extract parameters from HTML:
dtsg = re.search(
r'"DTSGInitialData",[],{"token":"([^"]+)"',
inbox_html_page
).group(1)
device_id = re.search(
r'"deviceId":"([^"]+)"',
inbox_html_page
).group(1)
schema_version = re.search(
r'"schemaVersion":"([0-9]+)"',
inbox_html_page
).group(1)
print("dtsg:", dtsg)
print("device_id:", device_id)
print("schema_version:", schema_version)
Right here we’re escaping the brackets and curly braces within the dtsg
common expression to keep away from them being interpreted as common
expression operators, and utilizing [0-9]
to imply any digit zero via
9.
For step 3, we’ll need to begin by utilizing common expressions to get a
checklist of all of the scripts that (just like the one we’re searching for) have
rsrc.php
of their URL and finish in .js
:
script_urls = re.findall(
r'"([^"]+rsrc.php/[^"]+.js[^"]+)"',
inbox_html_page
)
Then we’ll need to fetch every of them:
scripts = []
for url in script_urls:
resp = requests.get(url)
resp.raise_for_status()
scripts.append(resp.textual content)
Subsequent up, for step 4, we need to discover the script that defines
LSPlatformGraphQLLightspeedRequestQuery
, and extract doc_id
from
it:
for script in scripts:
if "LSPlatformGraphQLLightspeedRequestQuery" not in script:
proceed
doc_id = re.search(
r'id:"([0-9]+)",metadata:{},title:"LSPlatformGraphQLLightspeedRequestQuery"',
script
).group(1)
break
print("doc_id:", doc_id)
Right here’s what the parameter extraction appears to be like like in motion:
% python3 messyger.py -u [email protected] -p 0aSPlneurgscxzpuEZb9
dtsg: AQE0GKhvCGqVF3E:14:1637897352
device_id: 86fbb4b2-fe0e-43e9-8bd5-cd58f7bb763b
schema_version: 4680497022042598
doc_id: 4476599072415612
Lastly, for step 5, we have to recreate the next HTTPie request
in Python, utilizing our extracted parameters:
% http -f https://www.messenger.com/api/graphql/
Cookie:'c_user=100075402451059; xs=50percent3Aq86l0PoxUG0qewpercent3A2percent3A1637886942percent3A-1percent3A-1'
fb_dtsg=AQE4bjFlv-4P3Xs:50:1637886942
doc_id=4476599072415612
variables="{"deviceId":"6a9252cb-2145-4f81-9d69-1834b84ba614","requestId":0,"requestPayload":"{"database":1,"model":4680497022042598,"sync_params":"{}"}","requestType":1}"
Right here’s what that appears like:
import json
inbox_resp = requests.put up(
"https://www.messenger.com/api/graphql/",
cookies=login.cookies,
knowledge={
"fb_dtsg": dtsg,
"doc_id": doc_id,
"variables": json.dumps({
"deviceId": device_id,
"requestId": 0,
"requestPayload": json.dumps({
"database": 1,
"model": schema_version,
"sync_params": json.dumps({})
}),
"requestType": 1
})
}
)
inbox_resp.raise_for_status()
print(inbox_resp.textual content)
You would possibly ask why the Messenger API expects a JSON string inside a
JSON string inside a JSON string inside an HTML kind. You’d have a
excellent query. However at the very least it explains why we had so many
backslashes to take care of earlier.
Decipher the inbox knowledge response
Alright, now that we’ve efficiently retrieved our inbox knowledge… what
format is that knowledge coming in, precisely? Properly, right here’s the beginning of it:
{
"knowledge": {
"viewer": {
"lightspeed_web_request": {
"payload": "perform f(){let inputs=arguments,LS=inputs[inputs.length-1],
Yep, that’s proper. It’s a JSON object… with a bunch of JavaScript
code inside it! I can’t declare to grasp why, however this Facebook
blog post about “Project
LightSpeed”
might be associated, on condition that this JSON object is seemingly a
lightspeed_web_request
. Apparently, within the new model of Messenger,
launched in early 2020, the server straight sends JavaScript for the
consumer to blindly execute and replace its native state.
In any case, let’s check out this JavaScript and see what could lie
inside:
% cat inbox-resp.json
| jq .knowledge.viewer.lightspeed_web_request.payload -r
> inbox-payload.js
% prettier inbox-payload.js > inbox-payload-pretty.js
It appears to be like like issues begin out with a bunch of initialization, setting
up some sort of sequence of operations to function as a single
transaction utilizing a bunch of
higher-order
functions:
perform f() {
let inputs = arguments,
LS = inputs[inputs.length - 1],
n = LS.n,
m = [],
output = [],
U;
return LS.seq([
(_) =>
LS.seq([
(_) =>
LS.sp(
"executeFirstBlockForSyncTransaction",
[0, 1],
[-1, 4294967295],
U,
"HCwRAAAWlgEWlqOCxgETBAA",
[0, 2],
false,
[0, 0],
false,
[0, 1],
U
).then((r) => ([m[0]] = r)),
(_) =>
m[0]
? LS.seq([
(_) =>
LS.seq([
(_) => LS.fe(LS.db.table(15).fetch(), (c) => c.delete()),
(_) => LS.fe(LS.db.table(18).fetch(), (c) => c.delete()),
(_) => LS.fe(LS.db.table(19).fetch(), (c) => c.delete()),
(_) => LS.fe(LS.db.table(20).fetch(), (c) => c.delete()),
(_) => LS.fe(LS.db.table(21).fetch(), (c) => c.delete()),
(_) => LS.fe(LS.db.table(22).fetch(), (c) => c.delete()),
(_) => LS.fe(LS.db.table(23).fetch(), (c) => c.delete()),
(_) => LS.fe(LS.db.table(24).fetch(), (c) => c.delete()),
... lots more of this ...
After this boilerplate, however, the bulk of the script consists of
calls to the LS.sp
function, like this:
(_) =>
LS.sp(
"addParticipantIdToGroupThread",
[23284, 3405894928],
[23284, 3405894928],
[381, 1262926839],
[381, 1262927046],
[381, 1262844841],
U,
false,
U,
[0, 0],
[0, 80],
U,
U
),
That is the place we’ve to begin gazing code and making guesses. One
cheap guess is that the primary argument to LS.sp
represents the
motion to be taken (e.g. register a person as one of many individuals in
a bunch dialog), and the remaining arguments are parameters for
that motion (e.g., figuring out which person and which dialog are
to be operated on).
One factor that may be useful is knowing what a few of these
argument values are. For instance, what’s U
? Happily, that one is
fairly straightforward to determine. From the start of the script:
let inputs = arguments,
LS = inputs[inputs.length - 1],
n = LS.n,
m = [],
output = [],
U;
So U
is definitely simply undefined
(which is the default worth of an
uninitialized variable in JavaScript); it appears to be like just like the code is utilizing
the U
alias to save lots of characters.
What about these two-element arrays? They’re really all around the
place within the generated code, and unusually sufficient, each array has
precisely two integers, no extra, no much less. Much more unusually, there
aren’t any regular integers! Numbers solely present up inside two-element
arrays. Listed below are some examples of those arrays:
[-1, 4294967295]
[0, 0]
[0, 1]
[0, 2]
[0, 80]
[0, 1640485980]
[381, 1262844841]
[381, 1262926839]
[381, 1262927029]
[381, 1262927046]
[23284, 3405894928]
[23300, 2664454259],
[368832, 2185323521]
[230687821, 2208225279]
these arrays, we will see a few notable properties:
- There are numerous “genres” of the arrays, which have totally different
typical ranges. For instance, there are a bunch which can be[0, <small integer>]
, then a bunch which can be[381, <large integer>]
, then
some which can be[<integer around 23000>, <large integer>]
. - We all know that a few of these values should by some means characterize issues like
person IDs and dialog IDs, since capabilities like
addParticipantIdToGroupThread
don’t take any arguments that might
convey this info apart from the two-element arrays. - Values like
[0, 0]
and[0, 1]
present up loads. - The worth
4294967295
that reveals up in[-1, 4294967295]
is
precisely one lower than 2 to the 32, or the utmost worth of a 32-bit
integer.
These properties are all suggestive, however the factor that gave me a
flash of perception was this code elsewhere within the script payload:
LS.i64.eq(i.a, [23284, 3405894928]) &&
LS.i64.eq([0, 0], [0, 0]) &&
LS.i64.eq(i.b, [381, 1262927029]) &&
The time period i64
sometimes refers to a 64-bit integer, so i64.eq
would
be a perform for evaluating 64-bit integers for equality. Oh, so these
arrays have to be representing 64-bit integers! The primary integer have to be
the excessive 32 bits (which might usually be zero), and the second integer
is the low 32 bits. I might assume Messenger does this as a result of
JavaScript doesn’t have 64-bit
integers.
For instance, the recurring worth [23284, 3405894928]
would translate
to 2^32 * 23284 + 3405894928 = 100007424414992
. And what do you
know, that’s precisely the worth that was displaying up within the URL
https://www.messenger.com/t/100007424414992/
in our screenshots!
Armed with this data, let’s check out the place Hello Camilla
is
displaying up, as that can in all probability have the inbox info we’re
searching for. There turn into three occurrences:
deleteThenInsertThread
, upsertMessage
, and
setMessageDisplayedContentTypes
. Properly, we’re seeking to generate a
dialog checklist, so the primary perform sounds essentially the most related.
Listed below are its arguments:
LS.sp(
"deleteThenInsertThread",
[381, 1262897440],
[381, 1262897440],
"Hello Camilla!",
U,
"https://scontent-sjc3-1.xx.fbcdn.web/v/t1.30497-1/143086968_2856368904622192_1959732218791162458_n.png?_nc_cat=1&ccb=1-5&_nc_sid=dbb9e7&_nc_ohc=Q_Y6W2vdkywAX8eYoNq&_nc_ht=scontent-sjc3-1.xx&oh=8e70b2536bd14a3ecbe072f3753622a3&oe=61C56EF8",
U,
[0, 80],
[23300, 2492200183],
[0, 0],
[0, 1],
"inbox",
"/messaging/lightspeed/media_fallback/?entity_id=100075230196983&entity_type=10&width=200&peak=200",
[0, 1640328952],
[0, 0],
[0, 0],
[0, 0],
false,
[23300, 2492200183],
U,
U,
... heaps extra nulls and zeroes ...
Let’s break down the arguments right here based mostly on what we all know:
Hello Camilla!
would in all probability be the textual content that’s displayed within the
sidebar, i.e. the newest message in its dialog.[23300, 2492200183] = 100075402451059
is the ID that’s displayed
within the URL for the dialog with Kane Woods (who despatched theHello Camilla!
message). This parameter is repeated twice in several
locations for some motive.[381, 1262897440] = 1637645437216
is a UNIX
timestamp forTuesday, November 23, 2021 5:30:37.216 AM GMT
, which is roughly the
time that I’m scripting this information. UNIX timestamp is a brilliant widespread
format to characterize dates and instances, so it’s price remembering, and
after you see it sufficient you’ll have the ability to see an integer and say
“that kinda appears to be like prefer it’s the fitting measurement to be a timestamp”. You
can use an online tool to transform
timestamps to human-readable illustration, and vice versa. This
parameter can be repeated twice elsewhere for some
motive.- The
https://scontent-sjc3-1.xx.fbcdn.web
URL will be opened
straight within the browser and seems to be the profile image
displayed subsequent to the dialog. [0, 1640328952]
looks like one other UNIX timestamp,Friday, December 24, 2021 6:55:52 AM GMT
(this one in seconds relatively than
milliseconds), however that’s an entire month after the primary one, so the
relevance of this one isn’t completely clear.
Study the habits of the inbox knowledge response
This can be a good begin, however we’d like extra info to make sure we
perceive what these parameters actually imply. For one factor, we might
be making dangerous assumptions, and for an additional, there are already
unanswered questions, comparable to why there are seemingly duplicated
parameters. They might be actually redundant, or the totally different copies
might have totally different meanings and simply occur to have the identical worth
on this specific state of affairs. A technique we will resolve the confusion is
by gathering extra knowledge:
- Reload the web page and obtain the response knowledge once more. Examine it
with the unique (say, utilizinggit diff --no-index
on the
formatted JavaScript payloads) to see what modifications between two
subsequent requests for a similar knowledge. - Now make some change by interacting with Messenger, e.g. by sending
a brand new message. Obtain the response knowledge a 3rd time and evaluate
it with the unique as effectively. Something that’s totally different now that
wasn’t totally different within the first comparability have to be a change attributable to
your newest interplay. This would possibly assist to determine how the values
of the parameters relate to the info being proven on the webpage.
Let’s begin by sending a brand new message from our take a look at account in one in every of
its conversations. Right here is how the invocation to
deleteThenInsertThread
within the inbox API response modifications:
LS.sp(
"deleteThenInsertThread",
- [381, 1262927029],
- [381, 1262927029],
- "Hey there Camilla!",
+ [381, 1596534595],
+ [381, 1596534595],
+ "You: Good day, it is a response",
U,
"https://scontent-sjc3-1.xx.fbcdn.web/v/t31.18172-1/p200x200/13235223_1713693562221441_496736952870870067_o.jpg?_nc_cat=110&ccb=1-5&_nc_sid=dbb9e7&_nc_ohc=AlIiI4QhLQUAX_
FB-Bd&_nc_ht=scontent-sjc3-1.xx&oh=fc8dd918473653af963d198ea106e8eb&oe=61C7D45C",
U,
[0, 80],
[23284, 3405894928],
[0, 0],
[0, 1],
"inbox",
"/messaging/lightspeed/media_fallback/?entity_id=100007424414992&entity_type=10&width=200&peak=200",
[0, 1640485980],
[0, 0],
[0, 0],
[0, 0],
false,
- [23284, 3405894928],
+ [23300, 2664454259],
U,
U,
The UNIX timestamps on the high have modified from [381, 1262927029] = Tuesday, November 23, 2021 5:31:06.805 AM GMT
to [381, 1596534595] = Saturday, November 27, 2021 2:11:14.371 AM GMT
, with the brand new
timestamp being precisely the time after we despatched the most recent message. This
means that one or each of these timestamps corresponds to the time
of the final message or different replace to the dialog. We’ll must
do extra investigation to determine the distinction between the 2
parameters.
The last-message string was up to date from Hey there Camilla!
to You: Good day, it is a response
, and the presence of You:
in right here
means that it’s not the uncooked message, however really corresponds to
the literal textual content proven within the inbox sidebar.
And at last, the ID on the backside has modified. Be aware that beforehand
there have been two situations of [23284, 3405894928] = 100007424414992
(the person ID of the particular person we have been messaging), however now one in every of them has
modified to [23300, 2664454259] = 100075402451059
(our personal person ID,
which we will discover by looking for ourselves in Messenger and checking
the URL). This habits means that the primary of the 2 IDs is the
ID of the particular person we’re messaging, whereas the second is the ID of the
one that despatched the final message (beforehand the opposite particular person, now
us).
Let’s now discover learn/unread habits. We’ll obtain a brand new message
from one other account, after which see how the API response modifications when
we learn that message (clearing its unread standing).
Aha, this produces a distinction between the 2 mysterious timestamps
handed to deleteThenInsertThread
:
LS.sp(
"deleteThenInsertThread",
[381, 1597327774],
- [381, 1596534595],
+ [381, 1597327774],
"Have you ever learn this but?",
U,
"https://scontent-sjc3-1.xx.fbcdn.web/v/t31.18172-1/p200x200/13235223_1713693562221441_496736952870870067_o.jpg?_nc_cat=110&ccb=1-5&_nc_sid=dbb9e7&_nc_ohc=AlIiI4QhLQUAX_FB-Bd&_nc_ht=scontent-sjc3-1.xx&oh=fc8dd918473653af963d198ea106e8eb&oe=61C7D45C",
Primarily based on this info, it appears doubtless that the primary timestamp is
when the newest message was despatched, whereas the second timestamp is
the timestamp of the newest message that’s been learn to this point. In
different phrases, the dialog has unread message(s) when these two
timestamps differ.
One final element to clear up earlier than we will assemble our inbox view:
how will we really translate from these person IDs again to human names
we will show? Properly, looking for a reputation like Kane Woods
within the
response reveals that this info will be simply extracted from the
arguments to the verifyContactRowExists
perform:
LS.sp(
"verifyContactRowExists",
[23300, 2492200183],
[0, 1],
"https://scontent-sjc3-1.xx.fbcdn.web/v/t1.30497-1/p100x100/143086968_2856368904622192_1959732218791162458_n.png?_nc_cat=1&ccb=1-5&_nc_sid=dbb9e7&_nc_ohc=Q_Y6W2vdkywAX9oSRfq&_nc_ht=scontent-sjc3-1.xx&oh=daaf26cd7b77fa48b5bc2aafc7ee8a0c&oe=61C90FD1",
"Kane Woods",
... extra arguments ...
Parse the inbox knowledge response
We now have a guidelines of knowledge to extract from the inbox knowledge
response:
- Extract the embedded JavaScript snippet from the JSON response we
get from Messenger. - Take a look at calls to
deleteThenInsertThread
to get an inventory of
conversations that may be displayed within the sidebar. - Extract the last-sent message description from a string argument.
- Get the person ID of the particular person the dialog is with, in addition to
the person ID of the one who despatched the final message. - Examine the 2 timestamps within the preliminary arguments to find out
whether or not the dialog is marked as unread or not. - Take a look at calls to
verifyContactRowExists
to map person IDs again to
human names.
Nevertheless, this code looks like an enormous mess to parse with common
expressions, because the required knowledge is caught in particular positional
arguments of lengthy perform invocations, separated by plenty of cruft we
don’t care about. One other method is known as for.
The usual method to parsing programming languages with out common
expressions is to make use of a device to transform them into their abstract
syntax tree,
which permits manipulating the info embedded within the language with out
needing to parse plenty of syntax. We’ll use the
Esprima library to parse
JavaScript from Python:
import esprima
inbox_json = inbox_resp.json()
inbox_js = inbox_json["data"]["viewer"]["lightspeed_web_request"]["payload"]
ast = esprima.parseScript(inbox_js)
print(ast)
Now, as an alternative of getting to parse a perform name like this:
LS.sp(
"updateThreadsRangesV2",
"inbox",
[0, 0],
[0, 1],
[-2147483648, 0]
),
We get a pre-parsed knowledge construction like this, the place all of the arguments
are neatly recognized for us and we will simply loop over them in Python:
{
kind: "CallExpression",
callee: {
kind: "MemberExpression",
computed: False,
object: {
kind: "Identifier",
title: "LS"
},
property: {
kind: "Identifier",
title: "sp"
}
},
arguments: [
{
type: "Literal",
value: "updateThreadsRangesV2",
raw: ""updateThreadsRangesV2""
},
{
type: "Literal",
value: "inbox",
raw: ""inbox""
},
{
type: "ArrayExpression",
elements: [
{
type: "Literal",
value: 0,
raw: "0"
},
{
type: "Literal",
value: 0,
raw: "0"
}
]
},
{
kind: "ArrayExpression",
components: [
{
type: "Literal",
value: 0,
raw: "0"
},
{
type: "Literal",
value: 1,
raw: "1"
}
]
},
{
kind: "ArrayExpression",
components: [
{
type: "UnaryExpression",
prefix: True,
operator: "-",
argument: {
type: "Literal",
value: 2147483648,
raw: "2147483648"
}
},
{
type: "Literal",
value: 0,
raw: "0"
}
]
}
]
},
Step 1 of processing the AST we get from Esprima shall be to determine
all of the makes use of of LS.sp
. We are able to write a perform to match this
sample, based mostly on wanting on the AST snippet above:
def is_lightspeed_call(node):
return (
node.kind == "CallExpression"
and node.callee.kind == "MemberExpression"
and node.callee.object.kind == "Identifier"
and node.callee.object.title == "LS"
and node.callee.property.kind == "Identifier"
and node.callee.property.title == "sp"
)
Then we’ll need to rework the arguments into Python values relatively
than the objects that seem within the AST, once more by inspecting the AST
snippet above to see how issues are represented:
def parse_argument(node):
if node.kind == "Literal":
return node.worth
if node.kind == "ArrayExpression":
assert len(node.components) == 2
high_bits, low_bits = map(parse_argument, node.components)
return (high_bits << 32) + low_bits
if (
node.kind == "UnaryExpression" and
node.prefix and
node.operator == "-"
):
return -parse_argument(node.argument)
(We’re utilizing << 32
to implement Messenger’s multiply-by-2-to-the-32
operation; read more about
<<
.)
What we need to do now’s undergo each node within the AST, looking
for LS.sp
invocations, and type them by which perform is being
known as. Happily, that is precisely the kind of job that libraries
like Esprima are designed for. The standard strategy to do it’s by writing
a perform which the library will name for each node within the AST. Right here
is what that appears like:
import collections
fn_calls = collections.defaultdict(checklist)
def handle_node(node, meta):
if not is_lightspeed_call(node):
return
args = [parse_argument(arg) for arg in node.arguments]
(fn_name, *fn_args) = args
fn_calls[fn_name].append(fn_args)
esprima.parseScript(inbox_js, delegate=handle_node)
print(json.dumps(fn_calls, indent=2))
Right here’s what that appears like:
% python3 messyger.py -u [email protected] -p 5155xKYdE1zi0KxGPMvF
{
"executeFirstBlockForSyncTransaction": [
[
1,
-1,
null,
"HCwRAAAWZhaql9XyAxMEAA",
2,
false,
0,
false,
1,
null
]
],
"deleteThenInsertThread": [
[
1638035642666,
1638035369213,
"Totes agreed",
null,
"https://scontent-sjc3-1.xx.fbcdn.net/v/t1.30497-1/143086968_2856368904622192_1959732218791162458_n.png?_nc_cat=1&ccb=1-5&_nc_sid=dbb9e7&_nc_ohc=Q_Y6W2vdkywAX8I78V4&_nc_ht=scontent-sjc3-1.xx&oh=c780f73897ab04a260dbbd203a14e2e5&oe=61C96378",
null,
80,
100075475206906,
... lots more ...
(You may notice we’re using a different email address here, because
this is the point in the blog post where the original account I was
using for testing got banned for acting too suspicious.)
With this data in hand, we can finally assemble our parsed function
calls into a useful thread listing:
conversations = collections.defaultdict(dict)
for args in fn_calls["deleteThenInsertThread"]:
last_sent_ts, last_read_ts, last_msg, *relaxation = args
user_id, last_msg_author = [
arg for arg in rest if isinstance(arg, int) and arg > 1e14
]
conversations[user_id]["unread"] = last_sent_ts != last_read_ts
conversations[user_id]["last_message"] = last_msg
conversations[user_id]["last_message_author"] = last_msg_author
for args in fn_calls["verifyContactRowExists"]:
user_id, _, _, title, *relaxation = args
conversations[user_id]["name"] = title
print(json.dumps(conversations, indent=2))
And, huzzah! Conversations listed from most to least latest, with our
personal person ID out there because the final entry within the dictionary. (That final
bit is admittedly a bit of bizarre, however it might all the time be cleaned up
later if desired.)
% python3 messyger.py -u [email protected] -p 5155xKYdE1zi0KxGPMvF
{
"100075475206906": {
"unread": true,
"last_message": "Totes agreed",
"last_message_author": 100075475206906,
"title": "Kerri Blackmore"
},
"100075217039998": {
"unread": false,
"last_message": "You: How ya doin",
"last_message_author": 100075103764938,
"title": "Astrid Mccallum"
},
"100075103764938": {
"title": "Ailish Maldonado"
}
}
Discover the send-message request
Let’s transfer on to the final deliberate characteristic of Messyger: sending a
message to a dialog. We are able to begin by opening up Messenger with
the developer instruments open and sending a message, to see what request(s)
it triggers:
Hmmm, one thing is odd right here. Within the right-hand column, labeled
Waterfall, we will see a visible illustration of the time when every
request was made. Most of them have been made at about the identical time,
through the preliminary web page load. The third-to-last request was made a
little bit after (it appears to be like just like the consumer makes a request like this
periodically, even in the event you don’t do something). And the final two
requests are the one two that have been made after we despatched the message.
However these final two requests aren’t API requests, they’re simply requests
to fetch photographs! So how was the consumer capable of inform the server to ship
our message?
Properly, if we scroll up within the Community tab, we will see there are a number of
requests which can be nonetheless listed as “Pending”, which means they haven’t
completed transmitting knowledge but:
These connections are websocket
connections,
that are long-lasting HTTP connections that the server and consumer can
use to ship knowledge forwards and backwards at any time. Certainly, if we click on on
a kind of connections, we will see that they’ve been exhausting at work
the whole time sending messages forwards and backwards:
Since we didn’t see any new HTTP requests within the Community tab when
sending a message, it’s potential that the consumer used one in every of its
websocket connections to speak with the server as an alternative. Certainly,
if we verify the previous few websocket messages that have been exchanged after
we pressed Return, we discover this one, which is a message from the
consumer to the server containing precisely the textual content of the message we
despatched:
To examine the contents, we will obtain that message from Chrome (in
base64 because it’s labeled as a “binary message”):
Utilizing a device like base64decode.org,
we will then examine the contents:
It seems that there’s a little bit of junk originally, however then the
relaxation is only a JSON object that appears awfully just like the one which
we used when making our request to get the inbox knowledge.
Right here’s the uncooked JSON:
{
"request_id": 76,
"kind": 3,
"payload": "{"version_id":"4680497022042598","duties":[{"label":"46","payload":"{"thread_id":100075475206906,"otid":"6870463702739115828","source":65537,"send_type":1,"text":"Let's see how this message gets sent","initiating_source":1}","queue_name":"100075475206906","task_id":10,"failure_count":null},{"label":"21","payload":"{"thread_id":100075475206906,"last_read_watermark_ts":1638046193775,"sync_group":1}","queue_name":"100075475206906","task_id":11,"failure_count":null}],"epoch_id":6870463702858032614,"data_trace_id":"#0Q0JVtIKTdGjTDCwodlbNg"}",
"app_id": "772021112871879"
}
The place the payload
secret is a JSON string that expands to this:
{
"version_id": "4680497022042598",
"duties": [
{
"label": "46",
"payload": "{"thread_id":100075475206906,"otid":"6870463702739115828","source":65537,"send_type":1,"text":"Let's see how this message gets sent","initiating_source":1}",
"queue_name": "100075475206906",
"task_id": 10,
"failure_count": null
},
{
"label": "21",
"payload": "{"thread_id":100075475206906,"last_read_watermark_ts":1638046193775,"sync_group":1}",
"queue_name": "100075475206906",
"task_id": 11,
"failure_count": null
}
],
"epoch_id": 6870463702858032000,
"data_trace_id": "#0Q0JVtIKTdGjTDCwodlbNg"
}
And the payload
keys in that are JSON strings that increase to
these:
{
"thread_id": 100075475206906,
"otid": "6870463702739115828",
"supply": 65537,
"send_type": 1,
"textual content": "Let's examine how this message will get despatched",
"initiating_source": 1
}
{
"thread_id": 100075475206906,
"last_read_watermark_ts": 1638046193775,
"sync_group": 1
}
Replicate the send-message request
Now that we’ve recognized the request that the consumer makes use of to ship a
message, we need to replicate it outdoors the browser for testing.
Nevertheless, this one is a little more sophisticated, as a result of we will’t simply
“Copy as cURL” for a websocket message. We might get the cURL command
to open the websocket, however then the server and consumer would possibly must
trade a bunch of particular person messages on the socket earlier than we might
ship the request we would like.
Nevertheless, if we take a look at the info on this websocket message, it appears
unusually just like the HTTP request we made earlier when fetching
the inbox knowledge. Particularly:
- Each requests have an embedded string of JSON (known as
payload
and
requestPayload
respectively). - Each “payloads” have a top-level key that appears to specify some variety
of schema model, with worth4680497022042598
(the important thing being
known asversion_id
andmodel
respectively). - Alongside the “payload” there may be additionally a sibling key that specifies
some sort of request kind, with worthkind: 3
within the websocket
message andrequestType: 1
within the inbox request.
Is it potential that the Messenger API helps making the identical request
in two other ways (through particular person HTTP request or as a message on
an already-open websocket)? In that case, it could make issues simpler for us,
as a result of then as an alternative of determining find out how to do the stuff with
websockets, we might simply make this request the identical manner as we made
the inbox request.
We have now no specific motive to consider this can work, since we don’t
have a working instance within the consumer to check in opposition to, but when our
guess occurs to be proper, we might save lots of time, so let’s give
it a strive. We’ll begin with our present code to make the inbox
request:
inbox_resp = requests.put up(
"https://www.messenger.com/api/graphql/",
cookies=login.cookies,
knowledge={
"fb_dtsg": dtsg,
"doc_id": doc_id,
"variables": json.dumps({
"deviceId": device_id,
"requestId": 0,
"requestPayload": json.dumps({
"database": 1,
"model": schema_version,
"sync_params": json.dumps({})
}),
"requestType": 1
})
}
)
inbox_resp.raise_for_status()
Then we’ll modify it to substitute out its requestPayload
for the
one we noticed within the websocket message:
send_message_resp = requests.put up(
"https://www.messenger.com/api/graphql/",
cookies=login.cookies,
knowledge={
"fb_dtsg": dtsg,
"doc_id": doc_id,
"variables": json.dumps({
"deviceId": device_id,
"requestId": 0,
"requestPayload": json.dumps({
"version_id": "4680497022042598",
"duties": [
{
"label": "46",
"payload": json.dumps({
"thread_id": 100075475206906,
"otid": "6870463702739115828",
"source": 65537,
"send_type": 1,
"text": "Let's see how this message gets sent",
"initiating_source": 1
}),
"queue_name": "100075475206906",
"task_id": 10,
"failure_count": None
},
{
"label": "21",
"payload": json.dumps({
"thread_id": 100075475206906,
"last_read_watermark_ts": 1638046193775,
"sync_group": 1
}),
"queue_name": "100075475206906",
"task_id": 11,
"failure_count": None
}
],
"epoch_id": 6870463702858032000,
"data_trace_id": "#0Q0JVtIKTdGjTDCwodlbNg"
}),
"requestType": 3 # to match kind: 3 in websocket message
})
}
)
send_message_resp.raise_for_status()
print(send_message_resp.textual content)
If we run this, the outcomes are blended: it doesn’t return an error
(as an alternative it returns a bunch of embedded JavaScript identical to the inbox
response), however it additionally doesn’t really ship a message that reveals up
within the Messenger interface. There are a pair totally different potential
explanations for why this might be; for instance:
- We’d not have the ability to ship messages utilizing this API in any respect, and
our unique guess was incorrect. - We might have a syntax error someplace in our request, and the
server would possibly simply ignore malformed requests and ship again a generic
response. - Not like getting the inbox knowledge, which is read-only, sending a
message is a write operation. The API might need been designed so
that repeating the identical request greater than as soon as doesn’t lead to
a number of messages getting despatched.
From a design perspective, (3) really makes lots of sense. After
all, requests generally fail contained in the community and should be
retried, so it’s finest if repeating a request a number of instances doesn’t
lead to motion being taken a number of instances.
One strategy to see if so is by taking part in round with the
parameters of the request to see if one in every of them needs to be modified
earlier than the repeated request shall be seen as a request to ship a new
message, relatively than be ignored as a replica.
And if we do that, we discover that certainly, altering the otid
worth
(e.g., by including 1 to it) ends in a brand new copy of the message being
despatched!
Apparently, otid
is a few sort of distinctive identifier used to forestall
messages from by accident getting despatched twice (every message from the
consumer will get assigned a singular otid
, and every otid
can solely be
processed as soon as by the server).
Clear up the send-message request
Now that we’ve a proof of idea for find out how to ship messages, we will
clear it up by utilizing variables as an alternative of hardcoding in values. We’ll
begin by studying in command-line arguments for sending a message:
parser.add_argument("-m", "--message")
parser.add_argument("-r", "--recipient", kind=int)
Then we will substitute Let's examine how this message will get despatched
with
args.message
, and 100075475206906
with args.recipient
.
Many of the different parameters within the request look moderately essential,
so we’ll go away most of them in. The one exception is data_trace_id
,
which suggests one thing used for debugging, so we’ll take away that.
There are a few leftover hardcoded numbers:
supply
is about to65537
(precisely yet another than 2 to the sixteenth
energy). Nevertheless, testing means that the worth doesn’t really
matter, so we’ll simply set it to 0 for now and revisit later if it
causes points.label
is about to46
and21
respectively within the two components of
theduties
array. These values appear prone to be fastened ID numbers
which can be a part of the API (46
which means “ship message” and21
which means “replace last-read indicator”).last_read_watermark_ts
is about to a UNIX timestamp that appears to be
for the time the message was despatched, which we will substitute with one
generated by our code.task_id
is about to10
and11
respectively for the 2duties
.
Testing means that the values don’t matter, so we’ll set them to
0 and 1, respectively.requestType
is3
and this comes from the websocket API request,
so we’ll go away that as is.
Right here’s what we find yourself with after making these substitutions:
import datetime
timestamp = int(datetime.datetime.now().timestamp() * 1000)
send_message_resp = requests.put up(
"https://www.messenger.com/api/graphql/",
cookies=login.cookies,
knowledge={
"fb_dtsg": dtsg,
"doc_id": doc_id,
"variables": json.dumps(
{
"deviceId": device_id,
"requestId": 0,
"requestPayload": json.dumps(
{
"version_id": str(schema_version),
"duties": [
{
"label": "46",
"payload": json.dumps(
{
"thread_id": args.recipient,
"otid": "6870463702739115830",
"source": 0,
"send_type": 1,
"text": args.message,
"initiating_source": 1,
}
),
"queue_name": str(args.recipient),
"task_id": 0,
"failure_count": None,
},
{
"label": "21",
"payload": json.dumps(
{
"thread_id": args.recipient,
"last_read_watermark_ts": timestamp,
"sync_group": 1,
}
),
"queue_name": str(args.recipient),
"task_id": 1,
"failure_count": None,
},
],
"epoch_id": 6870463702858032000,
}
),
"requestType": 3,
}
),
},
)
All we’ve left now are the mysterious otid
and epoch_id
parameters. Producing otid
appropriately is essential for ensuring
our messages are literally despatched, so we’ll need to perceive find out how to do
it.
Because the otid
is totally different for each message, it could most probably
need to be generated on the consumer aspect. Subsequently, an affordable place
to begin could be to look the client-side JavaScript for mentions of
otid
.
% cat inbox-requests.har
| jq '.log.entries
| map(choose(.response.content material.textual content | .?
| comprises("otid"))
| .request.url)'
[
"https://static.xx.fbcdn.net/rsrc.php/v3iRYC4/y8/l/en_US/d0FzJm8Jr_2GGVI9daGiZfL5MEKqrgHqVIWF0joMj2QgTM4YRSBq4b1LW25pZd7TC-AjrHzCyljakIj8QgziINDKiIZu6XPLjKZJ_v74SDWO8lfwQznT2vHDG_5hUHYYOcBO0v0LGrADQfsxLTta97k6SMq0QFmd6lLAcfPeULHpocMm0pQ6ZiqCb9aFMEaLXT3_o_DtviHOB3GX1Isgz-QZRkiA16JwTjqxIM9tg2HGk3jOqpo4M8-E4se5OLtvP_50qxVk.js?_nc_x=0OMkmbJTxss
"
]
% cat inbox-requests.har
| jq '.log.entries
| map(choose(.response.content material.textual content | .?
| comprises("otid")))
| .[0].response.content material.textual content' -r
> otid.js
% prettier otid.js > otid-pretty.js
There are appearances of otid
on this script. By themselves, none of
them are very explanatory, however context is every thing. One of many
matches appears to be like like this:
d[2].set("otid", c.i64.to_string(d[1])),
And if we glance up on the high of the perform the place this assertion
seems, we see it appears to be like like this:
__d(
"LSCreateGroupThreadWithAdminText",
[
"LSCreateOfflineThreadingID",
"LSIssueNewTask",
"LSLocalApplyOptimisticGroupThread",
],
perform (a, b, c, d, e, f) {
Hey, wait, does otid
stand for OfflineThreadingID
? Growth, let’s
seek for the definition of this LSCreateOfflineThreadingID
perform. Conveniently sufficient, it’s even in the identical file!
__d(
"LSCreateOfflineThreadingID",
[],
perform (a, b, c, d, e, f) {
a = perform () {
var a = arguments,
b = a[a.length - 1];
b.n;
var c = [],
d = [];
return (
(c[0] = b.i64.random()),
(d[0] = b.i64.and_(
b.i64.or_(
b.i64.lsl_(a[0], b.i64.to_int32([0, 22])),
b.i64.and_(c[0], [0, 4194303])
),
[2147483647, 4294967295]
)),
b.resolve(d)
);
};
e.exports = a;
},
null
);
By this perform, it looks like the algorithm is one thing
like:
- Generate a random 64-bit integer (
b.i64.random()
). - Mix that with
4194303
utilizing bitwise AND. Since4194303
occurs to be one lower than 2 to the twenty second energy, this has the
impact of dropping all bits previous the rightmost 22, which converts
the worth from step 1 right into a random 22-bit integer. - Take the argument to
LSCreateOfflineThreadingID
, and shift it by
22 bits to the left (lsl
is a standard abbreviation for left shift
logical, which is often written as<<
). - Mix the previous two values utilizing bitwise OR. Because the first
worth has solely the rightmost bits, whereas the second worth has solely
the leftmost bits, this operation primarily concatenates the 2
numbers. Arithmetically, it really works out to(lefthand bits) * 2^22 + (righthand bits)
.
(Read more about bitwise
operators.)
This raises the query: what really is the argument handed to
LSCreateOfflineThreadingID
? Properly, if we glance again on the unique
perform that talked about LSCreateOfflineThreadingID
, we will see that
that is the place the decision occurs:
perform (a) {
return c
.sp(b("LSCreateOfflineThreadingID"), c.i64.of_float(Date.now()))
.then(perform (a) {
return (a = a), (d[1] = a[0]), a;
});
},
Aha! The argument is only a UNIX timestamp generated by Date.now()
.
So, with that in thoughts, we will generate our personal “offline threading IDs”
in Python:
timestamp = int(datetime.datetime.now().timestamp() * 1000)
otid = (timestamp << 22) + random.randrange(2 ** 22)
One final puzzle to resolve: what about epoch_id
? Really, it’s
virtually the identical as otid
! Within the websocket message, we had otid = 6870463702739115830
and epoch_id = 6870463702858032000
. It appears
like epoch_id
is simply otid
rounded all the way down to some specific even
boundary. Actually, with a bit of extra inspection, it seems that
the boundary is simply the identical 22-bit boundary as above. So, that is
what’s occurring:
import random
timestamp = int(datetime.datetime.now().timestamp() * 1000)
epoch = timestamp << 22
otid = epoch + random.randrange(2 ** 22)
In the end, we will add this code to Messyger and have a completely(?)
functioning Messenger consumer that may fetch our checklist of conversations,
and ship a textual content message to any one in every of them. The complete code is on
GitHub.
What subsequent?
Messyger is nowhere close to a full Messenger consumer. I don’t intend to
add any extra options, as a result of Messyger is an academic case research
relatively than a sensible utility. Nevertheless, if you wish to apply
your expertise, you would possibly take into account determining find out how to deal with among the
complexities that Messyger glosses over; for instance:
- We solely had direct messages right here; what about group chats?
- How do you fetch previous messages, along with the newest one?
- How are you going to ship and obtain photographs, as an alternative of simply texts?
- How are emojis and reactions represented?
- How are you going to get details about when the recipient has learn your
message?
Or, you’ll find a web site of your personal that you just want uncovered a greater
API. Can you determine find out how to make issues work the way in which you need?
⛏️ Go forth and construct one thing!
Necessary authorized discover: This weblog put up is maintained by Radian
LLC.