oss-sec mailing list archives

Analysis on who is Jia Tan, and who he could work for, reading xz.git


From: Alejandro Colomar <alx () kernel org>
Date: Wed, 10 Apr 2024 05:16:52 +0200

Hi!

Regarding <https://tukaani.org/xz-backdoor/>

I've been researching xz.git to learn about this malicious actor, and
who he might have worked for.

This Jia Tan seems to work mostly with the +0800 timezone:

        $ git log --all --author 'Jia Tan' \
        | grep ^Date \
        | grep -o '[+-][0-9][0-9][0-9]0' \
        | sort \
        | uniq -c;
              4 +0200
             10 +0300
            676 +0800

According to <https://www.timeanddate.com/time/map/>, in the summer,
+0800 corresponds to China, or Taiwan, or Hong Kong, or Irkutsk (Russia),
or Philippines or other small countries around it.  None of the regions
in +0800 use DST.

+0300 corresponds to, among others, Israel and Moscow, and then a bunch
of other less relevant countries, including most of eastern Europe,
Arabia, and some eastern Africa.  Of those, some have DST and some
haven't.  Israel has DST (so it is +0200 in the winter), but Moscow has
no DST.  It is very interesting that all of his commits that use a +0200
timezone are all in the winter, which discards countries that have +0200
in the summer due to DST (western Europe).  It also makes Israel a good
candidate, since there are very few countries that have +0200 in the
winter, and Israel is one of them.  And all of his +0300 commits are
during summer, which confirms the same.  We can discard Russia.  And we
can probably suspect of Israel (there are other powerful states that
could have been that are also in +0300 with DST, but Israel has more
antecedents in this matter).

I suspect those +0200 and +0300 correspond to a few times that this guy
would have traveled to his intelligence agency for some special work; so
a hypothesis could be that he works for Israel and he lives in China.
Following that hypothesis, the commits made in the +0200 and +0300
timezones could be special, perhaps done under some intensive planning
in an intelligence agency with entire teams, and should get special
atention.  Two of those commits are very recent, of this year (2024),
and that travel might have had something to do with the fact that he
rushed recently.

        $ git log --all --author='Jia Tan' \
        | grep -B2 '^Date.*+0[23]00' \
        | grep -v ^Author;
        commit de5c5e417645ad8906ef914bc059d08c1462fc29
        Date:   Mon Feb 12 17:09:10 2024 +0200
        --
        commit e446ab7a18abfde18f8d1cf02a914df72b1370e3
        Date:   Mon Feb 12 17:09:10 2024 +0200
        --
        commit c972d44103c4edf88e73917ef08bde69db9d06cb
        Date:   Tue Jul 18 13:27:46 2023 +0300
        --
        commit 3d1fdddf92321b516d55651888b9c669e254634e
        Date:   Tue Jun 27 17:27:09 2023 +0300
        --
        commit d4674dfbb7d1df1feb841f5dbce6ae1f0b026879
        Date:   Mon Nov 7 16:24:14 2022 +0200
        --
        commit 1fc6e7dd1fabdb60124d449b99273330ccab3ff1
        Date:   Mon Nov 7 16:24:14 2022 +0200
        --
        commit 6a86e81cab202d0a812a7b2e9efacaf70c58ba38
        Date:   Thu Oct 6 21:53:09 2022 +0300
        --
        commit e7a7ac744eb0f890ef52388de838596ef566c73f
        Date:   Thu Sep 8 15:07:00 2022 +0300
        --
        commit ba3e4ba2de034ae93a513f9c3a0823b80cdb66dc
        Date:   Thu Sep 8 15:07:00 2022 +0300
        --
        commit 76a5a752b8467ff591dd028deb61e9bf2c274c7e
        Date:   Mon Jul 25 18:30:05 2022 +0300
        --
        commit 749b86c2c18ab61a07f19ec8fefc67325da97397
        Date:   Mon Jul 25 18:20:01 2022 +0300
        --
        commit 61f8ec804abdb4c5dac01e8ae9b90c7be58a5c24
        Date:   Mon Jul 25 18:30:05 2022 +0300
        --
        commit 4d80b463a1251aa22eabc87d2732fec13b1adda6
        Date:   Mon Jul 25 18:20:01 2022 +0300
        --
        commit 86a30b0255d8064169fabfd213d907016d2f9f2a
        Date:   Thu Jun 16 17:32:19 2022 +0300

It is also interesting that the last commit in that timezone was
especially large:

        $ git log --all --author='Jia Tan' \
        | grep -B2 '^Date.*+0[23]00' \
        | grep ^commit \
        | awk '{print $2}' \
        | xargs -L1 git log -1 --oneline --stat \
        | grep -e '^[^ ]' -e '^ [1-9]';
        de5c5e41 liblzma: Creates Non-resumable and Resumable modes for lzma_decoder.
         2 files changed, 532 insertions(+), 224 deletions(-)
        e446ab7a liblzma: Creates separate "safe" range decoder mode.
         2 files changed, 83 insertions(+), 104 deletions(-)
        c972d441 xz: Fix typo in man page.
         1 file changed, 1 insertion(+), 1 deletion(-)
        3d1fdddf Docs: Document the configure option --disable-ifunc in INSTALL.
         1 file changed, 8 insertions(+)
        d4674dfb xz: Avoid a compiler warning in progress_speed() in message.c.
         1 file changed, 3 insertions(+), 6 deletions(-)
        1fc6e7dd xz: Avoid a compiler warning in progress_speed() in message.c.
         1 file changed, 3 insertions(+), 6 deletions(-)
        6a86e81c Tests: Refactor test_stream_flags.c.
         1 file changed, 441 insertions(+), 142 deletions(-)
        e7a7ac74 CMake: Clarify a comment about Windows symlinks without file extension.
         1 file changed, 3 insertions(+), 4 deletions(-)
        ba3e4ba2 CMake: Clarify a comment about Windows symlinks without file extension.
         1 file changed, 3 insertions(+), 4 deletions(-)
        76a5a752 liblzma: Refactor lzma_mf_is_supported() to use a switch-statement.
         1 file changed, 14 insertions(+), 18 deletions(-)
        749b86c2 Build: Don't allow empty LIST in --enable-match-finders=LIST.
         1 file changed, 4 insertions(+)
        61f8ec80 liblzma: Refactor lzma_mf_is_supported() to use a switch-statement.
         1 file changed, 14 insertions(+), 18 deletions(-)
        4d80b463 Build: Don't allow empty LIST in --enable-match-finders=LIST.
         1 file changed, 4 insertions(+)
        86a30b02 Tests: Add more tests into test_check.
         2 files changed, 295 insertions(+), 7 deletions(-)

Compare that to a random sample of the same size of his commits.  In the
+0800 timezone, his commits are on average quite small, as is usual in
most open source projects.  These two last commits in +0300 so recent
make me suspect he started working big in this attack around that time.
Here's a random sample of commits written in +0800 (presumably China):

        $ git log --all --author='Jia Tan' \
        | grep -B2 '^Date.*+0800' \
        | grep ^commit \
        | awk '{print $2}' \
        | sort -R \
        | head -n14 \
        | sort \
        | xargs -L1 git log -1 --oneline --stat \
        | grep -e '^[^ ]' -e '^ [1-9]';
        02ca4a7d Translations: Patch man pages to avoid fuzzy matches.
         4 files changed, 4 insertions(+), 4 deletions(-)
        1f157d21 liblzma: Omit lzma_index_iter's internal field from Doxygen docs.
         1 file changed, 8 insertions(+), 1 deletion(-)
        2a89670a liblzma: Cleans up old commented out code.
         1 file changed, 11 deletions(-)
        5c9fdd3b Tests: Refactors existing filter flags tests.
         1 file changed, 483 insertions(+), 224 deletions(-)
        6fcf4671 liblzma: Highlight liblzma API headers should not be included directly.
         14 files changed, 42 insertions(+), 28 deletions(-)
        7f2293cd Translations: Update the Spanish translation.
         1 file changed, 253 insertions(+), 166 deletions(-)
        898aad9f xzmore: Fix typo in xzmore.1.
         1 file changed, 1 insertion(+), 1 deletion(-)
        a6234f67 Build: Update getopt.m4 from Gnulib.
         1 file changed, 39 insertions(+), 40 deletions(-)
        afb2dbec xz: Validate --flush-timeout for all specified filter chains.
         1 file changed, 16 insertions(+), 8 deletions(-)
        b2ba1a48 CI: Reorder 32-bit build first for Linux autotool builds.
         1 file changed, 12 insertions(+), 5 deletions(-)
        ca9015f4 liblzma: Check HAVE_USABLE_CLMUL before omitting CRC64 table.
         1 file changed, 2 insertions(+), 2 deletions(-)
        db176567 lib: Silence -Wsign-conversion in getopt.c.
         1 file changed, 3 insertions(+), 3 deletions(-)
        e22d0b0f Translations: Update the Spanish translation.
         1 file changed, 158 insertions(+), 161 deletions(-)
        e970c28a liblzma: Fix bug in lzma_str_from_filters() not checking filters[] length.
         1 file changed, 7 insertions(+)

After some time, I thought of also checking the committer dates:

        $ git log --all --committer 'Jia Tan' --pretty=fuller \
        | grep ^CommitDate: \
        | grep -o '+[0-9][0-9][0-9]0' \
        | sort \
        | uniq -c;
              4 +0200
            690 +0800

Hmmm, let's see when those commits have ben done in +0200:

        $ git log --all --committer 'Jia Tan' --pretty=fuller \
        | grep -B2 '^CommitDate:.*+0[23]00' \
        | grep -v Author;
        Commit:     Jia Tan <jiat0218 () gmail com>
        CommitDate: Tue Mar 5 23:21:26 2024 +0200
        --
        Commit:     Jia Tan <jiat0218 () gmail com>
        CommitDate: Mon Mar 4 19:23:18 2024 +0200
        --
        Commit:     Jia Tan <jiat0218 () gmail com>
        CommitDate: Thu Feb 29 16:35:52 2024 +0200
        --
        Commit:     Jia Tan <jiat0218 () gmail com>
        CommitDate: Thu Feb 29 16:35:52 2024 +0200

All were very recent, probably coinciding with the rush for attacking.
And they also match Israel (or nearby) winter +0200 timezones.  The last
rush in this attack was probably the period that started around Feb 12,
until it got caught in late March.

If a nation-state wants to investigate this, it would be interesting to
investigate flights between Israel and China in the dates where there
are commits authored/committed in both timezones.  These could be the
dates this spy probably had international flights between (presumably)
China and Israel (possibly off by a few countries and hours/days, but
these are the most likely ones, IMO).  This is assuming a single person
had access to this account.

        $ git log --all --date=iso --pretty=fuller \
        | grep -A1 ':     Jia Tan <jiat0218 () gmail com>' \
        | grep Date \
        | pcre2grep -M -e '\+0[23]00\n.*\+0800' -e '\+0800\n.*\+0[23]00';
        CommitDate: 2024-03-09 09:20:57 +0800
        CommitDate: 2024-03-05 23:21:26 +0200
        CommitDate: 2024-03-05 00:27:31 +0800
        CommitDate: 2024-03-04 19:23:18 +0200
        CommitDate: 2024-02-29 16:35:52 +0200
        AuthorDate: 2024-02-27 23:42:41 +0800
        CommitDate: 2024-02-15 01:53:40 +0800
        AuthorDate: 2024-02-12 17:09:10 +0200
        AuthorDate: 2024-02-12 17:09:10 +0200
        AuthorDate: 2024-02-13 22:38:58 +0800
        AuthorDate: 2023-07-14 21:10:27 +0800
        AuthorDate: 2023-07-18 13:27:46 +0300
        CommitDate: 2023-06-27 23:56:06 +0800
        AuthorDate: 2023-06-27 17:27:09 +0300
        AuthorDate: 2022-11-19 23:18:04 +0800
        AuthorDate: 2022-11-07 16:24:14 +0200
        AuthorDate: 2022-11-07 16:24:14 +0200
        AuthorDate: 2022-10-23 21:01:08 +0800
        AuthorDate: 2022-10-06 21:53:09 +0300
        AuthorDate: 2022-10-06 17:00:38 +0800
        AuthorDate: 2022-09-02 20:18:55 +0800
        AuthorDate: 2022-09-08 15:07:00 +0300
        AuthorDate: 2022-09-02 20:18:55 +0800
        AuthorDate: 2022-09-08 15:07:00 +0300
        AuthorDate: 2022-07-25 18:20:01 +0300
        AuthorDate: 2022-07-01 21:19:26 +0800
        AuthorDate: 2022-06-16 17:32:19 +0300
        AuthorDate: 2022-06-12 11:31:40 +0800

But there are some dates that seem to say that at least two people had
access to this account: It's not possible to travel from +0300 to +0800
in 1.5 hours.  So they don't necessarily correspond to travel, but
maybe just dates where there was collaboration with the mother country.

        CommitDate: 2023-06-27 23:56:06 +0800
        AuthorDate: 2023-06-27 17:27:09 +0300

Considering these periods of (likely) extra malicious activity, I would
especially suspect of anything from at least 2024-02-12:

        $ git log --all --date=iso --pretty=fuller --since=2024-02-11 \
        | grep ':     Jia Tan <jiat0218 () gmail com>' -C1 \
        | grep ^commit \
        | awk '{print $2}' \
        | tail -n1 \
        | xargs git describe --contains;
        v5.5.2beta~51

And in general, put special attention on every commit made in
+0200/+0300 by them.

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

Attachment: signature.asc
Description:


Current thread: