数据脱敏

题目提供了进行过脱敏的数据文件,但是脱敏的方法有误,可以被还原。我们需要先还原得到原数据,再根据新的脱敏方法进行脱敏

坑点:

  • 提供的数据文件很大,直接用 json 库处理可能内存不够用,可以用 ijson 进行流式处理
  • 身份证号补充生日之后还差一位,需要通过校验位来计算完整的身份证号
  • 银行卡也需要通过校验位逆推
  • 手机号后面部分经过翻转

这道题我在比赛时并没有时间去看,赛后尝试了一下得到下面的脚本,我感觉该写的都写了,不知道为啥就是没有成功,先在这记录一下吧😢

# 其中传入的参数 data 格式为[ [姓名, 性别, 出生日期, 身份证号, 手机号, 密码, 手机号, 银行卡号, 邮箱号], .... ],返回的结果格式也是如此。
# 每项数据均由字符串构成,请对该data参数进行脱敏处理,返回脱敏后的参数数组。
# 脱敏方式
# 1. 姓名脱敏,姓氏保留,名全部替换成"**",例如 "张三" 被替换为 "张**" 、"诸葛亮" 被替换为 "诸葛**",复姓仅包含现存的81个复姓
#  (欧阳、太史、端木、上官、司马、东方、独孤、南宫、万俟、闻人、夏侯、诸葛、尉迟、公羊、赫连、澹台、皇甫、宗政、濮阳、公冶、太叔、申屠、公孙、慕容、仲孙、钟离、长孙、宇文、司徒、鲜于、司空、闾丘、子车、亓官、司寇、巫马、公西、颛孙、壤驷、公良、漆雕、乐正、宰父、谷梁、拓跋、夹谷、轩辕、令狐、段干、百里、呼延、东郭、南门、羊舌、微生、公户、公玉、公仪、梁丘、公仲、公上、公门、公山、公坚、左丘、公伯、西门、公祖、第五、公乘、贯丘、公皙、南荣、东里、东宫、仲长、子书、子桑、即墨、达奚、褚师、吴铭)
# 2. 性别脱敏,性别保留,男性用"M"表示,女性用"F"表示
# 3. 身份证号脱敏,隐去中间10位数字,例如 "140311198705150344" 被替换为 "1403**********0344"
# 4. 密码脱敏,一律使用12位星号 "************" 替换,例如密码 "123456" 被替换为 "************"
# 5. 手机号脱敏,保留前三位和后四位,其他用星号替换,例如 "13812345678" 被替换为 "138****5678"
# 6. 银行卡号脱敏,保留前四位和后四位,其他用星号替换,例如 "6222026006705351988" 被替换为 "6222***********1988"
# 7. 邮箱号脱敏,邮箱@符号前的内容保留首尾两个字符,中间用4位星号替换,例如 "awh2aeg@foxmail.com" 被替换为 "a****g@foxmail.com"
 
import base64
import json
import ijson
from hashlib import md5, sha256
 
# fmt: off
d_lastname = ("欧阳", "太史", "端木", "上官", "司马", "东方", "独孤", "南宫", "万俟", "闻人", "夏侯", "诸葛", "尉迟", "公羊", "赫连", "澹台", "皇甫", "宗政", "濮阳", "公冶", "太叔", "申屠", "公孙", "慕容", "仲孙", "钟离", "长孙", "宇文", "司徒", "鲜于", "司空", "闾丘", "子车", "亓官", "司寇", "巫马", "公西", "颛孙", "壤驷", "公良", "漆雕", "乐正", "宰父", "谷梁", "拓跋", "夹谷", "轩辕", "令狐", "段干", "百里", "呼延", "东郭", "南门", "羊舌", "微生", "公户", "公玉", "公仪", "梁丘", "公仲", "公上", "公门", "公山", "公坚", "左丘", "公伯", "西门", "公祖", "第五", "公乘", "贯丘", "公皙", "南荣", "东里", "东宫", "仲长", "子书", "子桑", "即墨", "达奚", "褚师", "吴铭")
# fmt: on
 
 
def mask_name(name: str) -> str:
    if name[:2] in d_lastname:
        return name[:2] + "**"
    else:
        return name[0] + "**"
 
 
check_num = {
    "1": 0,
    "0": 1,
    "X": 2,
    "9": 3,
    "8": 4,
    "7": 5,
    "6": 6,
    "5": 7,
    "4": 8,
    "3": 9,
    "2": 10,
}
 
 
def mask_id_card(id: str, birthdate: str) -> str:
    def calculate_x(Y: int, Z: int):
        for x in range(10):
            if (Y + x * 8) % 11 == Z:
                return x
        return None
 
    id = id[:6] + birthdate.replace("-", "") + "*" + id[-3:]
    Y = (
        int(id[0]) * 7
        + int(id[1]) * 9
        + int(id[2]) * 10
        + int(id[3]) * 5
        + int(id[4]) * 8
        + int(id[5]) * 4
        + int(id[6]) * 2
        + int(id[7]) * 1
        + int(id[8]) * 6
        + int(id[9]) * 3
        + int(id[10]) * 7
        + int(id[11]) * 9
        + int(id[12]) * 10
        + int(id[13]) * 5
        + int(id[15]) * 4
        + int(id[16]) * 2
    )
    Z = check_num[id[-1].upper()]
    if (x := calculate_x(Y, Z)) is not None:
        id = id.replace("*", str(x))
    else:
        print(f"计算失败:{id}")
    return id[:4] + "*" * 10 + id[-4:]
 
 
def mask_bank_card(id: str) -> str:
    def check(id_: str) -> bool:
        check_num = 0
        for i, num in enumerate(id_[::-1]):
            id = i + 1
            if id % 2 == 1:
                check_num += int(num)
            else:
                if (x := int(num) * 2) < 10:
                    check_num += x
                else:
                    check_num += (x // 10) + (x % 10)
        return check_num % 10 == 0
 
    origin = ""
    for i in id:
        if (j := int(i) - 1) == -1:
            origin += "9"
        else:
            origin += str(j)
    for x in range(10):
        if check(origin + str(x)):
            origin += str(x)
            break
    return origin[:4] + "*" * (len(origin) - 8) + origin[-4:]
 
 
def mask_phone(phone: str) -> str:
    phone = phone[:3] + phone[3:][::-1]
    return phone[:3] + "*" * (len(phone) - 7) + phone[-4:]
 
 
def mask_email(email_b64: str) -> str:
    email = base64.b64decode(email_b64).decode()
    username, hostname = email.split("@")
    return f"{username[0]}****{username[-1]}@{hostname}"
 
 
def data_mask(data: list) -> list:
    masked_data = []
    ###
    ### please write your code here
    count = 0
    for item in data:
        item_mask = []
        item_mask.append(mask_name(item[0]))  # 姓名
        item_mask.append(item[1])  # 性别
        item_mask.append(item[2])  # 出生日期
        item_mask.append(mask_id_card(item[3], item[2]))  # 身份证
        item_mask.append(mask_phone(item[4]))  # 手机号
        item_mask.append("*" * 12)  # 密码
        item_mask.append(mask_bank_card(item[6]))  # 银行卡号
        item_mask.append(mask_email(item[7]))  # 邮箱
        masked_data.append(item_mask)
 
        count += 1
        print(f"\r已处理 {count} 条数据", end="")
        if count > 10:
            break
    print("\n")
    ### end
    return masked_data
 
 
with open("./data.json") as f:
    data = ijson.items(f, "item")
    info = json.dumps(data_mask(data))
    print(info)
sha256_sum, md5_sum = sha256(info.encode()).hexdigest(), md5(info.encode()).hexdigest()
 
if sha256_sum == "c204be3d782e5d37a48b498364c60f4a610974e30d4aee76ca010ab0f8ba37cb":
    print(f"Correct! The submit answer is {md5_sum}")
else:
    print("Wrong! Try again!")