数据脱敏 题目提供了进行过脱敏的数据文件,但是脱敏的方法有误,可以被还原。我们需要先还原得到原数据,再根据新的脱敏方法进行脱敏 坑点: 提供的数据文件很大,直接用 json 库处理可能内存不够用,可以用 ijson 进行流式处理 身份证号补充生日之后还差一位,需要通过校验位来计算完整的身份证号 银行卡也需要通过校验位逆推 手机号后面部分经过翻转 这道题我在比赛时并没有时间去看,赛后尝试了一下得到下面的脚本,我感觉该写的都写了,不知道为啥就是没有成功,先在这记录一下吧😢 # 其中传入的参数 data 格式为[ [姓名, 性别, 出生日期, 身份证号, 手机号, 密码, 手机号, 银行卡号, 邮箱号], .... ],返回的结果格式也是如此。 # 每项数据均由字符串构成,请对该data参数进行脱敏处理,返回脱敏后的参数数组。 # 脱敏方式 # 1. 姓名脱敏,姓氏保留,名全部替换成"**",例如 "张三" 被替换为 "张**" 、"诸葛亮" 被替换为 "诸葛**",复姓仅包含现存的81个复姓 # (欧阳、太史、端木、上官、司马、东方、独孤、南宫、万俟、闻人、夏侯、诸葛、尉迟、公羊、赫连、澹台、皇甫、宗政、濮阳、公冶、太叔、申屠、公孙、慕容、仲孙、钟离、长孙、宇文、司徒、鲜于、司空、闾丘、子车、亓官、司寇、巫马、公西、颛孙、壤驷、公良、漆雕、乐正、宰父、谷梁、拓跋、夹谷、轩辕、令狐、段干、百里、呼延、东郭、南门、羊舌、微生、公户、公玉、公仪、梁丘、公仲、公上、公门、公山、公坚、左丘、公伯、西门、公祖、第五、公乘、贯丘、公皙、南荣、东里、东宫、仲长、子书、子桑、即墨、达奚、褚师、吴铭) # 2. 性别脱敏,性别保留,男性用"M"表示,女性用"F"表示 # 3. 身份证号脱敏,隐去中间10位数字,例如 "140311198705150344" 被替换为 "1403**********0344" # 4. 密码脱敏,一律使用12位星号 "************" 替换,例如密码 "123456" 被替换为 "************" # 5. 手机号脱敏,保留前三位和后四位,其他用星号替换,例如 "13812345678" 被替换为 "138****5678" # 6. 银行卡号脱敏,保留前四位和后四位,其他用星号替换,例如 "6222026006705351988" 被替换为 "6222***********1988" # 7. 邮箱号脱敏,邮箱@符号前的内容保留首尾两个字符,中间用4位星号替换,例如 "awh2aeg@foxmail.com" 被替换为 "a****g@foxmail.com" import base64 import json import ijson from hashlib import md5, sha256 # fmt: off d_lastname = ("欧阳", "太史", "端木", "上官", "司马", "东方", "独孤", "南宫", "万俟", "闻人", "夏侯", "诸葛", "尉迟", "公羊", "赫连", "澹台", "皇甫", "宗政", "濮阳", "公冶", "太叔", "申屠", "公孙", "慕容", "仲孙", "钟离", "长孙", "宇文", "司徒", "鲜于", "司空", "闾丘", "子车", "亓官", "司寇", "巫马", "公西", "颛孙", "壤驷", "公良", "漆雕", "乐正", "宰父", "谷梁", "拓跋", "夹谷", "轩辕", "令狐", "段干", "百里", "呼延", "东郭", "南门", "羊舌", "微生", "公户", "公玉", "公仪", "梁丘", "公仲", "公上", "公门", "公山", "公坚", "左丘", "公伯", "西门", "公祖", "第五", "公乘", "贯丘", "公皙", "南荣", "东里", "东宫", "仲长", "子书", "子桑", "即墨", "达奚", "褚师", "吴铭") # fmt: on def mask_name(name: str) -> str: if name[:2] in d_lastname: return name[:2] + "**" else: return name[0] + "**" check_num = { "1": 0, "0": 1, "X": 2, "9": 3, "8": 4, "7": 5, "6": 6, "5": 7, "4": 8, "3": 9, "2": 10, } def mask_id_card(id: str, birthdate: str) -> str: def calculate_x(Y: int, Z: int): for x in range(10): if (Y + x * 8) % 11 == Z: return x return None id = id[:6] + birthdate.replace("-", "") + "*" + id[-3:] Y = ( int(id[0]) * 7 + int(id[1]) * 9 + int(id[2]) * 10 + int(id[3]) * 5 + int(id[4]) * 8 + int(id[5]) * 4 + int(id[6]) * 2 + int(id[7]) * 1 + int(id[8]) * 6 + int(id[9]) * 3 + int(id[10]) * 7 + int(id[11]) * 9 + int(id[12]) * 10 + int(id[13]) * 5 + int(id[15]) * 4 + int(id[16]) * 2 ) Z = check_num[id[-1].upper()] if (x := calculate_x(Y, Z)) is not None: id = id.replace("*", str(x)) else: print(f"计算失败:{id}") return id[:4] + "*" * 10 + id[-4:] def mask_bank_card(id: str) -> str: def check(id_: str) -> bool: check_num = 0 for i, num in enumerate(id_[::-1]): id = i + 1 if id % 2 == 1: check_num += int(num) else: if (x := int(num) * 2) < 10: check_num += x else: check_num += (x // 10) + (x % 10) return check_num % 10 == 0 origin = "" for i in id: if (j := int(i) - 1) == -1: origin += "9" else: origin += str(j) for x in range(10): if check(origin + str(x)): origin += str(x) break return origin[:4] + "*" * (len(origin) - 8) + origin[-4:] def mask_phone(phone: str) -> str: phone = phone[:3] + phone[3:][::-1] return phone[:3] + "*" * (len(phone) - 7) + phone[-4:] def mask_email(email_b64: str) -> str: email = base64.b64decode(email_b64).decode() username, hostname = email.split("@") return f"{username[0]}****{username[-1]}@{hostname}" def data_mask(data: list) -> list: masked_data = [] ### ### please write your code here count = 0 for item in data: item_mask = [] item_mask.append(mask_name(item[0])) # 姓名 item_mask.append(item[1]) # 性别 item_mask.append(item[2]) # 出生日期 item_mask.append(mask_id_card(item[3], item[2])) # 身份证 item_mask.append(mask_phone(item[4])) # 手机号 item_mask.append("*" * 12) # 密码 item_mask.append(mask_bank_card(item[6])) # 银行卡号 item_mask.append(mask_email(item[7])) # 邮箱 masked_data.append(item_mask) count += 1 print(f"\r已处理 {count} 条数据", end="") if count > 10: break print("\n") ### end return masked_data with open("./data.json") as f: data = ijson.items(f, "item") info = json.dumps(data_mask(data)) print(info) sha256_sum, md5_sum = sha256(info.encode()).hexdigest(), md5(info.encode()).hexdigest() if sha256_sum == "c204be3d782e5d37a48b498364c60f4a610974e30d4aee76ca010ab0f8ba37cb": print(f"Correct! The submit answer is {md5_sum}") else: print("Wrong! Try again!")